Towards beating Factorio autonomously
I’ve recently become enchanted with the idea of beating Factorio autonomously.
Besides tool-assisted speedruns (which script the entire game on a known map layout) there has not yet been much progress in autonomous play for Factorio. The leading agent-based system has successfully produced moderate-complexity items, but only in a ‘lab-play’ environment which removes many constraints of the full game.
As far as I can tell, the title of ‘first autonomous rocket launch’ (even for liberal definitions of autonomous) is still up for grabs.
As a target for autonomous play, Factorio presents significant challenges:
- Building and iterating on a factory requires solving complex spatial reasoning problems
- Game progression relies on the repeated composition of sub-goals to achieve higher-level tasks
- Resources are produced asynchronously which obfuscates factory state, throughput, and bottlenecks
- The game environment evolves dynamically through resource depletion, pollution, and biters
Each of these is compounded by Factorio’s scale: the conceptual, spatial, and temporal challenges are very different for a ten entity and thousand entity factory.
Ways to play
For practicality I’m treating ‘launching a rocket’ as ‘beating the game’ and ignoring the Space Age expansion.
Up top, this is a hard game and I’m not sure today’s frontier models are cut out for it. The combination of long-horizon planning, spatial reasoning, and reflexes needed to beat the vanilla game may be too much for a system without significant scripted assistance (what exactly counts as ‘cheating’ will be discussed below).
Regardless, the full experience is not the place to start. I’m imaging a sort of ‘difficulty ladder’, which progressively backs off Factorio’s most challenging elements without completely giving the game away:
- Biters: The unpredictability and speed of the biters make them the biggest obstacle in beating the full game. Strategies like pollution management and nest elimination are very indirect and hard to autonomously develop without ‘hints’.
- Water and cliffs: Landscape obstacles complicate the spatial reasoning problems of factory design. While a small amount of water is necessary for steam power, an ‘easier’ map layout with a large, contiguous buildable area will help to lower the difficulty.
- Resource depletion: Initial nearby resource patches are usually exhausted in the midgame, requiring exploration and (typically) train infrastructure to continue. This is one of the longer asynchronous cycles in the game environment and isn’t worth contending with until an ‘infinite patches’ environment is mastered.
- Power: Designing and scaling the factory’s power system is a core challenge in Factorio. I think ‘automatic’ power could be useful while pursuing spatial design capability which would hopefully make the power ‘dimension’ easier to implement once added.
Similarly, I can imagine a range of system designs with varying legitimacy for the ‘autonomous rocket’ goal:
- Unlimited scripting: A small deterministic program combined with pre-made blueprints for every stage of development is probably sophisticated enough to succeed on a random-seed map (subject to some of the environmental ‘aids’ above). This is close enough to a tool-assisted speedrun as to be uninteresting.
- No blueprints: Almost anything goes, but no providing the system with pre-made layouts. Textual game knowledge, fine tuning, and some spatial tools (e.g.
connectEntities(a,b)) are all allowed. To me this is still not ‘autonomous’, but would be fun to make work. - No spatial tools: All entity placement must be done directly by the system or using programs that the system creates. Beating the vanilla game in this manner would be autonomous victory.
- Self-learning only: No knowledge base, knowledge graph, or game-specific info provided. To the extent possible, the system must learn everything about the game itself.
- Desktop interface only: The system interacts with the game as a human would, receiving frames and responding with mouse clicks and key presses. If this is ever feasible then we have bigger fish to fry.
Each of these ladders becomes an axis on my Factorio Automation Matrix™:
Automatic power and unlimited resources help to simplify the problem space and focus on the core challenge of the game: scaling the chained production of resources. If that goes well then I think launching a rocket with no spatial tools would be a respectable goal, even with unlimited resources. If spatial progress is slower then allowing some pathing tools might be an acceptable compromise in the context of a more rigorous game environment.
Self-learning only makes sense once a particular system design has proven effective; that would be more about re-deriving the game knowledge from first principles. Unfortunately, the multimodal cases are non-starters since I don’t have $10k to blow on tokens.
As for the vanilla game, I struggle to imagine a solution to biter defense that is both effective and satisfying. If the system is manipulated into pre-addressing attacks with overwhelming defense then that feels too scripting-adjacent. On the other hand, the long-term planning and short-term reacting needed to intelligently fight the biters the way a human would seems hard to develop organically.
For me, the north star is a biter-free rocket launch with no pre-built spatial solutions. Providing unstructured game knowledge seems fair (who hasn’t scrolled the wiki?) and removing it can be saved until a working solution is being generalized.
Literature review
Most of the prior art in autonomous Factorio play has been done by the team behind Factorio Learning Environment (FLE). As of v0.3, their best ‘lab-play’ result (where a model must reach a throughput target using provided materials in a small ‘lab’ map) is Claude Opus 4.1 achieving >16 military science packs per minute. The best ‘open-play’ (vanilla game) result from the original FLE paper is Claude Sonnet 3.5 constructing logistic science packs (one level below military) in under 5k steps.
Besides advancing the state-of-the-art in autonomous play, FLE has also contributed critical infrastructure to the community:
- A CLI for starting and managing multiple headless Factorio servers on a single machine, enabling rapid testing
- A Lua server which communicates with the Factorio game over RCON over TPC and exposes an API interacting with the game state
These two components can be combined with any client implementation, enabling quick testing of a wide array of game-playing systems!
Beyond FLE, most Factorio-related research (1, 2) focuses on pure math applications for transport belt and assembler layouts. For additional inspiration we must turn our focus to other games.
Research in Minecraft is more substantial, with projects like VistaWise and Ghost in the Minecraft finding success in hierarchical goal decomposition: to mine diamond you need an iron pickaxe, which requires smelting, which requires iron ore, etc.
Unfortunately, these systems aren’t much help in the spatial reasoning area. The typical ‘mine diamond’ objective for autonomous minecraft play isn’t much of a spatial challenge: systems can walk and dig randomly until they find the ore they need and vertical navigation is often delegated to a tool that digs up or down to a specified level.
MineAnyBuild is the only Minecraft project I found that directly assesses spatial reasoning by asking models to generate blueprints for structures like a house or the Olympic rings. Unfortunately, their conclusion is that proprietary models suck and open models are even worse:
For some low-parameter MLLM-based agents (e.g. InternVL2.5-2B/4B), they tend to understand the basic elements in the given instruction or image, and offer a simple or detailed plan for constructing. However, they often encounter difficulties when generating the executable spatial plan and cannot convert their planning into an executable 3D matrix, thus leading to scores under 0.4 as shown in Table 1.
For some large-parameter models, they can usually understand the block materials partly compared to low-parameter ones, and can actively select some diverse blocks to build structures. Nevertheless, they often fail to understand the correlations between various combinations of block materials, resulting in a faulty completion of the final building and thus poor scores (from 0.63 to 1.29 in Table 1) by critic model.
This is good to keep in mind as we consider how a system might express its factory layouts.
I also enjoyed this blog and associated video on beating Pokemon Red with reinforcement learning. There’s probably room for some RL in a Factorio solution, but we’ll get to that.
Design ideas
Factorio is a Russian nesting doll of objectives, a high-dimensional Tower of Hanoi puzzle: you need ore to make plates to make gears to make to make inserters to make packs and at every level you’re laying out transport belts, redirecting power, and desperately scaling your existing production to supply materials.
The high-level goal decomposition is the easiest part of this: today’s models (even small ones) can do it quickly and accurately. The rub is in the implementation.
Some questions and ideas:
- How do we convert an unstructured, abstract factory description to a sequence of API calls?
- Can an intermediate DAG serve as a bridge between unstructured and structured factory designs?
- Can RL solve the lowest-level placement problems which LLMs fumble at?
- Can asynchronous production be accounted for by treating rates/throughput as if they were inventory items?
- Is incrementality valuable, or do we allow the system to rebuild the factory from scratch at every stage?
These questions and more will need to be answered – onward!