Towards beating Factorio autonomously

I’ve recently become enchanted with the idea of beating Factorio autonomously.

Besides tool-assisted speedruns (which script the entire game on a known map layout) there has not yet been much progress in autonomous play for Factorio. The leading agent-based system has successfully produced moderate-complexity items, but only in a ‘lab-play’ environment which removes many constraints of the full game.

As far as I can tell, the title of ‘first autonomous rocket launch’ (even for liberal definitions of autonomous) is still up for grabs.

As a target for autonomous play, Factorio presents significant challenges:

Each of these is compounded by Factorio’s scale: the conceptual, spatial, and temporal challenges are very different for a ten entity and thousand entity factory.

Ways to play

For practicality I’m treating ‘launching a rocket’ as ‘beating the game’ and ignoring the Space Age expansion.

Up top, this is a hard game and I’m not sure today’s frontier models are cut out for it. The combination of long-horizon planning, spatial reasoning, and reflexes needed to beat the vanilla game may be too much for a system without significant scripted assistance (what exactly counts as ‘cheating’ will be discussed below).

Regardless, the full experience is not the place to start. I’m imaging a sort of ‘difficulty ladder’, which progressively backs off Factorio’s most challenging elements without completely giving the game away:

Similarly, I can imagine a range of system designs with varying legitimacy for the ‘autonomous rocket’ goal:

Each of these ladders becomes an axis on my Factorio Automation Matrix™:

Automatic power and unlimited resources help to simplify the problem space and focus on the core challenge of the game: scaling the chained production of resources. If that goes well then I think launching a rocket with no spatial tools would be a respectable goal, even with unlimited resources. If spatial progress is slower then allowing some pathing tools might be an acceptable compromise in the context of a more rigorous game environment.

Self-learning only makes sense once a particular system design has proven effective; that would be more about re-deriving the game knowledge from first principles. Unfortunately, the multimodal cases are non-starters since I don’t have $10k to blow on tokens.

As for the vanilla game, I struggle to imagine a solution to biter defense that is both effective and satisfying. If the system is manipulated into pre-addressing attacks with overwhelming defense then that feels too scripting-adjacent. On the other hand, the long-term planning and short-term reacting needed to intelligently fight the biters the way a human would seems hard to develop organically.

For me, the north star is a biter-free rocket launch with no pre-built spatial solutions. Providing unstructured game knowledge seems fair (who hasn’t scrolled the wiki?) and removing it can be saved until a working solution is being generalized.

Literature review

Most of the prior art in autonomous Factorio play has been done by the team behind Factorio Learning Environment (FLE). As of v0.3, their best ‘lab-play’ result (where a model must reach a throughput target using provided materials in a small ‘lab’ map) is Claude Opus 4.1 achieving >16 military science packs per minute. The best ‘open-play’ (vanilla game) result from the original FLE paper is Claude Sonnet 3.5 constructing logistic science packs (one level below military) in under 5k steps.

Besides advancing the state-of-the-art in autonomous play, FLE has also contributed critical infrastructure to the community:

These two components can be combined with any client implementation, enabling quick testing of a wide array of game-playing systems!

Beyond FLE, most Factorio-related research (1, 2) focuses on pure math applications for transport belt and assembler layouts. For additional inspiration we must turn our focus to other games.

Research in Minecraft is more substantial, with projects like VistaWise and Ghost in the Minecraft finding success in hierarchical goal decomposition: to mine diamond you need an iron pickaxe, which requires smelting, which requires iron ore, etc.

Unfortunately, these systems aren’t much help in the spatial reasoning area. The typical ‘mine diamond’ objective for autonomous minecraft play isn’t much of a spatial challenge: systems can walk and dig randomly until they find the ore they need and vertical navigation is often delegated to a tool that digs up or down to a specified level.

MineAnyBuild is the only Minecraft project I found that directly assesses spatial reasoning by asking models to generate blueprints for structures like a house or the Olympic rings. Unfortunately, their conclusion is that proprietary models suck and open models are even worse:

For some low-parameter MLLM-based agents (e.g. InternVL2.5-2B/4B), they tend to understand the basic elements in the given instruction or image, and offer a simple or detailed plan for constructing. However, they often encounter difficulties when generating the executable spatial plan and cannot convert their planning into an executable 3D matrix, thus leading to scores under 0.4 as shown in Table 1.

For some large-parameter models, they can usually understand the block materials partly compared to low-parameter ones, and can actively select some diverse blocks to build structures. Nevertheless, they often fail to understand the correlations between various combinations of block materials, resulting in a faulty completion of the final building and thus poor scores (from 0.63 to 1.29 in Table 1) by critic model.

This is good to keep in mind as we consider how a system might express its factory layouts.

I also enjoyed this blog and associated video on beating Pokemon Red with reinforcement learning. There’s probably room for some RL in a Factorio solution, but we’ll get to that.

Design ideas

Factorio is a Russian nesting doll of objectives, a high-dimensional Tower of Hanoi puzzle: you need ore to make plates to make gears to make to make inserters to make packs and at every level you’re laying out transport belts, redirecting power, and desperately scaling your existing production to supply materials.

The high-level goal decomposition is the easiest part of this: today’s models (even small ones) can do it quickly and accurately. The rub is in the implementation.

Some questions and ideas:

These questions and more will need to be answered – onward!