Faketorio: an agent testbed

Faketorio (GitHub) is a toy agent sandbox that I built this weekend with Codex. As the name suggests, it is based on the very early stages of Factorio. The world is constrained to a 50x50 grid with water, iron, stone, and coal and the game supports burner drills, burner inserters, transport belts, stone furnaces, and stone chests.

The game is written in vanilla TypeScript and runs in the browser, exposing an API for manipulating the player and entities in the world. Alongside the game, a simple sidecar node server can be used to autonomously play the game using a selected OpenAI model and built-in tools and prompt.

After some prompt and tool tweaking, I was able to get gpt-4.1-mini to achieve a goal of smelting ten iron plates!

I was inspired by the Factorio Learning Environment and spent several hours playing with that before embarking on this project. The orchestration of headless servers in that project is impressive and after some tweaking I was able to run simple tests against the real game!

Unfortunately the experience was a bit lackluster without graphics and didn’t lend itself to rapid manual iteration. Hence, Faketorio!

I was impressed with how capable even a fast, cheap model could be and unnerved by how small changes to the prompt could cause large variance in performance. I’d like to do some more testing with throughput-based goals to motivate more automation and see how far the mini model can get me.

There’s very little optimization so the successful ‘10 iron plates’ run took about 360k tokens (though mostly input) and cost roughly 14 cents!