Developing Robust AI in DwarfCorp
How we developed robust AI in DwarfCorp
DwarfCorp is an open-source (MIT licensed) base building strategy game from Completely Fair Games, slated for release in September 2017. While the game is in development, we provide free builds on our website. The game is inspired by Dwarf Fortress by Bay 12 Games, and of course Minecraft, among other similar games in the same genre.
In DwarfCorp, the player manages a colony of capitalist Dwarves who mine, collect resources, build and fight in a 3D procedurally voxel world. The Dwarves are controlled through a simple RTS-style control scheme where the player selects employees and assigns them tasks.
In a game like this, the units under the player’s control have to be largely autonomous. That is, they should make their own decisions and respond to changes in the world in real-time.
Since DwarfCorp has such a complicated world (3D destroyable terrain, water, lava and other hazards, enemy creatures that can harass the Dwarves, etc.) this is not an easy task. After we started working on the game in late 2012, we’ve gone through several re-designs of the AI system. In this article, I’d like to give an overview of what worked and what didn’t. And, since the game is open-source, you can read along in our source code to get an idea of what I’m talking about. Warning: this is a technical article full of jargon and code.
Task Executation vs. Management
The first thing that we have to make clear is what we mean by AI. The AI system in DwarfCorp is divided into two parts: the task management system, and the task execution system. Roughly speaking, the task management system assigns tasks among many Dwarves, while the task execution system figures out how to get one dwarf to complete one task.
Task Execution
The task execution system answers the question “How do I accomplish my goal?”. Given some task (for example, gathering an item or mining a block) we want to know the sequence of actions that the Dwarf should take to accomplish that goal, and we want to control exactly how those actions are carried out. We also want to do this in a robust way. The Dwarf might get interrupted during the execution of the task. The terrain might change. Things might disappear. The Dwarf might get set on fire. I don’t know, and neither does the Dwarf ahead of time — but he/she needs to figure out how to respond should any of those things happen.
Your first thought might be “Okay, I will just write a script that tells the Dwarf how to accomplish its task”. That’s fine, but also very labor intensive and complicated, especially if you think about all the things that could go wrong while the dwarf is following the script.
You might also wonder what the best way to script the AI system within a game loop. Should the script be in a thread? Should it work on a tick-by-tick basis? Should it be state-based? Should it be asynchronous? Event-driven? How much should be generated on the fly, versus pre-scripted by hand? Eventually you will settle on one of the basic patterns of AI development. All of the patterns work — but all of them have certain flaws that you should know about before using them.
All of these questions plagued DwarfCop early in development, and we explored a lot of alternatives before settling on something that worked for us.
First Attempt: State Machines
If you’ve spent any time AI programming, you’ve probably come accross Finite State Machines (FSMs), and its no wonder. They’re a very old concept, and are extremely simple and general. That can make them fairly expressive and powerful. When we were starting out, this seemed like the obvious choice when it comes to task execution, but, as we shall see they have flaws which make them difficult to maintain and write.
With an FSM, you’ve got certain “States” that the agent can be “In”, and a series of “Transitions” between these “States” based on certain conditions. Within each “State”, the agent performs a certain action until it can transition out of the state to another one. To design the AI, you define a bunch of states, and hook them up using a transition table. This is a frankly tedious process that takes a lot of skill to get right.
Our FSM class started out looking something like this:
// A state is a condition that the Dwarf can be in, and tells it how to behave.
class State
{
public string Name;
// The agent that's in the State.
public AI Agent;
// Do stuff when the state starts..
public Func<void> OnEnter;
// Update the state one tick at a time. Called in the game update loop.
public Func<void> void OnTick;
// Do stuff when the state exits (like cleaning up)
public Func<void> OnLeave;
}
and then you might have a State Machine class that looks like this:
// Represents a collection of states and transitions between them.
class StateMachine
{
public List<State> States {get; set;}
// Represents the conditions under which a state transitions from one to another.
public class Transition
{
// The state to transition from
public State Source;
// The state to transition to
public State Destination;
// The conditions (as a list of functions) that must hold before
// the transition will take place.
public IEnumerable<Func<bool> > Conditions;
// The state should transition to another when all the conditions are met.
public bool ShouldTransition()
{
return Conditions.All(condition => condition());
}
}
// A map from states to their transition criteria.
public Dictionary<State, Transition> TransitionTable {get; set;}
private State CurrentState {get; set;}
public void SetState(State state)
{
if (CurrentState != null)
{
CurrentState.OnLeave();
}
CurrentState = state;
state.OnEnter();
}
// Update the state machine.
public void Tick()
{
if(CurrentState != null)
{
// Determine if the state machine should transition to the next state.
var transitions = TransitionTable[CurrentState];
if (transitions.ShouldTransition())
{
SetState(transitions.Destination);
}
CurrentState.OnTick();
}
}
}
And then you’d set up a state machine like this:
AI dwarf;
// Some state.
State firstState = new State()
{
Name = "First State",
Agent = dwarf,
OnTick = () => { Console.Out.WriteLine("Hello from first state!"); }
OnEnter = () => { Console.Out.WriteLine("Entering first state!"); }
OnExit = () => { Console.Out.WriteLine("Exiting first state!"); }
};
// Some other state.
State secondState = new State()
{
Name = "Second State",
Agent = dwarf,
OnTick = () => { Console.Out.WriteLine("Hello from second state!"); }
OnEnter = () => { Console.Out.WriteLine("Entering second state!"); }
OnExit = () => { Console.Out.WriteLine("Exiting second state!"); }
};
// We will keep track of this iteration variable to determine when to transition states.
int iter = 0;
StateMachine machine = new StateMachine()
{
States = new List<State>
{
firstState,
secondState
}
TransitionTable = new Dictionary<State, Transition>()
{
// Transitions for the first state.
{
firstState,
new Transition()
{
Source = firstState,
Desitination = secondState,
// Transition to the second state when the number of iterations exceeds 5.
Conditions = new List<Func<bool> >()
{
() => { return iter > 4; }
}
}
}
// Transitions for the second state.
{
secondState,
new Transition()
{
Source = secondState,
Destination = firstState,
// Transition back to the first state when the number of
// iterations exceeds 10.
Conditions = new List<Func<bool> > ()
{
() => { return iter > 9; }
}
}
}
}
}
// Initialize the state machine.
machine.SetState(firstState);
// Update the state machine.
for (iter = 0; iter < 11; iter++) {
machine.Tick();
}
The output of this “simple” machine will be
Entering first state!
Hello from first state! // iter = 0
Hello from first state! // iter = 1
Hello from first state! // iter = 2
Hello from first state! // iter = 3
Hello from first state! // iter = 4
Exiting first state!
Entering second state!
Hello from second state! // iter = 5
Hello from second state! // iter = 6
Hello from second state! // iter = 7
Hello from second state! // iter = 8
Hello from second state! // iter = 9
Exiting second state!
Entering first state!
Hello from first state! // iter = 10
Even using a bunch of code-shortening tricks like lambdas and explicit initialization, that’s a lot of work for something that is the equivalent of two for
loops, and its not immediately clear what the behavior of this machine should be at first glance. If that sounds bad, imagine what happens when we have to do something more complicated.
Here is a real example of a State Machine we wrote in 2012 for building an item:
This is one of the simpler FSMs we had. Notice the structure of the graph. There is a catchall state called “Idle” which must have transitions to all other states in case of failure, and has to keep track of what has succeeded or failed. Almost all the transitions in the graph are entirely based on whether or not the previous state succeeded and not on some intrinsic true state of the Dwarf or World. This kind of graph certainly works, but its kind of tedious to write.
We quickly realized that FSMs were just not going to scale, and we went searching for alternatives which would make writing complex, robust behaviors easier.
Second attempt: Action Planning
In my day job, I do robotics research. At around the time that I was messing with FSMs in DwarfCorp, I was working at the Personal Robotics Lab at Carnegie Mellon. The lab’s primary focus is on an academic subject called planning. I’ve you’ve ever worked with game AI, you’ve probably done at least some planning — for example your everyday A* algorithm is a simple path planning algorithm on graphs.
Planners take a state space, consisting of a set of numbers or symbols that describe the world and agent, and a set of actions that take you from one state to another given pre-conditions and post-conditions, to produce a path from some starting state to some goal state. In the case of the simple A* graph planner, the state space consists of nodes in a graph, and the actions are edges between the nodes. If we restrict the graph to a grid, and connect things together based on their proximity, you wind up with a vanilla grid planner.
Long before we started looking at alternatives to FSMs, we were already using a graph planner to tell Dwarves how to get from point A to point B. The graph consisted of nodes which were free voxels in our 3D world, and the edges were actions (such as Walk
, Jump
, Fly
, and Climb
) which connected these voxels together. Clearly this was a kind of task execution system. The Dwarf has a goal: get from one place to another, and a set of commands it can issue to reach that goal.
Could the whole idea of task execution be rolled directly into the planner? Could all dwarf actions be described as simply as walking or jumping from one voxel to another?
It turns out this is a very old idea, though not quite as old as Finite State Machines. Newell and Simon were looking at the idea in the late 50’s and called the concept the “General Problem Solver”, though the applications of their approach were limited to proving theorems. Perhaps the first useful action planner was STRIPS in 1971, which was used to command the proto-robot Shakey at the Stanford Research Institute.
The idea behind STRIPS is pretty simple. You start out with a series of symbols that define your state space. You then define a series of actions on those symbols which take you from one state to another. Actions must have pre-conditions, the set of symbols which must be a certain value before the action can execute, and post-conditions, the set of symbols whose values change based on the action. By defining all the states and actions you can derive a graph called the state-action graph. You then plan in this graph exactly as you would using A* (although you have to carefully consider what kinds of heuristics to use, because the state space is very different from something like a grid). From there, your robot (or in my case Dwarf) can figure out how to do basically anything on its own, without having to write out a tedious transition table.
From the game industry, a popular algorithm called GOAP (Goal-oriented action planning) is basically an implementation of STRIPS and was used in some famous games like F.E.A.R.
Let’s see if we can write the “build item” task as an action planning domain. First we start by defining the state space as a set of true
or false
assertions. By default these assertions will be false
:
// Determines whether the dwarf has the given resource in the inventory
dwarf_has(dwarf, resource): true/false
// Determines whether the dwarf is at the given place
dwarf_at(dwarf, place): true/false
// Determines whether the dwarf can build the given item using a resource
can_build(resource,item): true/false
// Deteremines whether the given resource is at a place.
resource_at(resource, place): true/false
//Determines whether the given item has already been built.
item_built(item): true/false
// Determines whether the given item is at the place.
item_at(item, place): true/false
We can also define a set of actions and what they do:
// Gather causes a dwarf to hold a resource.
gather(dwarf, resource, place) {
// The dwarf can't already be holding the resource. They must also be
// at the place where the resource also is.
preconditions = [ resource_at(resource, place),
not dwarf_has(dwarf, resource),
dwarf_at(dwarf, place)];
// The dwarf is holding the resource. The resource is not at the place.
postconditions = [not resource_at(resource, place),
dwarf_has(dwarf, resource)];
cost = 1;
}
// Go to causes a dwarf to go from a place to another place.
go_to(dwarf, from_place, to_place) {
// The dwarf is at the starting location.
preconditions = [dwarf_at(dwarf, from_place)];
// The dwarf is at the ending location and not the starting location.
postconditions = [not dwarf_at(dwarf, from_place),
dwarf_at(dwarf, to_place)];
cost = 5;
}
// Causes the dwarf to build the given item using the resource at the place.
build(dwarf, item, resource, place) {
// The dwarf must be holding the resource, the resource needs to be
// able to be used to build the item. The item can't already be built.
// The dwarf must be at the place.
preconditions = [ dwarf_at(dwarf, place),
dwarf_has(dwarf, resource),
can_build(resource, item),
not item_built(item),
item_at(item, place) ];
// The item is now built. The resource has left the dwarf's inventory.
postconditions = [ item_built(item),
not dwarf_has(dwarf, resource)];
cost = 10;
}
Then we can set up an initial and goal state:
// The initial symbols.
Place stockpile;
Place worksite;
Place field;
Item trap;
Resource iron;
Dwarf dwarf;
// The dwarf is in the field, the resource is in the stockpile, and the trap
// is not built but can be built using iron.
initial_state = [ dwarf_at(dwarf, field),
resource_at(iron, stockpile),
can_build(iron, trap),
item_at(trap, worksite)] ;
// The dwarf wants to build the trap.
goal_state = [ item_built(trap) ];
Now, we run this through a planner. I won’t get into details about how to build a planner, but you could go with one of the algorithms out there for solving one of these problems like Fast Downward.
If it works, the planner should produce something like this:
// The dwarf first goes to the stockpile and then gathers iron from it.
// Then the dwarf goes to the worksite and builds the item.
plan = [ go_to(dwarf, field, stockpile),
gather(dwarf, iron, stockpile),
go_to(dwarf, stockpile, worksite),
build(dwarf, trap, iron, worksite)]
Then, the dwarf merely executes the actions to complete the task. Notice how there’s no error handling here. If the dwarf ends up failing an action because a precondition isn’t met, the dwarf merely has to replan to achieve its goals.
Say for example the dwarf gathered iron from the stockpile and then was terrified by a goblin, dropping the iron. Well, now the game state is that the iron is in the field, the dwarf doesn’t have the iron, and the dwarf still wants to build the trap, so the resulting replan is:
plan = [ gather(dwarf, iron, field),
go_to(dwarf, stockpile, worksite),
build(dwarf, trap, iron, worksite)]
In this way the dwarf can deal with any changes to its environment provided that the changes are visible to this symbolic state.
When I first read about this idea I got really excited about it and set out to write up an implementation of GOAP. It would mean having very smart dwarves that could be creative and dictate their own actions on the fly. I even started using this concept in my robotics research and looking into the classical planning work of Perez and Kaebling for inspiration.
Unfortunately, this method has serious pitfalls which make it almost unusable for practical game design.
Here are some of the problems with it
No symbolic state can sufficiently capture the complexity of the game world.
Defining the state variables and operators for a complex game is tricky. This is known as the frame problem. Unless your game is Chess, Checkers or Connect-4, the number of state variables and operators is going to be huge. And that means…
The Curse of Dimensionality will kill you.
The more actions you add and the more states you add, the bigger the planing problem gets. In fact, the amount of time required for a plan increases exponentially. With just a few tens of symbols and actions it might take a few milliseconds to generate a plan. With hundreds the solution will take seconds to minutes. Dwarves will be sitting around forever just trying to figure out how to do simple stuff.
I have once heard this phenomenon described as “just because you know how to put two sticks together doesn’t mean you know how to build a skyscraper” — your dwarf might know how to do the basic atomic actions it needs to get around the world, but actually formulating and solving the problem might still be too hard.
Debugging and tweaking is really hard
If you go down this road, you will inevitably run into logical bugs. The result is that the plans will just fail over and over again. Then you will have to go into the world of action costs, heuristics, etc. and dive deep into the planner to even slightly change the agent’s behavior. In my case I started to cheat by building “meta-actions” which were just state machines so that I could edit what the dwarf was doing at any time.
Sometimes “Creative” behavior is bad.
For one example, my initial formulation of dwarf gathering and building made it so that dwarves could take from or put into any “container”. “Containers” included the inventory of the dwarf. The natural result was that dwarves were constantly deciding to steal items from each other, sometimes tossing an item back and forth as they replanned their actions. It was creative, interesting, and a little bit funny — but ultimately not good for the game.
This led me to add another state to the system that meant you could “own” a container. This kind of bug found its way into every aspect of the game. In general, dwarves found unexpected ways of doing things which were counterproductive and made the game not fun.
These problems caused me to abandon action planning as a task execuation system in DwarfCorp and focus on something more maintainable. I would really like to go back to the idea some day because of its simplicity and beauty — but simplicity and beauty aren’t necessarily required or even desired when it comes to making games.
Third Attempt: Behavior Trees of IEnumerable
Coroutines
This leads me to the third and final system we implemented for task execution in DwarfCorp, which has served us well since about 2014. I gutted the GOAP planning system and extracted the actions, and especially “meta-actions” and cobbled them back together in the form of a Behavior Tree.
Behavior trees are a kind of restricted state machine where “States” are replaced with “Behaviors” (in DwarfCorp we call them Acts
to save on typing) that are connected in a tree structure. Each Behavior can have one parent and many children. The transitions of the FSM are abandoned in favor of implicit transitions that act on the children of a particular Behavior. This makes Behavior Trees slightly less expressive than FSMs but much easier to write and maintain.
There are special nodes in the tree called “Control Flow Nodes” which change how children are transitioned to. The simplest one is called Sequence
, which just performs each of the children in order. For example, here is a simple behavior tree consiting of a single Sequence
, where each child is tabbed underneath its parent:
Root
Sequence1
A
B
Sequence2
D
E
C
Each update, the behavior tree visits the nodes from the root to the leaves and as directed by the control flow noes. For the simple tree above, the order that each node is executed is
Root, Sequence1, A, B, Sequence2, D, E, C
Now, the interesting part comes in where nodes can have return codes. In our case, the return codes are Running
, Success
, and Failure
. The control flow nodes execute their children depending on the return code of the child. Here are some of the control flow nodes and their behavior:
Node | Visit Order | Exit on |
---|---|---|
Sequence(...) |
All children sequentially. | Any child returns Failure or all children return Success . |
Select(...) |
All children sequentially. | Any child returns Success or all children return Failure . |
Parallel(...) |
All children at the same time. | Any child returns Failure or all children return Success . |
Not(...) |
One child. | Return the opposite of the child. |
Repeat(..., Condition) |
One child. | Return Running until Condition is met, then return Success . |
Domain(..., Condition) |
One child. | Return Failure if Condition is not met, otherwise return what the child returns. |
Queue(...) |
All children in priority order. | Same as Select . |
These nodes can be mixed and matched to produce complex behavior, though one of the beautiful things about behavior trees is that they make the simple, dumb cases (which in my experience account for 99% of everything an AI has to do) very easy to implement. Let’s take the “build an item” example again and express it as a behavior tree:
// At the core, the behavior is just a sequence. Get the resources, then build the item.
// There is *A LOT* of error handling that's left out here. That will come later.
Behavior = new Sequence(
new SearchForResource(iron, stockpile, ...),
new GoTo(stockpile, ...),
new GatherResource(stockpile, iron, ...),
new GoTo(worksite, ...),
new BuildItem(trap, iron)
);
Each sub-behavior (such as GoTo
) is itself a behavior with many sub-behaviors. In this way elements can be easily composed and re-used.
Adding robustness
You might have seen that the above behavior was just a simple sequence — which means the Dwarf will have no idea how to respond if, say, SearchForResource
or GoTo
fails for whatever reason. To make behaviors robust, we abide by two rules: contingency plans and sequential composition.
Contingency Plans
All of our behaviors implement OnCanceled
and OnSuspended
methods. When behaviors get canceled or suspended for whatever reason, the Dwarf can clean up state within that behavior and return to it later.
Additonally, we wrap most behaviors with a Select
and Domain
which ends up catching the error and engaging in a recovery behavior. This is conceptually similar to try/catch
blocks within the programming language. For example, when building an item the task might end up getting cancelled. In this case the dwarf must return any held resources to the stockpile. So we wrap it like this:
// The build item behavior wrapped with contingency plans
Behavior = new Sequence(
new SearchForResource(iron, stockpile, ...),
new GoTo(stockpile, ...),
new GatherResource(stockpile, iron, ...),
// Here, the behavior `GoTo` is wrapped by a domain check that ensures
// the trap is still intended to be built. Otherwise, the Select
// falls back on a "ReturnResources" behavior.
new Select(new Sequence(new Domain(GoTo(worksite, ...), () => IsBuildOrder(trap)),
new BuildItem(trap, iron)),
new ReturnResources(iron))
);
Sequential Composition
These days, in my day job I work for a company that is famous for kicking robots that don’t fall over.
I can’t go into too many details about why these robots are so good at keeping upright, but one of the main concepts we use is something called Sequential Composition.
The main idea behind this esoteric style of programming is that all behaviors must be expressed inside a domain (similar to the “Preconditions” in GOAP or STRIPS), and that the domain should be re-checked every tick (regardless of how costly it is) and that further, the order of execution should be from the goal state to the start state (which is the opposite of what I’ve shown so far).
By doing this, you ensure that the agent is always doing something sensible, and that it is preferentially moving toward its goal. Interestingly, this concept fits in well with Behavior Trees. All we have to do to implement it is reverse the order of the list and attach meaningful domains to the behaviors:
// The build item behavior using sequential composition. It is a queue because each domain has to be evaluated every tick.
Behavior = new Queue(
// Here the queue is expressed in reverse order. We start with the goal: building the item.
// This can only happen if we have a build order, we have the resources required to build,
// and we're at the work site.
new Domain(new BuildItem(trap, iron),
() => IsBuildOrder(trap) && HasResource(iron) && IsAtLocation(worksite)),
// Otherwise, go to the worksite if we're not already there and we have the resources.
new Domain(new GoTo(worksite),
() => IsBuildOrder(trap) && HasResource(iron)),
// Otherwise, gather the resources from the stockpile if we don't already have them and are
// at the stockpile.
new Domain(new GatherResource(iron, stockpile, ...),
() => IsBuildOrder(trap) && IsAtLocation(stockpile) && !HasResource(iron)),
// Otherwise, go to the stockpile.
new Domain(new GoTo(stockpile),
() => IsBuildOrder(trap) && !HasResource(iron)),
// If all else fails, return any resources we're carrying.
new Domain(ReturnResources(iron),
() => HasResource(iron)),
new Idle()
);
Running through this behavior step by step:
- The dwarf starts faraway from the stockpile and without any resources. It has been given a build order to build an item.
- The behavior enters the first domain check — does the dwarf have resources, is it at the worksite, and is there a build order? No, this fails, so it moves to the next domain check..
- The behavior falls all the way down to the domain
IsBuildOrder(trap) && !HasResource(iron)
which evaluates to “true”. This causes the dwarf to go to the stockpile. - When the dwarf reaches the stockpile, the domain
IsBuildOrder(trap) && IsAtLocation(stockpile) && !HasResource(iron)
is active, so it gathers resources from the stockpile - After gathering resources, the domain
IsBuildOrder(trap) && HasResource(iron)
evaluates to “true”, so the dwarf goes to the worksite. - Finally the topmost domain succeeds, and the dwarf builds the trap.
Notice how the behaviors at the top of the queue have the smallest domain (that is, they have the most number of things that have to be true), while the behaviors at the bottom of the queue have the largest domain. That’s on purpose. In sequential composition, the agent moves from the largest most general domains to the smallest most specific ones — the larger more general domains “funnel” the agent into the smaller domains until finally the agent is at the goal. This kind of programming can make you go crazy. It has similar limitations to the action planning approach in that it might not always be clear what the domain of a behavior is. However it also produces very robust AI that has built in response to failure.
IEnumerable
Coroutines
In the traditional formulation of behavior trees, each leaf behavior is implemented as a class which has a single function called Tick
, which returns Success
, Failure
or Running
. That’s fine, but it restricts what behaviors can do to a single update tick. Any memory that they have must be explicitly stored in the class. As a toy example, let’s consider a behavior called JumpNTimes
which tells the dwarf to jump some number of times.
// In DwarfCorp, we call Behaviors "Acts"
class JumpNTimes : Act
{
// We have to store the memory of the Act here.
private int TimesJumped = 1;
// The number of times to jump.
private int N = 0;
public JumpNTimes(int numTimesToJump)
{
N = numTimesToJump;
}
// Called each time the Act gets visited.
public override Act.Status Tick()
{
// If we've jumped this number of times, return success.
if (TimesJumped >= N)
{
return Act.Status.Success;
}
//Otherwise jump.
Jump();
TimesJumped++;
}
}
This kind of thing can get really tedious, really fast — especially for the more complicated behaviors. For this reason, we use Coroutines instead of functions for Behaviors, making use of the C# IEnumerable
trick for coroutines. A Coroutine, unlike a function, remembers where you left off when you returned, allowing you to do loops and other complex logic without blocking the main thread — and without all the overhead and headache of true threads.
In C#, you can make a coroutine just by returning IEnumerable<Type>
instead of Type
directly, and then calling the function using foreach
. Instead of return
, you yield return
the result. Here’s the JumpNTimes
behavior, simplified by coroutines.
class JumpNTimes : Act
{
private int N = 0;
public JumpNTimes(int numTimesToJump)
{
N = numTimesToJump;
}
public override IEnumerable<Act.Status> Tick()
{
// Here's the trick. We can just use a simple for loop here instead of
// having extra members to keep track of the state in this function.
for (int i = 0; i < N; i++)
{
Jump();
yield return Act.Status.Running;
}
// We can even do stuff in this part after the dwarf jumps.
// PlayAnimation(...)
yield return Act.Status.Success;
}
}
The use of IEnumerable
for coroutines greatly simplified our codebase and allowed us to do a lot more without excessive boilerplate.
Passing Data Between Behaviors
One thing that’s noticeably missing from the previous text is the ability to compute something using one behavior, and then pass it on to another behavior. In regular programming its useful to get the results of a function and pass it into another one:
var a = A();
var b = B(a);
var c = ...
can we do a similar thing using Behavior Trees? The solution we settled on was a Blackboard
object that was shared among all behaviors. The Blackboard
is just a simple key-value storage where we can store bits of data. By convention, behaviors have their inputs and outputs marked as keys in the Blackboard
. For example, this is how we get a resource:
// Missing all the usual robustness contingencies...
new Sequence(
// Finds the given resource and puts it in the blackboard at "FoundResource"
new SearchForResource(iron, "FoundResource"),
// Plans a path to "FoundResource" and puts it in the blackboard at "PathToResource"
new PlanPathTo("FoundResource", "PathToResource"),
// Follows the path "PathToResource" from the blackboard.
new FollowPath("PathToResource"),
// Gathers "FoundResource".
new GatherResource("FoundResource"));
We of course also have several behaviors which just do book keeping on the blackboard, like clearing it or setting specific values.
Task Management
The Task Management system answers the question: “What should I be doing right now?”, it takes a list of tasks, a list of agents, and then assigns the tasks to the agents. Throughout development, the Task Management system has not changed much. We have a Task
class that represents a goal for a Dwarf to achieve, with a cost and priority.
// Represents a goal for a Dwarf to carry out, such as collecting an item or
// mining.
class Task
{
// Dwarves prefer to do tasks with higher priority over tasks with lower
// priority. Dwarves will drop their current task
// to carry out one with higher priority.
enum Priority
{
Eventually,
Low,
Normal,
High,
Urgent
}
public Priority TaskPriority {get; set;}
// Get the cost of carrying out this task for the given agent. Dwarves
// perform tasks from least to most costly within priority order.
public virtual float GetCost(AI agent)
{
//...
}
// Some agents simply cannot carry out a given task.
public virtual bool IsPossible(AI agent)
{
// ..
}
// If the task has failed, we need to know whether it is worth it to retry
// the task.
public virtual bool ShouldRetry(AI agent)
{
}
}
To make new tasks, we simply extend Task
and fill out the necessary virtual function calls. Some example tasks include DestroyBlock
, GatherItem
, KillEntity
, GuardPlace
, LookInteresting
and HealThyself
. The cost of the task is usually a heuristic like the distance of the dwarf to the item that they’re supposed to interact with, or the number of items in the Dwarf’s backpack.
Once we have a list of tasks, we can assign them to Dwarves in a way which minimizes the total cost. An optimal way of doing this is the Hungarian Algorithm. We use this when the number of tasks is small, but unfortunately since the algorithm runs in O(N^3)
time, we have to resort to a simple greedy algorithm when the number of tasks/dwarves increases beyond something like 10. The greedy algorithm simply assigns the highest priority Task to the Dwarf with the lowest cost. This has the unfortunate side effect that one Dwarf might get assigned all the tasks just because he/she is the cloest. To mitigate this effect, we multiply the cost of a Task by the number of tasks a Dwarf has already been assigned.
Dwarves individually have a Task Queue
which takes the form of a priority queue that is sorted into bins from highest to lowest priority followed by highest to lowest cost. It ends up looking something like this:
Task | Priority | Cost |
---|---|---|
Kill Goblin 92 | Urgent | 1.5 |
Kill Elf 105 | Urgent | 1.0 |
Heal Thyself | High | 3.4 |
Mine block 382 | Normal | 1.0 |
Mine block 118 | Normal | 1.8 |
Gather wood | Normal | 2.4 |
Talk with friend | Low | 0.4 |
Make crafts | Low | 4.8 |
Look interesting | Eventually | 0.0 |
Current Task: Kill Goblin 92
The dwarf prefers to do the things of highest priority first (like eliminating nearby threats), followed by things of lesser priority (like doing tasks that the player directly told them to do), and finally tasks that were generated by the Dwarf just to make things interesting (like playing animations or leisure tasks).
Whenever a task of higher priority than the currently active task gets pushed onto the queue, the Dwarf cancels the current task, puts it back onto the queue, and executes the higher priority task first. This allows the Dwarf to continue on with his/her day even when something interrupts what they’re doing (which is critical for having robust AI). We implemented this system soon after combat was introduced to the game, because Dwarves would get confused about whether they were supposed to be following the player’s orders or responding to critical threats from enemies.
The dreaded Task Loop
Another important thing to note is that Dwarves should always be doing something, even if they haven’t been explicitly assigned something to do — even if the someting they’re doing is just playing an idle animation or wandering around the rooms of the colony. Otherwise they look too robotic and static.
One of the ways this problem can manifest itself is in what we like to call the dreaded Task Loop. If the Dwarf has only one task (like mining a block), and that task repeatedly fails for whatever reason (say for example the block is unreachable, or the dwarf keeps getting interrupted by damage), we don’t want the dwarf to just be sitting there trying again and again. So each time a task fails, we increase its cost a little, and eventually knock it down a peg in priority until the Dwarf prefers not to do it anymore.
Generating Tasks
One last thing, so now we have both a way of assigning tasks and executing them. How do we generate tasks in the first place? One of the ways is obvious: with direct player control. But there are also a whole host of actions that dwarves can perform without prompting.
To do this, we give Dwarves some basic needs like hunger, sleepiness, fear, boredom and desire for social interaction. These needs build up over time until they are sufficiently large to produce a new Task
. For example, if a Dwarf gets bored, he or she will go build something for fun. If a Dwarf gets hungry, he or she will go make food.
Nice article!
As I posted on Reddit (https://www.reddit.com/r/gamedev/comments/6qb97w/developing_robust_ai_in_dwarfcorp/)…
Q: Do your BT examples imply that you rather build your BT programmatically (instead of manually in a config file or via a custom editor)?
Yes we build them programatically. I imagine we could do it with config but we find it way easier to do it in code.
Hey there, nice article!
I have a quick question on your BT implementation. How do you save the state of nodes that are implemented as a co-routine. For instance, in your example you might save the game state when j = 5; As this is a local variable you won’t have access to it from elsewhere.
I originally implemented my BT in the same way but then moved back to stateful nodes. I created simple delegate nodes like such to make it easier to code with:
var tree = Builder
.BeginTree()
.InSequence()
.Do(so => { if ( so.counter < 5) { so.counter++; jump(); } })
.EndSequence();
.EndTree();
There are three ways we store state in coroutines: using local variables, using the blackboard, and using members of the behavior class. If something must be saved, it is stored either in the blackboard or as a member of the class, both of which get serialized.
Hi Matthew,
Thanks for your reply! Unfortunately the e-mail notification when to my junk e-mail so I missed it 🙁
To be more precise on my question, If you have the following code:
public IEnumerable GetNumbers()
{
yield return 1;
yield return 2;
yield return 3;
}
Then a state machine is generated in the background for you. However you cannot access this state machine and the state variables.
Given the following scenario:
1. Start game
2. First number yielded (giving 1)
3. Game saved…
I cannot see how it’s possible to do:
4. Load game
5. Next number yielded (giving 2)
Without modifying the coroutine, at which point you may as well be implementing the state machine yourself.
So I guess when you re-initialize you are not saving the complete state of your behaviour tree, and instead re-evaluating it?
Thanks!
Thanks for this post. I’m developing my own dwarf fortress-y game and I was struggling to figure out a good way to implement behavior trees. I stumbled upon your post while googling and it helped me a lot. I have one question, for the time being.
I don’t understand the difference between your concept of a Selector node and a Queue node, as to the order of child execution. To my understanding, a selector executes all nodes from start to finish every tick, until one child returns success (or running). For the queue, you explain that “It is a queue because each domain has to be evaluated every tick”, which a selector would do as well, since each domain is a child of the selector?
Also according the table with the control nodes and their behavior, you explain that the selector visits each child sequentially, while a queue visits them in priority order. I can’t deduce the difference based on the examples provided, the queue node seem to be expected to visit all children sequentially, just as a selector would, their priority order being the order by which they were added to the queue constructor.
Thank you in advance!
I sort of figured the answer to my question myself. Apparently, I was wrong about the selectors, in that they continue execution from any child that returned running, whereas the queue always ticks its children from the beginning.
I have another question now. What is the benefit of combining the same condition with other conditions into AND expressions across multiple domains, instead of nesting smaller domains within the larger one? For example, using only one domain with the condition “isbuildorder”, and its child being a queue of lesser conditions.
Thanks again