Genetic Programming: Cart Centering
 


 

Matthew Frederick
matthew@mich.distance.net

Cart Centering

The goal of the cart centering problem is to evolve a controller that can stop a cart in the exact center of the track before time runs out.  At every time step, the controller has two options at its disposal: push right or push left.  The track is frictionless, so nothing will slow down the cart but an opposite force.  Unlike the controller in the inverted pendulum problem, this controller can't choose to do nothing.  The magnitude of the force that the controller can apply is constant.  Each time the simulation starts, the cart is given a random initial position and velocity.  This way, the controller that develops will be more robust.
 
Since the computer simulation is composed of discrete time slices, it would be impossible for the controller to center the cart over the line at the same time its velocity is 0.0.  The simulator considers anything less than 0.05 as 0.0 when evaluating an expression.  On the screen, there are two relevant numbers in green giving the cart position and cart velocity.  When the cart is within the range of the simulators approximate 0.0, these numbers will turn blue.  This is simply a way for users to see how close a controller is to being successful.

This problem was adapted for a problem proposed by John Koza (1992).

Expressions

Each expression can be composed basic math operations, constants, conditionals, and some environmental variables. Add, subtract, multiply, divide, absolute value, cosine, sine, inverse cosine, and inverse sine are available to expressions, as well as a branching conditional "if less than zero branch one way else branch the other way". Constants range in value between -10.0 and 10.0.  The following environmental variables are available to expressions:
 
cartVelocity the horizontal velocity of the cart
cartDistance the distance of the cart from the center of the track

The simulation is composed of a series of time steps that represent about one-twentieth of a second in real time. The output of each controller tells the simulator how to push the cart at each time step.  A value of 0.0 or more means apply a force to the right.  Otherwise, the simulator will apply a foce to the left.

Criterion for evaluating the controller

Each expression is given 3 chances to successfully center the cart.  Each time, the pole starts at a slightly different position and velocity.  Since a controller cannot be allowed to run forever, each controller has to balance for 350 time steps.  After the controller is successful, or after 350 time steps have passed, the expression is evaluated using such factors as:

Based on these criterion, points are given to each expression.  The higher the number, the better chance the expression will be bred in later generations.

Results

The outcome of this experiment was good.  A suitable controller that can center the cart can usually be found.

References

Koza, J. R. (1992). Generic Programming: On the programming of computers by eans of natural selection. Cambridge, MA: MIT Press.