Genetic Programming: Inverted Pendulum
 


 

Matthew Frederick
matthew@mich.distance.net

Inverted Pendulum

The goal of the inverted pendulum problem is to evolve a controller that can keep the pendulum above the horizontal plane, while at the same time not going off the end of the track.  At every time step, the controller has three options at its disposal: push right, push left, do nothing.  The magnitude of the force that the controller can apply is constant.  Each time the simulation starts, the pole is initially skewed.  This way, the controller is force to learned immediately  how to deal with the effects of gravity pulling the pendulum down.

Once a successful controller has been evolved, it can be tested for robustness.  Clicking on the display with the left or right mouse button will add about 5 degrees to the angle of the pole in that direction.

Expressions

Each expression can be composed basic math operations, constants, conditionals, and some environmental variables. Add, subtract, multiply, divide, cosine, sine, inverse cosine, and inverse sine are available to expressions, as well as a branching conditional "if less than zero branch one way else branch the other way". Constants range in value between -100.0 and 100.0.  The following environmental variables are available to expressions:
 
cartVelocity the horizontal velocity of the cart
poleAngVelocity the angular velocity of the inverted pendulum
cartDistance the distance of the cart from the center of the track
poleAngle the angle the pole makes with the vertical axis

The simulation is composed of a series of time steps that represent about one-twentieth of a second in real time. The output of each controller tells the simulator how to push the cart at each time step.  A value of 100.0 or more means apply a force to the right.  A value at or below -100.0 means apply a force to the left.  Any other value tells the simulator to leave the cart alone until the next time step.  The dynamics equations for simulating an inverted pendulum can be found in David Foley's book, Evolutionary Computation.

Criterion for evaluating the controller

Each expression is given 3 chances to successfully balance the inverted pendulum.  Each time, the pole starts at a slightly different angle.  Since a controller cannot be allowed to run forever, each controller has to balance for 800 time steps.  After 800 time steps have passed, the cart crashed, or the pole fell, the expression is evaluated using such factors as:

Based on these criterion, points are given to each expression.  The higher the number, the better chance the expression will be bred in later generations.

Results

The outcome of this experiment was good.  A suitable controller that can balance the inverted pendulum on into infinity can usually be generated in under 200 generations.

References

Fogel, David B. (1995). Evolutionary Computation. Piscataway, NJ: IEEE Press.