Matthew Frederick
matthew@mich.distance.net
Inverted Pendulum
The goal of the inverted pendulum problem is to evolve a controller that can keep the pendulum above the horizontal plane, while at the same time not going off the end of the track. At every time step, the controller has three options at its disposal: push right, push left, do nothing. The magnitude of the force that the controller can apply is constant. Each time the simulation starts, the pole is initially skewed. This way, the controller is force to learned immediately how to deal with the effects of gravity pulling the pendulum down.
Once a successful controller has been evolved, it can be tested for robustness. Clicking on the display with the left or right mouse button will add about 5 degrees to the angle of the pole in that direction.
Expressions
Each expression can be composed basic math operations,
constants, conditionals, and some environmental variables. Add, subtract,
multiply, divide, cosine, sine, inverse cosine, and inverse sine are available
to expressions, as well as a branching conditional "if less than zero branch
one way else branch the other way". Constants range in value between
-100.0 and 100.0. The following environmental variables
are available to expressions:
cartVelocity | the horizontal velocity of the cart |
poleAngVelocity | the angular velocity of the inverted pendulum |
cartDistance | the distance of the cart from the center of the track |
poleAngle | the angle the pole makes with the vertical axis |
The simulation is composed of a series of time steps that represent about one-twentieth of a second in real time. The output of each controller tells the simulator how to push the cart at each time step. A value of 100.0 or more means apply a force to the right. A value at or below -100.0 means apply a force to the left. Any other value tells the simulator to leave the cart alone until the next time step. The dynamics equations for simulating an inverted pendulum can be found in David Foley's book, Evolutionary Computation.
Criterion for evaluating the controller
Each expression is given 3 chances to successfully balance the inverted pendulum. Each time, the pole starts at a slightly different angle. Since a controller cannot be allowed to run forever, each controller has to balance for 800 time steps. After 800 time steps have passed, the cart crashed, or the pole fell, the expression is evaluated using such factors as:
Results
The outcome of this experiment was good. A suitable controller that can balance the inverted pendulum on into infinity can usually be generated in under 200 generations.
References
Fogel, David B. (1995). Evolutionary Computation. Piscataway, NJ: IEEE Press.