Evolving robust and specialized car racing skills



between neutral and forward drive is also used a way of
keeping a certain speed; an approach many engineers would
use when designing a controller for a vehicle that can only
be controlled with discrete inputs. Doing so is however
practically impossible for a human driver, and analysis of
action traces for human drivers shows much fewer changes
to the commands given; as an example, one of the authors
changes drive command only four times per lap when racing
on track 3.

VIII. Conclusions and future work

We believe that the results presented above answer, at least
in part, the several questions posed in section I-B. Different
tracks can indeed be constructed that have various difficulty
levels, in terms of the probability that an evolutionary run
starting from scratch will produce a proficient controller
within a given number of generations, and the mean fitness of
evolved controllers. Difficulty levels range from very easy,
where the evolutionary algorithm almost always succeeds,
to very hard, for which no successful controllers have been
found, and agree with intuitive human difficulty ratings of
the same tracks. These skills are, however, not transferable:
a controller evolved from scratch to perform well on a
given track usually performs very poorly on all other tracks.
Evolving sensor parameters along with network weights
makes for fewer proficient controllers (probably because of
more local optima), a results which is not inconsistent with
the good controllers that do emerge being slightly superior
to fixed-sensor ones, as found in [4].

As for the question on whether we can automatically
create controllers with driving skills so general that they can
proficiently race all tracks in our training set, this can be done
by using incremental evolution, going from simpler to more
complex tracks, with sensor mutation turned off. Attempts
to evolve general controllers with sensor mutation turned
on failed, as did attempts to evolve controllers on all tracks
simultaneously. Once a general controller has been created,
its fitness can be increased through continued evolution with
sensor mutation turned on. Specialized controllers can be
created by further evolving a general controller, using only
one track in the fitness function. These specialized controllers
invariably have very high fitness. Much to our surprise, this
was true even for one hard track which the general controller
had not been evolved on and which it had very low fitness
on, and for which we have not been able to evolve proficient
controllers from scratch. Apparently, the general controller
is somehow closer in search space to a proficient controller
for that track, even though it has no proficiency itself on that
track. Exactly how this works remains to be found out.

We hope that our results are relevant to both game AI
development, as it suggests a way of using already evolved
solutions as bases for further evolution to quickly and reliably
produce specialized solutions for particular tasks, and to
evolutionary robotics, as it goes some way to demonstrate
the scalability and generality of car racing as an ER testbed.

Currently, our efforts are focused on extending the model
to allow competitive coevolution between several vehicles on
the same track. Further, we are planning to use this model to
investigate controller architectures permitting internal state,
such as plastic networks[15] or recurrent networks. We are
also planning to compare evolutionary learning to other
forms of reinforcement learning, such as TD-learning, which
has been shown to be considerably faster in some domains.

However, to be able to handle really complex environ-
ment, the controller will need high-bandwidth sensor data
of some kind, without which complex object recognition
and resolution of perceptual aliasing is impossible. We are
therefore working towards incorporating visual or visual-
like input, using either the current 2D simulation, or some
other 3D-enabled simulator. We have previously tried the
“naive” approach of connecting high-bandwidth (on the order
10,000 inputs) 2D vision directly to evolvable single- or
multi-layer perceptrons, but only had limited success. An
integral part of the ongoing project is therefore to develop
a modular architecture which can reuse weights so as to
reduce the dimensionality of space in which to search for
such controllers.

References

[1] DARPA, “Grand challenge web site,” http://www.grandchallenge.org/,
2005.

[2] S. Nolfi and D. Floreano, Evolutionary robotics. Cambridge, MA:
MIT Press, 2000.

[3] B. F. Skinner, Science and Human Behavior. The Free Press, 1953,
ch. Behaviorism.

[4] J. Togelius and S. M. Lucas, “Evolving controllers for simulated car
racing,” in
Proceedings of the Congress on Evolutionary Computation,
2005, pp. 1906-1913.

[5] K. O. Stanley, N. Kohl, R. Sherony, and R. Miikkulainen, “Neuroevo-
lution of an automobile crash warning system,” in
Proceedings of the
Genetic and Evolutionary Computation Conference (GECCO-2005)
,
2005.

[6] “Rars (robot auto racing simulator) homepage,”
http://rars.sourceforge.net/, 1995.

[7] D. Floreano, T. Kato, D. Marocco, and E. Sauser, “Coevolution of
active vision and feature selection,”
Biological Cybernetics, vol. 90,
pp. 218-228, 2004.

[8] M. Hewat, “Carworld driving simulator,”
http://carworld.sourceforge.net/, 2000.

[9] I. Tanev, M. Joachimczak, H. Hemmi, and K. Shimohara, “Evolution
of the driving styles of anticipatory agent remotely operating a scaled
model of racing car,” in
Proceedings of the 2005 IEEE Congress on
Evolutionary Computation (CEC-2005)
, 2005, pp. 1891-1898.

[10] K. Wloch and P. J. Bentley, “Optimising the performance of a
formula one car using a genetic algorithm,” in
Proceedings of Eighth
International Conference on Parallel Problem Solving From Nature
,
2004, pp. 702-711.

[11] D. A. Pomerleau, “Neural network vision for robot driving,” in The
Handbook of Brain Theory and Neural Networks
, 1995.

[12] “Forza motorsport drivatars,” http://research.microsoft.com/mlp/forza/,
2005.

[13] D. M. Bourg, Physics for Game Developers. O’Reilly, 2002.

[14] M. Monster, “Car physics for games,” http://home.planet.nl/ mon-
strous/tutcar.html, 2003.

[15] D. Floreano and F. Mondada, “Evolution of plastic neurocontrollers for
situated agents,” in
From Animals to Animats IV: Proceedings of the
Fourth International Conference on Simulation of Adaptive Behavior
,
P. Maes, M. Mataric, J. Meyer, J. Pollack, H. Roitblat, and S. Wilson,
Eds. Cambridge, MA: MIT Press-Bradford Books, 1996.



More intriguing information

1. Non Linear Contracting and Endogenous Buyer Power between Manufacturers and Retailers: Empirical Evidence on Food Retailing in France
2. Types of Cost in Inductive Concept Learning
3. The Tangible Contribution of R&D Spending Foreign-Owned Plants to a Host Region: a Plant Level Study of the Irish Manufacturing Sector (1980-1996)
4. References
5. IMPACTS OF EPA DAIRY WASTE REGULATIONS ON FARM PROFITABILITY
6. The name is absent
7. The name is absent
8. New Evidence on the Puzzles. Results from Agnostic Identification on Monetary Policy and Exchange Rates.
9. Social Cohesion as a Real-life Phenomenon: Exploring the Validity of the Universalist and Particularist Perspectives
10. Confusion and Reinforcement Learning in Experimental Public Goods Games
11. Analyse des verbraucherorientierten Qualitätsurteils mittels assoziativer Verfahren am Beispiel von Schweinefleisch und Kartoffeln
12. The name is absent
13. The name is absent
14. The name is absent
15. The name is absent
16. Solidaristic Wage Bargaining
17. How do investors' expectations drive asset prices?
18. The name is absent
19. NVESTIGATING LEXICAL ACQUISITION PATTERNS: CONTEXT AND COGNITION
20. The name is absent