mmiles | Projects

In this project I implement an evolutionary model, built using genetic algorithms, that plays my first Python project. If you haven't seen it yet, you can play it here.

It uses a system that "evolves" its way of playing, inspired by how nature improves species over generations. At the beginning, the program creates a bunch of players with no knowledge of the environment, who are rewarded for achieving high scores and told they can move up or down.

Initially, they simply press these keys randomly, until, within that chaos, one manages to dodge the first wall. This allows it to score higher than the rest, and that small "success" is passed on to the next generation. The cycle repeats over and over. As the process converges, they begin to discover that dodging walls and collecting stars leads to higher scores.

Below you can watch in real-time how this genetic algorithm is trained. Each generation starts with 20 individuals; when all die, a new generation is created by combining the best genes from the previous one along with some random mutations, and the process starts again. You can try playing against the bot—the longer you let it train, the better it will be.

After the simulator, I leave the more technical details for those who are interested.

AI Live TrainingSelect Training Difficulty:

Training Paused

LIVE
Gen: 1
Alive: 0
Best Score: 0
3
Use 'W' and 'S' to move

Let the AI train for a bit first!

Genetic Algorithms

Genetic algorithms are artificial intelligence techniques based on the principles of biological evolution. They work through iterative selection of candidate solutions, favoring those with better performance.

The process consists of the following stages:

Initial population: Multiple individuals are generated with random parameters (neural networks with randomly initialized weights and connections)
Evaluation: Each individual executes the task and accumulates a score (fitness) based on its performance
Selection: Individuals with higher fitness are selected as parents
Reproduction and mutation: New individuals are created by combining characteristics from the parents and applying random modifications
Iteration: The cycle repeats, progressively improving the population's performance

This approach allows finding complex solutions without the need to explicitly program the desired behavior. Unlike traditional supervised learning, where a labeled dataset is required, genetic algorithms discover solutions through heuristic search in the space of possible configurations.

NEAT: NeuroEvolution of Augmenting Topologies

NEAT (NeuroEvolution of Augmenting Topologies) is a specialized genetic algorithm that simultaneously evolves both the weights and the structure of neural networks. Unlike conventional methods where the architecture is fixed, NEAT starts from minimal networks that become more complex according to the problem's needs.

Main characteristics of NEAT:

Incremental complexification: Starts with networks without hidden layers (minimal topology), adding neurons and connections through structural mutations. This approach solves the evolutionary bootstrapping problem, where initially complex structures hinder convergence
Speciation: Groups similar individuals into species using a compatibility coefficient that measures topological and weight differences. This protects structural innovations that may initially reduce fitness, allowing their refinement before competing with established solutions
Historical innovation markers: Uses unique identifiers (innovation numbers) to perform coherent genetic crossovers between different topologies, solving the alignment problem in crossover of networks with different structures

This approach allows the network's complexity to emerge naturally during training, avoiding both the overfitting of excessively complex architectures and the underfitting of overly simple structures.

Implemented Genetic Operators

The algorithm applies two main types of mutations:

Weight mutations (80% probability):

Gaussian perturbation of existing connections (±0.5 of current value)
Complete weight replacement (10% of mutations)
Allowed range: [-30, 30] to avoid extreme values that destabilize learning

Structural mutations:

Connection addition (50% probability): Creates links between previously disconnected neurons
Node addition (20% probability): Splits an existing connection by inserting an intermediate neuron
Component deletion (50% and 20% respectively): Keeps the population diverse without unnecessary complexity

The configuration uses the tanh activation function, which transforms input values to a bounded range between -1 and 1. This introduces the non-linearity necessary for the network to adjust its behavior flexibly.

Speciation and Diversity

The algorithm maintains compatibility threshold = 3.0, where the genetic distance between two individuals is calculated as:

δ = c₁·(E/N) + c₂·W̄

Where:

E: Disjoint/excess genes (structural differences)
W̄: Average weight difference in matching genes
c₁ = 1.0, c₂ = 0.5: Coefficients that prioritize structural differences over weight variations
N: Normalization by the larger genome size

Species with stagnation > 20 generations (without fitness improvement) are eliminated, except for the 2 best species (species elitism). This balances exploration of new solutions and exploitation of promising strategies.

Project Implementation

The game consists of an endless runner where a spaceship must dodge procedural obstacles (upper and lower columns) while collecting power-ups. The objective is to train an agent using NEAT that learns to maximize survival.

Input/Output Architecture

The neural network processes 5 normalized input variables:

Rocket's vertical position (0-300 pixels): Agent's current state
Horizontal distance to next obstacle (0-1024 pixels): Critical temporal information
Obstacle's vertical position (-200 or 100 pixels): Discriminates between upper/lower threat
Horizontal distance to next power-up (0-3072 pixels): Reward opportunity
Power-up's vertical position (50-250 pixels): Spatial location of the secondary objective

Generates 2 outputs that determine the action through threshold activations (>0.5):

Output 0: Upward vertical displacement (10 pixels/frame)
Output 1: Downward vertical displacement (10 pixels/frame)
Both <0.5: Maintain position (updates internal timers)

This representation faces the temporal credit assignment problem: how to attribute success/failure to actions taken several frames before the event? NEAT solves this implicitly through cumulative fitness.

Multi-Objective Fitness Function

The evaluation combines three components that reflect different aspects of optimal behavior:

Base survival (0.1 points/frame):

Rewards exploration and risk avoidance
At 30 FPS, equals 3 points/second of survival
Implicitly penalizes passive strategies through termination upon collision

Power-up collection (10 points/unit):

Incentivizes active searching behavior
Requires balance between survival and risk-taking
Power-ups appear with 30% probability after each obstacle

Obstacle clearance (1 point/unit):

Temporal progress metric
Obstacles appear approximately every 400 pixels
Correlates directly with survival time

The trade-off between these objectives emerges naturally: maximizing power-ups implies risky maneuvers, while avoiding obstacles favors conservative trajectories.

Difficulty Scaling

The system implements three modes that modify the scenario scroll speed:

Easy (15 px/frame): Reaction window of ~40 frames for an obstacle at typical distance
Normal (25 px/frame): Window of ~24 frames, requires anticipation
Hard (35 px/frame): Window of ~17 frames, demands near-instantaneous responses

This variation fundamentally alters the fitness landscape: higher speeds penalize reactive strategies and favor predictive policies based on distance information. The difficulty increase is not linear due to the game's physics (acceleration limits, gap sizes, etc.).

Training Strategies

Each generation evaluates 20 genomes in parallel for a maximum of 50 generations, with:

Elitism = 2: The two best individuals from each species pass directly to the next generation without mutation
Survival threshold = 20%: Only the top quintile of each species reproduces
Fitness limit = 1000 points: Early termination criterion (approx. 300 seconds of perfect survival)

Training typically converges in no more than 10 generations.

First Project Part 2

AI Live Training

Training Paused