Reinforcement Learning: Theory and Implementation in Gaming Environments • Généreux Akotenou's Blog

Reinforcement Learning (RL) has emerged as a powerful paradigm in artificial intelligence, enabling agents to learn optimal behaviors through interaction with their environment. In this blog page I present a short review of RL algorithms and their implementation in gaming environments, with particular focus on the RLs project implementation.

Literature Review

The foundation of modern RL can be traced back to the work of Sutton and Barto (2018), who formalized the Markov Decision Process (MDP) framework. In this framework, an agent interacts with an environment through states, actions, and rewards, aiming to maximize cumulative rewards over time.

RAG Architecture
Image 1: RL Approach based on environment knowledge

Model-Free Approaches

Model-free RL algorithms have gained significant attention due to their ability to learn directly from experience. Q-learning, introduced by Watkins (1989), represents a fundamental approach where the agent learns action-value functions through temporal difference learning. The Q-value update rule is given by:

[ Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a’}Q(s’,a’) - Q(s,a)] ]

where (\alpha) is the learning rate and (\gamma) is the discount factor.

Deep Q-Networks (DQN), proposed by Mnih et al. (2015), extended Q-learning by incorporating deep neural networks to approximate Q-values. Key innovations included:

Experience replay buffer
Target network
Double Q-learning (Van Hasselt et al., 2016)

Policy gradient methods, such as REINFORCE (Williams, 1992) and PPO (Schulman et al., 2017), directly optimize the policy function:

[ \nabla_\theta J(\theta) = \mathbb{E}{\pi\theta}[\nabla_\theta \log \pi_\theta(a|s)Q^\pi(s,a)] ]

Model-Based Approaches

Model-based RL methods learn a model of the environment dynamics. Dyna-Q (Sutton, 1990) combines real experience with simulated experience from the learned model. Monte Carlo Tree Search (MCTS), popularized by Silver et al. (2016) in AlphaGo, uses simulation and tree search to plan actions.

Implementation: RLs Project

The RLs project demonstrates these RL methods in practice through a modular framework designed for experimentation in gaming environments. The source code is available at: https://github.com/Genereux-akotenou/RLs

Project Structure

algorithms/: Implementations of Q-Learning, DQN, SARSA, etc.
environments/: Includes standard (FrozenLake) and custom (EscapeGame) environments.
config/: Configuration files for experiments.
prebuilt/: Stores trained models.
x-samples/: Notebooks and demos.
main.py: Launcher for experiments.

Setup Instructions

Clone and create virtual environment:

git clone https://github.com/Genereux-akotenou/RLs.git
cd RLs
python -m venv .rl_env
source .rl_env/bin/activate
pip install -r requirements.txt

Use .rl_env/bin/python as kernel in notebooks.

Running Experiments

Test a Pretrained Model

python main.py --env "FrozenLake" --algo "DQN" --mode "test"        --test_episodes 3 --verbose "1"        --model_path "prebuilt/frozenlake-v1/weights_0150.weights.h5"

Train a DQN Agent

python main.py --env "FrozenLake" --algo "DQN" --mode "train"        --output_dir "prebuilt/frozenlake-v1" --map "SFFF" "FHFH" "FFFH" "HFFG"        --batch_size 32 --n_episodes 1000 --max_steps 300 --verbose "0"

EscapeGame Environment
- Fully custom logic and graphical visualization.
- Uses chambers and rules of motion defined in escape_game.py.
- Standalone demo: Escape Game Notebook
Add New Algorithms
- Add to algorithms/ with minimal config.
- Existing examples: DQN, Q-Learning, SARSA

Sample Algorithm: DQN

class DQNAgent:
    def __init__(...):
        ...
        self.model = Sequential([
            Input(shape=(state_size,)),
            Dense(32, activation='relu'),
            Dense(action_size, activation='linear')
        ])
        self.model.compile(loss='mse', optimizer=Adam(...))

EscapeGame Environment Highlights

Custom graph-like chamber connections.
Sparse reward and deterministic transitions.
Visualized using matplotlib.
Q-table convergence shown in standalone notebook.

Results and Observations

Q-Learning:
- Efficient for simple, discrete environments.
- Struggles in high-dimensional or sparse-reward settings.
DQN:
- Leverages neural networks for state-action approximation.
- Benefits from replay buffers and target networks.
EscapeGame:
- Designed to test logical reasoning.
- Agent successfully learns to escape after few episodes.

Future Enhancements

Add multi-agent support.
Extend to continuous state/action spaces.
Benchmark multiple algorithms on the same environment.
Enable distributed training and visualization logs.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
Mnih, V. et al. (2015). Human-level control through deep reinforcement learning. Nature.
Van Hasselt, H. et al. (2016). Deep Reinforcement Learning with Double Q-learning. AAAI.
Silver, D. et al. (2016). Mastering the game of Go. Nature.
Schulman, J. et al. (2017). Proximal Policy Optimization. arXiv preprint.

The RLs project is available at: https://github.com/Genereux-akotenou/RLs