Checkout our manuscript for this repository
Alpha Tank is a multi-agent tank battle game built with Pygame and designed for Reinforcement Learning (RL) training. We want to create a fully customizable RL pipeline (from environment to learning algorithms) as a demonstration of showcasing how RL may learn from their opponents (depends on who, maybe another RL agent (i.e. PPO, SAC) or an intelligent bot (i.e. BFS bot, A* bot)) and use their charcteristics, along with the environement setup, to fight againts them and optimzie the reward.
We support both wandb loggings as well as saving agent_dict with agent_parameters both in the .pt file (this base config file need to remain the same for training and inference). Checkout real time training on this wandb report.
conda create -n alpha_tank python=3.9
conda activate alpha_tankpip install -r requirements.txt| Player | Movement | Shoot | Reset Game |
|---|---|---|---|
| Player 1 | WASD |
F |
R |
| Player 2 | Arrow Keys |
Space |
R |
- Bullets will bounce off walls
- Press
Rto reset the game. - Press
Vto enable/disable visualizing the tank aiming direction. - Press
Tto enable/disable visualizing the bullet trajectory. - Press
Bto enable/disable visualizing the BFS shortest path.
The complete documentation of the environment is in here
We support many many different modes, to avoid confusion, we will be going over them one by one, the general structure goes like the following:
algorithms
├── bot_mode
│ ├── single_mode
│ ├── cycle_learning
│ ├── team_mode
├── agent_mode
│ ├── single_mode
│ ├── team_mode
Notice that this is not specifically how the code are structured but rather a conceptual framework of our system:
Algorithms: include two popular RL algorithms: PPO & SAC, this is the main algorithm for training the learning agent, we will explain more later.Bot mode: include many different types of human heuritsic bots.- Supports cycle training + curriculum learning, but only for single agent-to-bot mode.
- Supports team playing against team of agents, team is fully customziable with mixes between agents, bots, and human players.
Agent mode: include different algorithm fighting againts each other.- Supports team playing against team of agents, team is fully customziable with mixes between agents, bots, and human players.
We try to keep our codebase as modularzie and conatined as possible, so we have seperated out the base team playing environment and single agent-to-agent or agent-to-bot environment while maintaining a coherent API call. Similarly, we have seperated out the main learning agent training/inference loop for clearness for now, we will make abstract classes for agent similar to how we have done with the bot later on.
python play_env.py --mode play
python play_env.py --mode team
python play_env.py --mode botWe support a variety of "intelligent" (manual crafted strategy) bot/exper using our very own bot factory to train our learning agent, run the following to see bots fighting aginst each other (choose from smart, random, aggressive, defensive, dodge), the complete documentation of the environment is in here.
python bot_arena.py --bot1 defensive --bot2 dodgeFor all training, the varaiable TERMINATE_TIME in this config file is very critical as it detrenmines whether or not the agent will have infinite life during training, promoting chasing and attacking actions, this flag should be set to None during ineference at all time.
When training, choose bot type from smart, random, aggressive, defensive, dodge. All the basic environmental configs are taken in as dictionary specified in this config file.
python train_ppo_bot.py --bot-type smart
python train_ppo_cycle.py
python train_ppo_ppo.py*Team playing is also a form of cycle learning as well, just happening all at once.
All the specific config_var are taken in as dictionary specified in this config file, notice that this experiment_name should be consitent for training and inference.
python train_multi_ppo.py --experiment_name 2a_vs_2b --config_var team_vs_bot_configsWhen inference with single agent-to-bot setting, you can choose bot type from smart, random, aggressive, defensive, dodge.
python inference.py --mode bot --bot-type smart --weakness 0.1 --algorithm ppo
python inference.py --mode agent --algorithm ppo Similar with team training, all the configs are taken in as dictionary specified in this config file. Notice that this experiment_name should be consitent for training and inference. There is no need to pass in the config_var as everything will be saved in teh checkpointing system.
python inference_multi.py --experiment_name 2a_vs_2bWe have provided a list of checkpoints that we have ran to the users for baseline comparisons. Run our trained single agent-to-bot model by the following. The demos will be defaulted to running the following, you can change the bot-type args or experiment_name to change demo check points to load.
Single agent checkpoints:
- Single: ppo v.s. aggersive bot (octagon)
- Single: ppo v.s. smart bot (non-octagon)
Team players checkpoints:
- Team: 2A ppo v.s. 2A ppo (octagon)
- Team: 2A ppo v.s. 2B smart (non-octagon)
- Team: 2A ppo v.s. 3B smarts + defensive (octagon)
- Team: 2A ppo v.s. 4B smarts + defensive (non-octagon)
- Team: 2A ppo v.s. 1B defensive (octagon)
For team playing mode, we support using a joy_stick_controller to play against the agent, you will be replacing one of the bot, only supported for demo mode.
python inference.py --mode bot --bot-type aggressive --algorithm ppo --demo True
python inference_multi.py --experiment_name 2a_vs_3b --demo True --joy_stick_controller TrueFor 2a_vs_2b, 2a_vs_4b, and ppo_vs_smart, octagon setting needed to be turned off.
