RL-Transformer-Trading-Bot

🧠 Abstract

This project implements a high-performance algorithmic trading agent for Bitcoin futures, leveraging Deep Reinforcement Learning (PPO) powered by a custom Transformer architecture. Unlike standard implementations that rely on LSTM or MLP networks, this system utilizes a sophisticated Self-Attention mechanism to capture long-range temporal dependencies and complex market patterns from a lookback window of 336 candles. The agent operates within a realistic simulation environment featuring conservative risk management and high leverage (30x), demonstrating robust performance with a +57% return on validation data.

🛠️ The Architecture

The core model abandons off-the-shelf components in favor of a domain-specific architecture tailored for financial time series:

Input Data: A comprehensive lookback window of 336 30-minute candles, providing deep historical context.
Feature Fusion:
- Time Encoding: Dedicated embedding layers for temporal features to capture cyclical patterns (hours, days).
- Candlestick & Indicator Encoding: Separate encoders process raw OHLCV data and technical indicators (MACD, RSI, etc.), preserving their distinct statistical properties.
- Account State Encoding: Real-time portfolio metrics (balance, margin, PnL) are injected directly into the attention mechanism.
Core: A Custom Transformer Encoder featuring Multi-Head Attention and Residual Connections. This allows the model to dynamically weigh the importance of different historical events regardless of their distance in time.
Action Space: A discrete action space (Long, Short, Hold, Close) optimized for trend-following and swing trading strategies.

🔬 Development Journey

The development of this agent followed a rigorous research-driven methodology, acting as a proof-of-concept for RL efficacy in high-friction markets:

Hypothesis: The primary hypothesis was that the Attention mechanism offers superior capability in identifying non-linear market regimes compared to recurrent architectures. Crucially, the experiment aimed to prove that an RL agent can extract profitable strategies using only simple market data (OHLCV + standard indicators) without external datasets, effectively solving the market using pure Price Action.
Environment Design: A custom FuturesTradingEnv was engineered to mimic real-world friction. It implements High Leverage (30x) and Conservative Liquidation Logic, triggering liquidations based on the Low (for Longs) or High (for Shorts) rather than the Close price. This pessimistic approach creates an extremely hostile environment, effectively stress-testing the agent's ability to survive and profit under harsh conditions.
Training Process: The training pipeline utilizes Ray RLlib for distributed PPO optimization. A custom SafeAdamPPOLearner with a manual Learning Rate Schedule was implemented to stabilize convergence and prevent catastrophic forgetting during late-stage training.
Results: The agent evolved from random behavior to a disciplined trader, achieving a +57% return on unseen validation data, validating the Transformer's ability to generalize even with a 30x leverage constraint.

📊 Results & Metrics

Key performance indicators extracted from the final validation run:

Initial Balance: 1000 USDT
Final Balance: 1570.50 USDT (+57.05%)
Max Drawdown: Maintained within acceptable risk limits through dynamic position management.
Volatility: The agent successfully navigated high-volatility periods without triggering ruin thresholds.

📸 Visuals

Training progress and metrics

Cumulative return on validation set

Detailed trade execution logs

💻 Tech Stack

Language: Python 3.10
ML Core: PyTorch, Ray (RLlib), Gymnasium
Data Processing: Pandas, NumPy (Fully vectorized feature engineering for zero look-ahead bias)

🚀 How to Run

Install Requirements:

pip install numpy pandas mplfinance gymnasium "ray[rllib]" torch

Run the Notebook: Launch Jupyter Notebook and execute RL-Transformer-Trading-Bot.ipynb. The notebook contains the full pipeline: Data Loading -> Training -> Backtesting.

Made for research purposes in algorithmic trading.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
README_ru.md		README_ru.md
RL-Transformer-Trading-Bot.ipynb		RL-Transformer-Trading-Bot.ipynb
screen_1.jpg		screen_1.jpg
screen_2.jpg		screen_2.jpg
screen_3.jpg		screen_3.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-Transformer-Trading-Bot

🧠 Abstract

🛠️ The Architecture

🔬 Development Journey

📊 Results & Metrics

📸 Visuals

💻 Tech Stack

🚀 How to Run

About

Uh oh!

Releases

Packages

Languages

nssanta/Ray-PPO-Transformer-Trader

Folders and files

Latest commit

History

Repository files navigation

RL-Transformer-Trading-Bot

🧠 Abstract

🛠️ The Architecture

🔬 Development Journey

📊 Results & Metrics

📸 Visuals

💻 Tech Stack

🚀 How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages