Skip to content

Deep Reinforcement Learning trading agent based on Custom Transformer architecture and Ray RLlib. Features vectorized feature engineering and conservative backtesting environment.

Notifications You must be signed in to change notification settings

nssanta/Ray-PPO-Transformer-Trader

Repository files navigation

RL-Transformer-Trading-Bot

Python 3.10+ Ray RLlib PyTorch License: MIT

Русская версия | English

🧠 Abstract

This project implements a high-performance algorithmic trading agent for Bitcoin futures, leveraging Deep Reinforcement Learning (PPO) powered by a custom Transformer architecture. Unlike standard implementations that rely on LSTM or MLP networks, this system utilizes a sophisticated Self-Attention mechanism to capture long-range temporal dependencies and complex market patterns from a lookback window of 336 candles. The agent operates within a realistic simulation environment featuring conservative risk management and high leverage (30x), demonstrating robust performance with a +57% return on validation data.

🛠️ The Architecture

The core model abandons off-the-shelf components in favor of a domain-specific architecture tailored for financial time series:

  • Input Data: A comprehensive lookback window of 336 30-minute candles, providing deep historical context.
  • Feature Fusion:
    • Time Encoding: Dedicated embedding layers for temporal features to capture cyclical patterns (hours, days).
    • Candlestick & Indicator Encoding: Separate encoders process raw OHLCV data and technical indicators (MACD, RSI, etc.), preserving their distinct statistical properties.
    • Account State Encoding: Real-time portfolio metrics (balance, margin, PnL) are injected directly into the attention mechanism.
  • Core: A Custom Transformer Encoder featuring Multi-Head Attention and Residual Connections. This allows the model to dynamically weigh the importance of different historical events regardless of their distance in time.
  • Action Space: A discrete action space (Long, Short, Hold, Close) optimized for trend-following and swing trading strategies.

🔬 Development Journey

The development of this agent followed a rigorous research-driven methodology, acting as a proof-of-concept for RL efficacy in high-friction markets:

  1. Hypothesis: The primary hypothesis was that the Attention mechanism offers superior capability in identifying non-linear market regimes compared to recurrent architectures. Crucially, the experiment aimed to prove that an RL agent can extract profitable strategies using only simple market data (OHLCV + standard indicators) without external datasets, effectively solving the market using pure Price Action.
  2. Environment Design: A custom FuturesTradingEnv was engineered to mimic real-world friction. It implements High Leverage (30x) and Conservative Liquidation Logic, triggering liquidations based on the Low (for Longs) or High (for Shorts) rather than the Close price. This pessimistic approach creates an extremely hostile environment, effectively stress-testing the agent's ability to survive and profit under harsh conditions.
  3. Training Process: The training pipeline utilizes Ray RLlib for distributed PPO optimization. A custom SafeAdamPPOLearner with a manual Learning Rate Schedule was implemented to stabilize convergence and prevent catastrophic forgetting during late-stage training.
  4. Results: The agent evolved from random behavior to a disciplined trader, achieving a +57% return on unseen validation data, validating the Transformer's ability to generalize even with a 30x leverage constraint.

📊 Results & Metrics

Key performance indicators extracted from the final validation run:

  • Initial Balance: 1000 USDT
  • Final Balance: 1570.50 USDT (+57.05%)
  • Max Drawdown: Maintained within acceptable risk limits through dynamic position management.
  • Volatility: The agent successfully navigated high-volatility periods without triggering ruin thresholds.

📸 Visuals

Training Log Training progress and metrics

Validation Graph Cumulative return on validation set

Trading Logs Detailed trade execution logs

💻 Tech Stack

  • Language: Python 3.10
  • ML Core: PyTorch, Ray (RLlib), Gymnasium
  • Data Processing: Pandas, NumPy (Fully vectorized feature engineering for zero look-ahead bias)

🚀 How to Run

  1. Install Requirements:
    pip install numpy pandas mplfinance gymnasium "ray[rllib]" torch
  2. Run the Notebook: Launch Jupyter Notebook and execute RL-Transformer-Trading-Bot.ipynb. The notebook contains the full pipeline: Data Loading -> Training -> Backtesting.

Made for research purposes in algorithmic trading.

About

Deep Reinforcement Learning trading agent based on Custom Transformer architecture and Ray RLlib. Features vectorized feature engineering and conservative backtesting environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published