Q-learning and its application to Forex

An Application of Deep Reinforcement Learning to Algorithmic Trading

Oct 15, 2024

AQUACITY on X: "Just like the 'This is Fine' dog meme, AI trading bots remain cool during stock market crashes. 🔥💻 Despite the chaos, it's programmed to stay calm and carry on

During the last couple of weeks, I’ve been writing about the power of algorithms for forex trading. Forex is no more than a hobby for me, something that excites me when I get it written, and where I try to learn when I get it wrong. Like when this happened a couple of weeks ago I assume the USD/MXN would go up due to the change of administration. I got it wrong, but I learned two things. First, there is a whole set of parameters that impact the ratio between one currency and another. Second, there is no way to keep track a have a lean methodology for trading. I think is more like when you are building software. You have a stack, a set of hypotheses, and milestones to achieve and you work with that. I like forex because in the end the incentives and disincentives are right there is no rocket science you just need to figure out how to follow them and how to weigh them a perfect job for Machine Learning.

This week I read a great paper about algorithmic trading and Q-learning.

The act of taking action and claiming a reward where the outcome is desirable is a Pavlovian method of training that reinforces that action. Reinforcement Learning (RL), refers to such a process applied through machine learning, where an agent learns actions in an environment to maximize its value. The agent learns from the outcomes of its actions, without being explicitly programmed with task-specific rules. What happens when we give these algorithms the ability to learn from experience? This is where reinforcement learning (RL) comes into play, and more specifically, Deep Q-Learning

In this article I try to make more relatable how Deep Q-Learning can be applied to algorithmic trading, focusing on a step-by-step use case inspired by the research paper on Trading Deep Q-Network (TDQN).

What is Deep Q-learning?

Deep Q-learning is a type of reinforcement learning where an agent (in our case, a trading algorithm) interacts with an environment (stock market) to maximize cumulative rewards (profits). The agent takes actions (buy, sell, hold) and receives feedback (profits or losses) to improve future decisions.

The “Q” in Q-Learning stands for the quality of an action, representing how good a specific action is for the agent in a particular situation.

Step-by-Step: Deep Q-Learning in Algorithmic Trading

Let’s walk through a practical experiment using Deep Q-Learning in algorithmic trading. This use case is inspired by the TDQN approach and aims to maximize the Sharpe ratio (a performance measure) by making intelligent trading decisions.

Step 1: Problem Definition

In algorithmic trading, the goal is to determine whether to take a long (buy) or short (sell) position in a stock at any given time. For this, the trading algorithm must analyze stock market data and decide the best action based on past performance and future predictions.

Objective: Maximize profits while minimizing risk.
Performance Measure: Use the Sharpe ratio to balance returns with risk.

Step 2: Data Collection

To train our Deep Q-Learning agent, we need historical stock market data. For simplicity, we focus on Open-High-Low-Close-Volume (OHLCV) data, which provides key information about stock prices and trading volumes and is available on any platform such as Trading View.

In our case, we divide the data into two sets:

Training Set: Used to teach the algorithm how to trade.
Test Set: Used to evaluate how well the algorithm performs on unseen data.

Step 3: Environment Setup

In reinforcement learning, the environment is where the agent (algorithm) interacts. For our use case, the stock market becomes the environment, and at every time step (a trading day), the agent makes decisions.

States: The agent observes market conditions such as past stock prices and its current portfolio (e.g., how many shares it owns and how much cash it has).
Actions: The agent can either buy, sell, or hold the stock.
Rewards: The agent earns rewards based on its profit (increase in portfolio value) after making a decision.

Step 4: Deep Q-Learning Algorithm

Here’s where the magic of Deep Q-learning happens. The agent uses a neural network to approximate the best possible actions by learning from past experiences.

Q-Table: Traditional Q-Learning uses a table to store the quality of actions in various states. However, since stock market states are complex, we use a Deep Q-network (DQN) to approximate the Q-values.
Experience Replay: To improve learning stability, the algorithm stores past experiences (state, action, reward, next state) and randomly samples them during training. This helps prevent the algorithm from overfitting to recent market conditions.
Discount Factor (γ): The algorithm values immediate rewards more than future ones. The discount factor helps balance this by controlling how much weight is given to future rewards.
Training Process: During training, the algorithm repeatedly adjusts its Q-values to maximize its expected future rewards. Over time, the agent becomes more confident in its decisions.

Step 5: Action Execution

Once trained, the agent is ready to trade! At each time step, it observes the current market conditions and selects the best action (buy, sell, hold) based on its Q-values.

If the agent predicts an upward trend, it might buy stock.
If it predicts a drop, it may sell to avoid losses.

Each action directly impacts the agent’s portfolio value, and the goal is to consistently increase this value while minimizing risks.

Step 6: Performance Evaluation

To evaluate the success of the trading strategy, we use the Sharpe ratio, which measures risk-adjusted returns. In other words, it shows how much profit the agent makes relative to the risk it takes.

After running the trading strategy on the test set, we calculate the following metrics:

Profit and Loss: How much money did the agent make overall?
Sharpe Ratio: How well did the agent balance risk and reward?
Maximum Drawdown: What was the largest drop in portfolio value during trading?

The TDQN algorithm, as demonstrated in the research, often outperforms traditional strategies like buy-and-hold by learning to adapt to various market conditions.

What is next?

Algorithmic trading using Deep Q-learning offers a powerful tool for making more informed, data-driven trading decisions. I think like any other profession this sophistication will become part of the basic setup to perform your trading tasks in the short future.

If you want to dig deeper into it here is the paper:

An Application of Deep Reinforcement Learning to Algorithmic Trading

And shout out to this medium article that made the experimentation for us.

Danrodmell’s Substack

Discussion about this post