Is Machine Learning Possible?
This gives rise to two questions:
1) How can different indicator signals be combined into one decision?
2) How can you correctly weight the relative reliability of each indicator?
The good thing about chart indicators is that there’s a continuous feedback of decision and outcome. From future prices you can directly measure how good any indicator signal was at any time. In other words, you can categorically confirm whether a buy, sell or hold signal from an indicator was the right decision or not.
This is what I’ll look at in this article. What I describe here is a decision based trading system that trades on inputs from several chart indicators. This strategy learns the “relative reliability” of any indicator input from experience. In other words, it’s a type of machine learning.
I’ll demonstrate this with a simple automated system to trade Bollinger squeezes. The method can be used equally well in other scenarios and I’d be interested to hear of any results you have in this or other areas.
The Problem of Multiple Inputs
Most algorithmic trading systems rely on more than one indicator to make their trading decisions. The tricky question then arises as to how to interpret the inputs from different indicators in a meaningful way. To understand the problem let’s look at the following toy example:
Suppose I create a simple trading system and I use these conditions to indicate a buy signal:
Indicator 1: Momentum equal or above 100 (F1)
Indicator 2: MACD between 30 and 50 (F2)
Indicator 3: RSI between 40 and 50 (F3)
Now suppose I find at one instant that Momentum=99.9, MACD=45, and RSI=45.
Indicator 1 is just outside of my hard limit. But indicator 2 and 3 indicate a strong buy. I therefore reject a buy signal at this instant. But was that the right decision or not?
Because I’ve put in hard limits, there is no flexibility or “weighing up” of the inputs from the three indicator measurements. In this case, I may have wrongly rejected a good buy entry signal.
Worse still I could have accepted a buy signal or sell decision when the opposite would have been the correct choice.
In other words trading decisions are rarely black and white. It could have been the case that during a period, indicator 1 (momentum) had proven less reliable than the others. I could easily check this against real data. In this case, should I logically have accepted the signals from the other indicators and entered a buy side trade?
This highlights several deficiencies with this approach: First, there is no way to account for relative importance of each indicator. One indicator may be more reliable than another – especially when it lies within a certain range or at different times.
Second, this method does not handle conflicts easily. For example what should I do when one input is just outside of the hard range, but the others are firmly within?
Thirdly, the logic for trading decisions has to be developed by “trial and error” rather than a rigorous approach. In other words it’s all subjective.
The greater the number of indicators used the weaker and more muddled this approach becomes.
Multivariate Analysis
What I really need is a system that “weighs up” all of the inputs together. So what data analysis techniques are available to help with this problem?
One approach is multivariate analysis. With this, I can do several useful things:
- Weight each indicator input according to its predictive reliability (discriminatory value)
- Vary the probability of each outcome (buy, sell, hold) according to other inputs
- Allow the system to adapt to different data or “learn” from experience by itself
The idea of a trading strategy learning as it goes along sounds a bit far-fetched. Actually though it is not that difficult to do and it’s what many advanced trading systems do all of the time.
The LDA Model
Let’s consider a simple example trading system. The goal of this system is to identify Bollinger squeezes and trade on the breakouts that often follow them. Sometimes these events are called volatility breakouts and I’ve talked about them elsewhere. An example of one such event is shown in Figure 1 below.
To identify these chart patterns I start by using the following thee indicators:
- Bollinger bandwidth (F1)
- Standard deviation (F2)
- Slope of the moving average line (F3)
As short hand, I’ve labelled these F1, F2 and F3. These are the inputs.
The first thing I have to do is train my trading system with these indicators and with the data. To start with I examine a sample of these indicator values from various points along the EUR/USD M5 timeseries. What this training phase will do is examine some data points, and objectively measure how good each of the above indicators (F1, F2, & F3) was at identifying these chart events.
To create a set of training examples, I visually inspected the EURUSD chart and classified a number of “training cases” into one of three clusters: Buy, Sell or Hold.
Part of this data is shown in Table 1.
Training samples | |||
Class | F1 | F2 | F3 |
BUY | 4.537 | 1.891 | -0.033 |
BUY | 4.028 | 0.916 | -0.077 |
BUY | 4.621 | 1.940 | -0.138 |
SELL | 5.598 | 1.959 | 0.051 |
SELL | 5.307 | 1.350 | 0.053 |
SELL | 4.995 | 0.769 | 0.055 |
SELL | 4.947 | 0.736 | 0.055 |
HOLD | 10.086 | 2.429 | -0.201 |
HOLD | 14.520 | 17.736 | -0.173 |
HOLD | 28.238 | 23.336 | -0.030 |
HOLD | 39.327 | 22.684 | 0.202 |
HOLD | 41.027 | 20.685 | 0.467 |
The “hold” class is the background case (do nothing). It is not needed in practice but it is useful to generate it to see how clearly the buy/sell classes are separated from the market’s background noise.
From the clusters, I then work out the mean and covariance matrices of the three classes. What do these tell us? The mean and covariances will tell me if the indicators are likely to be any good at discriminating each of the three cases. Namely, the trading decisions to buy, sell or hold.
If the means are close together for each class, then I need the variances to be low. If the clusters are highly spread, this will make classification almost impossible.
If the clusters group tightly with clear boundaries, this will make classification much easier. This will tell me I have picked good indicators as they have greater discriminatory value.
When choosing a set of indicators, it is important to select those with the strongest discriminatory strength. This will normally involve a bit of experimentation and number crunching to find the optimal set for a given pattern.
Figure 2 below illustrates the process of separating classes. I’ve shown just two dimensions here as Indicator 1 and Indicator 2. This method does extend to 3, 4 or even higher dimensions.
Trading Decision to Identify Bollinger Squeezes
Once I have calculated the covariances and means, the next step is to create a function that will make trading decisions from these three indicators. What this will do is create the decision lines as shown in Figure 2. From those decision lines, I can make the optimum trading decision based on my indicators.
To recap, what I am trying to do here is solve the following:
Which class has the highest probability, given the indicator inputs and the prior probability for each class?
In other words, given inputs from the indicators, what action should I take: buy, sell or do nothing.
This should not be a naïve decision. It should be based on the “track record” of how well the indicators produced reliable signals. What I want is weaker signals to be lower weighted and stronger signals to be higher weighted.
There are many solutions to this problem. The simplest solution and the one I will describe here is linear discriminant analysis. What this does is create a set of decision boundaries to which I can apply any data point. These boundaries tell me the class with the highest probability, given the inputs of the indicators. The linear discriminant method assumes inputs from each indicator are normally distributed.
Let’s say at tick n, I have the following values from my indicators
X=(F1, F2, F3)
To use the LDA, I put the indicator values into this formula:
Here C^{-1} is the inverse of the covariance matrix, µ are the class means. X is my data point, and p is the prior probability matrix. Both C^{-1} and µ_{i} are from my algorithm training phase.
This function returns a matrix of three values, one for each class. The class with the highest value is the optimum choice. These are actually as a weighted probability of each of the three choices: buy, sell or hold.
The prior probability matrix sets out how likely each of the three cases are a priori – or before observed data is considered. You might set the priors to be equal or you might weight one more than the other. For example, in an uptrend the prior for buying might be set higher than selling.
The trading decision chosen is then the one where the above formula gives the highest value. So for example, point 1 is
X = ( 8.6769, 1.2374, -0.0398 )
f = (0.9152, -0.1896, 0.8060)
This classifies as BUY because class 1 (0.9152) is the most probable class.
My Excel spreadsheet which you can download here includes all of the worked examples and will allow you to try this with your own data.
The table below shows examples for points on the chart.
LDA examples | ||||||||
Point | F1 | F2 | F3 | d1 | d2 | d3 | Classification | |
#1 | 8.6769 | 1.2374 | -0.0398 | 0.9152 | -0.1896 | 0.8060 | BUY | |
#2 | 11.6451 | 3.0811 | 0.1924 | -2.0745 | -0.9070 | -2.1609 | SELL | |
#3 | 36.3914 | 11.5561 | -0.3704 | 13.1418 | 4.6028 | 16.7644 | HOLD | |
#4 | 11.4298 | 4.4139 | -0.0451 | 0.8009 | -0.3424 | 1.1916 | HOLD | |
#5 | 5.1084 | 0.9634 | 0.0547 | -1.8826 | -1.2660 | -2.4716 | SELL | |
#6 | 9.7661 | 0.7820 | 0.2417 | -2.6207 | -0.9507 | -3.1333 | SELL | |
#7 | 9.3702 | 0.4546 | -0.0602 | 1.8456 | 0.2296 | 1.7654 | BUY |
The EURUSD charts in Figures 3-6 show the data points from the above table.
With the spreadsheet you can adjust the settings, for example, the prior probabilities, and check the results for yourself.
You can also create your own experiment with your own training data and your own indicators. Even if you don’t use the full method, this is a useful exercise because you can find out the discriminatory power of any indicator. That is, you can measure objectively how good it is at generating correct trading signals.
Removing Redundancy: Principal Components Analysis
If your trading strategy relies on a lot of inputs from different indicators, then it might help to simplify this data before making decisions upon it.
There are strong correlations between many chart indicators. This means it can be that a set of indicator measurements actually contains a lot of redundancy. For example, the Bollinger bandwidth, the standard deviation, and the ATR indicator are all strongly correlated because they are all measures of volatility.
Principal components analysis is a useful way of simplifying the inputs when there are correlations between them. The way this works is to define a new set of orthogonal transformed variables, or principal components. Component 1 captures the highest degree of variation, component 2, the second highest degree and so on. By doing this, it is sometimes possible to reduce a nine or ten dimensional set of indicators into a couple of principle components that capture 90% of the variability. The principal components, rather than the indicators themselves, then become the inputs to the trading decision I described above.
Bootstrapping
Training your system can be tedious and it’s also prone to errors.
The training part can be automated so that the system “learns” from the outcome of each trading decision on the fly. This is called “bootstrapping”. So called because the system adapts itself according to the data it comes across.
The bootstrapping phase takes place in a simulated trading environment. When the system becomes stable, only then does the algorithm switch over to live trading.
Bootstrapping is often better than manual training because it allows the trading system to adjust according to market conditions rather than relying on static training data samples.
The bootstrap algorithm tries to find a set of training cases that will maximize the profit with the lowest risk (outcome variability). In this way, the bootstrap scans through the chart data looking for highly specific patterns in the indicators. The ideal patterns have low variance and “high uniqueness”. In other words, the bootstrap can be setup to find an optimized set of training points that separate the classes clearly.
Bootstrap Example
As an example, let’s say I use three indicator measurements as my trading signal. With the bootstrap method, my trading system will initially make randomized decisions. This is pass one. Then based on the outcome of each trade, the bootstrap places those indicator measurements from the trade entry points into the appropriate class. The table below demonstrates how this works.
Pass | Trade | Action | P/L | Bootstrap Class |
1 | #1 | Buy | 60 | Buy |
1 | #2 | Buy | -50 | Sell |
1 | #3 | Sell | -2 | Hold |
1 | #4 | Buy | -4 | Hold |
1 | #5 | Hold | 100 | Buy |
2 | #6 | Buy | 70 | Buy |
2 | #7 | Hold | 2 | Hold |
2 | #8 | Sell | 10 | Sell |
2 | #9 | Buy | -40 | Sell |
2 | #2 | Hold | 2 | Hold |
3 | … | … | … | … |
The bootstrap class is the assignment that the algorithm should have made based on the resulting P/L of the trade in that pass.
On the second pass, the bootstrap uses the assignments of the first pass. This becomes the training set in pass two. In pass two, the trading decisions should be more accurate than in pass one, so the total profit on that pass should be higher. In pass three, the bootstrap uses the assignments from pass two as the training data. And so on. The bootstrapping continues in this way until the trading decisions reach a stable and optimal point. You know the system is stabilizing when the bootstrap makes no changes or fewer changes to each classification on each pass.
Results
I created an algorithmic strategy to test the above ideas. This was tested in several scenarios over a ten year time frame using both GBP/USD and EUR/USD at the M5 (five minute) scale.
The patterns I am trading on are Bollinger squeezes and the system is designed to make the correct trading decision at these points in the chart with high probability.
Training phase
The Bollinger squeeze patterns were identified by eye in the first phase. This was done using a separate part of the data series outside of the test frame. This formed the training data set.
Decision phase
Once the training data was generated, the test run then applied the classifier on a different part of the time series to make real trading decisions.
Bollinger squeeze patterns don’t appear very often so the number of trades the system made was quite low. On average there was only about one entry signal per week. The algorithm was also restricted to trade only on high confidence outputs from the classifier. Weak or ambiguous signals were not traded. This decision was based on the weighted output probability from the LDA function.
The first chart shows the result on GBP/USD. The system made 640 trades with total profit of $50k.
The second chart shows the result on EUR/USD. The system made 568 trades with total profit of $28k.
The test results show that this method can create a stable and reliable strategy.
Summary
The big challenge of using indicators is that of interpretation. The main problem is:
- Which indicator is reliable and when
- How do you make reliable trading decisions based on the indicator outputs
In this article I’ve demonstrated a simple decision based strategy that learns and adapts based on the data it sees. In other words, real chart data which there is an abundance of.
I’ve tried this on Bollinger squeeze patterns but the method is very general and can be applied to any chart patterns with any indicators as long as they have numerical output. It can also be used to combine outputs from other expert systems.
Multivariate analysis is used to interpret the outputs and account for the discriminatory value of each indicator. This kind of system can be trained by manually identifying prototype cases. Alternatively, bootstrapping allows the system to learn on its own automatically using an iterative training algorithm.
The Excel spreadsheet for multivariate classifications is included below.
Steve, thank you for this article. Very interesting to me as I am looking at building a neural net EA for a while now.
What do you recommend is the highest number of indicators you can use with this technique? I want to combine the signals of about 10 indicators.
It can extend indefinitely the only limit is computation time. The inversion of the covariance matrix can be a bottleneck if you are planning on putting in a huge number of indicator outputs.
But in my view more is not always better. Especially as I said above when you look at the redundancy issue that many indicators have. PCA reduction can come in useful and could drastically reduce the data set. With a massive set of inputs there could also be a problem with training and over-fitting of the data.
As for the neural net idea yes that’s certainly a choice for this kind of learning algorithm. There’s also some overlap with the methods.
Is there an EA for this if yes where can I download it? Thanks