Introduction

The foreign exchange market, known as Forex or FX, is a financial market where currencies are bought and sold simultaneously. Forex is the world’s largest financial market, with a volume of more than $5 trillion. It is a decentralized market that operates 24 h a day, except for weekends, which makes it quite different from other financial markets.

The characteristics of Forex show differences compared to other markets. These differences can bring advantages to Forex traders for more profitable trading opportunities. Some of these advantages include no commissions, no middlemen, no fixed lot size, low transaction costs, high liquidity, almost instantaneous transactions, low margins/high leverage, 24-h operations, no insider trading, limited regulation, and online trading opportunities. Two types of techniques are used to predict future values for typical financial time series—fundamental analysis and technical analysis—and both can be used for Forex. The former uses macroeconomic factors while the latter uses historical data to forecast the future price or the direction of the price.

The main decision in Forex involves forecasting the directional movement between two currencies. Traders can profit from transactions with correct directional prediction and lose with incorrect prediction. Therefore, identifying directional movement is the problem addressed in this study.

We chose the Euro/US dollar (EUR/USD) pair for the analysis since it is the largest traded Forex currency pair in the world, accounting for more than 80% of the total Forex volume.

In recent years, deep learning tools, such as long short-term memory (LSTM), have become popular and have been found to be effective for many time-series forecasting problems. In general, such problems focus on determining the future values of time-series data with high accuracy. However, in direction prediction problems, accuracy cannot be defined as simply the difference between actual and predicted values. Therefore, a novel rule-based decision layer needs to be added after obtaining predictions from LSTMs.

In this work, we propose a hybrid model composed of a macroeconomic LSTM model and a technical LSTM model, named after the types of data they use. We first separately investigated the effects of these data on directional movement. After that, we combined the results to significantly improve prediction accuracy. The macroeconomic LSTM model utilizes several financial factors, including interest rates, Federal Reserve (FED) funds rate, inflation rates, Standard and Poor’s (S&P) 500, and Deutscher Aktien IndeX (DAX) market indexes. Each factor has important effects on the trend of the EUR/USD currency pair. This can be interpreted as a fundamental analysis of price data. The other model is the technical LSTM model, which takes advantage of technical analysis. Technical analysis is based on technical indicators that are mathematical functions used to predict future price action. The feature set in our model uses popular technical indicators such as moving average (MA), moving average convergence divergence (MACD), rate of change (ROC), momentum, relative strength index (RSI), Bollinger bands (BB), and the commodity channel index (CCI).

The contributions of this study are as follows:

  • A popular deep learning tool called LSTM, which is frequently used to forecast values in time-series data, is adopted to predict direction in Forex data.

  • Both macroeconomic and technical indicators are used as features to make predictions.

  • A novel hybrid model is proposed that combines two different models with smart decision rules to increase decision accuracy by eliminating transactions with weaker confidence.

  • The proposed model and baseline models are tested using recent real data to demonstrate that the proposed hybrid model outperforms the others.

The rest of this paper is organized as follows. In “Related work” section, related studies of the financial time-series prediction problem are thoroughly examined. “Forex preliminaries”–“Technical indicators” sections provide background information about Forex, LSTM, and the technical indicators. Then, “The data set” section presents the data set used in the experiments. “LSTM-based hybrid model using macroeconomic and technical indicators” section introduces the proposed algorithm to handle the directional movement prediction problem. Moreover, the preprocessing and postprocessing phases are also explained in detail. “Experiments” section presents the results of the experiments and the classification performances of the proposed model. “Discussion” and “Conclusion” sections discuss the experimental results and provide insight for future research directions.

Related work

Various forecasting methods have been considered in the finance domain, including machine learning approaches (e.g., support vector machines and neural networks) and new methods such as deep learning. Unfortunately, there are not many survey papers on these methods. Cavalcante et al. (2016), Bahrammirzaee (2010), and Saad and Wunsch (1998) have provided overviews of the field. The most recent of these, by Cavalcante et al. (2016), categorized the approaches used in different financial markets. Although that study mainly introduced methods proposed for the stock market, it also discussed applications for foreign exchange markets.

There has been a great deal of work on predicting future values in stock markets using various machine learning methods. We discuss some of them below.

Selvamuthu et al. (2019) used neural networks based on Levenberg–Marquardt, scaled conjugate gradient, and Bayesian regularization for stock market prediction based on tick data and 15-min-interval data for an Indian company.

Patel et al. (2015b) developed a two-stage fusion structure to predict the future values of the stock market index for 1–10, 15, and 30 days using 10 technical indicators. In the first stage, support vector machine regression (SVR) was applied to these inputs, and the results were fed into an artificial neural network (ANN). SVR and random forest (RF) models were used in the second stage. They compared the fusion model with standalone ANN, SVR, and RF models. They reported that the fusion model significantly improved upon the standalone models.

Guresen et al. (2011) explored several ANN models for predicting stock market indexes. These models include multilayer perceptron (MLP), dynamic artificial neural network (DAN2), and hybrid neural networks with generalized autoregressive conditional heteroscedasticity (GARCH). Applying mean-square error (MSE) and mean absolute deviation (MAD), their results showed that MLP performed slightly better than DAN2 and GARCH-MLP while GARCH-DAN2 had the worst results.

Weng et al. (2018) developed a financial expert system using ensemble methods (i.e., neural network regressing ensemble (NNRE), support vector regression ensemble (SVRE), boosted regression tree (BRT), and random forest regression (RFR)) to predict stock prices 1 day ahead. Market prices, technical indicators, financial news, Google Trends, and the number unique visitors to Wikipedia pages were used as inputs. They also investigated the effect of PCA on performance. They reported that ensembles with PCA performed better than those without PCA. They also noted that BRT and RFR were the best while SVRE was the worst in terms of mean absolute percentage error.

Huang et al. (2005) examined forecasting weekly stock market movement direction using SVM. They compared SVM with linear discriminant analysis, quadratic discriminant analysis, and Elman back-propagation neural networks. They also proposed a model that combined SVM with other classifiers. They used not only the NIKKEI 225 index but also macroeconomic variables as features for the model. Their direction calculation was based on the first-order difference natural logarithmic transformation, and the directions were either increasing or decreasing. SVM outperformed the other models with an accuracy of 73% while the combined model was the best, with an accuracy of 75%.

Kara et al. (2011) compared the performance of ANN and SVM for predicting the direction of stock price index movement. Ten technical indicators were used as inputs for the model. They found that ANN, with an accuracy of 75.74%, performed significantly better than SVM, which had an accuracy of 71.52%.

Patel et al. (2015a) compared the performance of four classifiers (ANN, SVM, random forest, and naive Bayes) for stock price index direction using two approaches. In the first approach, they used 10 technical indicator values as inputs with different parameter settings for classifiers. Prediction accuracy fell within the range of 0.7331–0.8359. In the other approach, they represented same 10 technical indicator results as directions (up and down), which were used as inputs for the classifiers. Using this approach, they enhanced accuracy by about 15% for all of the classifiers. Although their experiments concerned short-term prediction, the direction period was not explicitly explained.

Ballings et al. (2015) evaluated ensemble methods (random forest, AdaBoost, and kernel factory) against neural networks, logistic regression, SVM, and k-nearest neighbor for predicting 1 year ahead. They used different stock market domains in their experiments. According to the median area under curve (AUC) scores, random forest showed the best performance, followed by SVM, random forest, and kernel factory.

Hu et al. (2018) introduced an improved sine–cosine algorithm (ISCA) for optimizing the weights and biases of BPNN to predict the directions of open stock prices of the S&P 500 and Dow Jones Industrial Average indices. Using Google Trends data in addition to the opening, high, low, and closing price, as well as trading volume, in their experiments, they obtained an 86.81% hit ratio for the S&P 500 index and an 88.98% hit ratio for the Dow Jones Industrial Average Index.

Gui et al. (2015) investigated SVM for predicting stock price index direction with different parameter settings. That study also compared the result for SVM with BPNN and case-based reasoning models; multiple technical indicators were used as inputs for the models. That study found that SVM outperformed the other models with an accuracy of 57.8313% while the other models had accuracies of 54.7332% and 51.9793%, respectively.

Qiu and Song (2016) developed a genetic algorithm (GA)—based optimized ANN to predict the direction of the next day’s price in the stock market index. GA was used to optimize the initial weights and bias of the model. Two types of input sets were generated using several technical indicators of the daily price of the Nikkei 225 index and fed into the model. They obtained accuracies 60.87% for the first set and 81.27% for the second set.

Zhong and Enke (2017) investigated three-dimensional reduction techniques applied to ANN for forecasting the daily direction of the S&P 500 Index ETF (SPY). Principal component analysis (PCA), fuzzy robust principal component analysis (FRPCA), and kernel-based principal component analysis (KPCA) were used to reduce the number of features. Their experiments indicated that ANN with PCA performed slightly better than the other two techniques.

Zhong and Enke (2019) used deep neural networks and ANNs to forecast the daily return direction of the stock market. They performed experiments on both untransformed and PCA-transformed data sets to validate the model.

In addition to classical machine learning methods, researchers have recently started to use deep learning methods to predict future stock market values. LSTM has emerged as a deep learning tool for application to time-series data, such as financial data.

Zhang et al. (2017) proposed a state-frequency memory recurrent network, which is a modification of LSTM, to forecast stock prices. By decomposing the hidden states of memory cells into multiple frequency components, they could learn the trading patterns of those frequencies. They used state-frequency components to predict future price values through nonlinear regression. They used stock prices from several sectors and performed experiments to make forecasts for 1, 3, and 5 days. They compared the results with LSTM and autoregressive integrated moving average (ARIMA) in terms of mean-square error. They obtained errors of 5.57, 17.00, and 28.90 for the different steps, which outperformed the other models.

Fulfillment et al. (2016) studied stock market forecasting in six different domains using LSTM. He aimed to predict the next 3 h using hourly historical stock data. The model was trained to classify three classes—namely, increasing 0–1%, increasing above 1%, and not increasing (less than 0%). The accuracy results ranged from 49.75 to 59.5%. That study also built a stock trading simulator to test the model on real-world stock trading activity. With that simulator, he managed to make profit in all six stock domains with an average of 6.89%.

Nelson et al. (2017) examined LSTM for predicting 15-min trends in stock prices using technical indicators. They used 175 technical indicators (i.e., external technical analysis library) and the open, close, minimum, maximum, and volume as inputs for the model. They compared their model with a baseline consisting of multilayer perceptron, random forest, and pseudo-random models. The accuracy of LSTM for different stocks ranged from 53 to 55.9%. They concluded that LSTM performed significantly better than the baseline models, according to the Kruskal–Wallis test.

More recently, Fischer and Krauss (2018) applied LSTM to the stock market. They investigated many different aspects of the stock market and found that LSTM was very successful for predicting future prices for that type of time-series data. They also compared LSTM with more traditional machine learning tools to show its superior performance.

Similarly, Di Persio and Honchar (2016) applied LSTM and two other traditional neural network based machine learning tools to future price prediction. They also analyzed ensemble-based solutions by combining results obtained using different tools.

In addition to traditional exchanges, many studies have also investigated Forex. Some studies of Forex based on traditional machine learning tools are discussed below.

Galeshchuk and Mukherjee (2017) investigated the performance of a convolutional neural network (CNN) for predicting the direction of change in Forex. Using the daily closing rates of EUR/USD, GBP/USD, and USD/JPY, they compared the results of CNN with their baseline models and SVM. While the baseline models and SVM had an accuracy of around 65%, their proposed CNN model had an accuracy of about 75%.

Meanwhile, Kayal (2010) investigated the use of MLP in Forex. That work used basic technical indicators as inputs.

Ghazali et al. (2009) also investigated the use of neural networks for Forex. They proposed a higher-order neural network called a dynamic ridge polynomial neural network (DRPNN). In their experiments, DRPNN performed better than a ridge polynomial neural network (RPNN) and a pi-sigma neural network (PSNN).

To predict exchange rates, Majhi et al. (2009) proposed using new ANNs, referred to as a functional link artificial neural network (FLANN) and a cascaded functional link artificial neural network (CFLANN). They demonstrated that those new networks were more robust and had lower computational costs compared to an MLP trained with back-propagation.

In what is commonly called a mark-to-market approach, market prices are increasingly being used to calibrate models to quantify risk in several sectors. The net present value of a financial institution, for example, is an important input for estimating both bankruptcy risk (e.g., Kou et al. 2020) and the likelihood that shocks will propagate throughout the financial system (Kou et al. 2019). In such a context, stock price crashes not only dramatically damage the capital market but also have medium-term adverse effects on the financial sector as a whole (Wen et al. 2019). Credit risk is a major factor in financial shocks. Therefore, a realistic appraisal of solvency needs to be an objective for banks. At the level of the individual borrower, credit scoring is a field in which machine learning methods have been used for a long time (e.g., Shen et al. 2020; Wang et al. 2020).

Deep learning methods such as LSTM are rarely used for Forex. In one recent work, Shen et al. (2015) proposed a modified deep belief network. They were able to show that deep learning approaches outperformed traditional methods.

Even though LSTM is starting to be used in financial markets, using it in Forex for direction forecasting between two currencies, as proposed in the present work, is a novel approach.

Forex preliminaries

Forex has characteristics that are quite different from those of other financial markets (Archer 2010; Ozorhan et al. 2017). To explain Forex, we start by describing how a trade is made. Profit/loss calculations are made using the difference between the final ratio and the initial ratio of the currency pair that has been traded. If the ratio of the currency pair increases and the trader goes long, or the currency pair ratio decreases and the trader goes short, the trader will profit from that transaction when it is closed. Otherwise, the trader not profit. For example, let us assume the EUR/USD ratio was 1.1500 when the trader started a transaction, going long with an initial amount of $10,000. When the position closes (i.e., the transaction ends) with a ratio of 1.1550, the trader will gain \({10000 * (1.1550 - 1.1500) = \$50}\). When the position closes with a ratio of 1.1450, the trader will lose \(10000 * (1.1500 - 1.1450) = \$50\). Furthermore, these calculations are based on no leverage. If the trader uses a leverage value such as 10, both the loss and the gain are multiplied by 10.

Detailed definitions of commonly used concepts and terms in Forex can be found in Forex (2018), Archer (2010) and Özorhan (2017). Here, we explain only the most important ones.

Base currency, which is also called the transaction currency, is the first currency in the currency pair while quote currency is the second one in the pair. To illustrate, in the EUR/USD pair, EUR is the base currency, and USD is the quote currency.

Being long (or going long) means buying the base currency or selling the quote currency in the currency pair. Being short (or going short) means selling the base currency or buying the quote currency in the currency pair. Pip is an abbreviation for “percentage of point,” defined as the smallest amount of change occurring in the currency ratio. In general, pip corresponds to the fourth decimal point (i.e., minimum as 0.0001) of that currency. Pipette is the fractional pip, which corresponds to the fifth decimal point (i.e., as 0.00001). In other words, 1 pip equals 10 pipettes.

Leverage corresponds to the use of borrowed money when making transactions. A leverage of 1:100 indicates that if one opens a position with a volume of 1, the actual transaction volume will be 100. After using leverage, one can either gain or lose 100 times the amount of that volume. Margin refers to money borrowed by a trader that is supplied by a broker to make investments using leverage. In this way, one can multiply his/her gains or losses.

Bid price is the price at which the trader can sell the base currency. Ask price is the price at which the trader can buy the base currency. Spread is the difference between the ask and bid prices. A lower spread means the trader can profit from small price changes. Spread value is dependent on market volatility and liquidity. Stop loss is an order to sell a currency when it reaches a specified price. This order is used to prevent larger losses for the trader. Take profit is an order by the trader to close the open position (transaction) for a gain when the price reaches a predefined value. This order guarantees profit for the trader without having to worry about changes in the market price. Market order is an order that is performed instantly at the current price. Swap is a simultaneous buy and sell action for the currency at the same amount at a forward exchange rate. This protects traders from fluctuations in the interest rates of the base and quote currencies. If the base currency has a higher interest rate and the quote currency has a lower interest rate, then a positive swap will occur; in the reverse case, a negative swap will occur.

Fundamental analysis and technical analysis are the two techniques commonly used for predicting future prices in Forex. While the first is based on economic factors, the latter is related to price actions (Archer 2010).

Fundamental analysis focuses on the economic, social, and political factors that can cause prices to move higher, move lower, or stay the same (Archer 2010; Murphy 1999). These factors are also called macroeconomic factors. Economic data reports, interest rates, monetary policy, and international trade/investment flows are some examples (Ozorhan et al. 2017).

Technical analysis uses only the price to predict future price movements (Kritzer and Service 2012). This approach studies the effect of price movement. Technical analysis mainly uses open, high, low, close, and volume data to predict market direction or generate sell and buy signals (Archer 2010). It is based on the following three assumptions (Murphy 1999):

  • Market action discounts everything.

  • Price moves in trends.

  • History repeats itself.

Chart analysis and price analysis using technical indicators are the two main approaches in technical analysis. While the former is used to detect patterns in price charts, the latter is used to predict future price actions (Ozorhan et al. 2017).

Long short-term memory (LSTM)

Long short-term memory (LSTM) was proposed by Hochreiter and Schmidhuber (1997). LSTM is a recurrent neural network architecture that was designed to overcome the vanishing gradient problem found in conventional recurrent neural networks (RNNs) (Biehl 2005). Errors between layers tend to vanish or blow up, which causes oscillating weights or unacceptably long convergence times. The initial LSTM structure solves this problem by introducing the constant error carousel (CEC). In this way, the architecture ensures constant error flow between the self-connected units (Hochreiter and Schmidhuber 1997).

The memory cell of the initial LSTM structure consists of an input gate and an output gate. While the input gate decides which information should be kept or updated in the memory cell, the output gate controls which information should be output. This standard LSTM was extended with the introduction of a new feature called the forget gate (Gers et al. 2000). The forget gate is responsible for resetting a memory state that contains outdated information. Furthermore, peephole connections and full back-propagation through time (BPTT) training are final features that were added to the LSTM architecture (Gers and Schmidhuber 2000; Greff et al. 2017). With these modifications, the architecture was renamed Vanilla LSTM (Greff et al. 2017), as shown in Fig. 1.

Fig. 1
figure 1

Vanilla LSTM (Greff et al. 2017)

LSTM offers an effective and scalable model for learning problems that includes sequential data (Greff et al. 2017). It has been used in many different fields, including handwriting recognition (Graves et al. 2009; Pham et al. 2014) and generation (Graves 2013), language modeling (Zaremba et al. 2014) and translation (Luong et al. 2015), acoustic modeling of speech (Zia and Zahid 2019), speech synthesis (Fan et al. 2014), protein secondary structure prediction (Sønderby and Winther 2014), audio analysis (Marchi et al. 2014), and video data analysis (Donahue et al. 2017; Greff et al. 2017).

Forward pass

One of the two main operations of LSTM, shown in Fig. 1, is called the forward pass. In the forward pass, the calculation moves forward by updating the weights (Greff et al. 2017). The weights of LSTM can be categorized as follows:

  • Input weights: \(W_z, W_i, W_f, W_o \, \in \, \mathbb {R^{N*M}}\)

  • Recurrent weights: \(R_z, R_i, R_f, R_o \, \in \, \mathbb {R^{N*N}}\)

  • Peephole weights: \(p_i, p_f, p_o \, \in \, \mathbb {R^N}\)

  • Bias weights: \(b_z, b_i, b_f, b_o \, \in \, \mathbb {R^N}\),

where z is the block input, i is the input gate, f is the forget gate, o is the output gate, N is the number of LSTM blocks, and M is the number of inputs. By introducing \(x^t\) as the input vector, \(y^t\) as the block output, and \(c^t\) as the cell at time t, the formulation of the forward pass in Vanilla LSTM can be defined as below:

$$\begin{aligned} {{\bar{z}}^{t}}&= {W_z}{x^t} + {R_z}{y^{t-1}} + {b_z}, \end{aligned}$$
(1)
$$\begin{aligned} {z^t}&= g({\bar{z}}^{t}), \end{aligned}$$
(2)
$$\begin{aligned} {{\bar{i}}^{t}}&= {W_i}{x^t} + {R_i}{y^{t-1}} + {p_i}\odot {c^{t-1}} + {b_i}, \end{aligned}$$
(3)
$$\begin{aligned} {i^t}&= \sigma ({\bar{i}}^{t}), \end{aligned}$$
(4)
$$\begin{aligned} {{\bar{f}}^{t}}&= {W_f}{x^t} + {R_f}{y^{t-1}} + {p_f}\odot {c^{t-1}} + {b_f}, \end{aligned}$$
(5)
$$\begin{aligned} {f^t}&= \sigma ({\bar{f}}^{t}), \end{aligned}$$
(6)
$$\begin{aligned} {c^{t}}&= {z_t}\odot {i^t} + {c^{t-1}}\odot {f^t}, \end{aligned}$$
(7)
$$\begin{aligned} {{\bar{o}}^{t}}&= {W_o}{x^t} + {R_o}{y^{t-1}} + {p_o}\odot {c^t} + {b_o}, \end{aligned}$$
(8)
$$\begin{aligned} {o^t}&= \sigma ({\bar{o}}^{t}), \end{aligned}$$
(9)
$$\begin{aligned} {y^{t}}&= {h(c^t)}\odot {o^t}, \end{aligned}$$
(10)

where \(\sigma \) is the logistic sigmoid function, g and h are hyperbolic tangent functions, and \(\odot \) is the point-wise multiplication of the two vectors.

Back-propagation through time

The other main operation is back-propagation. Back-propagation through time (BPTT) is the process of calculating the deltas of LSTM blocks and the gradient of the weights (Greff et al. 2017).

First, the deltas (\(\delta \)) of LSTM blocks and the inputs are calculated. In the below equations, \(\Delta ^t\) is the vector of the deltas passed down from the above layer, and T is the transposition operator. Calculation of the deltas is performed as follows:

$$\begin{aligned} {{\delta }y^{t}}&= \Delta ^t + {R_z}^T{\delta }z^{t+1} + {R_i}^T{\delta }i^{t+1} + {R_f}^T{\delta }f^{t+1} + {R_o}^T{\delta }o^{t+1}, \end{aligned}$$
(11)
$$\begin{aligned} {{\delta }{\bar{o}}^{t}}&= {\delta }{y^t} \odot h(c^t) \odot \sigma '({\bar{o}}^{t}), \end{aligned}$$
(12)
$$\begin{aligned} {{\delta }{\bar{c}}^{t}}&= {\delta }{y^t} \odot o^t \odot h'(c^t) + p_o \odot {\delta }{\bar{o}}^{t} + p_i \odot {\delta }{\bar{i}}^{t+1} + p_f \odot {\delta }{\bar{f}}^{t+1} + {\delta }{c^{t+1}} \odot f^{t+1}, \end{aligned}$$
(13)
$$\begin{aligned} {{\delta }{\bar{f}}^{t}}&= {\delta }{c^t} \odot c^{t-1} \odot \sigma '({\bar{f}}^{t}), \end{aligned}$$
(14)
$$\begin{aligned} {{\delta }{\bar{i}}^{t}}&= {\delta }{c^t} \odot z^t \odot \sigma '({\bar{i}}^{t}), \end{aligned}$$
(15)
$$\begin{aligned} {{\delta }{\bar{z}}^{t}}&= {\delta }{c^t} \odot i^t \odot g'({\bar{z}}^{t}), \end{aligned}$$
(16)
$$\begin{aligned} {{\delta }{x^t}}&= {W_z}^T {\delta }{\bar{z}}^{t} + {W_i}^T {\delta }{\bar{i}}^{t} + {W_f}^T {\delta }{\bar{f}}^{t} + {W_o}^T {\delta }{\bar{o}}^{t}. \end{aligned}$$
(17)

Then, the calculation of the gradient of the weights is performed. In the below formulas, \(*\) can be any of {\({\bar{z}}, {\bar{i}}, {\bar{f}}, {\bar{o}}\)}, \(<*_1, *_2>\) corresponds to the outer product of the two vectors, and T is the vector length. The calculations are as follows:

$$\begin{aligned} {{\delta }W_*}&= \sum _{t=0}^T{<{\delta }*^t, x^t>}, \end{aligned}$$
(18)
$$\begin{aligned} {{\delta }R_*}&= \sum _{t=0}^{T-1}{<{\delta }*^{t+1}, y^t>}, \end{aligned}$$
(19)
$$\begin{aligned} {{\delta }b_*}&= \sum _{t=0}^T{{\delta }*^t}, \end{aligned}$$
(20)
$$\begin{aligned} {{\delta }p_i}&= \sum _{t=0}^{T-1}{c^t} \odot {\delta }{\bar{i}}^{t+1}, \end{aligned}$$
(21)
$$\begin{aligned} {{\delta }p_f}&= \sum _{t=0}^{T-1}{c^t} \odot {\delta }{\bar{f}}^{t+1}, \end{aligned}$$
(22)
$$\begin{aligned} {{\delta }p_o}&= \sum _{t=0}^{T-1}{c^t} \odot {\delta }{\bar{o}}^t. \end{aligned}$$
(23)

Using Eqs. 1123, all weights are updated.

Technical indicators

A technical indicator is a time series that is obtained from mathematical formula(s) applied to another time series, which is typically a price (TIO 2018). These formulas generally use the close, open, high, low, and volume data. Technical indicators can be applied to anything that can be traded in an open market (e.g., stocks, futures, commodities, and Forex). They are empirical assistants that are widely used in practice to identify future price trends and measure volatility (Ozorhan et al. 2017). By analyzing historical data, they can help forecast the future prices.

According to their functionalities, technical indicators can be grouped into three categories: lagging, leading, and volatility. Lagging indicators, also referred to as trend indicators, follow the past price action. MA and MACD are the best examples of lagging indicators. Leading indicators, also known as momentum-based indicators, aim to predict future price trend directions and show rates of change in the price. ROC and RSI are the best-known examples of leading indicators. Volatility-based indicators measure volatility levels in the price. BB is the most widely used volatility-based indicator.

The technical indicators used in this study are described below.

Moving average (MA)

Moving average (MA) is a trend-following (or lagging) indicator that smooths prices by averaging them in a specified period. In this way, MA can help filter out noise. MA can not only identify the trend direction but also determine potential support and resistance levels (TIO 2018).

Moving average convergence divergence (MACD)

Moving average convergence divergence (MACD) is a momentum oscillator developed by Gerald Appel in the late 1970s. It is a trend-following indicator that uses the short and long term exponential moving averages of prices (Appel 2005). MACD uses the short-term moving average to identify price changes quickly and the long-term moving average to emphasize trends (Ozorhan et al. 2017).

Rate of change (ROC)

Rate of change (ROC) is a momentum oscillator that defines the velocity of the price. This indicator measures the percentage of the direction by calculating the ratio between the current closing price and the closing price of the specified previous time (Ozorhan et al. 2017).

Momentum

Momentum measures the amount of change in the price during a specified period (Colby 2003). It is a leading indicator that either shows rises and falls in the price or remains stable when the current trend continues. Momentum is calculated based on the differences in prices for a set time interval (Murphy 1999).

Relative strength index (RSI)

The relative strength index (RSI) is a momentum indicator developed by J. Welles Wilder in 1978. RSI is based on the ratio between the average gain and average loss, which is called the relative strength (RS) (Ozorhan et al. 2017; Wilder 1978). RSI is an oscillator, which means its values change between 0 and 100. It determines overbought and oversold levels in the prices.

Bollinger bands (BB)

Bollinger bands (BB) refers to a volatility-based indicator developed by John Bollinger in the 1980s. It has three bands that provide relative definitions of high and low according to the base (Bollinger 2001). While the middle band is the moving average in a specific period, the upper and lower bands are calculated by the standard deviations in the price, which are placed above and below the middle band. The distance between the bands depends on the volatility of the price (Bollinger 2001; Ozturk et al. 2016).

Commodity channel index (CCI)

The commodity channel index (CCI) is a momentum-based indicator developed by Donald Lambert in 1980. CCI is based on the principle that current prices should be examined based on recent past prices, not those in the distant past, to avoid confusing present patterns (Lambert 1983). This indicator can be used to highlight a new trend or warn against extreme conditions. Moreover, CCI identifies overbought and oversold conditions (Özorhan 2017).

The data set

Interest and inflation rates are two fundamental indicators of the strength of an economy. In the case of low interest rates, individuals tend to buy investment tools that strengthen the economy. In the opposite case, the economy becomes fragile. If supply does not meet demand, inflation occurs, and interest rates also increase (IRD 2018).

Germany and the US are two of the world’s most powerful economies. In such economies, the stock markets have strong relationships with their currencies. DAX is the German stock index, which has a strong relationship on the price of the EUR while the S&P 500 is one a US stock index that affects the USD. Central banks’ interest rates are also important factors determining the prices of currencies. Therefore, the interest rates determined by the Central Bank of Europe and the Fed directly affect EUR and USD prices, respectively.

In this work, to investigate the effect of macroeconomic factors on the value of the EUR/USD currency pair, we used the factors described in Table 1, as well as the close, open, high, and low values of the EUR/USD pair, which were retrieved from EUR/USD historical data (EUR 2018). The rest of the data were obtained from various online resources, including the ECB Statistical Data Warehouse (ECB 2018; EU 2018; Germany 2018), Bureau of Labor Statistics Data (2018), Federal Reserve Economic Data (EFFR 2018), and Yahoo Finance (DAX 2018).

The data set was created with values from the period January 2013–January 2018. This 5-year period contains 1234 data points in which the markets were open. There were 613 increases and 620 decreases for the EUR/USD ratio during this period. Table 1 presents explanations for each field in the data set. Monthly inflation rates were collected from the websites of central banks, and they were repeated for all days of the corresponding month to fill the fields in our daily records.

Table 1 Macroeconomic data and the currency pair used in the data set

LSTM-based hybrid model using macroeconomic and technical indicators

Using LSTM, we constructed a hybrid model to forecast directional movement in the EUR/USD currency pair that uses both macroeconomic and technical indicators. This hybrid model consists of two separate LSTM models that learn different parameter settings for different input sets (Yıldırım and Toroslu 2019). These models are called “macroeconomic LSTM” (ME-LSTM) and “technical LSTM” (TI-LSTM); they are explained below in “Macroeconomic LSTM model” and “Technical LSTM model” sections, respectively.

The main structure of the hybrid model, as shown in Fig. 2, can be summarized as follows:

  1. 1

    Preprocess the dataset.

  2. 2

    Train ME-LSTM and postprocess its results.

  3. 3

    Train TI-LSTM and postprocess its results.

  4. 4

    Apply different strategies to combine these LSTMs and use their individual results.

Fig. 2
figure 2

Hybrid LSTM Model. The macroeconomic LSTM model is on the left, and the technical indicator LSTM is on the right

Baseline LSTMs

As a baseline, ME-LSTM and TI-LSTM were tested separately. Also, by combining all of the features of these two into a single model, we generated a third baseline model: ME-TI-LSTM.

Macroeconomic LSTM model

This LSTM model (ME_LSTM) was built to investigate the effects of macroeconomic factors on the price movement of the EUR/USD pair. These factors, which are explained in detail in “The data set” section, are listed below:

  • Interest rates of Germany and the EU

  • FED funds rate (for the US)

  • Inflation rates in the EU and the US

  • Close value of the S&P 500 market index

  • Close value of the DAX market index

After the preprocessing phase, the ME_LSTM model was trained using all of these macroeconomic factors together with the closing values of the EUR/USD pair.

Technical LSTM model

This LSTM model (TI_LSTM) is formed by using technical indicators to observe their effects on the price movement of the EUR/USD pair. These technical indicators are listed below:

  • MA with a period of 10

  • MACD with short- and long-term periods of 12 and 26, respectively

  • ROC with a period of 2

  • Momentum with a period of 4

  • RSI with a period of 10

  • BB with period of 20

  • CCI with a period of 20

After the preprocessing stage, the TI_LSTM model is trained using these seven technical indicators together with the closing values of the EUR/USD pair.

Macroeconomic and technical LSTM model

This LSTM model (ME_TI_LSTM) was formed using all of the macroeconomic and technical indicators taken together to observe the effects of the combined set of indicators. After the preprocessing stage, ME_TI_LSTM was trained using the macroeconomic and technical indicators mentioned above together with the closing values of the EUR/USD currency pair.

Proposed model: hybrid LSTM model

Our proposed model does not combine the features of the two baseline LSTMs into a single model. Instead, we propose a rule-based decision mechanism that acts as a kind of postprocessing; it is used to combine the results of the baselines into a final decision (Yıldırım and Toroslu 2019).

Training classifiers and labeling the data

We trained ME-LSTM, TI-LSTM, and ME-TI-LSTM using the same settings. The data set was split into the training and test sets, with ratios of 80% and 20%, respectively. The training phase was carried out with different numbers of iterations (50, 100, and 150).

Our data points were labeled based on a histogram analysis and the entropy approach. At the end of these operations, we divided the data points into three classes by using a threshold value:

  • \(Class\_inc\): Corresponds to an increase in a price that is more than a threshold value.

  • \(Class\_dec\): Corresponds to a decrease in a price that is more than a threshold value.

  • \(Class\_noact\): Corresponds to a price change that is less than a threshold value.

In addition to the usual classes, increase and decrease, we introduced a third class no_action, which corresponds to the changes remaining in a predefined threshold range that is sufficiently small and thus negligible. Only when a difference between two consecutive data points is greater/less than the threshold will the next data point be labeled as increase/decrease. Otherwise, we treated the next data point as unaltered. This new class enabled us to eliminate some data points for generating risky trade orders. This helped us improve our results compared to the binary classification results. This approach generates a fewer number of trades but with higher accuracy, as reported in “Experiments” section.

Histogram analysis and threshold calculation

In addition to the decrease and increase classes, we needed to determine the threshold we could use to generate a third class—namely, a no-action class—corresponding to insignificant changes in the data. Algorithm 1 was used to determine the upper bound of this threshold value. The aim was to prevent exploring all of the possible difference values and narrow the search space. In other words, we assumed that the optimal threshold value should be in the range of [0, threshold_upper_bound] instead of [0, max_of_differences].

The idea of Algorithm 1 is to determine the upper bound of the threshold based on 85% coverage of all differences. To do that, first, histogram analysis was performed on the closing prices of the EUR/USD pair to determine the distributions of price changes occurring in the data during consecutive days.

We placed the EUR/USD ratio differences between consecutive days into 10 bins (as \(number\_of\_bins\) value), which range equally between the minimum (which is 0) and maximum difference values. We determined the count of each bin and sorted them in descending order. After that, the counts of the bins were summed until the sum exceeded 85% of the whole count (the data set size). Then, the maximum difference value of the last bin added was used as the upper bound of the threshold value.

As can be seen in Algorithm 1, it has two phases. In the first phase, which simply corresponds to line 2, the whole data set is processed linearly to determine the distributions of the differences, using a simple histogram construction function. The second phase is depicted in detail, corresponding to the rest of the algorithm. To improve the threshold construction operation, an upper bound of the potential threshold was calculated as the value that is larger than 85% of the differences between two consecutive days’ closing values.

figure a

The threshold value should be determined based on entropy. Entropy is related to the distribution of the data. The following formula defines entropy where \(p_i\) corresponds to the probability of the occurrence of class i:

$$\begin{aligned} {Entropy} = {-\sum {{\mathbf {p}}_{i} * \log {{\mathbf {p}}_{i}}}}. \end{aligned}$$
(24)

To get balanced distribution, we calculated the entropy of class distribution in an iterative way for each threshold value up until the maximum difference value. However, we precalculated the threshold of the upper bound value and used it instead of the maximum difference value. After limiting the iteration number to the upper bound of the threshold found in the histogram analysis, we aimed to find the final threshold \(\tau \), which maximizes entropy. Algorithm 2 shows the details of our approach.

In Algorithm 2, to find the best threshold, potential threshold values are attempted with increments of 0.00001. Dropping the maximum threshold value is thus very important in order to reduce the search space. The main while loop is used to try each threshold value between 0 and the \(threshold\_upper\_bound\) with increments of 0.00001. For each threshold value, the number of increases (labeled as 2) and decreases (labeled as 1) above the threshold value are both determined, and the rest of the changes are assumed to be \(no\_change\) (labeled 0). Then, the entropy value for this distribution is calculated. At the end of the while loop, the distribution that gives the best entropy is determined, and that distribution is used to determine the increase, decrease, and no-change classes.

In our experiments, we observed that in most cases, the threshold upper bound approach significantly reduced the search space (i.e., searching for the threshold value). In a typical case, this improvement corresponds to reducing the search space to around 20% of the original. For example, in one case, the maximum difference value was 0.029, but our approach determined the upper bound of the threshold value to be 0.00652. In this case, the optimum threshold value was found to be 0.0023.

figure b

Postprocessing

The purpose of this processing is to determine the final class decision. We combined the predictions of the ME_LSTM and TI_LSTM models with the following set of rules:

  • If one model’s prediction is class_noact, then the final decision will be class_noact.

  • If both models agree on the labels, we set the final decision as this label.

  • If the predictions of the two models are different, we choose for the final decision the one whose prediction has higher probability. If the probability is the same, we choose the prediction of the TI_LSTM model.

This is a type of conservative approach to trading; it reduces the number of trades and favors only high-accuracy predictions.

Performance metric

Measuring the accuracy of the decisions made by these models also requires a new approach. Consider that during the testing phase of one of the LSTMs, our model predicts the class as “increase” (or “decrease”), but according to our three-class classification, it actually corresponds to a “no_act” class. In that case, we check if the actual movement is in the same direction with the prediction; that is, there was an “increase” (or “decrease”) but with less than the threshold value. If that is the case, then the prediction is correct, and we treat this test case as the correct classification.

We introduced a new performance metric to measure the success of our proposed method. We defined profit_accuracy as the accuracy that is related to the number of increases and decreases in the predicted labels. We can interpret this metric such that it gives the ratio of the number of profitable transactions over the total number of transactions, defined using Table 2. In the below formula, the following values are used:

  • True_dec: the number of true predictions decreases

  • True_inc: the number of true predictions increases

  • False_dec_noact: the number of predictions of the no-action class decreases

  • False_inc_noact: the number of predictions of the no-action class increases

  • False_inc_dec: the number of predictions of the decrease class increases

  • False_dec_inc: the number of predictions of the increase class decreases

Note that in the above formula, there is no case corresponding to the “True_inc_noact” and the “True_dec_noact” counts since we converted such decisions into “True_inc” and “True_dec,” respectively, as explained above.

$$\begin{aligned} {Profit Accuracy} = {\frac{True\_dec + True\_inc}{False\_dec\_noact + False\_inc\_noact + True\_dec + False\_inc\_dec + False\_dec\_inc + True\_inc}}. \end{aligned}$$
(25)
Table 2 Sample table for profit_accuracy calculation

Experiments

After applying the labeling algorithm, we obtained a balanced distribution of the three classes over the data set. This algorithm calculates different threshold values for each period and forms different sets of class distributions. For predictions of different periods, the thresholds and corresponding number of data points (explicitly via training and test sets) in each class are calculated, as shown in Table 3.

This table shows that the class distributions of the training and test data have slightly different characteristics. While the class decrease has a higher ratio in the training set and a lower ratio in the test set, the class increase shows opposite behavior. Class \(no\_action\), meanwhile, is more stable in both sets. This is because a split is made between the training and test sets without shuffling the data sets to preserve the order of the data points.

We collected daily EUR/USD rates for a total of 1214 consecutive days. We used the first 971 days of this data to train our models and the last 243 days to test them. Our models aims to determine if there will be an “increase” or “decrease” in the next day, 3 days ahead, and 5 days ahead of the day of the prediction. If one of these is predicted, a transaction is considered to be started on the test day ending on the day of the prediction (1, 3, or 5 days ahead). Otherwise, no transaction is started. A transaction is successful and the traders profit if the prediction of the direction is correct.

Table 3 Data set statistics (training and test sets)

Experiments on long-term real data

For time-series data, LSTM is typically used to forecast the value for the next time point. It can also forecast the values for further time points by replacing the output value with not the next time point value but the value for the chosen number of data points ahead. This way, during the test phase, the model predicts the value for that many time points ahead. However, as expected, the accuracy of the forecast usually diminishes as the distance becomes longer.

Zhang et al. (2017) used a very similar LSTM model for stock price prediction. They defined it as an n-step prediction as follows:

$$\begin{aligned} \acute{p}_{t+n} = f(p_1, p_2,\ldots , p_t). \end{aligned}$$

This simply corresponds to mapping the history of prices from \(p_1\) to \(p_t\) into n-steps ahead. They performed experiments for 1, 3, and 5 days ahead. In their experiments, the accuracy of the prediction decreased as n became larger.

Our experiments also involved 1-day, 3-day, and 5-day predictions of the directional movement of the EUR/USD currency pair. We used individual LSTM models and the simple combined LSTM as baselines and compared them with our proposed hybrid model. We also present the number of total transactions made on test data for each experiment. Accuracy results are obtained for transactions that are made.

For each experiment, we performed 50, 100, 150, and 200 iterations in the training phases to properly compare different models. The execution times of the experiments were almost linear with the number of iterations. For our data set, using a typical high-end laptop (MacBook Pro, 2.7 GHz dual-core Intel Core i5 processor, 8 GB memory, 256 GB disk space), the training phase for 200 iterations took more than 7 h.

Forecasting one day ahead

Macroeconomic LSTM model results

As seen in Table 4, this model shows huge variance in the number of transactions. Meanwhile, the profit_accuracy results show small variance, with 50.69% ± 3,72% accuracy on average. Additionally, the average predicted transaction number is 149.50, which corresponds to 61.52% of the test data.

Table 4 ME_LSTM model: one-day-ahead result summary

Technical LSTM model results

In these experiments, whose results are shown in Table 5, the profit_accuracy results are also close to each other, with 52.18% ± 1.93% accuracy on average. For this LSTM model, the average predicted transaction number is 155.25, which corresponds to 63.89% of the test data.

Table 5 TI_LSTM model: one-day-ahead result summary

Macroeconomic and technical LSTM model results

The results for this model are shown in Table 6. The profit_accuracy results have higher variance, with 53.05% ± 7.42% accuracy on average. The average predicted transaction number is 157.25, which corresponds to 64.71% of the test data. One major difference of this model is that it is for 200 iterations. For this test case, the accuracy significantly increased, but the number of transactions dropped even more significantly.

Table 6 ME_TI_LSTM model: one-day-ahead result summary

Hybrid LSTM model results

Table 7 summarizes the profit_accuracy values and the number of transactions for each case in this model. In some experiments, the number of transactions is quite low. In particular, for 200 iterations, our model generated very few transactions, which corresponds to the “increase” and “decrease” predictions. Basically, the total number of decrease and increase predictions are in the range of [8, 137], with an overall average of 64.75. That value corresponds to a transaction ratio of \({64.75/243 = 26.65}\)%. Moreover, we obtained an average profit_accuracy in 16 cases of 77.32% ± 7.82% and 77.76% ± 8.33% for ME_LSTM- and TI_LSTM-based modified hybrid models, respectively, where 7.82 and 8.33 represent standard deviations.

When we analyze the results for one-day-ahead predictions, we observe that although the baseline models made more transactions (89.25 more on average out of 243), our hybrid model predicted more accurately (25,57% better on average).

Table 7 Hybrid model: one-day-ahead predictions

Forecasting three days ahead

Macroeconomic LSTM model results

Table 8 presents the results of these experiments. According to the results, profit_accuracy had high variance, with 51.31% ± 7.83% accuracy on average. Additionally, the average predicted transaction number is 174.50, which corresponds to 71.81% of the test data. One significant observation concerns the huge drop in the number of transactions for 200 iterations without any increase in accuracy.

Table 8 ME_LSTM model: three-days-ahead result summary

Technical LSTM model results

As shown in Table 9, in this set of experiments, the profit_accuracy results showed smaller variance, with 48.58% ± 3.95% on average. Furthermore, the variance in the number of transactions is also smaller; the average predicted transaction number is 146.50, which corresponds to 60.29% of the test data. There is a drop in the number of transactions for 200 iterations but not as much as with the macroeconomic LSTM.

Table 9 TI_LSTM model: three-days-ahead result summary

Macroeconomic and technical LSTM model results

The results for this model are presented in Table 10. The profit_accuracy results are very close to each other, except at 200 iterations, with 53.84% ± 21.25% accuracy on average. Additionally, the average predicted transaction number is 158.50, which corresponds to 65.23% of the test data. However, the case with 200 iterations is quite different from the others, with only 10 transactions out of a possible 243 generating a very high profit accuracy.

Table 10 ME_TI_LSTM model: three-days-ahead result summary

Hybrid LSTM model results

Table 11 shows the profit_accuracy values and the number of transactions for each case. The total number of “decrease” and “increase” predictions are in the range of [2, 155]. On average, this value is 65.13, which corresponds to a transaction ratio of \({65.13/243 = 26.80}\)%. Moreover, the average profit_accuracies are 78.98% ± 15.02% and 79.23% ± 15.06% for the ME_LSTM- and TI_LSTM-based modified hybrid models, respectively. There are also some very striking cases with 100% accuracy, involving 200 iterations for at least one of the LSTM models. However, all of these cases produced a very small number of transactions.

When we compare the results, similar to the one-day-ahead cases, we observe that the baseline models produced more transactions (more than 94.70 out of 243 on average), but the hybrid model predicted more accurately (27.87% better on average).

Table 11 Hybrid model: three-days-ahead predictions

Forecasting 5 days ahead

Macroeconomic LSTM model results

The results of these experiments are shown in Table 12. According to the results, the profit_accuracy values have small variance, with 47.31% ± 4.71% accuracy on average. Additionally, the average predicted transaction number is 206.25, corresponding to 85.23% of the test data.

Table 12 ME_LSTM model: five-days-ahead result summary

Technical LSTM model results

Table 13 shows the results of these experiments. The profit_accuracy results have higher variance in these experiments, especially in the case of 200 iterations, with 49.88% ± 9.92% accuracy on average. The average predicted transaction number is 151.50, corresponding to 62.60% of the test data. Again, the case of 200 iterations shows huge differences from the other cases, generating less than half the number of the lowest number of transactions generated by the others.

Table 13 TI_LSTM model: five-days-ahead result summary

Macroeconomic and technical LSTM model results

Table 14 shows the results of these experiments. Similar to the technical LSTM model, the profit_accuracy results are close to each other, except at 200 iterations, with an overall average accuracy of 48.73% ± 8.49%. Meanwhile, the average predicted transaction number is 138.75, corresponding to 57.34% of the test data. However, the case of 200 iterations is not an exception, and there is huge variance among the cases.

Table 14 ME_TI_LSTM model: five-days-ahead result summary

Hybrid LSTM model results

Table 15 presents the profit_accuracy values and the number of transactions for each case in these experiments. The total number of “decrease” and “increase” predictions is in the range of [0, 112]. On average, this value is 69.31, corresponding to a transaction ratio of \({69.31/242 = 28.64}\)%. Moreover, the overall average profit_accuracies are 84.08% ± 6.54% and 83.44% ± 6.69% for the ME_LSTM- and TI_LSTM-based modified hybrid models, respectively.

From the five-days-ahead prediction experiments, we observe that, similar to the one-day- and three-days-ahead experiments, the baseline models produced more transactions (more than 96.19 on average out of 242), but the hybrid model predicted more accurately (35.12% better on average).

Table 15 Hybrid model: five-days-ahead predictions

Experiments using recent real data

To further validate our results, we extended our data set to include a very recent one—namely, EUR/USD rates from January 1, 2018, to April 1, 2019. This extended data set has 1539 data points, which contain 761 increases and 777 decreases overall. Applying our labeling algorithm, we formed a data set with a balanced distribution of three classes. Table 16 presents the statistics of the extended data set.

Table 16 Extended data set statistics (training and test sets)

The extended data set is split into training and test sets, with ratios of 90% and 10%, respectively. Below, we report one-day-, three-days-, and five-days-ahead prediction results for our hybrid model based on the extended data.

Forecasting one day ahead

Table 17 presents the profit_accuracy values and the number of transactions for each case. The total number of “decrease” and “increase” predictions is in the range of [52, 97]. The average the number of predictions is 73.19, corresponding to a transaction ratio of \({73.19/152 = 48.15}\)%. Moreover, the average profit_accuracies in the 16 cases are 70.93% ± 10.60% and 72.19% ± 10.14% for the ME_LSTM- and TI_LSTM-based modified hybrid models, respectively.

Table 17 Hybrid model (on extended dataset): one-day-ahead predictions

Forecasting three days ahead

The results of these experiments are shown in Table 18. The total number of generated transactions is in the range of [2, 83]. Some cases with 200 iterations produced a very small number of transactions. The average number of transactions is 39.88, for a transaction ratio of \({39.88/152 = 26.24}\)%. Also, the average profit_accuracies are 71.76% ± 13.77% and 70.30% ± 14.15% for the ME_LSTM- and TI_LSTM-based modified hybrid models respectively.

Table 18 Hybrid model (on extended dataset): three-days-ahead predictions

Forecasting 5 days ahead

Table 19 shows the results for the five-days-ahead prediction experiments. Interestingly, the total numbers predictions are much closer to each other in all of the cases compared to the one-day- and three-days-ahead predictions. These numbers are in the range of [59, 84]. On average, the number of transactions is 71.56, corresponding to a transaction ratio of \({71.56/152 = 47.08}\)%. Moreover, the average profit_accuracy values are 71.24% ± 5.40% and 68.25% ± 4.95% for the ME_LSTM- and TI_LSTM-based modified hybrid models, respectively.

Table 19 Hybrid model (on extended dataset): five-days-ahead predictions

Discussion

Table 20 summarizes the overall results of the experiments. In the one-day-ahead predictions, the individual LSTM models had a slightly better profit_accuracy than ME_TI_LSTM, which was less than 1%. However, they produced 3.91% fewer transactions than ME_TI_LSTM on average. Moreover, when we combined the predictions of the individual models in our proposed model, it reached a much higher profit_accuracy of 73.09% (22.30% improvement) on average while reducing the number of transactions to 37.96%.

In the three-days-ahead predictions, the individual models had even better profit_accuracy results than ME_TI_LSTM by 5.81% but, again, with fewer transactions on average. In these experiments, there were huge differences in terms of the number of transactions generated by the two different LSTMs. While ME_LSTM produced more than 90% of the transactions, TI_LSTM only generated around 66%. Moreover, our proposed hybrid model showed a much better performance than the other three with a profit_accuracy of 68.31% (a 19.29% average improvement over the others). As in the above case, this higher accuracy was obtained by reducing the number of transactions to 42.57%.

Finally, in the five-days ahead predictions, the profit_accuracy results for individual LSTMs and the ME_TI_LSTM were very close. Similar to the three-days-ahead prediction, ME_LSTM produced a very high number of transactions, with more than 97%, while ME_TI_LSTM had the lowest, with an accuracy of around 63%. Moreover, the hybrid model showed an exceptional accuracy performance of 79.42% (34.33% improvement) by reducing the number of transactions to 32.72%.

Additional results of these experiments can be summarized as follows:

  • ME_LSTM: The profit_accuracy of three-days-ahead predictions was slightly better than that of one-day-ahead predictions (by just 0.40%). Also, both were higher than the five-days-ahead predictions, by 5.48% and 5.08%, respectively. The number of transactions became higher with further forecasting, for 87.52% on average. It is difficult to form a simple interpretation of these results, but, in general, we can say that with macroeconomic indicators, more transactions are generated.

  • TI_LSTM: Profit_accuracy decreased when we extended the prediction period, falling within the range of [45.11–51.43%]. The number of transactions was less in the five-days-ahead predictions than in the one-day and three-day predictions. Compared to the ME_LSTM, these results show that there is no winner between the two individual LSTMs.

  • ME_TI_LSTM: The profit_accuracy of the one-day-ahead predictions was the highest, at 49.89%. Additionally, the profit_accuracy of the five-days-ahead prediction was 1.34% higher than the three-days-ahead prediction. The transaction number ratio over the test data varied and was around 75.44% on average. These results also show that a simple combination of two sets of indicators did not produce better results than those obtained individually from the two sets.

  • Hybrid model: Our proposed model, as expected, generated much higher accuracy results than the other three models. In both one-day- and three-days-ahead cases, the improvement was above 20%, and in the five-days-ahead case, it was even higher, with an improvement of more than 30% compared to the other three LSTMs. Interestingly, the performance of profit_accuracy was the highest in the five-days-ahead predictions. Moreover, in all cases, it generated the smallest number of transactions compared to the other models (40.37% on average).

The main motivation for our hybrid model solution was to avoid the drawbacks of the two different LSTMs (i.e., macroeconomic and technical LSTMs). When the ME_LSTM and TI_LSTM were executed separately using the features of their corresponding data sets (i.e., macroeconomic features and technical indicator features), they generated too many transactions. Some of these transactions were generated with not very good signals and thus had lower accuracy results. When all features were simply appended to each other, in what we call ME_TI_LSTM, the results did not change much.

Although the two individual baseline LSTMs used completely different data sets, their results seemed to be very similar. Actually, their accuracy results can be interpreted as failure since they were around 50%. Even though LSTMs are, in general, quite successful in time-series predictions, even for applications such as stock price prediction, when it comes to predicting price direction, they fail if used directly. That is why there are not many results reported involving using LSTMs for Forex.

Moreover, combining two data sets into one seemed to improve accuracy only slightly. For that reason, we developed a hybrid model that takes the results of two individual LSTMs separately and merges them using smart decision logic. In real data, fluctuations in the EUR/USD ratio are usually very small. That is why incorrect directional predictions made by LSTMs correspond to a very small amount of errors. This causes LSTMs to produce models making many such predictions with incorrect directions.

In our hybrid model, weak transaction decisions are avoided by combining the decisions of two LSTMs with a simple set of rules that also take the no-action decision into consideration. This extension significantly reduced the number of transactions, by mostly preventing risky ones. As can be seen in Table 20, which summarizes all of the results, the new approach predicted fewer transactions than the other models. Moreover, the accuracy of the proposed transactions of the hybrid approach is much higher than that of the other models.

Table 20 Summary of all experiments conducted on the main data set

Comparing the performances of the hybrid model on the main data set and the extended data set, we see some decreases in the profit_accuracy results and some changes in the number of transactions. We present this comparison in Table 21. From these results, we can say the hybrid model’s behavior on the extended data set was very similar to that obtained using the main data set. In other words, the best performance occurred for five-days-ahead predictions, and one-day-ahead predictions is slightly better than three-days-ahead predictions, by 0.33%. Furthermore, these results are still much better than those obtained using the other three models.

We can also conclude that as the number of transactions increased, it reduced the accuracy of the model. This was an expected result, and it was observed in all of the experiments. Depending on the data set, the number of transactions generated by our model could vary. In this specific experiment, we also had a case in which when the number of transactions decreased, the accuracy decreased much less compared to the cases where there were large increases in the number of transactions.

In most financial markets, accurate predictions above 50% technically generate profits. Considering other costs and risks, we can conclude that more than 60% prediction accuracy is a very successful result, and we showed that our hybrid model always had an accuracy of greater than 60%.

Table 21 Performance comparison of hybrid model

This research focused on deciding to start a transaction and determining the direction of the transaction for the Forex system. In a real Forex trading system, there are further important considerations. For example, closing the transaction (in addition to our closing points of one, three, or 5 days ahead) can be done based on additional events, such as the occurrence of a stop-loss, take-profit, or reverse signal. Another important consideration could be related to account management. The amount of the account to be invested at each transaction could vary. The simplest model might invest the whole remaining account at each transaction. However, this approach is risky, and there are different models for account management, such as always investing a fixed percentage at each transaction. Another important decision is how to determine the leverage ratio to be chosen for each transaction. Simple models use fixed ratios for all transactions. Moreover, the leverage ratio can be determined using the strength of model’s decision.

Conclusion

This study applied two separate LSTM models to forecast the directional movement of the EUR/USD currency pair. Our predictions included periods of one day, three days, and 5 days ahead. We designed a classifier to determine the direction of the EUR/USD pair. In our proposed model, there are three classes: \(no\_action\), decrease, and increase. \(No\_action\) means that if the changes between two time points are below a predefined threshold, they are negligible and require no action. This enabled us to introduce a new performance metric, \(profit\_accuracy\), which gives us the ratio of the number of profitable transactions over the total number of transactions. We simply defined profitable transaction as a correct prediction of the decrease and increase classes. Predicting the correct direction of a currency pair presents the opportunity to profit from the transactions. This was the main objective of our study. This metric met our expectation completely since the predicted class \(no\_action\) had no contribution to the profit/loss of the transaction.

We used a balanced data set with almost the same number of increases and decreases. Thus, our results were not biased. Two baseline models were implemented, using only macroeconomic or technical indicator data. We observed that, compared to TI_LSTM, ME_LSTM had a slightly better performance in terms of both profit_accuracy and the number of transactions generated. However, the difference was very small and insignificant. Furthermore, combining all of the features into a single LSTM, called ME_TI_LSTM, did not significantly increase accuracy.

Meanwhile, our proposed hybrid model had the best performance in terms of profit_accuracy for predictions in all periods (73.61% on average. It reduced the number of transactions compared to the baseline models (40.37% on average). The increase in accuracy can be attributed to dropping risky transactions.

The proposed hybrid model was also tested using a recent data set. The results of the experiments were in line with the other experiments, showing only a small decrease in profit_accuracy.

The main contributions of this work can be summarized as follows:

  • Macroeconomic and technical indicators can both be used to train LSTMs, separately or together, to predict the directional movement of currency pairs in Forex. We showed that rather than combining these parameters into a single LSTM, processing them separately with different LSTMs and combining their results using smart decision logic improved prediction accuracy significantly.

  • Rather than trying to determine whether the currency pair rate will increase or decrease, a third class was introduced—a no-change class—corresponding to small changes between the prices of two consecutive days. This, too, improved the accuracy of direction prediction. We described a novel way to determine the most appropriate threshold value for defining the no-change class.

  • LSTMs can be trained to determine not only the next day’s value but also the values for k-days ahead. We used this feature to predict three days and 5 days ahead, with some decreases in accuracy values.

  • Typically, the accuracy of LSTMs can be improved by increasing the number of iterations during training. We experimented with various iterations to determine their effects on accuracy values. The results showed that more iterations increased accuracy while decreasing the number of transactions (i.e., potential profits and risks are simultaneously reduced).

In future research, our work could be extended to other currency pairs, such as EUR/GBP, GBP/USD, USD/CHF, GBP/CHF, and EUR/CHF. Additionally, a trading simulator could be developed to further validate the model. Such a simulator could be useful for observing the real-time behavior of our model. However, for such a simulator to be meaningful, several issues related to real trading (e.g., closing the account, account management, leverage ratio decision) must be carefully investigated.