### What is an ETF?

ETF stands for Exchange Traded Funds. As the term implies, these are investment funds that are traded like a stock on an exchange. You can buy and sell ETFs intraday like any stock. You cannot really do that with a Mutual fund.

### Who trades ETF? Why is it popular?

Investors across the globe especially United States, Canada and Europe regions, are moving away from their active portfolios and investing into ETFs. Trillions of dollars are invested in Exchange Traded Funds. ETFs are tax friendly and less risky than a mutual fund. It is safer than other instruments like Futures and Options.

### Who are all involved in ETF?

There are various actors involved in ETF. ETF creators, buyers, sellers etc. Different players are – Market makers, Authorized Participants (APs), investors, retail traders, specialists and institutional investors. Investors can be hedge fund managers, high net worth individuals (HNIs) and the like

### The problem statement

Exchange Traded Funds have two prices

1. The net asset value – NAV price

2. Fair Value – the actual value of the ETF based on the underlying assets

Typically, NAV trades close to the Fair value. But all funds get out of balance due to volatality, news, rumours, greed, volume spikes and more. This is where arbitrage trading comes into picture. The difference between the NAV price and the Fair price is where traders can make money at the end of the day

### Where does Deep Learning / Neural Networks come into the picture?

Artificial Neural Networks (ANNs) are built on architectures and learning algorithms that resemble features of the human brain. ANNs do not rely on underlying probability distributions or likelihoods to be maximized. The approach determines relationships in the data itself without any assumptions, no matter how complex mathemmatically the relation is. For regression problems like the above, MLP (Multi Layer Perceptron) is the best approach to get fantastic results.

For solving the ETF problem above, a Deep Learning approach using ANNs, will predict the difference between NAV and Fair Price with good accuracy, allowing the traders to make informed decisions. By using this solution, a hedge fund trader made 16 million dollars profit in a single day.

### Solution Approach

– Collect data containing NAV and Fair Price of ETFs

– Clean and format the data as necessary

– Train and validate ANNs on the data

– Collect new data using current approximations and interpolations

– Compare the results of the ANN with current approximations

### Softwares Used

AWS Kinesis, S3 buckets, MySQL – for data collection

MS Excel, Tableau, MySQL – data formatting, visualization and data transfer

Python, Tensorflow, matplotlib, pandas, numpy – for data splitting, model construction, model training & model validation

### Solution Details

- Solution was designed as an Artificial Neural Network (ANN) with an input layer, multiple hidden layers and an output layer.
- Input layer consisted of features like OHLC, volatility, volume data, sentiment analysis and more. For removing redundant factors and normalize the data, Principal Component Analysis (PCA) was applied.
- In the hidden layers, each neuron computes a weighted sum of all inputs leading to it, adds a bias term and computes a transformation of that sum. The transformed sum is passed on as an input to the nodes in the next layer until the output is attained.
- ReLU was used as the activation function as it trains the model faster and avoids the vanishing gradient problem. 3 layers with 400 nodes in each layer gave a very low MSE.
- When the calculations of the ANN are complete, the output is compared against the labeled values of the training data and a cost function is computed.

y = **Σ**wx + b;

x = **σ(**y)

- Backpropagation is used to send values to the previous layers as a learning algorthm to compute a gradient descent with respect to weights. Weights are updated backwards from output towards input.
- Computed outputs are compared with original outputs, and then tuned by adjusting weights and bias to narrow the difference and minimize the cost function.
- Mean square error (MSE) is used as the cost function here as it has a proven history for using ANNs with derivatives pricing.
- The procedure is repeated until the error between the predicted output and actual output is minimized at which point, the model is considered to be trained and ready to investigate new data

- As an alternative, Adam optimizer which is a variation of the gradient descent, was used to improve the efficiency of the model.

Adam (Adaptive Moment) works by adapting the learning rates for each parameter. The method stores an exponentially decaying average of previous gradients.

### Training and Test Data

The data needs to be split into two sets. One set of data will be used to train the model, while the other set of data will be used to validate the training. The ratio of train to test data can be 80:20 or 75:25 or similar depending on the amount of the data available for the use case.

Approximately 20 million rows of data was imported from data feeds and stored files using Kinesis and python scripts. The rows were shuffled for random order. The data was split into 90/10 considering the huge amount of input data. train_test_split() function was used with 0.1 as the test data in python. model.fit() and model.score() were applied for evaluating the score for the accuracy of the model.

### Conclusion

Demand for ETFs have increased tremendously in the recent few years. Investors and authorized participants use arbitrage trading to make huge sums of money by predicting the difference between the net asset value(NAV) and the fair price of an ETF.While there are mathematically available solutions to calculate the present values of financial instruments they are not accurate and may give completely wrong results due to volatility.

The solution of using deep learning enables the program to learn the complex relationships in the data using the neural networks approach makes predictions with great accuracy and helps traders make informed decisions.