In previous post, we analyzed raw price changes of cryptocurrencies. The problem with that approach is that prices of different cryptocurrencies are not normalized and we cannot use comparable metrics.

In this post, we describe benefits of using log returns for analysis of price changes. You can download this Jupyter Notebook and the data.

Follow me on twitter to get latest updates.

Bitcoin, Ethereum, and Litecoin Log Returns
Bitcoin, Ethereum, and Litecoin Log Returns

Disclaimer

I am not a trader and this blog post is not a financial advice. This is purely introductory knowledge. The conclusion here can be misleading as we analyze the time period with immense growth.

Requirements

For other requirements, see my first blog post of this series.

Load the data

import pandas as pd

df_btc = pd.read_csv('BTC_USD_Coinbase_hour_2017-12-24.csv', index_col='datetime')
df_eth = pd.read_csv('ETH_USD_Coinbase_hour_2017-12-24.csv', index_col='datetime')
df_ltc = pd.read_csv('LTC_USD_Coinbase_hour_2017-12-24.csv', index_col='datetime')
df = pd.DataFrame({'BTC': df_btc.close,
                   'ETH': df_eth.close,
                   'LTC': df_ltc.close})
df.index = df.index.map(pd.to_datetime)
df = df.sort_index()
df.head()
BTC ETH LTC
2017-10-02 08:00:00 4448.85 301.37 54.72
2017-10-02 09:00:00 4464.49 301.84 54.79
2017-10-02 10:00:00 4461.63 301.95 54.63
2017-10-02 11:00:00 4399.51 300.02 54.01
2017-10-02 12:00:00 4383.00 297.51 53.71
df.describe()
BTC ETH LTC
count 2001.000000 2001.000000 2001.000000
mean 9060.256122 407.263793 106.790100
std 4404.269591 149.480416 89.142241
min 4150.020000 277.810000 48.610000
25% 5751.020000 301.510000 55.580000
50% 7319.950000 330.800000 63.550000
75% 11305.000000 464.390000 100.050000
max 19847.110000 858.900000 378.660000

Why Log Returns?

Benefit of using returns, versus prices, is normalization: measuring all variables in a comparable metric, thus enabling evaluation of analytic relationships amongst two or more variables despite originating from price series of unequal values (for details, see Why Log Returns).

Let’s define return as:

where $r_i$ is return at time $i$, $p_i$ is the price at time $i$ and $j = i-1$.

Calculate Log Returns

Author of Why Log Returns outlines several benefits of using log returns instead of returns so we transform returns equation to log returns equation:

Now, we apply the log returns equation to closing prices of cryptocurrencies:

import numpy as np

# shift moves dates back by 1
df_change = df.apply(lambda x: np.log(x) - np.log(x.shift(1))) 
df_change.head()
BTC ETH LTC
2017-10-02 08:00:00 NaN NaN NaN
2017-10-02 09:00:00 0.003509 0.001558 0.001278
2017-10-02 10:00:00 -0.000641 0.000364 -0.002925
2017-10-02 11:00:00 -0.014021 -0.006412 -0.011414
2017-10-02 12:00:00 -0.003760 -0.008401 -0.005570

Visualize Log Returns

We plot normalized changes of closing prices for last 50 hours. Log differences can be interpreted as the percentage change.

df_change[:50].plot(figsize=(15, 10)).axhline(color='black', linewidth=2)
Bitcoin, Ethereum, and Litecoin Log Returns for last 50 hours
Bitcoin, Ethereum, and Litecoin Log Returns for last 50 hours

Are LTC prices distributed log-normally?

If we assume that prices are distributed log-normally, then $log(1 + r_i)$ is conveniently normally distributed (for details, see Why Log Returns)

On the chart below, we plot the distribution of LTC hourly closing prices. We also estimate parameters for log-normal distribution and plot estimated log-normal distribution with a red line.

from scipy.stats import lognorm
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))

values = df['LTC']

shape, loc, scale = stats.lognorm.fit(values) 
x = np.linspace(values.min(), values.max(), len(values))
pdf = stats.lognorm.pdf(x, shape, loc=loc, scale=scale) 
label = 'mean=%.4f, std=%.4f, shape=%.4f' % (loc, scale, shape)

ax.hist(values, bins=30, normed=True)
ax.plot(x, pdf, 'r-', lw=2, label=label)
ax.legend(loc='best')
Distribution of LTC prices
Distribution of LTC prices

Are LTC log returns normally distributed?

On the chart below, we plot the distribution of LTC log returns. We also estimate parameters for normal distribution and plot estimated normal distribution with a red line.

import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

values = df_change['LTC'][1:]  # skip first NA value
x = np.linspace(values.min(), values.max(), len(values))

loc, scale = stats.norm.fit(values)
param_density = stats.norm.pdf(x, loc=loc, scale=scale)
label = 'mean=%.4f, std=%.4f' % (loc, scale)

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(values, bins=30, normed=True)
ax.plot(x, param_density, 'r-', label=label)
ax.legend(loc='best')
Distribution of LTC Log Returns
Distribution of LTC Log Returns

Pearson Correlation with log returns

We calculate Pearson Correlation from log returns. The correlation matrix below has similar values as the one at Sifr Data. There are differences because:

Observations

  • BTC and ETH have moderate positive relationship,
  • LTC and ETH have strong positive relationship.
import seaborn as sns
import matplotlib.pyplot as plt

# Compute the correlation matrix
corr = df_change.corr()

# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(10, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, annot=True, fmt = '.4f', mask=mask, center=0, square=True, linewidths=.5)
Correlation matrix with BTC, ETH and LTC
Correlation matrix with BTC, ETH and LTC

Conclusion

We showed how to calculate log returns from raw prices with a practical example. This way we normalized prices, which simplifies further analysis. We also showed how to estimate parameters for normal and log-normal distributions.