In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self-affinity of a signal. It is useful for analyzing time series that appear to be long-memory processes and noise.
█ OVERVIEW
We have introduced the concept of Hurst Exponent in our previous open indicator Hurst Exponent (Simple). It is an indicator that measures market state from autocorrelation. However, we apply a more advanced and accurate way to calculate Hurst Exponent rather than simple approximation. Therefore, we recommend using this version of Hurst Exponent over our previous publication going forward. The method we used here is called detrended fluctuation analysis. (For folks that are not interested in the math behind the calculation, feel free to skip to "features" and "how to use" section. However, it is recommended that you read it all to gain a better understanding of the mathematical reasoning).
█ Detrend Fluctuation Analysis
Detrended Fluctuation Analysis was first introduced by by Peng, C.K. (Original Paper) in order to measure the long-range power-law correlations in DNA sequences . DFA measures the scaling-behavior of the second moment-fluctuations, the scaling exponent is a generalization of Hurst exponent .
The traditional way of measuring Hurst exponent is the rescaled range method. However DFA provides the following benefits over the traditional rescaled range method ( RS ) method:
• Can be applied to non-stationary time series. While asset returns are generally stationary, DFA can measure Hurst more accurately in the instances where they are non-stationary.
• According the the asymptotic distribution value of DFA and RS , the latter usually overestimates Hurst exponent (even after Anis- Llyod correction) resulting in the expected value of RS Hurst being close to 0.54, instead of the 0.5 that it should be. Therefore it's harder to determine the autocorrelation based on the expected value. The expected value is significantly closer to 0.5 making that threshold much more useful, using the DFA method on the Hurst Exponent (HE).
• Lastly, DFA requires lower sample size relative to the RS method. While the RS method generally requires thousands of observations to reduce the variance of HE, DFA only needs a sample size greater than a hundred to accomplish the above mentioned. (Referenece)
█ Calculation
DFA is a modified root-mean-squares ( RMS ) analysis of a random walk. In short, DFA computes the RMS error of linear fits over progressively larger bins (non-overlapped “boxes” of similar size) of an integrated time series.
1. Our signal time series is the log returns. First we subtract the mean from the log return to calculate the demeaned returns. Then, we calculate the cumulative sum of demeaned returns resulting in the cumulative sum being mean centered and we can use the DFA method on this. The subtraction of the mean eliminates the “global trend” of the signal. The advantage of applying scaling analysis to the signal profile instead of the signal, allows the original signal to be non-stationary when needed. (For example, this process converts an i.i.d. white noise process into a random walk.)
2. We slice the cumulative sum into windows of equal space and run linear regression on each window to measure the linear trend. After we conduct each linear regression . We detrend the series by deducting the linear regression line from the cumulative sum in each windows. The fluctuation is the difference between cumulative sum and regression.
3. We use different windows sizes on the same cumulative sum series. The window sizes scales are log spaced. Eg: powers of 2, 2,4,8,16... This is where the scale free measurements come in, how we measure the fractal nature and self similarity of the time series, as well as how the well smaller scale represent the larger scale.
As the window size decreases, we uses more regression lines to measure the trend. Therefore, the fitness of regression should be better with smaller fluctuation. It allows one to zoom into the “picture” to see the details. The linear regression is like rulers. If you use more rulers to measure the smaller scale details you will get a more precise measurement.
The exponent we are measuring here is to determine the relationship between the window size and fitness of regression (the rate of change ). The more complex the time series are the more it will depend on decreasing window sizes (using more linear regression lines to measure). The less complex or the more trend in the time series, it will depend less. The fitness is calculated by the average of root mean square errors ( RMS ) of regression from each window.
4. Root mean Square error is calculated by square root of the sum of the difference between cumulative sum and regression. The following chart displays average RMS of different window sizes. As the chart shows, values for smaller window sizes shows more details due to higher complexity of measurements.
5. The last step is to measure the exponent . In order to measure the power law exponent . We measure the slope on the log-log plot chart. The x axis is the log of the size of windows, the y axis is the log of the average RMS . We run a linear regression through the plotted points. The slope of regression is the exponent . It's easy to see the relationship between RMS and window size on the chart. Larger RMS equals less fitness of the regression. We know the RMS will increase (fitness will decrease) as we increases window size (use less regressions to measure), we focus on the rate of RMS increasing (how fast) as window size increases.
If the slope is < 0.5, It means the rate of of increase in RMS is small when window size increases. Therefore the fit is much better when it's measured by a large number of linear regression lines. So the series is more complex. (Mean reversion, negative autocorrelation).
If the slope is > 0.5, It means the rate of increase in RMS is larger when window sizes increases. Therefore even when window size is large, the larger trend can be measured well by a small number of regression lines. Therefore the series has a trend with positive autocorrelation.
If the slope = 0.5, It means the series follows a random walk.
█ FEATURES
• Sample Size is the lookback period for calculation. Even though DFA requires a lower sample size than RS , a sample size larger > 50 is recommended for accurate measurement.
• When a larger sample size is used (for example = 1000 lookback length), the loading speed may be slower due to a longer calculation. Date Range is used to limit numbers of historical calculation bars. When loading speed is too slow, change the data range "all" into numbers of weeks/days/hours to reduce loading time. (Credit to allanster)
• “show filter” option applies a smoothing moving average to smooth the exponent .
• Log scale is my work around for dynamic log space scaling. Traditionally the smallest log space for bars is power of 2. It requires at least 10 points for an accurate regression, resulting in the minimum lookback to be 1024. I made some changes to round the fractional log space into integer bars requiring the said log space to be less than 2.
• For a more accurate calculation a larger "Base Scale" and "Max Scale" should be selected. However, when the sample size is small, a larger value would cause issues. Therefore, a general rule to be followed is: A larger "Base Scale" and "Max Scale" should be selected for a larger the sample size. It is recommended for the user to try and choose a larger scale if increasing the value doesn't cause issues.
The following chart shows the change in value using various scales. As shown, sometimes increasing the value makes the value itself messy and overshoot.
When using the lowest scale (4,2), the value seems stable. When we increase the scale to (8,2), the value is still alright. However, when we increase it to (8,4), it begins to look messy. And when we increase it to (16,4), it starts overshooting. Therefore, (8,2) seems to be optimal for our use.
█ How to Use
Similar to Hurst Exponent (Simple). 0.5 is a level for determine long term memory.
• In the efficient market hypothesis, market follows a random walk and Hurst exponent should be 0.5. When Hurst Exponent is significantly different from 0.5, the market is inefficient.
• When Hurst Exponent is > 0.5. Positive Autocorrelation. Market is Trending. Positive returns tend to be followed by positive returns and vice versa.
• Hurst Exponent is < 0.5. Negative Autocorrelation. Market is Mean reverting. Positive returns trends to follow by negative return and vice versa.
However, we can't really tell if the Hurst exponent value is generated by random chance by only looking at the 0.5 level. Even if we measure a pure random walk, the Hurst Exponent will never be exactly 0.5, it will be close like 0.506 but not equal to 0.5. That's why we need a level to tell us if Hurst Exponent is significant.
So we also computed the 95% confidence interval according to Monte Carlo simulation. The confidence level adjusts itself by sample size. When Hurst Exponent is above the top or below the bottom confidence level, the value of Hurst exponent has statistical significance. The efficient market hypothesis is rejected and market has significant inefficiency.
The state of market is painted in different color as the following chart shows. The users can also tell the state from the table displayed on the right.
An important point is that Hurst Value only represents the market state according to the past value measurement. Which means it only tells you the market state now and in the past. If Hurst Exponent on sample size 100 shows significant trend, it means according to the past 100 bars, the market is trending significantly. It doesn't mean the market will continue to trend. It's not forecasting market state in the future.
However, this is also another way to use it. The market is not always random and it is not always inefficient, the state switches around from time to time. But there's one pattern, when the market stays inefficient for too long, the market participants see this and will try to take advantage of it. Therefore, the inefficiency will be traded away. That's why Hurst exponent won't stay in significant trend or mean reversion too long. When it's significant the market participants see that as well and the market adjusts itself back to normal.
The Hurst Exponent can be used as a mean reverting oscillator itself. In a liquid market, the value tends to return back inside the confidence interval after significant moves(In smaller markets, it could stay inefficient for a long time). So when Hurst Exponent shows significant values, the market has just entered significant trend or mean reversion state. However, when it stays outside of confidence interval for too long, it would suggest the market might be closer to the end of trend or mean reversion instead.
Larger sample size makes the Hurst Exponent Statistics more reliable. Therefore, if the user want to know if long term memory exist in general on the selected ticker, they can use a large sample size and maximize the log scale. Eg: 1024 sample size, scale (16,4).
Following Chart is Bitcoin on Daily timeframe with 1024 lookback. It suggests the market for bitcoin tends to have long term memory in general. It generally has significant trend and is more inefficient at it's early stage.
█ OVERVIEW
We have introduced the concept of Hurst Exponent in our previous open indicator Hurst Exponent (Simple). It is an indicator that measures market state from autocorrelation. However, we apply a more advanced and accurate way to calculate Hurst Exponent rather than simple approximation. Therefore, we recommend using this version of Hurst Exponent over our previous publication going forward. The method we used here is called detrended fluctuation analysis. (For folks that are not interested in the math behind the calculation, feel free to skip to "features" and "how to use" section. However, it is recommended that you read it all to gain a better understanding of the mathematical reasoning).
█ Detrend Fluctuation Analysis
Detrended Fluctuation Analysis was first introduced by by Peng, C.K. (Original Paper) in order to measure the long-range power-law correlations in DNA sequences . DFA measures the scaling-behavior of the second moment-fluctuations, the scaling exponent is a generalization of Hurst exponent .
The traditional way of measuring Hurst exponent is the rescaled range method. However DFA provides the following benefits over the traditional rescaled range method ( RS ) method:
• Can be applied to non-stationary time series. While asset returns are generally stationary, DFA can measure Hurst more accurately in the instances where they are non-stationary.
• According the the asymptotic distribution value of DFA and RS , the latter usually overestimates Hurst exponent (even after Anis- Llyod correction) resulting in the expected value of RS Hurst being close to 0.54, instead of the 0.5 that it should be. Therefore it's harder to determine the autocorrelation based on the expected value. The expected value is significantly closer to 0.5 making that threshold much more useful, using the DFA method on the Hurst Exponent (HE).
• Lastly, DFA requires lower sample size relative to the RS method. While the RS method generally requires thousands of observations to reduce the variance of HE, DFA only needs a sample size greater than a hundred to accomplish the above mentioned. (Referenece)
█ Calculation
DFA is a modified root-mean-squares ( RMS ) analysis of a random walk. In short, DFA computes the RMS error of linear fits over progressively larger bins (non-overlapped “boxes” of similar size) of an integrated time series.
1. Our signal time series is the log returns. First we subtract the mean from the log return to calculate the demeaned returns. Then, we calculate the cumulative sum of demeaned returns resulting in the cumulative sum being mean centered and we can use the DFA method on this. The subtraction of the mean eliminates the “global trend” of the signal. The advantage of applying scaling analysis to the signal profile instead of the signal, allows the original signal to be non-stationary when needed. (For example, this process converts an i.i.d. white noise process into a random walk.)
2. We slice the cumulative sum into windows of equal space and run linear regression on each window to measure the linear trend. After we conduct each linear regression . We detrend the series by deducting the linear regression line from the cumulative sum in each windows. The fluctuation is the difference between cumulative sum and regression.
3. We use different windows sizes on the same cumulative sum series. The window sizes scales are log spaced. Eg: powers of 2, 2,4,8,16... This is where the scale free measurements come in, how we measure the fractal nature and self similarity of the time series, as well as how the well smaller scale represent the larger scale.
As the window size decreases, we uses more regression lines to measure the trend. Therefore, the fitness of regression should be better with smaller fluctuation. It allows one to zoom into the “picture” to see the details. The linear regression is like rulers. If you use more rulers to measure the smaller scale details you will get a more precise measurement.
The exponent we are measuring here is to determine the relationship between the window size and fitness of regression (the rate of change ). The more complex the time series are the more it will depend on decreasing window sizes (using more linear regression lines to measure). The less complex or the more trend in the time series, it will depend less. The fitness is calculated by the average of root mean square errors ( RMS ) of regression from each window.
4. Root mean Square error is calculated by square root of the sum of the difference between cumulative sum and regression. The following chart displays average RMS of different window sizes. As the chart shows, values for smaller window sizes shows more details due to higher complexity of measurements.
5. The last step is to measure the exponent . In order to measure the power law exponent . We measure the slope on the log-log plot chart. The x axis is the log of the size of windows, the y axis is the log of the average RMS . We run a linear regression through the plotted points. The slope of regression is the exponent . It's easy to see the relationship between RMS and window size on the chart. Larger RMS equals less fitness of the regression. We know the RMS will increase (fitness will decrease) as we increases window size (use less regressions to measure), we focus on the rate of RMS increasing (how fast) as window size increases.
If the slope is < 0.5, It means the rate of of increase in RMS is small when window size increases. Therefore the fit is much better when it's measured by a large number of linear regression lines. So the series is more complex. (Mean reversion, negative autocorrelation).
If the slope is > 0.5, It means the rate of increase in RMS is larger when window sizes increases. Therefore even when window size is large, the larger trend can be measured well by a small number of regression lines. Therefore the series has a trend with positive autocorrelation.
If the slope = 0.5, It means the series follows a random walk.
█ FEATURES
• Sample Size is the lookback period for calculation. Even though DFA requires a lower sample size than RS , a sample size larger > 50 is recommended for accurate measurement.
• When a larger sample size is used (for example = 1000 lookback length), the loading speed may be slower due to a longer calculation. Date Range is used to limit numbers of historical calculation bars. When loading speed is too slow, change the data range "all" into numbers of weeks/days/hours to reduce loading time. (Credit to allanster)
• “show filter” option applies a smoothing moving average to smooth the exponent .
• Log scale is my work around for dynamic log space scaling. Traditionally the smallest log space for bars is power of 2. It requires at least 10 points for an accurate regression, resulting in the minimum lookback to be 1024. I made some changes to round the fractional log space into integer bars requiring the said log space to be less than 2.
• For a more accurate calculation a larger "Base Scale" and "Max Scale" should be selected. However, when the sample size is small, a larger value would cause issues. Therefore, a general rule to be followed is: A larger "Base Scale" and "Max Scale" should be selected for a larger the sample size. It is recommended for the user to try and choose a larger scale if increasing the value doesn't cause issues.
The following chart shows the change in value using various scales. As shown, sometimes increasing the value makes the value itself messy and overshoot.
When using the lowest scale (4,2), the value seems stable. When we increase the scale to (8,2), the value is still alright. However, when we increase it to (8,4), it begins to look messy. And when we increase it to (16,4), it starts overshooting. Therefore, (8,2) seems to be optimal for our use.
█ How to Use
Similar to Hurst Exponent (Simple). 0.5 is a level for determine long term memory.
• In the efficient market hypothesis, market follows a random walk and Hurst exponent should be 0.5. When Hurst Exponent is significantly different from 0.5, the market is inefficient.
• When Hurst Exponent is > 0.5. Positive Autocorrelation. Market is Trending. Positive returns tend to be followed by positive returns and vice versa.
• Hurst Exponent is < 0.5. Negative Autocorrelation. Market is Mean reverting. Positive returns trends to follow by negative return and vice versa.
However, we can't really tell if the Hurst exponent value is generated by random chance by only looking at the 0.5 level. Even if we measure a pure random walk, the Hurst Exponent will never be exactly 0.5, it will be close like 0.506 but not equal to 0.5. That's why we need a level to tell us if Hurst Exponent is significant.
So we also computed the 95% confidence interval according to Monte Carlo simulation. The confidence level adjusts itself by sample size. When Hurst Exponent is above the top or below the bottom confidence level, the value of Hurst exponent has statistical significance. The efficient market hypothesis is rejected and market has significant inefficiency.
The state of market is painted in different color as the following chart shows. The users can also tell the state from the table displayed on the right.
An important point is that Hurst Value only represents the market state according to the past value measurement. Which means it only tells you the market state now and in the past. If Hurst Exponent on sample size 100 shows significant trend, it means according to the past 100 bars, the market is trending significantly. It doesn't mean the market will continue to trend. It's not forecasting market state in the future.
However, this is also another way to use it. The market is not always random and it is not always inefficient, the state switches around from time to time. But there's one pattern, when the market stays inefficient for too long, the market participants see this and will try to take advantage of it. Therefore, the inefficiency will be traded away. That's why Hurst exponent won't stay in significant trend or mean reversion too long. When it's significant the market participants see that as well and the market adjusts itself back to normal.
The Hurst Exponent can be used as a mean reverting oscillator itself. In a liquid market, the value tends to return back inside the confidence interval after significant moves(In smaller markets, it could stay inefficient for a long time). So when Hurst Exponent shows significant values, the market has just entered significant trend or mean reversion state. However, when it stays outside of confidence interval for too long, it would suggest the market might be closer to the end of trend or mean reversion instead.
Larger sample size makes the Hurst Exponent Statistics more reliable. Therefore, if the user want to know if long term memory exist in general on the selected ticker, they can use a large sample size and maximize the log scale. Eg: 1024 sample size, scale (16,4).
Following Chart is Bitcoin on Daily timeframe with 1024 lookback. It suggests the market for bitcoin tends to have long term memory in general. It generally has significant trend and is more inefficient at it's early stage.
Banned because we made all the math phobic line drawing TA idiots suffering from Apophenia here look way dumber than they would like to appear (to all normal people who have yet to realize the depth of their stupidity & the degree of their horribleness).