OPEN-SOURCE SCRIPT

Accurate Bollinger Bands mcbw_ [True Volatility Distribution]

The Bollinger Bands have become a very important technical tool for discretionary and algorithmic traders alike over the last decades. It was designed to give traders an edge on the markets by setting probabilistic values to different levels of volatility. However, some of the assumptions that go into its calculations make it unusable for traders who want to get a correct understanding of the volatility that the bands are trying to be used for. Let's go through what the Bollinger Bands are said to show, how their calculations work, the problems in the calculations, and how the current indicator I am presenting today fixes these.

--> If you just want to know how the settings work then skip straight to the end or click on the little (i) symbol next to the values in the indicator settings window when its on your chart <--

  1. --------------------------- What Are Bollinger Bands ---------------------------

    The Bollinger Bands were formed in the 1980's, a time when many retail traders interacted with their symbols via physically printed charts and computer memory for personal computer memory was measured in Kb (about a factor of 1 million smaller than today). Bollinger Bands are designed to help a trader or algorithm see the likelihood of price expanding outside of its typical range, the further the lines are from the current price implies the less often they will get hit. With a hands on understanding many strategies use these levels for designated levels of breakout trades or to assist in defining price ranges.

  2. --------------------------- How Bollinger Bands Work ---------------------------

    The calculations that go into Bollinger Bands are rather simple. There is a moving average that centers the indicator and an equidistant top band and bottom band are drawn at a fixed width away. The moving average is just a typical moving average (or common variant) that tracks the price action, while the distance to the top and bottom bands is a direct function of recent price volatility. The way that the distance to the bands is calculated is inspired by formulas from statistics. The standard deviation is taken from the candles that go into the moving average and then this is multiplied by a user defined value to set the bands position, I will call this value 'the multiple'. When discussing Bollinger Bands, that trading community at large normally discusses 'the multiple' as a multiplier of the standard deviation as it applies to a normal distribution (gaußian probability). On a normal distribution the number of standard deviations away (which trades directly use as 'the multiple') you are directly corresponds to how likely/unlikely something is to happen:

    1 standard deviation equals 68.3%, meaning that the price should stay inside the 1 standard deviation 68.3% of the time and be outside of it 31.7% of the time;
    2 standard deviation equals 95.5%, meaning that the price should stay inside the 2 standard deviation 95.5% of the time and be outside of it 4.5% of the time;
    3 standard deviation equals 99.7%, meaning that the price should stay inside the 3 standard deviation 99.7% of the time and be outside of it 0.3% of the time.


    Therefore when traders set 'the multiple' to 2, they interpret this as meaning that price will not reach there 95.5% of the time.


  3. ---------------- The Problem With The Math of Bollinger Bands ----------------

    In and of themselves the Bollinger Bands are a great tool, but they have become misconstrued with some incorrect sense of statistical meaning, when they should really just be taken at face value without any further interpretation or implication.
    In order to explain this it is going to get a bit technical so I will give a little math background and try to simplify things. First let's review some statistics topics (distributions, percentiles, standard deviations) and then with that understanding explore the incorrect logic of how Bollinger Bands have been interpreted/employed.

    ---------------- Quick Stats Review ----------------
    .

    (If you are comfortable with statistics feel free to skip ahead to the next section)
    .

    -------- I: Probability distributions --------
    When you have a lot of data it is helpful to see how many times different results appear in your dataset. To visualize this people use "histograms", which just shows how many times each element appears in the dataset by stacking each of the same elements on top of each other to form a graph. You may be familiar with the bell curve (also called the "normal distribution", which we will be calling it by). The normal distribution histogram looks like a big hump around zero and then drops off super quickly the further you get from it. This shape (the bell curve) is very nice because it has a lot of very nifty mathematical properties and seems to show up in nature all the time. Since it pops up in so many places, society has developed many different shortcuts related to it that speed up all kinds of calculations, including the shortcut that 1 standard deviation = 68.3%, 2 standard deviations = 95.5%, and 3 standard deviations = 99.7% (these only apply to the normal distribution). Despite how handy the normal distribution is and all the shortcuts we have for it are, and how much it shows up in the natural world, there is nothing that forces your specific dataset to look like it. In fact, your data can actually have any possible shape. As we will explore later, economic and financial datasets *rarely* follow the normal distribution.

    -------- II: Percentiles --------
    After you have made the histogram of your dataset you have built the "probability distribution" of your own dataset that is specific to all the data you have collected. There is a whole complicated framework for how to accurately calculate percentiles but we will dramatically simplify it for our use. The 'percentile' in our case is just the number of data points we are away from the "middle" of the data set (normally just 0). Lets say I took the difference of the daily close of a symbol for the last two weeks, green candles would be positive and red would be negative. In this example my dataset of day by day closing price difference is:

    week 1: [-5, -5, -10, -0, +4]
    week 2: [+20, +2, +0, -1, +7]


    sorting all of these value into a single dataset I have:
    [-10, -5, -5, -1, 0, 0, 2, 4, 7, 20]


    I can separate the positive and negative returns and explore their distributions separately:
    negative return distribution = [0, -1, -5, -5, -10]
    positive return distribution = [0, +2, +4, +7, +20]


    Taking the 25th% percentile of these would just be taking the value that is 25% towards the end of the end of these returns. Or akin the 100%th percentile would just be taking the vale that is 100% at the end of those:

    negative return distribution (50%) = -5
    positive return distribution (50%) = +4
    negative return distribution (100%) = -10
    positive return distribution (100%) = +20


    Or instead of separating the positive and negative returns we can also look at all of the differences in the daily close as just pure price movement and not account for the direction, in this case we would pool all of the data together by ignoring the negative signs of the negative reruns

    combined return distribution = [0, 0, (1), 2, 4, (5), (5), 7, (10), 20]


    In this case the 50%th and 100%th percentile of the combined return distribution would be:

    combined return distribution (50%) = 4
    combined return distribution (100%) = 10


    Sometimes taking the positive and negative distributions separately is better than pooling them into a combined distribution for some purposes. Other times the combined distribution is better.

    Most financial data has very different distributions for negative returns and positive returns. This is encapsulated in sayings like "Price takes the stairs up and the elevator down".


    -------- III: Standard Deviation --------
    The formula for the standard deviation (refereed to here by its shorthand 'STDEV') can be intimidating, but going through each of its elements will illuminate what it does. The formula for STDEV is equal to:

    square root ( (sum [(x - mean) ^ 2]) / N )


    Going back the the dataset that you might have, the variables in the formula above are:

    'mean' is the average of your entire dataset

    'x' is just representative of a single point in your dataset (one point at a time)

    'N' is the total number of things in your dataset.


    Going back to the STDEV formula above we can see how each part of it works. Starting with the '(x - mean)' part. What this does is it takes every single point of the dataset and measure how far away it is from the mean of the entire dataset. Taking this value to the power of two: '(x - mean) ^ 2', means that points that are very far away from the dataset mean get 'penalized' twice as much. Points that are very close to the dataset mean are not impacted as much. In practice, this would mean that if your dataset had a bunch of values that were in a wide range but always stayed in that range, this value ('(x - mean) ^ 2') would end up being small. On the other hand, if your dataset was full of the exact same number, but had a couple outliers very far away, this would have a much larger value since the square par of '(x - mean) ^ 2' make them grow massive. Now including the sum part of 'sum [(x - mean) ^ 2]', this just adds up all the of the squared distanced from the dataset mean. Then this is divided by the number of values in the dataset ('N'), and then the square root of that value is taken.


    There is nothing inherently special or definitive about the STDEV formula, it is just a tool with extremely widespread use and adoption. As we saw here, all the STDEV formula is really doing is measuring the intensity of the outliers.



  4. --------------------------- Flaws of Bollinger Bands ---------------------------

    The largest problem with Bollinger Bands is the assumption that price has a normal distribution. This is assumption is massively incorrect for many reasons that I will try to encapsulate into two points:

    Price return do not follow a normal distribution, every single symbol on every single timeframe has is own unique distribution that is specific to only itself. Therefore all the tools, shortcuts, and ideas that we use for normal distributions do not apply to price returns, and since they do not apply here they should not be used. A more general approach is needed that allows each specific symbol on every specific timeframe to be treated uniquely.

    The distributions of price returns on the positive and negative side are almost never the same. A more general approach is needed that allows positive and negative returns to be calculated separately.

    In addition to the issues of the normal distribution assumption, the standard deviation formula (as shown above in the quick stats review) is essentially just a tame measurement of outliers (a more aggressive form of outlier measurement might be taking the differences to the power of 3 rather than 2). Despite this being a bit of a philosophical question, does the measurement of outlier intensity as defined by the STDEV formula really measure what we want to know as traders when we're experiencing volatility? Or would adjustments to that formula better reflect what we *experience* as volatility when we are actively trading? This is an open ended question that I will leave here, but I wanted to pose this question because it is a key part of what how the Bollinger Bands work that we all assume as a given.

    Circling back on the normal distribution assumption, the standard deviation formula used in the calculation of the bands only encompasses the deviation of the candles that go into the moving average and have no knowledge of the historical price action. Therefore the level of the bands may not really reflect how the price action behaves over a longer period of time.



  5. ------------ Delivering Factually Accurate Data That Traders Need------------

    In light of the problems identified above, this indicator fixes all of these issue and delivers statistically correct information that discretionary and algorithmic traders can use, with truly accurate probabilities. It takes the price action of the last 2,000 candles and builds a huge dataset of distributions that you can directly select your percentiles from. It also allows you to have the positive and negative distributions calculated separately, or if you would like, you can pool all of them together in a combined distribution. In addition to this, there is a wide selection of moving averages directly available in the indicator to choose from.

    Hedge funds, quant shops, algo prop firms, and advanced mechanical groups all employ the true return distributions in their work. Now you have access to the same type of data with this indicator, wherein it's doing all the lifting for you.

  6. ------------------------------ Indicator Settings ------------------------------
    .


    ---- Moving average ----
    Select the type of moving average you would like and its length

    ---- Bands ----
    The percentiles that you enter here will be pulled directly from the return distribution of the last 2,000 candles. With the typical Bollinger Bands, traders would select 2 standard deviations and incorrectly think that the levels it highlights are the 95.5% levels. Now, if you want the true 95.5% level, you can just enter 95.5 into the percentile value here. Each of the three available bands takes the true percentile you enter here.

    ---- Separate Positive & Negative Distributions----
    If this box is checked the positive and negative distributions are treated indecently, completely separate from each other. You will see that the width of the top and bottom bands will be different for each of the percentiles you enter.
    If this box is unchecked then all the negative and positive distributions are pooled together. You will notice that the width of the top and bottom bands will be the exact same.

    ---- Distribution Size ----
    This is the number of candles that the price return is calculated over. EG: to collect the price return over the last 33 candles, the difference of price from now to 33 candles ago is calculated for the last 2,000 candles, to build a return distribution of 2000 points of price differences over 33 candles.

    NEGATIVE NUMBERS(<0) == exact number of candles to include;
    EG: setting this value to -20 will always collect volatility distributions of 20 candles

    POSITIVE NUMBERS(>0) == number of candles to include as a multiple of the Moving Average Length value set above;
    EG: if the Moving Average Length value is set to 22, setting this value to 2 will use the last 22*2 = 44 candles for the collection of volatility distributions

    MORE candles being include will generally make the bands WIDER and their size will change SLOWER over time.




I wish you focus, dedication, and earnest success on your journey.

Happy trading :)

免責事項