MBAD (Moment-Based Adaptive Detection): a method applicable to a wide range of purposes, like outlier or novelty detection, that requires building a sensible interval/set of thresholds. Unlike other methods that are static and rely on optimizations that inevitably lead to underfitting/overfitting, it dynamically adapts to your data distribution without any optimizations, MLE, or stuff, and provides a set of data-driven adaptive thresholds, based on closed-form solution with O(n) algo complexity.
1.5 years ago, when I was still living in Versailles at my friend's house not knowing what was gonna happen in my life tomorrow, I made a damn right decision not to give up on one idea and to actually R&D it and see what’s up. It allowed me to create this one.
The Method Explained I’ve been wandering about z-values, why exactly 6 sigmas, why 95%? Who decided that? Why would you supersede your opinion on data? Based on what? Your ego?
Then I consciously noticed a couple of things:
1) In control theory & anomaly detection, the popular threshold is 3 sigmas (yet nobody can firmly say why xD). If your data is Laplace, 3 sigmas is not enough; you’re gonna catch too many values, so it needs a higher sigma.
2) Yet strangely, the normal distribution has kurtosis of 3, and 6 for Laplace.
3) Kurtosis is a standardized moment, a moment scaled by stdev, so it means "X amount of something measured in stdevs."
4) You generate synthetic data, you check on real data (market data in my case, I am a quant after all), and you see on both that:
lower extension = mean - standard deviation * kurtosis ≈ data minimum upper extension = mean + standard deviation * kurtosis ≈ data maximum
Why not simply use max/min? - Lower info gain: We're not using all info available in all data points to estimate max/min; we just pick the current higher and lower values. Lol, it’s the same as dropping exponential smoothing with alpha = 0 on stationary data & calling it a day. You can’t update the estimates of min and max when new data arrives containing info about the matter. All you can do is just extend min and max horizontally, so you're not using new info arriving inside new data. - Mixing order and non-order statistics is a bad idea; we're losing integrity and coherence. That's why I don't like the Hurst exponent btw (and yes, I came up with better metrics of my own). - Max & min are not even true order statistics, unlike a percentile (finding which requires sorting, which requires multiple passes over your data). To find min or max, you just need to do one traversal over your data. Then with or without any weighting, 100th percentile will equal max. So unlike a weighted percentile, you can’t do weighted max. Then while you can always check max and min of a geometric shape, now try to calculate the 56th percentile of a pentagram hehe.
TL;DR max & min are rather topological characteristics of data, just as the difference between starting and ending points. Not much to do with statistics.
Now the second part of the ballet is to work with data asymmetry:
1) Skewness is also scaled by stdev -> so it must represent a shift from the data midrange measured in stdevs -> given asymmetric data, we can include this info in our models. Unlike kurtosis, skewness has a sign, so we add it to both thresholds:
lower extension = mean - standard deviation * kurtosis + standard deviation * skewness upper extension = mean + standard deviation * kurtosis + standard deviation * skewness
2) Now our method will work with skewed data as well, omg, ain’t it cool?
3) Hold up, but what about 5th and 6th moments (hyperskewness & hyperkurtosis)? They should represent something meaningful as well.
4) Perhaps if extensions represent current estimated extremums, what goes beyond? Limits, beyond which we expect data not to be able to pass given the current underlying process generating the data?
When you extend this logic to higher-order moments, i.e., hyperskewness & hyperkurtosis (5th and 6th moments), they measure asymmetry and shape of distribution tails, not its core as previous moments -> makes no sense to mix 4th and 3rd moments (skewness and kurtosis) with 5th & 6th, so we get:
lower limit = mean - standard deviation * hyperkurtosis + standard deviation * hyperskewness upper limit = mean + standard deviation * hyperkurtosis + standard deviation * hyperskewness
While extensions model your data’s natural extremums based on current info residing in the data without relying on order statistics, limits model your data's maximum possible and minimum possible values based on current info residing in your data. If a new data point trespasses limits, it means that a significant change in the data-generating process has happened, for sure, not probably—a confirmed structural break.
And finally we use time and volume weighting to include order & process intensity information in our model.
I can't stress it enough: despite the popularity of these non-weighted methods applied in mainstream open-access time series modeling, it doesn’t make ANY sense to use non-weighted calculations on time series data. Time = sequence, it matters. If you reverse your time series horizontally, your means, percentiles, whatever, will stay the same. Basically, your calculations will give the same results on different data. When you do it, you disregard the order of data that does have order naturally. Does it make any sense to you? It also concerns regressions applied on time series as well, because even despite the slope being opposite on your reversed data, the centroid (through which your regression line always comes through) will be the same. It also might concern Fourier (yes, you can do weighted Fourier) and even MA and AR models—might, because I ain’t researched it extensively yet.
I still can’t believe it’s nowhere online in open access. No chance I’m the first one who got it. It’s literally in front of everyone’s eyes for centuries—why no one tells about it?
How to use That’s easy: can be applied to any, even non-stationary and/or heteroscedastic time series to automatically detect novelties, outliers, anomalies, structural breaks, etc. In terms of quant trading, you can try using extensions for mean reversion trades and limits for emergency exits, for example. The market-making application is kinda obvious as well.
The only parameter the model has is length, and it should NOT be optimized but picked consciously based on the process/system you’re applying it to and based on the task. However, this part is not about sharing info & an open-access instrument with the world. This is about using dem instruments to do actual business, and we can’t talk about it.