normdist(z) Parameters: z (float): (float): The z-score for which the CDF is to be calculated. Returns: (float): The cumulative probability corresponding to the input z-score.
Notes: - Uses an approximation method for the normal distribution CDF, which is computationally efficient. - The result is accurate for most practical purposes but may have minor deviations for extreme values of `z`.
Formula: - Based on the approximation formula: `Φ(z) ≈ 1 - f(z) * P(t)` if `z > 0`, otherwise `Φ(z) ≈ f(z) * P(t)`, where: `f(z) = 0.3989423 * exp(-z^2 / 2)` (PDF of standard normal distribution) `P(t) = Σ [c * t^i]` with constants `c` and `t = 1 / (1 + 0.2316419 * |z|)`.
Implementation details: - The approximation uses five coefficients for the polynomial part of the CDF. - Handles both positive and negative values of `z` symmetrically.
Constants: - The coefficients and scaling factors are chosen to minimize approximation errors.
gamma(x) Parameters: x (float): (float): The input value for which the Gamma function is to be calculated. Must be greater than 0. For x <= 0, the function returns `na` as it is undefined. Returns: (float): Approximation of the Gamma function for the input `x`.
Notes: - The Lanczos approximation provides a numerically stable and efficient method to compute the Gamma function. - The function is not defined for `x <= 0` and will return `na` in such cases. - Uses precomputed Lanczos coefficients for accuracy. - Includes handling for small numerical inaccuracies.
Formula: - The Gamma function is approximated as: `Γ(x) ≈ sqrt(2π) * t^(x + 0.5) * e^(-t) * Σ(p[k] / (x + k))` where `t = x + g + 0.5` and `p` is the array of Lanczos coefficients.
Implementation details: - Lanczos coefficients (`p`) are precomputed and stored in an array. - The summation iterates over these coefficients to compute the final result. - The constant `g` controls the precision of the approximation (commonly `g = 7`).
t_cdf(t, df) Parameters: t (float): (float): The t-statistic for which the CDF value is to be calculated. df (int): (int): Degrees of freedom of the t-distribution. Returns: (float): Approximate CDF value for the given t-statistic.
Notes: - This function computes a one-tailed p-value. - Relies on an approximation formula using gamma functions and standard t-distribution properties. - May not be as accurate as specialized statistical libraries for extreme values or very high degrees of freedom.
Formula: - Let `x = df / (t^2 + df)`. - The approximation formula is derived using: `CDF(t, df) ≈ 1 - [sqrt(df * π) * Γ(df / 2) / Γ((df + 1) / 2)] * x^((df + 1) / 2) / 2`, where Γ represents the gamma function.
Implementation details: - Computes the gamma ratio for normalization. - Applies the t-distribution formula for one-tailed probabilities.
tStatForPValue(p, df) Parameters: p (float): (float): P-value for which the t-statistic needs to be calculated. Must be in the interval (0, 1). df (int): (int): Degrees of freedom of the t-distribution. Returns: (float): The t-statistic corresponding to the given p-value.
Notes: - If `p` is outside the interval (0, 1), the function returns `na` as an error. - The function uses binary search with a fixed number of iterations and a defined tolerance. - The result is accurate to within the specified tolerance (default: 0.0001). - Relies on the cumulative density function (CDF) `t_cdf` for the t-distribution.
Formula: - Uses the cumulative density function (CDF) of the t-distribution to iteratively find the t-statistic.
Implementation details: - `low` and `high` define the search interval for the t-statistic. - The midpoint (`mid`) is iteratively refined until the difference between the cumulative probability and the target p-value is smaller than the tolerance.
jarqueBera(n, s, k) Parameters: n (float): (series float): Number of observations in the dataset. s (float): (series float): Skewness of the dataset. k (float): (series float): Kurtosis of the dataset. Returns: (float): The Jarque-Bera test statistic.
Formula: JB = n * [(S^2 / 6) + ((K - 3)^2 / 24)]
Notes: - A higher JB value suggests that the data deviates more from a normal distribution. - The test is asymptotically distributed as a chi-squared distribution with 2 degrees of freedom. - Use this value to calculate a p-value to determine the significance of the result.
skewness(data) Parameters: data (float): (series float): Input data series. Returns: (float): The skewness value.
Notes: - Handles missing values (`na`) by ignoring invalid points. - Includes error handling for zero variance to avoid division-by-zero scenarios. - Skewness is calculated as the normalized third central moment of the data.
kurtosis(data) Parameters: data (float): (series float): Input data series. Returns: (float): The kurtosis value.
Notes: - Handles missing values (`na`) by ignoring invalid points. - Includes error handling for zero variance to avoid division-by-zero scenarios. - Kurtosis is calculated as the normalized fourth central moment of the data.
regression(y, x, lag) Parameters: y (float): (series float): Dependent series (observed values). x (float): (series float): Independent series (explanatory variable). lag (int): (int): Number of lags applied to the independent series (x). Returns: (tuple): Returns a tuple containing the following values: - n: Number of valid observations. - alpha: Intercept of the regression line. - beta: Slope of the regression line. - t_stat: T-statistic for the beta coefficient. - p_value: Two-tailed p-value for the beta coefficient. - r_squared: Coefficient of determination (R²) indicating goodness of fit. - skew: Skewness of the residuals. - kurt: Kurtosis of the residuals.
Notes: - Handles missing data (`na`) by ignoring invalid points. - Includes basic error handling for zero variance and division-by-zero scenarios. - Computes residual-based statistics (skewness and kurtosis) for model diagnostics.