Geometric distribution

Geometric
	Probability mass function
	Cumulative distribution function
Parameters	success probability (real)
Support	k trials where
PMF
CDF	for ,; for
Mean
Median	; (not unique if is an integer)
Mode
Variance
Skewness
Excess kurtosis
Entropy
MGF	; for
CF
PGF

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

The probability distribution of the number $X$ of Bernoulli trials needed to get one success, supported on $\mathbb {N} =\{1,2,3,\ldots \}$ ;
The probability distribution of the number $Y=X-1$ of failures before the first success, supported on $\mathbb {N} _{0}=\{0,1,2,\ldots \}$ .

Which of these is called the geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of $X$ ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$ . If the probability of success on each trial is $p$ , then the probability that the $k$ -th trial is the first success is

\Pr(X=k)=(1-p)^{k-1}p

for $k=1,2,3,4,\dots$

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

\Pr(Y=k)=\Pr(X=k+1)=(1-p)^{k}p

for $k=0,1,2,3,\dots$

In either case, the sequence of probabilities is a geometric sequence.

Definition[edit]

The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function is $P(X=k)=(1-p)^{k-1}p$ where $k=1,2,3,\dotsc$ is the number of trials and $p$ is the probability of success in each trial.^[1]^{: 260–261}

Alternatively, some texts define the distribution where $k=0,1,2,\dotsc$ and call the former the zero-truncated geometric distribution. This alters the probability mass function into:^[2]^: 66 $P(Y=k)=(1-p)^{k}p$ An example of a geometric distribution arises from rolling a six-sided die until a "1" appears. Each roll is independent with a $1/6$ chance of success. The number of rolls needed follows a geometric distribution with $p=1/6$ .

Properties[edit]

Memorylessness[edit]

The geometric distribution is the only memoryless discrete probability distribution.^[3] It is the discrete version of the same property found in the exponential distribution.^[4] The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success. Expressed in terms of conditional probability, $\Pr(X>m+n|X>n)=\Pr(X>m)$ where $m$ and $n$ are natural numbers. The equality is still true when ≥ is substituted.^[2]^: 71

Moments and cumulants[edit]

The expected value and variance of a geometrically distributed random variable $X$ defined over $\mathbb {N}$ is^[1]^: 261 $\operatorname {E} (X)={\frac {1}{p}},\qquad \operatorname {var} (X)={\frac {1-p}{p^{2}}}.$ When a geometrically distributed random variable $Y$ defined over $\mathbb {N} _{0}$ , the expected value changes into $\operatorname {E} (Y)={\frac {1-p}{p}},$ while the variance stays the same.^[5]^{: 114–115}

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is ${\frac {1}{1/6}}=6$ and the average number of failures is ${\frac {1-1/6}{1/6}}=5$ .

The moments for the number of failures before the first success are given by

{\begin{aligned}\mathrm {E} (Y^{n})&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k^{n}\\&{}=p\operatorname {Li} _{-n}(1-p)&({\text{for }}n\neq 0)\end{aligned}}

where $\operatorname {Li} _{-n}(1-p)$ is the polylogarithm function.

The cumulants $\kappa _{n}$ of the probability distribution of Y satisfy the recursion

\kappa _{n+1}=\mu (\mu +1){\frac {d\kappa _{n}}{d\mu }}.

where $\mu ={\frac {1-p}{p}}$ , the expected value of a geometrically distributed random variable defined over $\mathbb {N} _{0}$ .

Proof of expected value[edit]

Consider the expected value $\mathrm {E} (X)$ of X as above, i.e. the average number of trials until a success. On the first trial, we either succeed with probability $p$ , or we fail with probability $1-p$ . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:

$\mathrm {E} (X)=p\cdot 1+(1-p)\cdot (1+\mathrm {E} (X)),$

which, if solved for $\mathrm {E} (X)$ , gives:

$\mathrm {E} (X)={\frac {1}{p}}.$

The expected value of $Y$ can be found from the linearity of expectation, $\mathrm {E} (Y)=\mathrm {E} (X-1)=\mathrm {E} (X)-1={\frac {1}{p}}-1={\frac {1-p}{p}}$ . It can also be shown in the following way:

${\begin{aligned}\mathrm {E} (Y)&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k\\&{}=p\sum _{k=0}^{\infty }(1-p)^{k}k\\&{}=p(1-p)\sum _{k=0}^{\infty }(1-p)^{k-1}\cdot k\\&{}=p(1-p)\left[{\frac {d}{dp}}\left(-\sum _{k=0}^{\infty }(1-p)^{k}\right)\right]\\&{}=p(1-p){\frac {d}{dp}}\left(-{\frac {1}{p}}\right)={\frac {1-p}{p}}.\end{aligned}}$

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

Summary statistics[edit]

The mean of the geometric distribution is its expected value which is, as previously discussed in § Moments and cumulants, ${\frac {1}{p}}$ or ${\frac {1-p}{p}}$ when defined over $\mathbb {N}$ or $\mathbb {N} _{0}$ respectively.

The median of the geometric distribution is $\left\lfloor -{\frac {\log 2}{\log(1-p)}}\right\rfloor$ when defined over $\mathbb {N} _{0}$ .^[2]^: 69

The mode of the geometric distribution is the first value in the support set. This is 1 when defined over $\mathbb {N}$ and 0 when defined over $\mathbb {N} _{0}$ .^[2]^: 69

The skewness of the geometric distribution is ${\frac {2-p}{\sqrt {1-p}}}$ .^[5]^: 115

The kurtosis of the geometric distribution is $9+{\frac {p^{2}}{1-p}}$ .^[5]^: 115 The excess kurtosis of a distribution is the difference between its kurtosis and the kurtosis of a normal distribution, $3$ .^[6]^: 217 Therefore, the excess kurtosis of the geometric distribution is $6+{\frac {p^{2}}{1-p}}$ . Since ${\frac {p^{2}}{1-p}}\geq 0$ , the excess kurtosis is always positive so the distribution is leptokurtic.^[2]^: 69 In other words, the tail of a geometric distribution decays faster than a Gaussian.^[6]^: 217

General properties[edit]

The probability generating functions of geometric random variables $X$ and $Y$ defined over $\mathbb {N}$ and $\mathbb {N} _{0}$ are, respectively,^[5]^{: 114–115}

{\begin{aligned}G_{X}(s)&={\frac {s\,p}{1-s\,(1-p)}},\\[10pt]G_{Y}(s)&={\frac {p}{1-s\,(1-p)}},\quad |s|<(1-p)^{-1}.\end{aligned}}

Among all discrete probability distributions supported on $\mathbb {N}$ with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.^{[citation needed]}
The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any positive integer n, there exist independent identically distributed random variables Y₁, ..., Y_n whose sum has the same distribution that Y has. These will not be geometrically distributed unless n = 1; they follow a negative binomial distribution.
The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.^{[citation needed]} For example, the hundreds digit D has this probability distribution:

\Pr(D=d)={q^{100d} \over 1+q^{100}+q^{200}+\cdots +q^{900}},

where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Golomb coding is the optimal prefix code^{[clarification needed]} for the geometric discrete distribution.^[7]

Related distributions[edit]

The sum of $r$ independent geometric random variables with parameter $p$ is a negative binomial random variable with parameters $r$ and $p$ .^[8] The geometric distribution is a special case of the negative binomial distribution, with $r=1$ .

The geometric distribution is a special case of discrete compound Poisson distribution.
The minimum of $n$ geometric random variables with parameters $p_{1},\dotsc ,p_{n}$ is also geometrically distributed with parameter $1-\prod _{i=1}^{n}(1-p_{i})$ .^[9]

Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X_k has a Poisson distribution with expected value r^k/k. Then

\sum _{k=1}^{\infty }k\,X_{k}

has a geometric distribution taking values in

\mathbb {N} _{0}

, with expected value r/(1 − r).^{[citation needed]}

The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter $\lambda$ creates a geometric distribution with parameter $p=1-e^{-\lambda }$ defined over $\mathbb {N} _{0}$ .^[2]^: 74 This can be used to generate geometrically distributed random numbers as detailed in § Random variate generation.

If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since

{\begin{aligned}\Pr(X/n>a)=\Pr(X>na)&=(1-p)^{na}=\left(1-{\frac {1}{n}}\right)^{na}=\left[\left(1-{\frac {1}{n}}\right)^{n}\right]^{a}\\&\to [e^{-1}]^{a}=e^{-a}{\text{ as }}n\to \infty .\end{aligned}}

More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:

\Pr(X>nx)=\lim _{n\to \infty }(1-\lambda /n)^{nx}=e^{-\lambda x}

therefore the distribution function of X/n converges to $1-e^{-\lambda x}$ , which is that of an exponential random variable.

Statistical inference[edit]

The true parameter $p$ of an unknown geometric distribution can be inferred through estimators ${\hat {p}}$ and conjugate distributions.

Method of moments[edit]

Provided they exist, the first $l$ moments of a probability distribution can be estimated from a sample $x_{1},\dotsc ,x_{n}$ using the formula $m_{i}={\frac {1}{n}}\sum _{j=1}^{n}x_{j}^{i}$ where $m_{i}$ is the $i$ th sample moment and $1\leq i\leq l$ .^[10]^{: 349–350} Estimating $\mathrm {E} (X)$ with $m_{1}$ gives the sample mean, denoted ${\bar {x}}$ . Substituting this estimate in the formula for the expected value of a geometric distribution and solving for $p$ gives the estimators ${\hat {p}}={\frac {1}{\bar {x}}}$ and ${\hat {p}}={\frac {1}{{\bar {x}}+1}}$ when supported on $\mathbb {N}$ and $\mathbb {N} _{0}$ respectively. These estimators are biased since $\mathrm {E} \left({\frac {1}{\bar {x}}}\right)>{\frac {1}{\mathrm {E} ({\bar {x}})}}=p$ as a result of Jensen's inequality.^[11]^: 53–54

Maximum likelihood estimation[edit]

The maximum likelihood estimator of $p$ is the value that maximizes the likelihood function given a sample.^[10]^: 308 By finding the zero of the derivative of the log-likelihood function when the distribution is defined over $\mathbb {N}$ , the maximum likelihood estimator can be found to be ${\hat {p}}={\frac {1}{\bar {x}}}$ , where ${\bar {x}}$ is the sample mean.^[12] If the domain is $\mathbb {N} _{0}$ , then the estimator shifts to ${\hat {p}}={\frac {1}{{\bar {x}}+1}}$ . As previously discussed in § Method of moments, these estimators are biased.

Regardless of the domain, the bias is equal to

b\equiv \operatorname {E} {\bigg [}\;({\hat {p}}_{\mathrm {mle} }-p)\;{\bigg ]}={\frac {p\,(1-p)}{n}}

which yields the bias-corrected maximum likelihood estimator,

{\hat {p\,}}_{\text{mle}}^{*}={\hat {p\,}}_{\text{mle}}-{\hat {b\,}}

Bayesian inference[edit]

In Bayesian inference, the parameter $p$ is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples.^[11]^: 167 If a beta distribution is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the conjugate distribution. In particular, if a $\mathrm {Beta} (\alpha ,\beta )$ prior is selected, then the posterior, after observing samples $k_{1},\dotsc ,k_{n}\in \mathbb {N}$ , is^[13] $p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}(k_{i}-1)\right).\!$ Alternatively, if the samples are in $\mathbb {N} _{0}$ , the posterior distribution is^[14] $p\sim \mathrm {Beta} \left(\alpha +n,\beta +\sum _{i=1}^{n}k_{i}\right).$ The posterior mean approaches its maximum likelihood estimate ${\widehat {p}}$ as $\alpha$ and $\beta$ approach zero, regardless of the support.

Random variate generation[edit]

The geometric distribution can be generated experimentally from i.i.d. standard uniform random variables by finding the first such random variable to be less than or equal to $p$ . However, the number of random variables needed is also geometrically distributed and the algorithm slows as $p$ decreases.^[15]^: 498

Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable $E$ can become geometrically distributed with parameter $p$ through $\lceil -E/\log(1-p)\rceil$ . In turn, $E$ can be generated from a standard uniform random variable $U$ altering the formula into $\lceil \log(U)/\log(1-p)\rceil$ .^[15]^{: 499–500}^[16]

Computational methods[edit]

In the programming language R, the function dgeom(k, prob) calculates the probability of k failures before a success with a success probability prob for each trial.

In Microsoft Excel, the function NEGBINOMDIST(number_f, number_s, probability_s) can be used to calculate the number of failures, number_f, before a number of successes, number_s, with a success probability, probability_s, for each trial. Setting number_s to 1, gives the geometric distribution.^[17]

References[edit]

^ ^a ^b Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.
^ ^a ^b ^c ^d ^e ^f Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. doi:10.1007/978-3-031-02425-2. ISBN 978-3-031-01297-6.
^ Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.
^ Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005-08-19). Univariate Discrete Distributions. Wiley Series in Probability and Statistics (1 ed.). Wiley. p. 228. doi:10.1002/0471715816. ISBN 978-0-471-27246-5.
^ ^a ^b ^c ^d Forbes, Catherine; Evans, Merran; Hastings, Nicholas; Peacock, Brian (2010-11-29). Statistical Distributions (1st ed.). Wiley. doi:10.1002/9780470627242. ISBN 978-0-470-39063-4.
^ ^a ^b Chan, Stanley (2021). Introduction to Probability for Data Science (1st ed.). Michigan Publishing. ISBN 978-1-60785-747-1.
^ Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.
^ Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 372. doi:10.1007/978-1-4612-4374-8. ISBN 978-0-387-94594-1.
^ Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.
^ ^a ^b Evans, Michael; Rosenthal, Jeffrey (2023). Probability and Statistics: The Science of Uncertainty (2nd ed.). ISBN 978-1429224628.
^ ^a ^b Held, Leonhard; Sabanés Bové, Daniel (2020). Likelihood and Bayesian Inference: With Applications in Biology and Medicine. Statistics for Biology and Health. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-662-60792-3. ISBN 978-3-662-60791-6.
^ Siegrist, Kyle (2020-05-05). "7.3: Maximum Likelihood". Statistics LibreTexts. Retrieved 2024-06-20.
^ Fink, Daniel. "A Compendium of Conjugate Priors". CiteSeerX 10.1.1.157.5540.
^ "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.
^ ^a ^b Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York, NY: Springer New York. doi:10.1007/978-1-4613-8643-8. ISBN 978-1-4613-8645-2.
^ Knuth, Donald Ervin (1997). The Art of Computer Programming. Vol. 2 (3rd ed.). Reading, Mass: Addison-Wesley. p. 136. ISBN 978-0-201-89683-1.
^ "3.5 Geometric Probability Distribution using Excel Spreadsheet". Statistics LibreTexts. 2021-07-24. Retrieved 2023-10-20.

External links[edit]

Geometric distribution on MathWorld.

[:1-1] Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.

[:2-2] ^ ^a ^b ^c ^d ^e ^f Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. doi:10.1007/978-3-031-02425-2. ISBN 978-3-031-01297-6.

[3] Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.

[4] Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005-08-19). Univariate Discrete Distributions. Wiley Series in Probability and Statistics (1 ed.). Wiley. p. 228. doi:10.1002/0471715816. ISBN 978-0-471-27246-5.

[:0-5] Forbes, Catherine; Evans, Merran; Hastings, Nicholas; Peacock, Brian (2010-11-29). Statistical Distributions (1st ed.). Wiley. doi:10.1002/9780470627242. ISBN 978-0-470-39063-4.

[:4-6] Chan, Stanley (2021). Introduction to Probability for Data Science (1st ed.). Michigan Publishing. ISBN 978-1-60785-747-1.

[7] Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.

[8] Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 372. doi:10.1007/978-1-4612-4374-8. ISBN 978-0-387-94594-1.

[9] Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.

[:5-10] Evans, Michael; Rosenthal, Jeffrey (2023). Probability and Statistics: The Science of Uncertainty (2nd ed.). ISBN 978-1429224628.

[:3-11] Held, Leonhard; Sabanés Bové, Daniel (2020). Likelihood and Bayesian Inference: With Applications in Biology and Medicine. Statistics for Biology and Health. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-662-60792-3. ISBN 978-3-662-60791-6.

[12] Siegrist, Kyle (2020-05-05). "7.3: Maximum Likelihood". Statistics LibreTexts. Retrieved 2024-06-20.

[13] Fink, Daniel. "A Compendium of Conjugate Priors". CiteSeerX 10.1.1.157.5540.

[14] "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.

[:6-15] Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York, NY: Springer New York. doi:10.1007/978-1-4613-8643-8. ISBN 978-1-4613-8645-2.

[16] Knuth, Donald Ervin (1997). The Art of Computer Programming. Vol. 2 (3rd ed.). Reading, Mass: Addison-Wesley. p. 136. ISBN 978-0-201-89683-1.

[17] "3.5 Geometric Probability Distribution using Excel Spreadsheet". Statistics LibreTexts. 2021-07-24. Retrieved 2023-10-20.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]