If the probability mass in the induced distribution is spread smoothly along the real line,
with no point mass concentrations, there is a
probability density function
which
satisfies
At each
t ,
is the mass per unit length in the probability distribution. The density function has
three characteristic properties:
A random variable (or distribution) which has a density is called
absolutely continuous . This term comes from measure theory. We often simply abbreviate as
continuous distribution.
Remarks
There is a technical mathematical description of the condition “spread
smoothly with no point mass concentrations.” And strictly speaking the integrals areLebesgue integrals rather than the ordinary Riemann kind. But for practical cases, the
two agree, so that we are free to use ordinary integration techniques.
By the fundamental theorem of calculus
Any integrable, nonnegative function
with
determines a distribution
function
, which in turn determines a probability distribution. If
,
multiplication by the appropriate positive constant gives a suitable
. An
argument based on the Quantile Function shows the existence of a randomvariable with that distribution.
In the literature on probability, it is customary to omit the indicationof the region of integration when integrating over the whole line. Thus
The first expression is
not an indefinite integral. In many situations,
will
be zero outside an interval. Thus, the integrand effectively determines the region ofintegration.
Some common absolutely continuous distributions
Uniform
.
Mass is spread uniformly on the interval
. It
is immaterial whether or not the end points are included, since probability associated witheach individual point is zero. The probability of any subinterval is proportional to the
length of the subinterval. The probability of being in any two subintervals of the samelength is the same. This distribution is used to model situations in which it is known that
X takes on values in
but is equally likely to be in any subinterval of a given
length. The density must be constant over the interval (zero outside), and the distributionfunction increases linearly with
t in the interval. Thus,
The graph of
F
X rises linearly, with slope
from zero at
to
one at
.
Symmetric triangular
.
This distribution is used frequently in instructional numerical examples because probabilities
can be obtained geometrically. It can be shifted, with a shift of the graph, to differentsets of values. It appears naturally (in shifted form) as the distribution for the sum or
difference of two independent random variables uniformly distributed on intervals of the samelength. This fact is established with the use of the moment generating function
(see Transform Methods).More generally, the density may have a triangular graph which is not symmetric.
Use of a triangular distribution
Suppose
symmetric triangular
. Determine
.
Remark . Note that in the continuous case, it is immaterial whether the end
point of the intervals are included or not.
Solution
To get the area under the triangle between 120 and 250, we take one minus the area of
the right triangles between 100 and 120 and between 250 and 300. Using the fact thatareas of similar triangles are proportional to the square of any side, we have
Exponential
(zero elsewhere).
Integration shows
(zero elsewhere).
We note that
. This leads to
an extremely important property of the exponential distribution. Since
implies
, we have
Because of this property, the exponential distribution is often used in reliability problems.
Suppose
X represents the time to failure (i.e., the life duration) of a device put into service
at
. If the distribution is exponential, this property says that if the device
survives to time
t (i.e.,
) then the (conditional) probability it will survive
h more
units of time is the same as the original probability of surviving for
h units of time.
Many devices have the property that they do not wear out. Failure is dueto some stress of external origin. Many solid state electronic devices behave essentially
in this way, once initial “burn in” tests have removed defective units.Use of Cauchy's equation (Appendix B) shows that the exponential distribution is the only
continuous distribution with this property.
Gamma distribution
(zero elsewhere)
We have an m-function
gammadbn to determine values of the distribution function
for
gamma
. Use of moment generating functions shows
that for
, a random variable
gamma
has the same distribution
as the sum of
n independent random variables, each exponential
. A relation to
the Poisson distribution is described in Sec 7.5.
An arrival problem
On a Saturday night, the times (in hours) between arrivals in a hospital emergency unit
may be represented by a random quantity which is exponential
. As we show in the chapter
Mathematical Expectation ,
this means that the average interarrival time is 1/3 hour or 20 minutes. What is theprobability of ten or more arrivals in four hours? In six hours?
Solution
The time for ten arrivals is the sum of ten interarrival times. If we suppose these are
independent, as is usually the case, then the time for ten arrivals isgamma
.
>> p = gammadbn(10,3,[4 6])p = 0.7576 0.9846
Normal, or Gaussian We generally indicate that a random variable
X has the normal or gaussian distribution by
writing
, putting in the actual values for the parameters.
The gaussian distribution plays a central role in many aspects of applied probability theory, particularlyin the area of statistics. Much of its importance comes from the
central limit theorem (CLT), which is a term applied to a number of theorems in analysis. Essentially, the
CLT shows that thedistribution for the sum of a sufficiently large number of independent random variables has
approximately the gaussian distribution. Thus, the gaussian distribution appears naturallyin such topics as
theory of errors or theory of noise, where the quantity observed is an additive combinationof a large number of essentially independent quantities.
Examination of the expression shows that the graph for
is symmetric about its
maximum at
. The greater the parameter
σ
2 , the smaller the maximum
value and the more slowly the curve decreases with distance from
μ . Thus parameter
μ locates the center of the mass distribution and
σ
2 is a measure of the spread of mass
about
μ . The parameter
μ is called the
mean value and
σ
2 is
the
variance . The parameter
σ , the positive square root of the variance, is called the
standard deviation . While we have an explicit formula for the density function, it is
known that the distribution function, as the integral of the density function, cannot be expressedin terms of elementary functions.
The usual procedure is to use tables obtained bynumerical integration.
Since there are two parameters, this raises the question whether a separate table is needed
for each pair of parameters. It is a remarkable fact that this is not the case.We need only
have a table of the distribution function for
. This is refered to as
the
standardized normal distribution. We use
φ and
Φ for the standardized
normal density and distribution functions, respectively.
Standardized normal
so that the distribution function is
.
The graph of the density function is the well known bell shaped curve, symmetrical about
the origin (see
[link] ). The symmetry about the origin contributes to its usefulness.
Note that the area to the left of
is the same as the area to the right of
, so that
. The same is true for any
t , so that we have
This indicates that we need only a table of values of
for
to
be able to determine
for any
t . We may use the symmetry for any case. Note
that
,
Standardized normal calculations
Suppose
. Determine
and
.
Solution
1.
2.
From a table of standardized normal distribution function (see
Appendix D ), we find
General gaussian distribution For
, the density maintains
the bell shape, but is shifted with different spread and height.
[link] shows the
distribution function and density function for
. The density is centered
about
. It has height 1.2616 as compared with 0.3989 for the standardized
normal density. Inspection shows that the graph is narrower than that for thestandardized normal. The distribution function reaches 0.5 at the mean value 2.
A change of variables in the integral shows that the
table for standardized normal distribution function can be used for any case.
Make the change of variable and corresponding formal changes
to get
General gaussian calculation
Suppose
(i.e.,
and
). Determine
and
.
SOLUTION
In each case the problem reduces to that in
[link]
We have m-functions
gaussian and
gaussdensity to calculate
values of the distribution and density function for any reasonable value of the parameters.
The following are solutions of
[link] and
[link] , using the m-function gaussian.
[link] and
[link] show graphs of the densities for various values of
. The
usefulness comes in approximating densities on the unit interval. By using scaling andshifting, these can be extended to other intervals. The special case
gives
the uniform distribution on the unit interval. The Beta distribution is quite usefulin developing the Bayesian statistics for the problem of sampling to determine a
population proportion.If
are integers, the density function is a polynomial. For the general case
we have two m-functions,
beta and
betadbn to perform the calculatons.
Weibull The parameter
ν is a shift parameter. Usually we assume
. Examination
shows that for
the distribution is exponential
. The
parameter
α provides a distortion of the time scale for the exponential distribution.
[link] and
[link] show graphs of the Weibull density for some representative values of
α and
λ (
). The distribution is used in reliability theory. We
do not make much use of it. However, we have m-functions
weibull (density) and
weibulld (distribution function) for shift parameter
only. The shift can
be obtained by subtracting a constant from the
t values.