<< Chapter < Page | Chapter >> Page > |
Notice that we cannot form a density estimate by simply differentiating the empirical CDF, since this function contains discontinuities at thesample locations . Rather, we need to estimate the probability that a random variable willfall within a particular interval of the real axis. In this section, we will describe a common method known as the histogram .
Our goal is to estimate an arbitrary probability density function, , within a finite region of the -axis. We will do this by partitioning the region into equally spaced subintervals, or “bins”,and forming an approximation for within each bin. Let our region of support start at the value , and end at . Our subintervals of this region will be , , ..., . To simplify our notation we will define to represent the interval , , and define the quantity to be the length of each subinterval.
We will also define to be the probability that falls into .
The approximation in [link] only holds for an appropriately small bin width .
Next we introduce the concept of a histogram of a collection of i.i.d. random variables . Let us start by defining a function that will indicate whether ornot the random variable falls within .
The histogram of at , denoted as , is simply the number of random variables that fall within . This can be written as
We can show that the normalized histogram, , is an unbiased estimate of the probability of falling in . Let us compute the expected value of the normalized histogram.
The last equality results from the definition of , and from the assumption that the 's have the same distribution. A similar argument may be used to show that the variance of is given by
Therefore, as grows large, the bin probabilities can be approximated by the normalized histogram .
Using [link] , we may then approximate the density function within by
Notice this estimate is a staircase function of which is constant over each interval . It can also easily be verified that this density estimate integrates to 1.
Let be a uniformly distributed random variable on the interval [0,1]with the following cumulative probability distribution, :
We can calculate the cumulative probability distribution for the new random variable .
Plot for . Also, analytically calculate the probability density , and plot it for .
Using , and , use Matlab to compute , the probability of falling into .
stem
function.
stem
to plot
, and put all three plots
on a single figure using
subplot
.Generate 1000 samples of a random variable
that is uniformly distributed between 0 and 1
(using the
rand
command).
Then form the random vector
by computing
.
Use the Matlab function
hist
to plot a normalized
histogram for your samples of
, using 20 bins uniformly
spaced on the interval
.
H=hist(X,(0.5:19.5)/20)
to
obtain the histogram, and then normalize
H
.stem
command to plot the normalized histogram
and
together on the same figure using
subplot
.
Notification Switch
Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?