<< Chapter < Page | Chapter >> Page > |
Suppose we make noisy measurements of a smooth function:
where
and
The unknown function is a map
In Lecture 4 , we consider this problem in the case where was Lipschitz on That is, satisfied
where is a constant. In that case, we showed that by using a piecewise constant function on a partition of equal-size bins [link] we were able to obtain an estimator whose mean square error was
In this lecture we will use the Maximum Complexity-Regularized Likelihood Estimation result we derived in Lecture 14 to extend our denoising scheme in several important ways.
To begin with let's consider a broader class of functions.
For define the space of functions
for some constant and where above contains functions that are bounded, but less smooth than Lipschitz functions. Indeed, the space of Lipschitzfunctions can be defined as ( )
for Functions in are continuous, but those in are not in general.
Let's also consider functions that are smoother than Lipschitz. If where then define
In other words, contains Lipschitz functions that are also differentiable and their derivatives are Hölder smooth with smoothness
And finally, let
contain functions that have continuous derivatives, but that are notnecessarily twice-differentiable.
If , then we say that is Hölder smooth with Hölder constant The notion of Hölder smoothness can also be extended to in a straightforward way.
Note: If then
Summarizing, we can describe Hölder spaces as follows. If for some and then
Note that in general there is a natural relationship between the Hölder space containing the function and the approximation classused to estimate the function. Here we will consider functions which are Hölder smooth where and work with piecewise linear approximations. If we were to consider smootherfunctions, we would need consider higher order approximation functions, i.e. quadratic, cubic, etc.
Now let's assume for some unknown ; i.e. we don't know how smooth is. We will use our observations
to construct an estimator Intuitively, the smoother is, the better we should be able to estimate it. Can we take advantage ofextra smoothness in if we don't know how smooth it is? The smoother is, the more averaging we can perform to reduce noise. In other words for smoother we should average over larger bins. Also, we will need to exploit the extra smoothnessin our approximation of To that end, we will consider candidate functions that are piecewiselinear functions on uniform partitions of Let
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?