<< Chapter < Page | Chapter >> Page > |
In contrast, the Kalman filter algorithm, like I said, over here. I just have the update step. On the other board I had the predict step. But you can carry out the computation on both of these lines and it’s actually constant time. So on every time step you perform these Kalman filter updates. So if every time you get one more observation you perform one more Kalman filter update and the computation of that doesn’t depend on it’s – or the one time for every time step. So the amount of stuff you need to keep around in memory doesn’t grow linearly with the number of time steps you see. Okay? Because – actually what – I think I just realized why – so, yes. Actually this is the way we actually run Kalman filters, which is initially I have just my first observation. So I then compute P of X1 given Y1, right? And now I know why I think my helicopter is at time step one. Having computed this there may be some time passes, like a second passes, and then I get another observation and what I’ll do is I’ll combine these two together to get P of X2 given Y1 and Y2, right? And then may be another second passes in time and I get another observation. So my helicopters move a little bit more, because another second’s passed and I get another observation. What I do is I combine these two to compute P of SV given Y1, Y2, Y3. And it turns out that in order to compute this I don’t need to remember any of these earlier observations. Okay? So this is how you actually run it in real time say. Okay? Cool. So – oh, drat, running out of time. The last thing I want to do is actually put these things together. So putting it together – putting Kalman filters together with LQR control you get an algorithm called LQG control, which stands for linear-quadratic Gaussian. But in this type of control problem, we have a linear dynamical system. So I’m now adding actions back in, right? So now B times AT. Okay? And then, so LQG problem, or linear-quadratic Gaussian problem, I have a linear dynamical system that I want to control and I don’t get to observe the states directly. I only get to observe these variables YT. Okay? So I only get noisy observations of the actual state. So it turns out that you can solve an LGG control problem as follows. At every time step, we’ll use a Kalman filter to estimate the state, right? So concretely – let’s say you know the initial state. Then you initialize this to be like that. If you know that the initial state is some state as zero, you initialize that as zero and that or, whatever, right? And this is just – well, okay? If you don’t know the initial state exactly, then this is just a mean of your initial state estimate and that would be your covariance or your initial state estimate. So just initialize your Kalman filter this way. And then you use the Kalman filter on every step to estimate what the state is. So here’s the predict step, right? Previously we had ST plus one give T equals – and so on. So this is your predict step and then you have an update step, same as before. The one change I’m gonna make to the predict step is now I’m going to take this into account as well. This is just saying suppose my previous state was ST given T, what do I think my next state ST plus one given T will be given no other observations and the answer is, you know, it’s really just this equation, AST given T plus BAT.
And then, so this takes care of, sort of, the observations. And then the other thing you do is compute LT’s using LQR, right? Assuming – then the other thing you do is you just look at the linear dynamical systems, and forget about the observations for now, and compute the optimal policy – oh, right. Previously we had that you would choose actions AT equals to LT times ST, right? So the optimal policy we said was these matrixes, LT times ST. So the other part of this problem you would use LQR to compute these matrixes LT, ignoring the fact that you don’t actually observe the state. And the very final step of LQR control is that – well, when you’re actually flying a helicopter, when you’re actually doing whatever you’re doing, you can’t actually plug in the actual state because in LGG problem you don’t get to observe the state exactly. So what you do when you actually execute the policy is you choose the action according to your best estimate of the state. Okay?
So in other words, you don’t know what ST is, but your best estimate of the state at any time is this S of T given T. So you just plug those in and take LT times your best estimate of the state and then you go ahead and execute the action AT on your system, on your helicopter, or whatever. Okay? And it turns out that for this specific class of problems, this is actually optimal procedure. This will actually cause you to act optimally in your LQG problem. And there’s this intuition that, earlier I said, in LQR problems it’s almost as if the noise doesn’t matter and in a pure LQR problem the WT terms don’t matter. It’s as if you can ignore the noise. So it turns out that by elaborating that proof, which I’m not gonna do you can – you’re welcome to proof for yourself at home. It’s that intuition means that you can actually ignore the noise in your observations as well. The ST given T is some of your best estimate. So it’s as if your true state ST is equal to ST given T plus noise. So in LQG control, what we’re going to do is ignore the noise and just plug in this ST given T and this turns out the optimal thing to do. I should say, this turns out to be a very special case of a problem where you can ignore the noise and still act optimally and this property – this actually is something called the separation principle where you can design an algorithm for estimate the states and design an algorithm for controlling your system. So just glom the two together and that turns out to be optimal. This is a very unusual property and it pretty much was true only for LQG. It doesn’t hold true for many systems. Once you change anything, one’s that’s non-linear, you know, some other noise model of one that’s non-linear once this – I don’t know. Once you change almost anything in this problem this will no longer hold true. The – and just estimate the states and plug that into a controller that was designed, assuming you could observe the states fully. But that once you change almost anything this will no longer turn out to be optimal. But for the LQG problem specifically, it’s kind of convenient that you can do this. Just one quick question to actually close
Student: [Inaudible]
Instructor (Andrew Ng) :Oh, yes. Yeah. In every embassy wing – in everything I’ve described I’m assuming that you’re already learned A and B or something, so to –
Student: [Inaudible]
Instructor (Andrew Ng) :Yeah, right. Okay. Sorry we’re running a little bit late; let’s close for today and next time I’ll talk a bit more about these partially observed problems.
[End of Audio]
Duration: 79 minutes
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?