<< Chapter < Page Chapter >> Page >

And so one other thing about it is you can, to be slightly more efficient – efficiency isn’t really the issue, but if you want you can actually forget about the fi T’s. You actually don’t need to compute that at all. Now, the other interesting property of this is that the matrix sigma W appears only in my DV update for the si T’s. It doesn’t actually appear in my updates for the fi T’s. So you remember – well, my model was that ST plus one equals ATST plus VT, AT plus WT where these noise terms, WT, had a covariance sigma W and so the only place that appears – the covariance of the noise terms of appears is in those IT’s, but I just said that they can do this entire [inaudible] ordering algorithm without the si T’s. So what this means is that you can actually find the optimal policy without knowing what the covariance of the noise terms are. Okay?

So this is a very special property of LQR systems and once you change anything, once you go away from a linear dynamical system, or once you change almost any aspect of this because at discreet states or discreet actions or whatever and once you change almost any aspect of this problem this property will no longer hold true because this is a very special property of LQR systems that the optimal policy does not actually depend on the noise magnitude of these noise terms. Okay? And the only important property is that the noise function of zero mean. So there’s this intuition that to compute the optimal policy you can just ignore the noise terms. Or as if, so as long as you know the expected value of your state ST plus one – write down. On average ST plus one is ATST plus BTAT, then there’s as if you can ignore the noise in your next state ST plus one. And the optimal policy doesn’t change. Okay?

So we’ll actually come back to this in a minute. Later on we’ll talk about Kalman filters. We’ll actually use this property of LQR systems. Just to point out, note that the value function does depend on the noise covariance, right? The value function here does depend on si T. So the larger the noise in your system the worse your value function. So this does depend on the noise, but it’s the optimal policy that doesn’t depend on the noise. We’ll use this property later. Okay. So let’s see how we’re doing on time. Let’s see. Right. Okay. So let’s put this aside for now. What I want to do now is tell you about one specific way of applying LQR that’s called differential dynamic programming. And as in most of the example, think of try to control a system, like a helicopter or a car or even a chemical plant, with some continuous state. So for the sake of thinking through this example, just imagine trying to control a helicopter. And let’s say you have some simulator that espece with the next state is a function of the previous data in action, right? And for this let’s say the model of your simulator is non-linear, but and determinalistic. Okay? So I say just now, that the noise terms don’t matter very much. So let’s just work with the term simulator for now, but let’s say F is non-linear. And let’s say there’s some specific trajectory that you want the helicopter to follow, all right? So I want to talk about how to apply LQR to get helicopter or a car or a chemical plant where your state variables may depend on the amounts of different chemicals and the mixes of chemicals you have in different batch, really. It’s really easy to think about a helicopter. Let’s say there’s some trajectory you want the helicopter to follow. So here’s what the differential dynamic programming it does. First step is come up with what I’m gonna call some nominal trajectory, right? And so we’re gonna call this S zero A zero. Okay? So one way to come up with this would be if you had some very bad controller – someone hacked a controller for flying a helicopter is not a good controller at all. But you might then go ahead and fly the helicopter using a very bad, a very sloppy, controller and you get out some sequence of states and actions, right? So I’m gonna – and I just call this sequence of states and actions the trajectory – the nominal trajectory. Then I will linearize F around this normal trajectory. Okay? So i.e., right? I’ll use that same thing. So for times si T our approximate ST plus one, as this linearization thing that we just saw, times – plus the other term. Okay? And then you distill this down to sum ATST plus BTST. Okay? So this will actually be the first time that I’ll make explicit use of the ability of LQR or these finer horizon problems to handle non-stationery dynamics. In particular, for this example, it will be important that AT and BT depend on time – oh, excuse me. Okay?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask