<< Chapter < Page | Chapter >> Page > |
Student: [Inaudible]
Instructor (Andrew Ng) :In this sample? Yes, yeah, right, yeah. So I’m going to – let’s pick some kind of horizon T. So I’m going to run through my entire trajectory in my simulator, so I end up with a new nominal trajectory to linearize around, right? Okay? Yeah?
Student: So does this method give you, like, a Gaussian for performing, like, a certain action? Like you talked about, like, the 90-degree turn thing or something.
Instructor (Andrew Ng) :Right.
Student: So is this from one, like, is this from one, like, one 90-degree turn or can you [inaudible]?
Instructor (Andrew Ng) :Yeah. So it turns out what – so this is used clear – let’s see. Go and think about this as if there’s a specific trajectory that you want to follow. I’m just gonna, car or helicopter or it could be in a chemical plant, right? If there’s some specific sequence of states you expect the system to go through over time, so that you actually want to linearize at different times – excuse me. So, therefore, the different times you want different linear approximations, your dynamics, right? So I actually start to laugh over stationary simulator, right? I mean, this function F, it may be the same function F for all time steps, but the point of DDP is that I may want to use different linearizations for different time steps. So a lot of the inner loop of the algorithm is just coming up with better and better places around to linearize. Where at different times I’ll linearize around different points. Does that make sense? Cool. Okay, cool. So that was DDP.
And I’ll show examples of DDP results in the next lecture. So the last thing I wanted to do was talk about Kalman filters and LQG control, linear-quadratic Gaussian control. And what I want to do is actually talk about a different type of MDP problem where we don’t get to observe the state explicitly, right? So far in every one I’ve been talking about, I’ve been assuming that every time step you know what the state of the system is, so you can compute a policy to some function of the state is in. If you’ve all ready had that, you know, the action we take is LT times ST, right? So to compute the action you need to know what the state is.
What I want to do now is talk about the different type of problem where you don’t get to observe the state explicitly. The fact – before we even talk about the control let me just talk about the different problem where – just forget about control for now and just look at some dynamical systems where you may not get to observe the state explicitly and then only later we’ll tie this back to controlling some systems. Okay? As a concrete example, let’s say as, sort of, just an example to think about, imagine using a radar to track the helicopter, right? So we may model a helicopter, and this will be an amazingly simplified model of a helicopter, as, you know, some linear dynamical systems. So [inaudible] ST plus one equals AST plus WT, and we’ll forget about controls for now, okay? We’ll fill the controls back in. And just with this example, I’m gonna use an extremely simplified state, right? Where my state is just a position in velocity in the X and Y directions, so you may choose an A matrix like this as a – okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?