<< Chapter < Page | Chapter >> Page > |
Instructor (Andrew Ng) :Yeah, right. So you’re saying like point one may not be simulated accurate; it may be that the hardware is doing something strange with the control. There is concluding the controls through some action and through some strange transformation because it’s actually simulating it as a helicopter. I don’t know. Yeah. I’ve definitely seen that happen on some other robots before. So maybe diagnostic one here is a better form as deciding whether the simulator matches the actual hardware, I don’t know. Yeah. That’s another class of those to watch out for. If you suspect that’s the case, I can’t think of a good, sort of, diagnostic right now to confirm that, but that, again, before – that would be a good thing to try to come up with a diagnostic for to see if there might be something wrong with the hardware. And I think these are by no means the definitive diagnostics or anything like that. It’s just sort of an example, but it would be great if you come up with other diagnostics to check that the hardware is working properly that would be a great thing to do, too. Okay. Is this – okay, last couple of questions before we move on?
Student: You said the reward function was?
Instructor (Andrew Ng) :Oh, in this example, what I was just using a quadratic constant. On the helicopter we often use things that are much more complicated.
Student: [Inaudible] You have no way of knowing what is the, like, desired focus?
Instructor (Andrew Ng) :Yeah. So you can, sort of, figure out where – ask the human pilot to hover in place and guess what his desire was. Again, these aren’t constants telling you to, yeah.
Student: [Inaudible] physics that are based learning problems. Do you – does it actually work best to use a physical finian model? You know, just, sort of, physics tell or you just sort of do learning on [inaudible]?
Instructor (Andrew Ng) :Yeah. The physics models work well, right? So the answer is it varies a lot from problem to problem. It turns out that the aerodynamics of helicopters, I think, aren’t understood well enough that you can look at the “specs” of a helicopter and build a very good physics simulator. So on the helicopter, we actually learn the dynamics. And I don’t know how to build a physics model. For other problems, like if you actually have an inverted pendulum problem or something, there are many other problems for which the aerodynamics are much better understood and for which physic simulators work perfectly fine, but it depends a lot – on some problems physic simulators work great on some they probably aren’t great on all. Okay? Cool.
So, I guess, retract the chalkboard, please. All right. So how much time do I have? Twenty minutes. Okay. So let’s go back to our discussion on LQR Control, linear-quadratic regulation control, and then I want to take that a little bit further and tell you about the, sort of, one variation on the LQR called differential dynamic programming. Just to recap, just to remind you of what LQR, or linear-quadratic regulation control, is, in the last lecture I defined a horizon problem where your goal is to maximize, right? Just sort of find the horizon sum of rewards, and there’s no discounting anymore, and then we came up with this dynamic programming algorithm, right? Okay.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?