<< Chapter < Page Chapter >> Page >

So, of course, this is just one example of how you might debug a reinforcement learning algorithm. And these particular set of diagnostics happen to apply only because we’re fortunate enough to be able to an amazingly good human pilot who can fly a helicopter for us, right? If you didn’t have a very good pilot then maybe some of these diagnostics won’t apply and you’ll have to come up with other ones. But want to go through this again as an example of the source of diagnostics you use to debug a reinforcement learning algorithm. And the point of this example isn’t so much that I want you to remember the specifics of the diagnostics, right? The point of this is really for your own problem, be it supervised learning, unsupervised learning, reinforcement learning, whatever. You very often have to come up with your own diagnostics, your own debugging tools, to figure out why an algorithm is working and why an algorithm isn’t working. And this is one example of what we actually do on the helicopter. Okay? Questions about this? Yeah, Justin?

Student: I’m just curious how do you collect, like, training data? Like in our homework the pendulum fell over lots of times, but how do you work that with an expensive helicopter?

Instructor (Andrew Ng) :Yeah. So, I see, right. So on the helicopter the way we collect data to learn [inaudible] and improbabilities is we usually – not – done lots of things, but to first approximation, the most basic merger is if you ask a human pilot to just fly the helicopter around for you and he’s not gonna crash it. So you can collect lots of data as is being controlled by a human pilot. And, as I say, it turns out in data collection the, sort of, a few standard ways to collect data are to let you – so a helicopter can do lots of things and you don’t want to collect data representing only one small part of the flight regime. So concretely what we often do is ask the pilot to carry out frequency sweeps and what that means, very informally, is you imagine them holding a control stick, right? In their hand. Frequency sweeps are a process where you start off making very slow oscillations and then you start taking your control stick and moving it back and forth faster and faster, so you sort of sweep out the range of frequencies ranging from very slow slightly [inaudible]by oscillations until you go faster and faster and you’re sort of directing the control seat back and forth. So this is – oh, good. Thank you. So that’s, sort of, one standard way that we use to collect data on various robotics. It may or may not apply to different robots or to different systems you work on. And, as I say, in reality we do a lot of things. Sometimes we have a controllers collect data autonomously too, but that’s other more complex algorithms. Anything else?

Student: In the first point the goal communicates that the other is perhaps not mapping the actions similar to the simulator, so in [inaudible] this could be very common that you could have [inaudible]? Just any specific matter that pilots use to take care of that variable, so all [inaudible]?

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask