<< Chapter < Page | Chapter >> Page > |
If you want an example that is kind of fun but unusual is, and I’m just gonna use this as an example and actually use this little bit example in today’s lecture is the inverted pendulum problem which is sort of a long-running classic in reinforcement learning in which imagine that you have a little cart that’s on a rail. The rail ends at some point and if you imagine that you have a pole attached to the cart, and this is a free hinge and so the pole here can rotate freely, and your goal is to control the cart and to move it back and forth on this rail so as to keep the pole balanced. Yeah, there’s no long pole in this class but you know what I mean, so you can imagine. Oh, is there a long pole here?
Student: Back in the corner.
Instructor (Andrew Ng) :Oh, thanks. Cool. So I did not practice this but you can take a long pole and sort of hold it up, balance, so imagine that you can do it better than I can. Imagine these are [inaudible] just moving back and forth to try to keep the pole balanced, so you can actually us the reinforcement learning algorithm to do that. This is actually one of the longstanding classic problems that people [inaudible]implement and play off using reinforcement learning algorithms, and so for this, the states would be X and T, so X would be the position of the cart, and T would be the orientation of the pole and also the linear velocity and the angular velocity of the pole, so I’ll actually use this example a couple times.
So to read continuous state space, how can you apply an algorithm like value iteration and policy iteration to solve the MDP to control like the car or a helicopter or something like the inverted pendulum? So one thing you can do and this is maybe the most straightforward thing is, if you have say a two-dimensional continuous state space, a S-1 and S-2 are my state variables, and in all the examples there are I guess between 4-dimensional to 12-dimensional. I’ll just draw 2-D here. The most straightforward thing to do would be to take the continuous state space and discretize it into a number of discrete cells.
And I use S-bar to denote they’re discretized or they’re discrete states, and so you can [inaudible] with this continuous state problem with a finite or discrete set of states and then you can use policy iteration or value iteration to solve for V*(s)-bar and ?*(s)-bar. And if you’re robot is then in some state given by that dot, you would then figure out what discretized state it is in. In this case it’s in, this discretized dygrid cell that’s called S-bar, and then you execute. You choose the policy. You choose the action given by applied to that discrete state, so discretization is maybe the most straightforward way to turn a continuous state problem into a discrete state problem.
Sometimes you can sorta make this work but a couple of reasons why this does not work very well. One reason is the following, and for this picture, let’s even temporarily put aside reinforcement learning. Let’s just think about doing regression for now and so suppose you have some invariable X and suppose I have some data, and I want to fill a function. Y is the function of X, so discretization is saying that I’m going to take my horizontal Xs and chop it up into a number of intervals. Sometimes I call these intervals buckets as well. We chop my horizontal Xs up into a number of buckets and then we’re approximate this function using something that’s piecewise constant in each of these buckets. And just look at this.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?