<< Chapter < Page | Chapter >> Page > |
Just for example, if you say that if you want to estimate your next state ST+1 as a linear function of your previous state in your action and so A here will be a 4 by 4 matrix, and B would be a 4-dimensional vector, then you would choose the values of A and B that minimize this. So if you want your model to be that ST+1 is some linear function of the previous stated action, then you might pose this optimization objective and choose A and B to minimize the sum of squares error in your predictive value for ST+1 as the linear function of ST and AT. I should say that this is one specific example where you’re using a linear function of the previous stated action to predict the next state.
Of course, you can also use other algorithms like low [inaudible] weight to linear regression or linear regression with nonlinear features or kernel linear regression or whatever to predict the next state as a nonlinear function of the current state as well, so this is just [inaudible]linear problems. And it turns out that low [inaudible] weight to linear regression is for many robots turns out to be an effective method for this learning problem as well.
And so having learned to model, having learned the parameters A and B, you then have a model where you say that ST+1 is AST plus BAT, and so that would be an example of a deterministic model or having learned the parameters A and B, you might say that ST+1 is equal to AST + BAT + ?T. And so these would be very reasonable ways to come up with either a deterministic or a stochastic model for your inverted pendulum MDP. And so just to summarize, what we have now is a model, meaning a piece of code, where you can input a state and an action and get an ST+1. And so if you have a stochastic model, then to influence this model, you would actually sample ?T from this [inaudible] distribution in order to generate ST+1.
So it actually turns out that in a preview, I guess, in the next lecture it actually turns out that in the specific case of linear dynamical systems, in the specific case where the next state is a linear function of the current stated action, it actually turns out that there’re very powerful algorithms you can use. So I’m actually not gonna talk about that today. I’ll talk about that in the next lecture rather than right now but turns out that for many problems of inverted pendulum go if you use low [inaudible] weights and linear regressionals and long linear algorithm ‘cause many systems aren’t really linear. You can build a nonlinear model.
So what I wanna do now is talk about given a model, given a simulator for your MDP, how to come up with an algorithm to approximate the alpha value function piece. Before I move on, let me check if there’re questions on this. Okay, cool. So here’s the idea. Back when we talked about linear regression, we said that given some inputs X in supervised learning, given the input feature is X, we may choose some features of X and then approximate the type of variable as a linear function of various features of X, and so just do exactly the same thing to approximate the optimal value function, and in particular, we’ll choose some features 5-S of a state S.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?