<< Chapter < Page | Chapter >> Page > |
Something we’ll do later today – I don’t want to do this now, but you actually go to a website, like, use.google.com and that’s an example of a website that uses a clustering algorithm to everyday group related news articles together to display to you so that you can see one of the thousand news articles today on whatever the top story of today is and all the 500 news articles on all the different websites on different story of the day. And a very solid [inaudible] actually talks about image segmentation, which the application of when you might take a picture and group together different subsets of the picture into coherent pieces of pixels to try to understand what’s contained in the picture. So that’s yet another application of clustering. The next idea is given a data set like this, given a set of points, can you automatically group the data sets into coherent clusters. Let’s see. I’m still waiting for the laptop to come back so I can show you an example. You know what, why don’t I just start to write out the specific clustering algorithm and then I’ll show you the animation later. So this is the called the k-means clustering algorithm for finding clustering’s near the inset. The input to the algorithm will be an unlabeled data set which I write as X1, X2, [inaudible]and because we’re now talking about unsupervised learning, you see a lot of this as [inaudible] with just the Xs and no cross labels Y. So what a k-means algorithm does is the following.
This will all make a bit more sense when I show you the animation on my laptop. To initialize a set of points, called the cluster centroids, [inaudible] randomly and so if you’re [inaudible]of training data are [inaudible] then your cluster centroids, these muse, will also be vectors and [inaudible]and then you repeat until convergence the following two steps. So the cluster centroids will be your guesses for where the centers of each of the clusters are and so in one of those steps you look at each point, XI and you look at which cluster centroid J is closest to it and then this step is called assigning your point XI to cluster J. So looking at each point and picking the cluster centroid that’s closest to it and the other step is you update the cluster centroids to be the median of all the points that have been assigned to it. Okay. Let’s see. Could you please bring down the display for the laptop? Excuse me. Okay. Okay. There we go. Okay. So here’s an example of the k-means algorithm and hope the animation will make more sense. This is an inch chopped off. This is basically an example I got from Michael Jordan in Berkley. So these points in green are my data points and so I’m going to randomly initialize a pair of cluster centroids. So the [inaudible] blue crosses to note the positions of New1 and New2 say if I’m going to guess that there’s two clusters in this data. Sets of k-means algorithms as follow. I’m going to repeatedly go to all of the points in my data set and I’m going to associate each of the green dots with the closer of the two cluster centroids, so visually, I’m gonna denote that by painting each of the dots either blue or red depending on which is the closer cluster centroid. Okay.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?