<< Chapter < Page | Chapter >> Page > |
Student: [Inaudible]. [Inaudible]clusters [inaudible] far away from the data point [inaudible]points and the same cluster?
Instructor (Andrew Ng) :I see. Right. So yes. K-means is susceptible to [inaudible] so this function, J of C, new is not a convex function and so k-means, sort of called in a sense on the non-convex function is not guaranteed to converge the [inaudible]. So k-means is susceptible to local optimal and [inaudible]. One thing you can do is try multiple random initializations and then run clustering a bunch of times and pick the solution that ended up with the lowest value for the distortion function. Yeah.
Student: [Inaudible]
Instructor (Andrew Ng) :Yeah, let’s see. Right. So what if one cluster centroid has no points assigned to it, again, one thing you could do is just eliminate it exactly the same. Another thing you can is you can just reinitialize randomly if you really [inaudible]. More questions. Yeah.
Student: [Inaudible] as a norm or can you [inaudible]or infinity norm or –
Instructor (Andrew Ng) :I see. Right. Is it usually two norms? Let’s see. For the vast majority of applications I’ve seen for k-means, you do take two norms when you have data [inaudible]. I’m sure there are others who have taken infinity norm and one norm as well. I personally haven’t seen that very often, but there are other variations on this algorithm that use different norms, but the one I described is probably the most commonly used there is. Okay. So that was k-means clustering. What I want to do next and this will take longer to describe is actually talk about a closely related problem. In particular, what I wanted to do was talk about density estimation. As another k-means example, this is a problem that I know some guys that worked on. Let’s say you have aircraft engine building off an assembly. Let’s say you work for an aircraft company, you’re building aircraft engines off the assembly line and as the aircraft engines roll off the assembly line, you test these aircraft engines and measure various different properties of it and to use [inaudible] example I’m gonna write these properties as heat and vibrations. Right.
In reality, you’d measure different vibrations, different frequencies and so on. We’ll just write the amount of heat produced and vibrations produced. Let’s say that maybe it looks like that and what you would like to do is estimate the density of these [inaudible] of the joint distribution, the amount of heat produced and the amount of vibrations because you would like to detect [inaudible]so that as a new aircraft engine rolls off the assembly line, you can then measure the same heat and vibration properties. If you get a point there, you can then ask, “How likely is it that there was an undetected flaw in this aircraft engine that it needs to go undergo further inspections?” And so if we look at the typical distribution of features we get, and we build a model for P of X and then if P of X is very small for some new aircraft engine then that would raise a red flag and we’ll say there’s an anomaly aircraft engine and we should subject it to further inspections before we let someone fly with the engine. So this problem I just described is an instance of what is called anomaly detection and so a common way of doing anomaly detection is to take your training set and from this data set, build a model, P of X of the density of the typical data you’re saying and if you ever then see an example with very low probability under P of X, then you may flag that as an anomaly example.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?