<< Chapter < Page | Chapter >> Page > |
Events which contain the desired t-tbar events are inevitably accompanied by a much larger number of undesired background events. The vast majority of these events initially involve the creation of much lighter quarks – these are called “QCD" events, named after the theory that describes the behavior of these particles, quantum chromodynamics. QCD events are several million times more common than t-tbar events at the energies presently used at the LHC. A minority of background events involve the creation of W bosons. Although these W events are rarer, occurring at the rate of “only" a few hundred per t-tbar event, their detectable features are very similar to those of t-tbar events and are thus they are harder to distinguish from the desired t-tbar signal.
The author initially got started investigating this classification problem as part of an independent study course taken with Dr. Paul Padley. Dr. Padley works at Bonner Lab at Rice University and is manager of the Endcap Muon Subdetector, a large component of the Compact Muon Solenoid (CMS) experiment at the LHC. The course involved learning background about particle physics necessary to be able to understand the classification problem.
The strategy over the summer was to figure out how to use a popular event generating program called Pythia in conjunction with a popular machine learning toolkit called WEKA. The author wrote a small program in C to interface with Pythia to generate a large number of t-tbar, W, and QCD events. The C program ran the events through a simple filter, a “trigger" in physics parlance, to quickly eliminate a large number of events that had a relatively low probability of being t-tbar events. (The trigger filters out 95% of t-tbar events, but 99.9% of the QCD background events.) Sufficient data was collected to form a training data set and a test data set; each set had 10,000 of each type of event.
When Pythia generates an event, it makes available a wide array of information useful for generating features. The chosen features for use in this project included how many of each type of lepton (electrons, muons, and tau particles) were created in each event, how many “jets" corresponding to quarks were generated, and the minimum angle between any pair of quark jets. In addition, missing transverse momentum, indicative of an invisible neutrino, was measured. Finally, the total transverse energy (energy perpendicular to the beam axis) of all of the quark jets was measured. A large transverse energy is strongly associated with “head-on" collisions capable of releasing enough energy to make top quarks. “It turns out" that transverse energy is the most important single feature for identifying top quarks.
Because of the author's lack of experience with machine learning, a script was developed that ran each of the classifiers in WEKA against the training data and extracted the relevant statistics. Each of the classifiers was run with default parameters. For each classifier, the number of t-tbar events that were correctly identified (true positives) was determined, as well as the number of QCD and W events that were misclassified as t-tbar events (false positives.) From the ratio of these numbers and the cross-sections of the relevant pathways, it was possible to determine the total beam luminosity needed to confirm the existence of top quarks in a set of events using each classifier.
Notification Switch
Would you like to follow the 'Introductory survey and applications of machine learning methods' conversation and receive update notifications?