Tag N’ Train: a technique to train improved classifiers on unlabeled data

Abstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefo...

Full description

Bibliographic Details
Main Authors: Oz Amram, Cristina Mantilla Suarez
Format: Article
Language:English
Published: SpringerOpen 2021-01-01
Series:Journal of High Energy Physics
Subjects:
Online Access:https://doi.org/10.1007/JHEP01(2021)153
id doaj-2759ec89276f406383a2592a7219b607
record_format Article
spelling doaj-2759ec89276f406383a2592a7219b6072021-01-31T12:13:14ZengSpringerOpenJournal of High Energy Physics1029-84792021-01-012021112110.1007/JHEP01(2021)153Tag N’ Train: a technique to train improved classifiers on unlabeled dataOz Amram0Cristina Mantilla Suarez1Department of Physics and Astronomy, The Johns Hopkins UniversityDepartment of Physics and Astronomy, The Johns Hopkins UniversityAbstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.https://doi.org/10.1007/JHEP01(2021)153Jet substructureBeyond Standard ModelExoticsHadron-Hadron scattering (experiments)Jets
collection DOAJ
language English
format Article
sources DOAJ
author Oz Amram
Cristina Mantilla Suarez
spellingShingle Oz Amram
Cristina Mantilla Suarez
Tag N’ Train: a technique to train improved classifiers on unlabeled data
Journal of High Energy Physics
Jet substructure
Beyond Standard Model
Exotics
Hadron-Hadron scattering (experiments)
Jets
author_facet Oz Amram
Cristina Mantilla Suarez
author_sort Oz Amram
title Tag N’ Train: a technique to train improved classifiers on unlabeled data
title_short Tag N’ Train: a technique to train improved classifiers on unlabeled data
title_full Tag N’ Train: a technique to train improved classifiers on unlabeled data
title_fullStr Tag N’ Train: a technique to train improved classifiers on unlabeled data
title_full_unstemmed Tag N’ Train: a technique to train improved classifiers on unlabeled data
title_sort tag n’ train: a technique to train improved classifiers on unlabeled data
publisher SpringerOpen
series Journal of High Energy Physics
issn 1029-8479
publishDate 2021-01-01
description Abstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.
topic Jet substructure
Beyond Standard Model
Exotics
Hadron-Hadron scattering (experiments)
Jets
url https://doi.org/10.1007/JHEP01(2021)153
work_keys_str_mv AT ozamram tagntrainatechniquetotrainimprovedclassifiersonunlabeleddata
AT cristinamantillasuarez tagntrainatechniquetotrainimprovedclassifiersonunlabeleddata
_version_ 1724317367985504256