ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery

Abstract We provide a prescription called ThickBrick to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six different performance measures....

Full description

Bibliographic Details
Main Authors: Konstantin T. Matchev, Prasanth Shyamsundar
Format: Article
Language:English
Published: SpringerOpen 2021-03-01
Series:Journal of High Energy Physics
Subjects:
Online Access:https://doi.org/10.1007/JHEP03(2021)291
id doaj-0f869208a93c4df7ad2659cd8e049cd1
record_format Article
spelling doaj-0f869208a93c4df7ad2659cd8e049cd12021-04-04T11:07:11ZengSpringerOpenJournal of High Energy Physics1029-84792021-03-012021315310.1007/JHEP03(2021)291ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discoveryKonstantin T. Matchev0Prasanth Shyamsundar1Institute for Fundamental Theory, Physics Department, University of FloridaInstitute for Fundamental Theory, Physics Department, University of FloridaAbstract We provide a prescription called ThickBrick to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six different performance measures. For analyses where the signal search is performed in the distribution of some event variables, our prescription ensures that only the information complementary to those event variables is used in event selection and categorization. This eliminates a major misalignment with the physics goals of the analysis (maximizing the significance of an excess) that exists in the training of typical ML-based event selectors and categorizers. In addition, this decorrelation of event selectors from the relevant event variables prevents the background distribution from becoming peaked in the signal region as a result of event selection, thereby ameliorating the challenges imposed on signal searches by systematic uncertainties. Our event selectors (categorizers) use the output of machine-learning-based classifiers as input and apply optimal selection cutoffs (categorization thresholds) that are functions of the event variables being analyzed, as opposed to flat cutoffs (thresholds). These optimal cutoffs and thresholds are learned iteratively, using a novel approach with connections to Lloyd’s k-means clustering algorithm. We provide a public, Python implementation of our prescription, also called ThickBrick, along with usage examples.https://doi.org/10.1007/JHEP03(2021)291Supersymmetry Phenomenology
collection DOAJ
language English
format Article
sources DOAJ
author Konstantin T. Matchev
Prasanth Shyamsundar
spellingShingle Konstantin T. Matchev
Prasanth Shyamsundar
ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
Journal of High Energy Physics
Supersymmetry Phenomenology
author_facet Konstantin T. Matchev
Prasanth Shyamsundar
author_sort Konstantin T. Matchev
title ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
title_short ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
title_full ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
title_fullStr ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
title_full_unstemmed ThickBrick: optimal event selection and categorization in high energy physics. Part I. Signal discovery
title_sort thickbrick: optimal event selection and categorization in high energy physics. part i. signal discovery
publisher SpringerOpen
series Journal of High Energy Physics
issn 1029-8479
publishDate 2021-03-01
description Abstract We provide a prescription called ThickBrick to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six different performance measures. For analyses where the signal search is performed in the distribution of some event variables, our prescription ensures that only the information complementary to those event variables is used in event selection and categorization. This eliminates a major misalignment with the physics goals of the analysis (maximizing the significance of an excess) that exists in the training of typical ML-based event selectors and categorizers. In addition, this decorrelation of event selectors from the relevant event variables prevents the background distribution from becoming peaked in the signal region as a result of event selection, thereby ameliorating the challenges imposed on signal searches by systematic uncertainties. Our event selectors (categorizers) use the output of machine-learning-based classifiers as input and apply optimal selection cutoffs (categorization thresholds) that are functions of the event variables being analyzed, as opposed to flat cutoffs (thresholds). These optimal cutoffs and thresholds are learned iteratively, using a novel approach with connections to Lloyd’s k-means clustering algorithm. We provide a public, Python implementation of our prescription, also called ThickBrick, along with usage examples.
topic Supersymmetry Phenomenology
url https://doi.org/10.1007/JHEP03(2021)291
work_keys_str_mv AT konstantintmatchev thickbrickoptimaleventselectionandcategorizationinhighenergyphysicspartisignaldiscovery
AT prasanthshyamsundar thickbrickoptimaleventselectionandcategorizationinhighenergyphysicspartisignaldiscovery
_version_ 1721543046656425984