The Prediction of Batting Averages in Major League Baseball

The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provid...

Full description

Bibliographic Details
Main Authors: Sarah R. Bailey, Jason Loeppky, Tim B. Swartz
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/3/2/8
id doaj-f938ab5d88784239879e06723b821820
record_format Article
spelling doaj-f938ab5d88784239879e06723b8218202020-11-25T02:21:57ZengMDPI AGStats2571-905X2020-04-0138849310.3390/stats3020008The Prediction of Batting Averages in Major League BaseballSarah R. Bailey0Jason Loeppky1Tim B. Swartz2Department of Statistics and Actuarial Science, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A1S6, CanadaDepartment of Computer Science, Mathematics, Physics and Statistics, University of British Columbia Okanagan, 3187 University Way, Kelowna, BC VIV1V7, CanadaDepartment of Statistics and Actuarial Science, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A1S6, CanadaThe prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided by the Statcast system in an attempt to predict batting averages. The publicly available Statcast data and resultant predictions supplement proprietary PECOTA forecasts. With detailed Statcast data, we attempt to account for a luck component involving batting averages. It is anticipated that the luck component will not be repeated in future seasons. The two predictions (Statcast and PECOTA) are combined via simple linear regression to provide improved forecasts of batting average.https://www.mdpi.com/2571-905X/3/2/8big dataforecastinglogistic regressionPECOTAStatcast
collection DOAJ
language English
format Article
sources DOAJ
author Sarah R. Bailey
Jason Loeppky
Tim B. Swartz
spellingShingle Sarah R. Bailey
Jason Loeppky
Tim B. Swartz
The Prediction of Batting Averages in Major League Baseball
Stats
big data
forecasting
logistic regression
PECOTA
Statcast
author_facet Sarah R. Bailey
Jason Loeppky
Tim B. Swartz
author_sort Sarah R. Bailey
title The Prediction of Batting Averages in Major League Baseball
title_short The Prediction of Batting Averages in Major League Baseball
title_full The Prediction of Batting Averages in Major League Baseball
title_fullStr The Prediction of Batting Averages in Major League Baseball
title_full_unstemmed The Prediction of Batting Averages in Major League Baseball
title_sort prediction of batting averages in major league baseball
publisher MDPI AG
series Stats
issn 2571-905X
publishDate 2020-04-01
description The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided by the Statcast system in an attempt to predict batting averages. The publicly available Statcast data and resultant predictions supplement proprietary PECOTA forecasts. With detailed Statcast data, we attempt to account for a luck component involving batting averages. It is anticipated that the luck component will not be repeated in future seasons. The two predictions (Statcast and PECOTA) are combined via simple linear regression to provide improved forecasts of batting average.
topic big data
forecasting
logistic regression
PECOTA
Statcast
url https://www.mdpi.com/2571-905X/3/2/8
work_keys_str_mv AT sarahrbailey thepredictionofbattingaveragesinmajorleaguebaseball
AT jasonloeppky thepredictionofbattingaveragesinmajorleaguebaseball
AT timbswartz thepredictionofbattingaveragesinmajorleaguebaseball
AT sarahrbailey predictionofbattingaveragesinmajorleaguebaseball
AT jasonloeppky predictionofbattingaveragesinmajorleaguebaseball
AT timbswartz predictionofbattingaveragesinmajorleaguebaseball
_version_ 1724864384043319296