Developing winner prediction models of professional baseball using data mining techniques

博士 === 國立體育大學 === 體育研究所 === 99 === The purposes of this study were (a) to use data mining techniques to develop winner prediction models of professional baseball, (b) to analyze the betting odds of home/away team in order to construct the odds prediction models, (c) to determine the important variab...

Full description

Bibliographic Details
Main Authors: Chi-Wen Chen, 陳麒文
Other Authors: Chin-Hsung Kao
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/62916248454485292037
Description
Summary:博士 === 國立體育大學 === 體育研究所 === 99 === The purposes of this study were (a) to use data mining techniques to develop winner prediction models of professional baseball, (b) to analyze the betting odds of home/away team in order to construct the odds prediction models, (c) to determine the important variables affecting the winner models, (d) to find out the important variables affecting banker to set the betting odds of home/away team, (e) to compare the strengths and weaknesses of winner prediction models and odds prediction models, (f) to validate the winner prediction models through simulated betting. The data were from the season records of Yankees and Red Sox from 2006 to 2010. The winner prediction models and odds prediction models were established by discriminant analysis (DA), logistic regression analysis (LRA), artificial neural networks (ANNs), multivariate adaptive regression splines (MARS), and support vector machine (SVM). The results were as followed: 1. Overall, the winning percentage of Yankees was higher than the Red Sox both in home/away field. In addition, most records of pitching and batting in Yankees were better than Red Sox. Therefore, the “home field advantage” was obviously existed in Yankees. 2. In the betting odds of home/away field opened by bankers, the bankers still favored the team with “home field advantage.” Besides, the cross analysis were conducted using “banker favor winning team” and “real game result,” and the prediction accuracy rate of bankers was 56.04%. 3. Both in the winner prediction models and odds prediction models, MARS not only had the highest whole correct classification rate (83.33% and 88.89%, resptctively), but also could pick out the important variables. It was recommended to use MARS. 4. This study used the winner prediction model which was constructed by MARS to conduct simulated bettings, the results showed that MARS model had the whole correct classification rate of 77.78% in nine “Yankees vs. Red Sox” games.