Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases

Premise Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees fo...

Full description

Bibliographic Details
Main Authors: Brianna K. Almeida, Manish Garg, Miroslav Kubat, Michelle E. Afkhami
Format: Article
Language:English
Published: Wiley 2020-07-01
Series:Applications in Plant Sciences
Subjects:
Online Access:https://doi.org/10.1002/aps3.11379
id doaj-aeaa7c0d982b441990bb7b2eec402dbb
record_format Article
spelling doaj-aeaa7c0d982b441990bb7b2eec402dbb2020-11-25T03:12:08ZengWileyApplications in Plant Sciences2168-04502020-07-0187n/an/a10.1002/aps3.11379Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databasesBrianna K. Almeida0Manish Garg1Miroslav Kubat2Michelle E. Afkhami3Department of Biology University of Miami 1301 Memorial Drive Coral Gables Florida33143USADepartment of Electrical and Computer Engineering University of Miami 1251 Memorial Drive Coral Gables Florida33143USADepartment of Electrical and Computer Engineering University of Miami 1251 Memorial Drive Coral Gables Florida33143USADepartment of Biology University of Miami 1301 Memorial Drive Coral Gables Florida33143USAPremise Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. Methods Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. Results The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. Discussion Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification.https://doi.org/10.1002/aps3.11379decision treeinformation gainmachine learningplant identificationTRY plant trait database
collection DOAJ
language English
format Article
sources DOAJ
author Brianna K. Almeida
Manish Garg
Miroslav Kubat
Michelle E. Afkhami
spellingShingle Brianna K. Almeida
Manish Garg
Miroslav Kubat
Michelle E. Afkhami
Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
Applications in Plant Sciences
decision tree
information gain
machine learning
plant identification
TRY plant trait database
author_facet Brianna K. Almeida
Manish Garg
Miroslav Kubat
Michelle E. Afkhami
author_sort Brianna K. Almeida
title Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_short Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_full Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_fullStr Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_full_unstemmed Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_sort not that kind of tree: assessing the potential for decision tree–based plant identification using trait databases
publisher Wiley
series Applications in Plant Sciences
issn 2168-0450
publishDate 2020-07-01
description Premise Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. Methods Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. Results The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. Discussion Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification.
topic decision tree
information gain
machine learning
plant identification
TRY plant trait database
url https://doi.org/10.1002/aps3.11379
work_keys_str_mv AT briannakalmeida notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT manishgarg notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT miroslavkubat notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT michelleeafkhami notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
_version_ 1724651381387689984