How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. Methods: We constructed...

Full description

Bibliographic Details
Main Author:	Sebo, P. (Author)
Format:	Article
Language:	English
Published:	NLM (Medline) 2022
Subjects:	accuracy China Chinese female Female gender detection human Humans male Male misclassification name Names name-to-gender nomenclature performance
Online Access:	View Fulltext in Publisher


LEADER	02327nam a2200325Ia 4500
001	10.5195-jmla.2022.1289
008	220510s2022 CNT 000 0 und d
020			\|a 15589439 (ISSN)
245	1	0	\|a How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
260		0	\|b NLM (Medline) \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.5195/jmla.2022.1289
520	3		\|a Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. Methods: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded). Results: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort. Conclusion: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population. Copyright © 2022 Paul Sebo.
650	0	4	\|a accuracy
650	0	4	\|a China
650	0	4	\|a Chinese
650	0	4	\|a female
650	0	4	\|a Female
650	0	4	\|a gender detection
650	0	4	\|a human
650	0	4	\|a Humans
650	0	4	\|a male
650	0	4	\|a Male
650	0	4	\|a misclassification
650	0	4	\|a name
650	0	4	\|a Names
650	0	4	\|a name-to-gender
650	0	4	\|a nomenclature
650	0	4	\|a performance
700	1		\|a Sebo, P. \|e author
773			\|t Journal of the Medical Library Association : JMLA

How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

Similar Items