Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models

With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models t...

Full description

Bibliographic Details
Published in:	Applied Sciences
Main Authors:	Xintao Xing, Peng Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-09-01
Subjects:	large language models police reports information extraction data enhancement fine-tuning
Online Access:	https://www.mdpi.com/2076-3417/14/17/7819

_version_	1850042820249780224
author	Xintao Xing Peng Chen
author_facet	Xintao Xing Peng Chen
author_sort	Xintao Xing
collection	DOAJ
container_title	Applied Sciences
description	With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve the effectiveness of police report extraction has become an inevitable trend in the field of police data analysis. This study addresses the characteristics of Chinese police reports and the need to extract key elements by employing large language models specific to the public security domain for entity extraction. Several lightweight (6/7b) open-source large language models were tested as base models. To enhance model performance, LoRA fine-tuning was employed, combined with data engineering approaches. A zero-shot data augmentation method based on ChatGPT and prompt engineering techniques tailored for police reports were proposed to further improve model performance. The key police report data from a certain city in 2019 were used as a sample for testing. Compared to the base models, prompt engineering improved the F1 score by approximately 3%, while fine-tuning led to an increase of 10–50% in the F1 score. After fine-tuning and comparing different base models, the Baichuan model demonstrated the best overall performance in extracting key elements from police reports. Using the data augmentation method to double the data size resulted in an additional 4% increase in the F1 score, achieving optimal model performance. Compared to the fine-tuned universal information extraction (UIE) large language model, the police report entity extraction model constructed in this study improved the F1 score for each element by approximately 5%, with a 42% improvement in the F1 score for the “organization” element. Finally, ChatGPT was employed to align the extracted entities, resulting in a high-quality entity extraction outcome.
format	Article
id	doaj-art-65b079facbf64fca9df8f2103892761a
institution	Directory of Open Access Journals
issn	2076-3417
language	English
publishDate	2024-09-01
publisher	MDPI AG
record_format	Article
spelling	doaj-art-65b079facbf64fca9df8f2103892761a2025-08-20T00:30:56ZengMDPI AGApplied Sciences2076-34172024-09-011417781910.3390/app14177819Entity Extraction of Key Elements in 110 Police Reports Based on Large Language ModelsXintao Xing0Peng Chen1School for Information and Cyber Security, People’s Public Security University of China, Beijing 100038, ChinaSchool for Information and Cyber Security, People’s Public Security University of China, Beijing 100038, ChinaWith the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve the effectiveness of police report extraction has become an inevitable trend in the field of police data analysis. This study addresses the characteristics of Chinese police reports and the need to extract key elements by employing large language models specific to the public security domain for entity extraction. Several lightweight (6/7b) open-source large language models were tested as base models. To enhance model performance, LoRA fine-tuning was employed, combined with data engineering approaches. A zero-shot data augmentation method based on ChatGPT and prompt engineering techniques tailored for police reports were proposed to further improve model performance. The key police report data from a certain city in 2019 were used as a sample for testing. Compared to the base models, prompt engineering improved the F1 score by approximately 3%, while fine-tuning led to an increase of 10–50% in the F1 score. After fine-tuning and comparing different base models, the Baichuan model demonstrated the best overall performance in extracting key elements from police reports. Using the data augmentation method to double the data size resulted in an additional 4% increase in the F1 score, achieving optimal model performance. Compared to the fine-tuned universal information extraction (UIE) large language model, the police report entity extraction model constructed in this study improved the F1 score for each element by approximately 5%, with a 42% improvement in the F1 score for the “organization” element. Finally, ChatGPT was employed to align the extracted entities, resulting in a high-quality entity extraction outcome.https://www.mdpi.com/2076-3417/14/17/7819large language modelspolice reportsinformation extractiondata enhancementfine-tuning
spellingShingle	Xintao Xing Peng Chen Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models large language models police reports information extraction data enhancement fine-tuning
title	Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models
title_full	Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models
title_fullStr	Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models
title_full_unstemmed	Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models
title_short	Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models
title_sort	entity extraction of key elements in 110 police reports based on large language models
topic	large language models police reports information extraction data enhancement fine-tuning
url	https://www.mdpi.com/2076-3417/14/17/7819
work_keys_str_mv	AT xintaoxing entityextractionofkeyelementsin110policereportsbasedonlargelanguagemodels AT pengchen entityextractionofkeyelementsin110policereportsbasedonlargelanguagemodels

Entity Extraction of Key Elements in 110 Police Reports Based on Large Language Models

Similar Items