An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity
Traffic flow is one of the fundamental parameters for traffic analysis and planning. With the rapid development of intelligent transportation systems, a large number of various detectors have been deployed in urban roads and, consequently, huge amount of data relating to the traffic flow are accumul...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Faculty of Transport, Warsaw University of Technology
2020-06-01
|
Series: | Archives of Transport |
Subjects: | |
Online Access: | http://aot.publisherspanel.com/gicid/01.3001.0014.2968 |
id |
doaj-b1867ff55ff2479d8df01036f80d5bcd |
---|---|
record_format |
Article |
spelling |
doaj-b1867ff55ff2479d8df01036f80d5bcd2020-12-29T13:09:32ZengFaculty of Transport, Warsaw University of TechnologyArchives of Transport0866-95462300-88302020-06-01542597310.5604/01.3001.0014.296801.3001.0014.2968An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarityYang Wang0Yu Xiao1Jianhui Lai2Yanyan Chen3Beijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing, ChinaBeijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing, ChinaBeijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing , ChinaTraffic flow is one of the fundamental parameters for traffic analysis and planning. With the rapid development of intelligent transportation systems, a large number of various detectors have been deployed in urban roads and, consequently, huge amount of data relating to the traffic flow are accumulatively available now. However, the traffic flow data detected through various detectors are often degraded due to the presence of a number of missing data, which can even lead to erroneous analysis and decision if no appropriate process is carried out. To remedy this issue, great research efforts have been made and subsequently various imputation techniques have been successively proposed in recent years, among which the k nearest neighbour algorithm (kNN) has received a great popularity as it is easy to implement and impute the missing data effectively. In the work presented in this paper, we firstly analyse the stochastic effect of traffic flow, to which the suffering of the kNN algorithm can be attributed. This motivates us to make an improvement, while eliminating the requirement to predefine parameters. Such a parameter-free algorithm has been realized by introducing a new similarity metric which is combined with the conventional metric so as to avoid the parameter setting, which is often determined with the requirement of adequate domain knowledge. Unlike the conventional version of the kNN algorithm, the proposed algorithm employs the multivariate linear regression model to estimate the weights for the final output, based on a set of data, which is smoothed by a Wavelet technique. A series of experiments have been performed, based on a set of traffic flow data reported from serval different countries, to examine the adaptive determination of parameters and the smoothing effect. Additional experiments have been conducted to evaluate the competent performance for the proposed algorithm by comparing to a number of widely-used imputing algorithms. http://aot.publisherspanel.com/gicid/01.3001.0014.2968missing traffic datasimilarity metricsk nearest neighbour methodstochastic characteristics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yang Wang Yu Xiao Jianhui Lai Yanyan Chen |
spellingShingle |
Yang Wang Yu Xiao Jianhui Lai Yanyan Chen An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity Archives of Transport missing traffic data similarity metrics k nearest neighbour method stochastic characteristics |
author_facet |
Yang Wang Yu Xiao Jianhui Lai Yanyan Chen |
author_sort |
Yang Wang |
title |
An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
title_short |
An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
title_full |
An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
title_fullStr |
An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
title_full_unstemmed |
An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
title_sort |
adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity |
publisher |
Faculty of Transport, Warsaw University of Technology |
series |
Archives of Transport |
issn |
0866-9546 2300-8830 |
publishDate |
2020-06-01 |
description |
Traffic flow is one of the fundamental parameters for traffic analysis and planning. With the rapid development of intelligent transportation systems, a large number of various detectors have been deployed in urban roads and, consequently, huge amount of data relating to the traffic flow are accumulatively available now. However, the traffic flow data detected through various detectors are often degraded due to the presence of a number of missing data, which can even lead to erroneous analysis and decision if no appropriate process is carried out. To remedy this issue, great research efforts have been made and subsequently various imputation techniques have been successively proposed in recent years, among which the k nearest neighbour algorithm (kNN) has received a great popularity as it is easy to implement and impute the missing data effectively. In the work presented in this paper, we firstly analyse the stochastic effect of traffic flow, to which the suffering of the kNN algorithm can be attributed. This motivates us to make an improvement, while eliminating the requirement to predefine parameters. Such a parameter-free algorithm has been realized by introducing a new similarity metric which is combined with the conventional metric so as to avoid the parameter setting, which is often determined with the requirement of adequate domain knowledge. Unlike the conventional version of the kNN algorithm, the proposed algorithm employs the multivariate linear regression model to estimate the weights for the final output, based on a set of data, which is smoothed by a Wavelet technique. A series of experiments have been performed, based on a set of traffic flow data reported from serval different countries, to examine the adaptive determination of parameters and the smoothing effect. Additional experiments have been conducted to evaluate the competent performance for the proposed algorithm by comparing to a number of widely-used imputing algorithms.
|
topic |
missing traffic data similarity metrics k nearest neighbour method stochastic characteristics |
url |
http://aot.publisherspanel.com/gicid/01.3001.0014.2968 |
work_keys_str_mv |
AT yangwang anadaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT yuxiao anadaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT jianhuilai anadaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT yanyanchen anadaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT yangwang adaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT yuxiao adaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT jianhuilai adaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity AT yanyanchen adaptiveknearestneighbourmethodforimputationofmissingtrafficdatabasedontwosimilarity |
_version_ |
1724367628493914112 |