Malware Detection Inside App Stores Based on Lifespan Measurements

Potentially Harmful Apps (PHAs), like any other type of malware, are a problem. Even though Google tries to maintain a clean app ecosystem, Google Play Store is still one of the main vectors for spreading PHAs. In this paper, we propose a solution based on machine learning algorithms to detect PHAs...

Full description

Bibliographic Details
Main Authors: Carlos Cilleruelo, Enrique-Larriba, Luis De-Marcos, Jose-Javier Martinez-Herraiz
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9522135/
id doaj-f4babfbfe4d8419abde54b28f3982102
record_format Article
spelling doaj-f4babfbfe4d8419abde54b28f39821022021-09-03T23:00:42ZengIEEEIEEE Access2169-35362021-01-01911996711997610.1109/ACCESS.2021.31079039522135Malware Detection Inside App Stores Based on Lifespan MeasurementsCarlos Cilleruelo0https://orcid.org/0000-0001-7107-8655 Enrique-Larriba1Luis De-Marcos2https://orcid.org/0000-0003-0718-8774Jose-Javier Martinez-Herraiz3Computer Science Department, University of Alcalá, Alcalá de Henares, SpainComputer Science Department, University of Alcalá, Alcalá de Henares, SpainComputer Science Department, University of Alcalá, Alcalá de Henares, SpainComputer Science Department, University of Alcalá, Alcalá de Henares, SpainPotentially Harmful Apps (PHAs), like any other type of malware, are a problem. Even though Google tries to maintain a clean app ecosystem, Google Play Store is still one of the main vectors for spreading PHAs. In this paper, we propose a solution based on machine learning algorithms to detect PHAs inside application markets. Being the application markets one of the main entry vectors, a solution capable of detecting PHAs submitted or in submission to those markets is needed. This solution is capable of detecting PHAs inside an application market and can be used as a filtering method, to automatically block the publishing of novel PHAs. The proposed solution is based on application static analysis, and even though several static analysis solutions have been developed, the innovation of this system is based on its training and the creation of its dataset. We have created a new dataset that uses as criteria the lifespan of an application inside Google Play, the shorter time an application is active inside an application market the higher the probability that this is a PHA. This criterion was added in order to avoid the usage and bias of antivirus engines for detecting malware. Involving the lifespan as criteria we created a new method of detection that does not replicate any existing antivirus engines. Experimental results have proved that this solution obtains a 90% accuracy score, using a dataset of 91,203 applications published on the Google Play Store. Despite showing a decrease in accuracy, compared with other machine learning models focused on detecting PHAs; it is necessary to take into account that this is a complementary and different method. The presented work can be combined with other static and dynamic machine learning models, since its training is drastically different, as it was based on lifespan measurements.https://ieeexplore.ieee.org/document/9522135/Machine learningapp~storesgoogle play malwareandroid malwaremalware detectionpotentially harmful apps
collection DOAJ
language English
format Article
sources DOAJ
author Carlos Cilleruelo
Enrique-Larriba
Luis De-Marcos
Jose-Javier Martinez-Herraiz
spellingShingle Carlos Cilleruelo
Enrique-Larriba
Luis De-Marcos
Jose-Javier Martinez-Herraiz
Malware Detection Inside App Stores Based on Lifespan Measurements
IEEE Access
Machine learning
app~stores
google play malware
android malware
malware detection
potentially harmful apps
author_facet Carlos Cilleruelo
Enrique-Larriba
Luis De-Marcos
Jose-Javier Martinez-Herraiz
author_sort Carlos Cilleruelo
title Malware Detection Inside App Stores Based on Lifespan Measurements
title_short Malware Detection Inside App Stores Based on Lifespan Measurements
title_full Malware Detection Inside App Stores Based on Lifespan Measurements
title_fullStr Malware Detection Inside App Stores Based on Lifespan Measurements
title_full_unstemmed Malware Detection Inside App Stores Based on Lifespan Measurements
title_sort malware detection inside app stores based on lifespan measurements
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Potentially Harmful Apps (PHAs), like any other type of malware, are a problem. Even though Google tries to maintain a clean app ecosystem, Google Play Store is still one of the main vectors for spreading PHAs. In this paper, we propose a solution based on machine learning algorithms to detect PHAs inside application markets. Being the application markets one of the main entry vectors, a solution capable of detecting PHAs submitted or in submission to those markets is needed. This solution is capable of detecting PHAs inside an application market and can be used as a filtering method, to automatically block the publishing of novel PHAs. The proposed solution is based on application static analysis, and even though several static analysis solutions have been developed, the innovation of this system is based on its training and the creation of its dataset. We have created a new dataset that uses as criteria the lifespan of an application inside Google Play, the shorter time an application is active inside an application market the higher the probability that this is a PHA. This criterion was added in order to avoid the usage and bias of antivirus engines for detecting malware. Involving the lifespan as criteria we created a new method of detection that does not replicate any existing antivirus engines. Experimental results have proved that this solution obtains a 90% accuracy score, using a dataset of 91,203 applications published on the Google Play Store. Despite showing a decrease in accuracy, compared with other machine learning models focused on detecting PHAs; it is necessary to take into account that this is a complementary and different method. The presented work can be combined with other static and dynamic machine learning models, since its training is drastically different, as it was based on lifespan measurements.
topic Machine learning
app~stores
google play malware
android malware
malware detection
potentially harmful apps
url https://ieeexplore.ieee.org/document/9522135/
work_keys_str_mv AT carloscilleruelo malwaredetectioninsideappstoresbasedonlifespanmeasurements
AT enriquelarriba malwaredetectioninsideappstoresbasedonlifespanmeasurements
AT luisdemarcos malwaredetectioninsideappstoresbasedonlifespanmeasurements
AT josejaviermartinezherraiz malwaredetectioninsideappstoresbasedonlifespanmeasurements
_version_ 1717815751346225152