TY - GEN
T1 - CHAMP
T2 - 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021
AU - Hu, Yangyu
AU - Wang, Haoyu
AU - Ji, Tiantong
AU - Xiao, Xusheng
AU - Luo, Xiapu
AU - Gao, Peng
AU - Guo, Yao
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies. In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall (>0.9) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a whistle blower that assigns policy-violation scores and identifies most informative comments for apps.
AB - Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies. In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall (>0.9) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a whistle blower that assigns policy-violation scores and identifies most informative comments for apps.
KW - App market
KW - Undesired behavior
KW - User comment
UR - http://www.scopus.com/inward/record.url?scp=85114803377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114803377&partnerID=8YFLogxK
U2 - 10.1109/ICSE43902.2021.00089
DO - 10.1109/ICSE43902.2021.00089
M3 - Conference contribution
AN - SCOPUS:85114803377
T3 - Proceedings - International Conference on Software Engineering
SP - 933
EP - 945
BT - Proceedings - 2021 IEEE/ACM 43rd International Conference on Software Engineering, ICSE 2021
PB - IEEE Computer Society
Y2 - 22 May 2021 through 30 May 2021
ER -