TY - GEN
T1 - WebEvo
T2 - 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
AU - Shao, Fei
AU - Xu, Rui
AU - Haque, Wasif
AU - Xu, Jingwei
AU - Zhang, Ying
AU - Yang, Wei
AU - Ye, Yanfang
AU - Xiao, Xusheng
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/11
Y1 - 2021/7/11
N2 - The development of Web technology and the beginning of the Big Data era have led to the development of technologies for extracting data from websites, such as information retrieval (IR) and robotic process automation (RPA) tools. As websites are constantly evolving, to prevent these tools from functioning improperly due to website evolution, it is important to monitor the changes in websites and report them to the developers and testers. Existing monitoring tools mainly use DOM-tree based techniques to detect changes in the new web pages. However, these monitoring tools incorrectly report content-based changes (i.e., web content refreshed every time a web page is retrieved) as the changes that will adversely affect the performance of the IR and RPA tools. This results in false warnings since the IR and RPA tools typically consider these changes as expected and retrieve dynamic data from them. Moreover, these monitoring tools cannot identify GUI widget evolution (e.g., moving a button), and thus cannot help the IR and RPA tools adapt to the evolved widgets (e.g., automatic repair of locators for the evolved widgets). To address the limitations of the existing monitoring tools, we propose an approach, WebEvo, that leverages historic pages to identify the DOM elements whose changes are content-based changes, which can be safely ignored when reporting changes in the new web pages. Furthermore, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. Empirical evaluations on 13 real-world websites from 9 popular categories demonstrate the superiority of WebEvo over the existing DOM-tree based detection or whole-page visual comparison in terms of both effectiveness and efficiency.
AB - The development of Web technology and the beginning of the Big Data era have led to the development of technologies for extracting data from websites, such as information retrieval (IR) and robotic process automation (RPA) tools. As websites are constantly evolving, to prevent these tools from functioning improperly due to website evolution, it is important to monitor the changes in websites and report them to the developers and testers. Existing monitoring tools mainly use DOM-tree based techniques to detect changes in the new web pages. However, these monitoring tools incorrectly report content-based changes (i.e., web content refreshed every time a web page is retrieved) as the changes that will adversely affect the performance of the IR and RPA tools. This results in false warnings since the IR and RPA tools typically consider these changes as expected and retrieve dynamic data from them. Moreover, these monitoring tools cannot identify GUI widget evolution (e.g., moving a button), and thus cannot help the IR and RPA tools adapt to the evolved widgets (e.g., automatic repair of locators for the evolved widgets). To address the limitations of the existing monitoring tools, we propose an approach, WebEvo, that leverages historic pages to identify the DOM elements whose changes are content-based changes, which can be safely ignored when reporting changes in the new web pages. Furthermore, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. Empirical evaluations on 13 real-world websites from 9 popular categories demonstrate the superiority of WebEvo over the existing DOM-tree based detection or whole-page visual comparison in terms of both effectiveness and efficiency.
KW - Computer Vision
KW - DOM-Tree Analysis
KW - Web Application Evolution
UR - http://www.scopus.com/inward/record.url?scp=85111440516&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111440516&partnerID=8YFLogxK
U2 - 10.1145/3460319.3464800
DO - 10.1145/3460319.3464800
M3 - Conference contribution
AN - SCOPUS:85111440516
T3 - ISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
SP - 16
EP - 28
BT - ISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
A2 - Cadar, Cristian
A2 - Zhang, Xiangyu
PB - Association for Computing Machinery, Inc
Y2 - 11 July 2021 through 17 July 2021
ER -