TY - GEN
T1 - BYTEWEIGHT
T2 - 23rd USENIX Security Symposium
AU - Bao, Tiffany
AU - Burket, Jonathan
AU - Woo, Maverick
AU - Turner, Rafael
AU - Brumley, David
N1 - Publisher Copyright:
copyright © 2014 USENIX Security Symposium.All right reserved.
PY - 2014
Y1 - 2014
N2 - Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.
AB - Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2, 200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2, 200 binaries, we found that BYTE-WEIGHT missed 44, 621 functions in comparison with the 266, 672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459, 247 functions, BYTEWEIGHT misidentified only 43, 992 functions.
UR - http://www.scopus.com/inward/record.url?scp=85076265022&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076265022&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85076265022
T3 - Proceedings of the 23rd USENIX Security Symposium
SP - 845
EP - 860
BT - Proceedings of the 23rd USENIX Security Symposium
PB - USENIX Association
Y2 - 20 August 2014 through 22 August 2014
ER -