TY - JOUR
T1 - Role of Mixup in Topological Persistence-Based Knowledge Distillation for Wearable Sensor Data
AU - Jeon, Eun Som
AU - Choi, Hongjun
AU - Buman, Matthew P.
AU - Turaga, Pavan
N1 - Publisher Copyright:
© 2001-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in machine learning and various applications. In order to tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. Even though it has been widely studied to understand the interplay between mixup and KD, most of them are focused on image-based analysis only, and it still remains to be understood how mixup behaves in the context of KD for incorporating multimodal data, such as both time-series and topological knowledge using wearable sensor data. In this article, we analyze the role of mixup in KD with time series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup, supported by empirical results on wearable sensor data. We observe that applying a mixup to training a student in KD improves performance. We suggest a general set of recommendations to obtain an enhanced student.
AB - The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in machine learning and various applications. In order to tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. Even though it has been widely studied to understand the interplay between mixup and KD, most of them are focused on image-based analysis only, and it still remains to be understood how mixup behaves in the context of KD for incorporating multimodal data, such as both time-series and topological knowledge using wearable sensor data. In this article, we analyze the role of mixup in KD with time series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup, supported by empirical results on wearable sensor data. We observe that applying a mixup to training a student in KD improves performance. We suggest a general set of recommendations to obtain an enhanced student.
KW - Knowledge distillation (KD)
KW - time series
KW - topological persistence
KW - wearable sensor data
UR - http://www.scopus.com/inward/record.url?scp=85212854552&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212854552&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2024.3517653
DO - 10.1109/JSEN.2024.3517653
M3 - Article
AN - SCOPUS:85212854552
SN - 1530-437X
VL - 25
SP - 5853
EP - 5865
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 3
ER -