TY - GEN
T1 - ImVerde
T2 - 2018 IEEE International Conference on Big Data, Big Data 2018
AU - Wu, Jun
AU - He, Jingrui
AU - Liu, Yongming
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/22
Y1 - 2019/1/22
N2 - Imbalanced data widely exist in many high-impact applications. An example is in air traffic control, where among all three types of accident causes, historical accident reports with 'personnel issues' are much more than the other two types ('aircraft issues' and 'environmental issues') combined. Thus, the resulting data set of accident reports is highly imbalanced. On the other hand, this data set can be naturally modeled as a network, with each node representing an accident report, and each edge indicating the similarity of a pair of accident reports. Up until now, most existing work on imbalanced data analysis focused on the classification setting, and very little is devoted to learning the node representations for imbalanced networks. To bridge this gap, in this paper, we first propose Vertex-Diminished Random Walk (VDRW) for imbalanced network analysis. It is significantly different from the existing Vertex Reinforced Random Walk by discouraging the random particle to return to the nodes that have already been visited. This design is particularly suitable for imbalanced networks as the random particle is more likely to visit the nodes from the same class, which is a desired property for learning node representations. Furthermore, based on VDRW, we propose a semi-supervised network representation learning framework named ImVerde for imbalanced networks, where context sampling uses VDRW and the limited label information to create node-context pairs, and balanced-batch sampling adopts a simple under-sampling method to balance these pairs from different classes. Experimental results demonstrate that ImVerde based on VDRW outperforms state-of-the-art algorithms for learning network representations from imbalanced data.
AB - Imbalanced data widely exist in many high-impact applications. An example is in air traffic control, where among all three types of accident causes, historical accident reports with 'personnel issues' are much more than the other two types ('aircraft issues' and 'environmental issues') combined. Thus, the resulting data set of accident reports is highly imbalanced. On the other hand, this data set can be naturally modeled as a network, with each node representing an accident report, and each edge indicating the similarity of a pair of accident reports. Up until now, most existing work on imbalanced data analysis focused on the classification setting, and very little is devoted to learning the node representations for imbalanced networks. To bridge this gap, in this paper, we first propose Vertex-Diminished Random Walk (VDRW) for imbalanced network analysis. It is significantly different from the existing Vertex Reinforced Random Walk by discouraging the random particle to return to the nodes that have already been visited. This design is particularly suitable for imbalanced networks as the random particle is more likely to visit the nodes from the same class, which is a desired property for learning node representations. Furthermore, based on VDRW, we propose a semi-supervised network representation learning framework named ImVerde for imbalanced networks, where context sampling uses VDRW and the limited label information to create node-context pairs, and balanced-batch sampling adopts a simple under-sampling method to balance these pairs from different classes. Experimental results demonstrate that ImVerde based on VDRW outperforms state-of-the-art algorithms for learning network representations from imbalanced data.
KW - Network representation
KW - imbalanced data
KW - random walk
UR - http://www.scopus.com/inward/record.url?scp=85062601565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062601565&partnerID=8YFLogxK
U2 - 10.1109/BigData.2018.8622603
DO - 10.1109/BigData.2018.8622603
M3 - Conference contribution
AN - SCOPUS:85062601565
T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
SP - 871
EP - 880
BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
A2 - Song, Yang
A2 - Liu, Bing
A2 - Lee, Kisung
A2 - Abe, Naoki
A2 - Pu, Calton
A2 - Qiao, Mu
A2 - Ahmed, Nesreen
A2 - Kossmann, Donald
A2 - Saltz, Jeffrey
A2 - Tang, Jiliang
A2 - He, Jingrui
A2 - Liu, Huan
A2 - Hu, Xiaohua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 December 2018 through 13 December 2018
ER -