TY - GEN
T1 - MultiC2
T2 - 17th SIAM International Conference on Data Mining, SDM 2017
AU - Zhou, Yao
AU - Ying, Lei
AU - He, Jingrui
N1 - Funding Information:
network analysis [10, 15] and recommendations [20]. Recently, there are also some works [24, 28, 27] that focus on designing and estimating the various abilities of the workers and learning to hire the workers that can learn over time. 7 Conclusion In this paper, we have developed a novel optimization framework (MultiC2), which bypasses the standard two-step supervised learning procedure and learns the ensemble classifier directly using noisy and missing labels from crowdsourcing. We also conduct several comparison experiments with respect to the effectiveness, robustness, and efficiency of our framework. The experimental results of semi-synthetic data set and real data set have shown that our model outperforms the state-of-the-art techniques. Acknowledgements This work is supported by the NSF research grant IIS-1552654, CNS-1618768, ECCS-1547294, ONR research grant N00014-15-1-2821, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.
Publisher Copyright:
Copyright © by SIAM.
PY - 2017
Y1 - 2017
N2 - Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.
AB - Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.
KW - Crowdsourcing
KW - Multi-task learning
KW - Tensor representation
UR - http://www.scopus.com/inward/record.url?scp=85027831142&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027831142&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974973.65
DO - 10.1137/1.9781611974973.65
M3 - Conference contribution
AN - SCOPUS:85027831142
T3 - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
SP - 579
EP - 587
BT - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
A2 - Chawla, Nitesh
A2 - Wang, Wei
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 27 April 2017 through 29 April 2017
ER -