Learning from local to global: An efficient distributed algorithm for modeling time-to-event data

Rui Duan; Chongliang Luo; Martijn J. Schuemie; Jiayi Tong; C. Jason Liang; Howard H. Chang; Mary Regina Boland; Jiang Bian; Hua Xu; John H. Holmes; Christopher B. Forrest; Sally C. Morton; Jesse A. Berlin; Jason H. Moore; Kevin B. Mahoney; Yong Chen

doi:10.1093/jamia/ocaa044

Learning from local to global: An efficient distributed algorithm for modeling time-to-event data

Rui Duan, Chongliang Luo, Martijn J. Schuemie, Jiayi Tong, C. Jason Liang, Howard H. Chang, Mary Regina Boland, Jiang Bian, Hua Xu, John H. Holmes, Christopher B. Forrest, Sally C. Morton, Jesse A. Berlin, Jason H. Moore, Kevin B. Mahoney, Yong Chen

Research output: Contribution to journal › Article › peer-review

31 Scopus citations

Abstract

Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.

Original language	English (US)
Pages (from-to)	1028-1036
Number of pages	9
Journal	Journal of the American Medical Informatics Association
Volume	27
Issue number	7
DOIs	https://doi.org/10.1093/jamia/ocaa044
State	Published - Jul 1 2020
Externally published	Yes

Keywords

Cox proportional hazards model
data integration
distributed algorithm
electronic health record
meta-analysis

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocaa044

Cite this

Duan, R., Luo, C., Schuemie, M. J., Tong, J., Liang, C. J., Chang, H. H., Boland, M. R., Bian, J., Xu, H., Holmes, J. H., Forrest, C. B., Morton, S. C., Berlin, J. A., Moore, J. H., Mahoney, K. B., & Chen, Y. (2020). Learning from local to global: An efficient distributed algorithm for modeling time-to-event data. Journal of the American Medical Informatics Association, 27(7), 1028-1036. https://doi.org/10.1093/jamia/ocaa044

Duan, R, Luo, C, Schuemie, MJ, Tong, J, Liang, CJ, Chang, HH, Boland, MR, Bian, J, Xu, H, Holmes, JH, Forrest, CB, Morton, SC, Berlin, JA, Moore, JH, Mahoney, KB & Chen, Y 2020, 'Learning from local to global: An efficient distributed algorithm for modeling time-to-event data', Journal of the American Medical Informatics Association, vol. 27, no. 7, pp. 1028-1036. https://doi.org/10.1093/jamia/ocaa044

@article{9ff1bd45252f4de091620804528fdabb,

title = "Learning from local to global: An efficient distributed algorithm for modeling time-to-event data",

abstract = "Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.",

keywords = "Cox proportional hazards model, data integration, distributed algorithm, electronic health record, meta-analysis",

author = "Rui Duan and Chongliang Luo and Schuemie, {Martijn J.} and Jiayi Tong and Liang, {C. Jason} and Chang, {Howard H.} and Boland, {Mary Regina} and Jiang Bian and Hua Xu and Holmes, {John H.} and Forrest, {Christopher B.} and Morton, {Sally C.} and Berlin, {Jesse A.} and Moore, {Jason H.} and Mahoney, {Kevin B.} and Yong Chen",

year = "2020",

month = jul,

day = "1",

doi = "10.1093/jamia/ocaa044",

language = "English (US)",

volume = "27",

pages = "1028--1036",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "7",

}

TY - JOUR

T1 - Learning from local to global

T2 - An efficient distributed algorithm for modeling time-to-event data

AU - Duan, Rui

AU - Luo, Chongliang

AU - Schuemie, Martijn J.

AU - Tong, Jiayi

AU - Liang, C. Jason

AU - Chang, Howard H.

AU - Boland, Mary Regina

AU - Bian, Jiang

AU - Xu, Hua

AU - Holmes, John H.

AU - Forrest, Christopher B.

AU - Morton, Sally C.

AU - Berlin, Jesse A.

AU - Moore, Jason H.

AU - Mahoney, Kevin B.

AU - Chen, Yong

PY - 2020/7/1

Y1 - 2020/7/1

N2 - Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.

AB - Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.

KW - Cox proportional hazards model

KW - data integration

KW - distributed algorithm

KW - electronic health record

KW - meta-analysis

UR - http://www.scopus.com/inward/record.url?scp=85088492383&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85088492383&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocaa044

DO - 10.1093/jamia/ocaa044

M3 - Article

C2 - 32626900

AN - SCOPUS:85088492383

SN - 1067-5027

VL - 27

SP - 1028

EP - 1036

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 7

ER -

Learning from local to global: An efficient distributed algorithm for modeling time-to-event data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this