TY - JOUR
T1 - Generalized model for mapping bicycle ridership with crowdsourced data
AU - Nelson, Trisalyn
AU - Roy, Avipsa
AU - Ferster, Colin
AU - Fischer, Jaimy
AU - Brum-Bastos, Vanessa
AU - Laberee, Karen
AU - Yu, Hanchen
AU - Winters, Meghan
N1 - Funding Information:
This work was supported by a grant (#1516-HQ-000064) from the Public Health Agency of Canada. MW is supported by a Scholar Award from the Michael Smith Foundation for Health Research.
Funding Information:
This work was supported by a grant (#1516-HQ-000064) from the Public Health Agency of Canada. MW is supported by a Scholar Award from the Michael Smith Foundation for Health Research. The authors would like to thank Tetsuro Ide of the City of Ottawa for providing intersection count data; Joe Castiglione of the San Francisco County Transportation Authority for providing spatial data on the transportation infrastructure in San Francisco city; Tori Winters of the San Francisco Municipal Transportation Agency (SFMTA) for providing spatial data on the official counts for San Francisco; Jamie Parks (SFMTA) for providing official count data for San Francisco; Alex Phillips and Alex Hyde-Write from the City of Boulder and Boulder County for sharing official count data and supporting our work throughout; and to Jay Douillard and John Hicks from the Capital Regional District (Victoria, BC) for providing manual count data.
Publisher Copyright:
© 2021 The Author(s)
PY - 2021/4
Y1 - 2021/4
N2 - Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R2 for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.
AB - Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R2 for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.
KW - Bias-correction
KW - Bicycling ridership
KW - Big data
KW - Exposure
KW - LASSO
KW - Strava
UR - http://www.scopus.com/inward/record.url?scp=85101619467&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101619467&partnerID=8YFLogxK
U2 - 10.1016/j.trc.2021.102981
DO - 10.1016/j.trc.2021.102981
M3 - Article
AN - SCOPUS:85101619467
SN - 0968-090X
VL - 125
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 102981
ER -