
Precious Kamsiyo Ejikeme Master’s Thesis Defense, Tuesday, December 2, 2025 @ 8:30 am Central Time
December 2 @ 8:30 am - 10:00 am
COMMITTEE CHAIR: Dr. Noushin Ghaffari
COMMITTEE CO-CHAIR: Dr. Judy Perkins
TITLE: INTEGRATING GIS AND MACHINE LEARNING TO UNCOVER SPATIAL, TEMPORAL, AND CONTRIBUTING FACTORS TO BICYCLE CRASHES
ABSTRACT: The recent drive for a cleaner environment has led to an increase in greener modes of transportation. Cycling is becoming an increasingly popular choice for environmentally conscious individuals seeking to reduce their carbon footprint. In 2024, the Bureau of Transportation reported a 620% increase in household spending on bicycles and accessories. These rising trends in bicycle use have resulted in an increase in bicycle-related crashes. According to the CDC, an estimated 1,150 cyclists were killed in 2023, with 120,000 sustaining non-fatal injuries. Transportation planners and agencies are under pressure to make cycling safer by providing dedicated bike lanes and improving existing infrastructure. To accomplish this, it is vital to understand the factors that contribute to crash severity across different roadways. Despite recent advancements in modeling bicycle crash severity, most studies rely on classical methods such as ordered probit, ordered logit, and logistic models. Tree-based models such as Decision Trees, Random Forest, and Gradient Boosting handle nonlinearities and high-dimensional data better but, still face challenges with class imbalance. Previous studies have employed oversampling and undersampling techniques, such as SMOTE and recursive feature elimination, for feature selection. However, most research remains region-specific. This study integrates multi-state data (Texas, Colorado, and California) to enhance model transferability and explicitly includes lane-type information extracted from national road network data. It also embeds spatial clustering and predictive modeling into a single pipeline, addressing class imbalance through hybrid encoding (OHE + Frequency Encoding) in XGBoost and leveraging HPC (Bridges2) for large-scale training. The XGBoost model achieved the highest predictive accuracy across the 3 states (90 percent, precision = 0.85, recall = 0.66, F1 = 0.71, AUC-ROC = 0.84), outperforming all other tested algorithms. Feature-importance analysis revealed that lighting conditions, roadway type, traffic exposure, and time of day are dominant predictors of severity. Hotspot analysis identified clusters of severe crashes along multilane arterials and poorly lit intersections. This study presents a scalable, data-driven framework that integrates GIS visualization, machine learning, and high-performance computing to support data-informed bicycle safety planning and policy development. The multistate model has the potential to be hosted online, enabling researchers and policymakers to generate region-specific crash-severity predictions and advance safer cycling infrastructure. Index Terms — bicycle crashes; crash severity; GIS; machine learning; HPC; XGBoost; feature selection; class imbalance; spatial analysis; transportation safety.
Keywords: Bicycle crashes; crash severity; GIS; machine learning; HPC; XGBoost; feature selection; class imbalance; spatial analysis; transportation safety.
Room Location: SR Collins Meeting Room

