COMMITTEE CHAIR: Dr. Lijun Qian

TITLE: UNIFIED DEEP LEARNING TECHNIQUES FOR SPATIAL DETECTION AND TEMPORAL FORECASTING ACROSS VISUAL DOMAINS

ABSTRACT: This dissertation proposes a unified deep learning framework for spatial detection and temporal forecasting across visual domains, designed to address limited supervision, class imbalance, and scale variability. The framework is structured around four complementary principles: Domain-Aware Input Rebalancing, Representation-Centric Learning, Diversity-Driven Robustness, and Transfer Across Scale and Modality, which together enable robust visual representation learning across heterogeneous data sources. Domain-Aware Input Rebalancing mitigates non-uniform and sparse data distributions by actively reshaping inputs prior to learning through class-aware augmentation, sampling, resolution manipulation, and super-resolution. This principle underpins object detection in overhead satellite imagery (xView), dense urban aerial scenes (CADOT), temporally sparse NDVI signals, and low-contrast microscopy images, where sensitivity to small or rare structures is critical. Representation-Centric Learning emphasizes shared feature encoders as the primary mechanism for generalization. YOLO-based spatial encoders serve as the backbone for object detection across satellite, aerial, and microscopy domains, while sequence encoders and pretrained transformers are employed for NDVI forecasting. By prioritizing transferable representations over task-specific heuristics, the framework supports both spatial localization and temporal sequence modeling within a unified learning paradigm. To enhance generalization without increasing annotation cost, Diversity-Driven Robustness introduces architectural and representational diversity. For NDVI forecasting, model-family comparisons under few-shot settings further demonstrate the stabilizing role of diversity. Finally, Transfer Across Scale and Modality enables the framework to generalize learned principles beyond individual tasks. Spatial learning strategies inform temporal NDVI forecasting, pretrained sequence models adapt to environmental time series, and super-resolution enhances cavity detection in microscopy, confirming scale-invariant behavior across modalities. Collectively, this dissertation demonstrates that spatial detection and temporal forecasting can be unified as data-efficient representation learning problems, providing a principled framework for robust visual intelligence in data-constrained environment.

Keywords: Deep learning, computer vision, object detection, time-series data, NDVI forecasting

Room Location: Electrical and Computer Engineering Department Conference, Room 315D