DEAP Institute in Research and Education for Science Translation via Low-Resource Neural Machine Translation

(Supported by NASA)

Project Number: 80NSSC22KM0052
Duration: March 2023 – March 2026

This is a collaborative project of teams from Prairie View A&M University (PVAMU), Texas Southern University (TSU), and Texas A&M University (TAMU)

Team of Prairie View A&M University: ECE: L. Qian (lead), X. Dong, X. Li, P. Obiomon, R. Wilkins; CS: L. Li; JJ: L. Wu
Team of Texas Southern University: W. Li (lead), R. Holmes, Penn-Marshall
Team of Texas A&M University: N. Duffield (lead), X. Ye, D. Rodriguez

Goal:

The goal of this NASA-DEAP project is to build an AI-based system that is able to share interactive, instantaneous, and user-relevant Earth science information to allow NASA science to be more discoverable and more accessible to a broad audience.

Question answering has increasingly become a crucial application problem in AI, especially in the realm of natural language processing and machine intelligence. However, most existing solutions of question answering focus on “shallow” tasks that test merely the capability of a question answering system to pay attention to specific words and pieces of text. On the contrary, the knowledge and reasoning required to answer complex questions, become much more sophisticated and specialized that it would require the AI engine to understand and make inferences that resemble human expertise in specialized domains of NASA’s interests. Furthermore, the proposed system must be able to answer questions from non-scientists that are ambiguous and dependent upon implicit knowledge that isn’t explicitly stated in the question, not to mention that real-time questions are often hurried and rife with malformed syntax and spelling errors.

In this project, we plan to complete this task by developing a novel mapping method based on low-resource neural machine translation (NMT). NMT has emerged to significantly promote machine translation with end-to-end models that automatically translate a source language to a target language. Because large amounts of data like pairs of annotated questions and queries may not be available, low-resource NMT techniques will be developed for this project. Specifically, we plan to build a semi-supervised NMT to enhance the mapping performance by leveraging large amounts of unlabeled data. In addition, we will also explore the mapping from questions (text) to NASA database queries by leveraging large language models (LLMs) and prompt engineering.

Publications:

U. Okechukwu, L. Li, L. Qian, X. Dong (2026). “Exploring Interpretable Deep Knowledge Tracing Through Integrating SHAP with LLMs.” Submitted to The 2026 7th International Conference on Artificial Intelligence, Robotics, and Control (AIRC).
S. Sarker, X. Dong, L. Qian (2026). “From Tokens to Transitions: A Structured Jensen-Shannon Knowledge Distillation Method for Medication Identification and Medical Event Classification.” Submitted to the IEEE Transactions on Knowledge and Data Engineering.
S. Sarker, X. Dong, X. Li, L. Qian (2025). “Enhancing LLM Fine-tuning for Text-to-SQL by SQL Quality Measurement.” 2025 International Joint Conference on Neural Networks (IJCNN 2025).
T. Ogunsusi, J. Mangue, W. Bai, X. Ye, L. Qian, X.Dong (2025). “LLM-based Text-to-SQL: A Case Study of Geospatial Information Retrieval.” 20th International Conference on SEMANTIC COMPUTING.
S. Sarker, X. Dong, L. Qian (2025). “Integrating Non-Parametric Attention and Prompt Refinement to Enhance LLM-Based Text-to-SQL Without External Knowledge.” The IEEE International Conference on Data Mining (ICDM).
S. Sarker, X. Dong, X. Li, L. Qian (2025). “Text Generator and Text Discriminator for NIST GenAI T2T Challenge.” NIST GenAI Workshop.
Ye, X., Du, J., Li, X. et al. (2025). “Human-centered GeoAI foundation models: where GeoAI meets human dynamics.” Urban Informatics, 4(1), 2 (2025).
Ye, X., Yigitcanlar, T., Goodchild, & et al. (2025). “Artificial intelligence in urban science: why does it matter?” Annals of GIS, 31(2), 181-189.
Ye, X., Huang, T., Song, Y., Li, X., Newman, G., Lin, Z., & Wu, D. J. (2025). “Geodesign in the era of artificial intelligence.” Frontiers of Urban and Rural Planning, 3(1), 1-12.
Liu, C., Ye, X., Huang, X., & Xu, Y. (2025). Vertical Dimension of Urban Thermal Environments: A Literature Survey. Cities, 158, 105629.
Han, Y., Ye, X., & Zhu, C. (2024). The Unequal Impact of Disasters: Assessing the Interplay Between Social Vulnerability, Public Assistance, Flood Insurance, and Migration in the US. Urban Informatics, 3(1), 1-12.
Shaw, S. L., Ye, X., Goodchild, M., & Sui, D. (2024). Human Dynamics Research in GIScience: Challenges and Opportunities. Computational Urban Science, 4(1), 31.
Du, J., Ye, X., Huang, X., Qiang, Y., & Zhu, C. (2024). Unveiling Multifaceted Resilience: A Heterogeneous Graph Neural Network Approach for Analyzing Locale Recovery Patterns. Environment and Planning B: Urban Analytics and City Science.
Dvir, R., Vedlitz, A., & Ye, X. (2024). Worried (and) Sick: How Environmental Hazards Affect Americans' Health-Related Risk Attitudes. Urban Informatics, 3(1), 26.
Sarker S, Qian L, Dong X. (2023). “Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
Abdul-Quddoos T, Dong X, Li X. (2023). “Systematic Comparative Analysis of Pre-trained Large Language Models on Contextualized Medication Event Extraction.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
Bello, J., Dong, X., & Li, X. (2023). “Semi-supervised Single-Shot Object Detection for Table Detection in Scanned Documents.” In Proceedings of the Future Technologies Conference (pp. 430-441). Cham: Springer Nature Switzerland.

Educational and Training Activities:

A week-long summer school designed to introduce participants to the fundamentals of Scientific Machine Learning (SciML), and in particular Physics-Informed Neural Networks (PINNs) and physics-informed gaussian processes (PIGP) was organized and delivered on May 12-16, 2025. A cohort of 40 students from PVAMU, TSU, and TAMU participated the summer school and very positive feedback was obtained.
The PI (Qian) delivered a talk titled “Artificial Intelligence for Mission-Critical Applications: State-of-the-Art and Future Trends” during the Dean’s Lecture Series in 2025.
Python tutorial training was offered to students in March 2024.
Offered a 5-day PhD Workshop for the NASA-DEAP program on June 10-14, 2024, where 15 PhD students participated and very

positive feedback was received.

5-day PhD Workshop for the NASA-DEAP program

This project was highlighted at the HBCU Week in Philadelphia on September 15-19, 2024 by Roderick Chappell from NASA and

PVAMU President Dr. LeGrande.
On September 16, 2024, Lynn Vernon, Chief Engineer for IT at NASA Johnson Space Center, spoke at the Texas A&M Institute of

Data Science research seminar. He also met with individual faculty of the NASA DEAP project to explore research collaborations.

TAMIDS hosted a student lunch where Lynn discussed student internship programs in science and engineering at NASA.
A one-day Workshop on Generative AI Mastery: From Theory to Practice was attended by over 80 staff researchers and students

from the NASA-DEAP partner schools. The free workshop covered generative AI, with a special focus on large language models.

This comprehensive event started with an exploration of the history, theory, and evolution of generative AI, providing

participants with a solid foundation in machine learning, neural networks, and natural language processing. Attendees then

participated in hands-on experience with key technologies, including prompt engineering, retrieval-augmented generation (RAG), and

using cutting-edge models.
Taught a new course in Data Ethics in the Fall of 2023 (by Dr. X. Li) at PVAMU.
Proposed a new course in Generative AI (LLMs and prompt engineering).
The team and NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence

on March 30-31, 2023 at PVAMU. About 65 students completed the training and received the certificate from NVIDIA.

NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence

The team organized TAMIDS Data Science Bootcamp on July 6-12, 2023 at TAMU

https://tamids.tamu.edu/event/tamids-data-science-bootcamp/

DEAP Institute in Research and Education for Science Translation via Low-Resource Neural Machine Translation(Supported by NASA)

Project Number: 80NSSC22KM0052Duration: March 2023 – March 2026

This is a collaborative project of teams from Prairie View A&M University (PVAMU), Texas Southern University (TSU), and Texas A&M University (TAMU)

Team of Prairie View A&M University: ECE: L. Qian (lead), X. Dong, X. Li, P. Obiomon, R. Wilkins; CS: L. Li; JJ: L. Wu

Team of Texas Southern University: W. Li (lead), R. Holmes, Penn-Marshall

Team of Texas A&M University: N. Duffield (lead), X. Ye, D. Rodriguez

Goal:

The goal of this NASA-DEAP project is to build an AI-based system that is able to share interactive, instantaneous, and user-relevant Earth science information to allow NASA science to be more discoverable and more accessible to a broad audience.

Publications:

U. Okechukwu, L. Li, L. Qian, X. Dong (2026). “Exploring Interpretable Deep Knowledge Tracing Through Integrating SHAP with LLMs.” Submitted to The 2026 7th International Conference on Artificial Intelligence, Robotics, and Control (AIRC).

S. Sarker, X. Dong, L. Qian (2026). “From Tokens to Transitions: A Structured Jensen-Shannon Knowledge Distillation Method for Medication Identification and Medical Event Classification.” Submitted to the IEEE Transactions on Knowledge and Data Engineering.

S. Sarker, X. Dong, X. Li, L. Qian (2025). “Enhancing LLM Fine-tuning for Text-to-SQL by SQL Quality Measurement.” 2025 International Joint Conference on Neural Networks (IJCNN 2025).

T. Ogunsusi, J. Mangue, W. Bai, X. Ye, L. Qian, X.Dong (2025). “LLM-based Text-to-SQL: A Case Study of Geospatial Information Retrieval.” 20th International Conference on SEMANTIC COMPUTING.

S. Sarker, X. Dong, L. Qian (2025). “Integrating Non-Parametric Attention and Prompt Refinement to Enhance LLM-Based Text-to-SQL Without External Knowledge.” The IEEE International Conference on Data Mining (ICDM).

S. Sarker, X. Dong, X. Li, L. Qian (2025). “Text Generator and Text Discriminator for NIST GenAI T2T Challenge.” NIST GenAI Workshop.

Ye, X., Du, J., Li, X. et al. (2025). “Human-centered GeoAI foundation models: where GeoAI meets human dynamics.” Urban Informatics, 4(1), 2 (2025).

Ye, X., Yigitcanlar, T., Goodchild, & et al. (2025). “Artificial intelligence in urban science: why does it matter?” Annals of GIS, 31(2), 181-189.

Ye, X., Huang, T., Song, Y., Li, X., Newman, G., Lin, Z., & Wu, D. J. (2025). “Geodesign in the era of artificial intelligence.” Frontiers of Urban and Rural Planning, 3(1), 1-12.

Liu, C., Ye, X., Huang, X., & Xu, Y. (2025). Vertical Dimension of Urban Thermal Environments: A Literature Survey. Cities, 158, 105629.

Han, Y., Ye, X., & Zhu, C. (2024). The Unequal Impact of Disasters: Assessing the Interplay Between Social Vulnerability, Public Assistance, Flood Insurance, and Migration in the US. Urban Informatics, 3(1), 1-12.

Shaw, S. L., Ye, X., Goodchild, M., & Sui, D. (2024). Human Dynamics Research in GIScience: Challenges and Opportunities. Computational Urban Science, 4(1), 31.

Du, J., Ye, X., Huang, X., Qiang, Y., & Zhu, C. (2024). Unveiling Multifaceted Resilience: A Heterogeneous Graph Neural Network Approach for Analyzing Locale Recovery Patterns. Environment and Planning B: Urban Analytics and City Science.

Dvir, R., Vedlitz, A., & Ye, X. (2024). Worried (and) Sick: How Environmental Hazards Affect Americans' Health-Related Risk Attitudes. Urban Informatics, 3(1), 26.

Sarker S, Qian L, Dong X. (2023). “Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.

Abdul-Quddoos T, Dong X, Li X. (2023). “Systematic Comparative Analysis of Pre-trained Large Language Models on Contextualized Medication Event Extraction.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.

Bello, J., Dong, X., & Li, X. (2023). “Semi-supervised Single-Shot Object Detection for Table Detection in Scanned Documents.” In Proceedings of the Future Technologies Conference (pp. 430-441). Cham: Springer Nature Switzerland.

Educational and Training Activities:

The PI (Qian) delivered a talk titled “Artificial Intelligence for Mission-Critical Applications: State-of-the-Art and Future Trends” during the Dean’s Lecture Series in 2025.

Python tutorial training was offered to students in March 2024.

Offered a 5-day PhD Workshop for the NASA-DEAP program on June 10-14, 2024, where 15 PhD students participated and very

positive feedback was received.

This project was highlighted at the HBCU Week in Philadelphia on September 15-19, 2024 by Roderick Chappell from NASA and

PVAMU President Dr. LeGrande.

On September 16, 2024, Lynn Vernon, Chief Engineer for IT at NASA Johnson Space Center, spoke at the Texas A&M Institute of

Data Science research seminar. He also met with individual faculty of the NASA DEAP project to explore research collaborations.

TAMIDS hosted a student lunch where Lynn discussed student internship programs in science and engineering at NASA.

A one-day Workshop on Generative AI Mastery: From Theory to Practice was attended by over 80 staff researchers and students

from the NASA-DEAP partner schools. The free workshop covered generative AI, with a special focus on large language models.

This comprehensive event started with an exploration of the history, theory, and evolution of generative AI, providing

participants with a solid foundation in machine learning, neural networks, and natural language processing. Attendees then

participated in hands-on experience with key technologies, including prompt engineering, retrieval-augmented generation (RAG), and

using cutting-edge models.

Taught a new course in Data Ethics in the Fall of 2023 (by Dr. X. Li) at PVAMU.

Proposed a new course in Generative AI (LLMs and prompt engineering).

The team and NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence

on March 30-31, 2023 at PVAMU. About 65 students completed the training and received the certificate from NVIDIA.

The team organized TAMIDS Data Science Bootcamp on July 6-12, 2023 at TAMU

https://tamids.tamu.edu/event/tamids-data-science-bootcamp/

Last Modified: January 2026

DEAP Institute in Research and Education for Science Translation via Low-Resource Neural Machine Translation

(Supported by NASA)

Project Number: 80NSSC22KM0052
Duration: March 2023 – March 2026