DEAP Institute in Research and Education for Science Translation via Low-Resource Neural Machine Translation

(Supported by NASA)

Project Number: 80NSSC22KM0052
Duration: March 2023 – March 2026

This is a collaborative project of teams from Prairie View A&M University (PVAMU), Texas Southern University (TSU), and Texas A&M University (TAMU)

  • Team of Prairie View A&M University: ECE: L. Qian (lead), X. Dong, X. Li, P. Obiomon, R. Wilkins;  CS: L. Li; JJ: L. Wu
  • Team of Texas Southern University: W. Li (lead), R. Holmes, Penn-Marshall
  • Team of Texas A&M University: N. Duffield (lead), X. Ye, D. Rodriguez

Goal: 

The goal of this NASA-DEAP project is to build an AI-based system that is able to share interactive, instantaneous, and user-relevant Earth science information to allow NASA science to be more discoverable and more accessible to a broad audience.

Question answering has increasingly become a crucial application problem in AI, especially in the realm of natural language processing and machine intelligence. However, most existing solutions of question answering focus on “shallow” tasks that test merely the capability of a question answering system to pay attention to specific words and pieces of text. On the contrary, the knowledge and reasoning required to answer complex questions, become much more sophisticated and specialized that it would require the AI engine to understand and make inferences that resemble human expertise in specialized domains of NASA’s interests. Furthermore, the proposed system must be able to answer questions from non-scientists that are ambiguous and dependent upon implicit knowledge that isn’t explicitly stated in the question, not to mention that real-time questions are often hurried and rife with malformed syntax and spelling errors.

In this project, we plan to complete this task by developing a novel mapping method based on low-resource neural machine translation (NMT). NMT has emerged to significantly promote machine translation with end-to-end models that automatically translate a source language to a target language. Because large amounts of data like pairs of annotated questions and queries may not be available, low-resource NMT techniques will be developed for this project. Specifically, we plan to build a semi-supervised NMT to enhance the mapping performance by leveraging large amounts of unlabeled data. In addition, we will also explore the mapping from questions (text) to NASA database queries by leveraging large language models (LLMs) and prompt engineering.

Publications:

  • S. Sarker, X. Dong, X. Li, L. Qian. (2025) “Enhancing LLM Fine-tuning for Text-to-SQL by SQL Quality Measurement.” 2025 International Joint Conference on Neural Networks (IJCNN 2025), 2025. (submitted)
  • S. Sarker, X. Dong, L. Qian. (2025) “Integrating Attention-based Data Preprocessing with LLMs for Text-to-SQL.” In Preparation.
  • S. Sarker, X. Dong, X. Li, L. Qian. (2025) “Text Generator and Text Discriminator for NIST GenAI T2T Challenge.” In Preparation.
  • Liu, C., Ye, X., Huang, X., & Xu, Y. (2025). Vertical Dimension of Urban Thermal Environments: A Literature Survey. Cities, 158, 105629.
  • Han, Y., Ye, X., & Zhu, C. (2024). The Unequal Impact of Disasters: Assessing the Interplay Between Social Vulnerability, Public Assistance, Flood Insurance, and Migration in the US. Urban Informatics, 3(1), 1-12.
  • Shaw, S. L., Ye, X., Goodchild, M., & Sui, D. (2024). Human Dynamics Research in GIScience: Challenges and Opportunities. Computational Urban Science, 4(1), 31.
  • Du, J., Ye, X., Huang, X., Qiang, Y., & Zhu, C. (2024). Unveiling Multifaceted Resilience: A Heterogeneous Graph Neural Network Approach for Analyzing Locale Recovery Patterns. Environment and Planning B: Urban Analytics and City Science.
  • Dvir, R., Vedlitz, A., & Ye, X. (2024). Worried (and) Sick: How Environmental Hazards Affect Americans' Health-Related Risk Attitudes. Urban Informatics, 3(1), 26.
  • W. Ni, Y. Zhang, and W. Li (2024). “Optimal Task Admission Control of Private Cloud Data Centers With Limited Resources,” IEEE 14TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC).
  • Sarker S, Qian L, Dong X. (2023). “Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
  • Abdul-Quddoos T, Dong X, Li X. (2023). “Systematic Comparative Analysis of Pre-trained Large Language Models on Contextualized Medication Event Extraction.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
  • Bello, J., Dong, X., & Li, X. (2023). “Semi-supervised Single-Shot Object Detection for Table Detection in Scanned Documents.” In Proceedings of the Future Technologies Conference (pp. 430-441). Cham: Springer Nature Switzerland.

Educational and Training Activities:

  • Python tutorial training was offered to students in March 2024.
  • Offered a 5-day PhD Workshop for the NASA-DEAP program on June 10-14, 2024, where 15 PhD students participated and very
    positive feedback was received.

5-day PhD Workshop for the NASA-DEAP program

  • This project was highlighted at the HBCU Week in Philadelphia on September 15-19, 2024 by Roderick Chappell from NASA and
    PVAMU President Dr. LeGrande.
  • On September 16, 2024, Lynn Vernon, Chief Engineer for IT at NASA Johnson Space Center, spoke at the Texas A&M Institute of
    Data Science research seminar. He also met with individual faculty of the NASA DEAP project to explore research collaborations.
    TAMIDS hosted a student lunch where Lynn discussed student internship programs in science and engineering at NASA.
  • A one-day Workshop on Generative AI Mastery: From Theory to Practice was attended by over 80 staff researchers and students
    from the NASA-DEAP partner schools. The free workshop covered generative AI, with a special focus on large language models.
    This comprehensive event started with an exploration of the history, theory, and evolution of generative AI, providing
    participants with a solid foundation in machine learning, neural networks, and natural language processing. Attendees then
    participated in hands-on experience with key technologies, including prompt engineering, retrieval-augmented generation (RAG), and
    using cutting-edge models.
  •  Taught a new course in Data Ethics in the Fall of 2023 (by Dr. X. Li) at PVAMU.
  • Proposed a new course in Generative AI (LLMs and prompt engineering).
  • The team and NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence
    on March 30-31, 2023 at PVAMU. About 65 students completed the training and received the certificate from NVIDIA.

NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence

 

Last Modified: January 2025