DEAP Institute in Research and Education for Science Translation via Low-Resource Neural Machine Translation

(Supported by NASA)

Project Number: 80NSSC22KM0052
Duration: March 2023 – March 2026

This is a collaborative project of teams from Prairie View A&M University (PVAMU), Texas Southern University (TSU), and Texas A&M University (TAMU)

  • Team of Prairie View A&M University: ECE: L. Qian (lead), X. Dong, X. Li, P. Obiomon, R. Wilkins;  CS: L. Li; JJ: L. Wu
  • Team of Texas Southern University: W. Li (lead), R. Holmes, Penn-Marshall
  • Team of Texas A&M University: N. Duffield (lead), X. Ye, D. Rodriguez

Goal: 

The goal of this NASA-DEAP project is to build an AI-based system that is able to share interactive, instantaneous, and user-relevant Earth science information to allow NASA science to be more discoverable and more accessible to a broad audience.

Question answering has increasingly become a crucial application problem in AI, especially in the realm of natural language processing and machine intelligence. However, most existing solutions of question answering focus on “shallow” tasks that test merely the capability of a question answering system to pay attention to specific words and pieces of text. On the contrary, the knowledge and reasoning required to answer complex questions, become much more sophisticated and specialized that it would require the AI engine to understand and make inferences that resemble human expertise in specialized domains of NASA’s interests. Furthermore, the proposed system must be able to answer questions from non-scientists that are ambiguous and dependent upon implicit knowledge that isn’t explicitly stated in the question, not to mention that real-time questions are often hurried and rife with malformed syntax and spelling errors.

In this project, we plan to complete this task by developing a novel mapping method based on low-resource neural machine translation (NMT). NMT has emerged to significantly promote machine translation with end-to-end models that automatically translate a source language to a target language. Because large amounts of data like pairs of annotated questions and queries may not be available, low-resource NMT techniques will be developed for this project. Specifically, we plan to build a semi-supervised NMT to enhance the mapping performance by leveraging large amounts of unlabeled data. In addition, we will also explore the mapping from questions (text) to NASA database queries by leveraging large language models (LLMs) and prompt engineering.

Publications:

  1. Sarker S, Qian L, Dong X. (2023). “Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
  2. Abdul-Quddoos T, Dong X, Li X. (2023). “Systematic Comparative Analysis of Pre-trained Large Language Models on Contextualized Medication Event Extraction.” The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2023.
  3. Bello, J., Dong, X., & Li, X. (2023). “Semi-supervised Single-Shot Object Detection for Table Detection in Scanned Documents.” In Proceedings of the Future Technologies Conference (pp. 430-441). Cham: Springer Nature Switzerland.
  4. W. Ni, Y. Zhang, and W. Li (2024). “Optimal Task Admission Control of Private Cloud Data Centers With Limited Resources,” IEEE 14TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC).

Educational and Training Activities:

  •  Taught a new course in Data Ethics in the Fall of 2023 (by Dr. X. Li) at PVAMU.
  • Proposed a new course in Generative AI (LLMs and prompt engineering).
  • The team and NVIDIA co-organized Workshop and Training Courses on Deep Learning and Artificial Intelligence on March 30-31, 2023 at PVAMU. About 65 students completed the training and received the certificate from NVIDIA.

 

Last Modified: February 2024