Research

Making AI systems learn more effectively from data, and helping humans learn more efficiently from AI

Research Philosophy

My research is driven by a simple yet powerful question: How can we make learning more efficient? This applies to both artificial intelligence systems and human learners. I believe the most impactful advances come from work that bridges theoretical understanding with practical implementation, creating solutions that are both scientifically rigorous and immediately applicable.

Core Research Themes

Efficiency • Adaptability • Practicality

Current Research Areas

🎯 Domain Adaptation for Large Language Models

**The Challenge**: General-purpose AI models often lack the specialized knowledge needed for expert-level tasks in specific domains like astronomy, medicine, or law. **My Approach**: Developing cost-effective methodologies to adapt large language models for domain-specific applications without the computational expense of training from scratch. **Key Work**: The ORBIT methodology demonstrates how to curate high-quality, domain-specific datasets from noisy web sources, achieving significant performance improvements with minimal computational overhead. **Impact**: Making specialized AI accessible to organizations without massive computational resources.

📊 Efficient Dataset Curation

**The Challenge**: High-quality training data is crucial for AI performance, but manually curating datasets is expensive and time-consuming. **My Approach**: Developing automated systems that can identify and extract high-quality, domain-specific content from massive, noisy web-scale datasets. **Key Innovation**: ORBIT reduces dataset curation costs while improving model performance, demonstrated across multiple domains including astronomy, law, and medicine. **Future Directions**: Expanding to real-time dataset curation and multi-modal data sources.

🔍 Advanced Information Retrieval

**The Challenge**: AI systems need efficient access to relevant information for accurate and up-to-date responses. **My Approach**: Researching advanced retrieval-augmented generation (RAG) systems that can efficiently access and integrate information in real-time. **Key Work**: Contributing to SIGIR LiveRAG workshop on next-generation retrieval systems. **Applications**: Improving AI systems' ability to provide accurate, current information across various domains.

Research Timeline

2024: Breakthrough Year

  • Published ORBIT methodology in ACL Findings
  • Presented LiveRAG research at SIGIR Workshop
  • Demonstrated cross-domain applicability of efficiency techniques
  • Open-sourced datasets and methodologies for community impact

2023-2024: Deep Dive into Domain Adaptation

  • Developed core ORBIT framework
  • Validated approach across multiple domains
  • Established collaboration patterns between academia and industry

2022-2023: Foundation Building

  • Explored intersection of learning efficiency and AI systems
  • Built relationships with mentors and collaborators
  • Identified key gaps in current methodologies

Current Research Projects

🚀 ORBIT 2.0: Multi-Modal Adaptation

Status: Active Development

Target Publication: ICLR 2025

Goal: Extending ORBIT methodology to handle text, images, and code simultaneously for comprehensive domain adaptation

Key Innovation: Cross-modal quality assessment and unified filtering pipeline

Collaboration: Prof. Chengxiang Zhai (UIUC), Industry Partners

Timeline: 2024-2025

Recent Progress: Preliminary experiments show 40% improvement in multi-modal domain adaptation tasks.

🧠 Human-AI Learning Efficiency Framework

Status: Data Collection Phase

Target Publication: CHI 2025

Goal: Quantitative framework for measuring and optimizing learning efficiency in human-AI collaborative systems

Key Innovation: Bidirectional learning insights between AI training and human education

Focus: Deliberate practice principles applied to AI system design

Timeline: 2024-2026

Methodology: Conducting longitudinal studies with 200+ participants across different learning domains.

💼 Industrial AI Deployment Lessons

Status: Industry Collaboration

Target Publication: KDD Industry Track 2025

Goal: Documenting real-world deployment challenges and solutions for domain-adapted AI systems

Key Innovation: Bridging academic research with production system requirements

Partnership: Capital One AI/ML Engineering Team

Timeline: 2024-2025

Impact: Systems serving 1M+ daily users, 95% uptime, significant cost reduction vs. general models.

Research Vision & Timeline

🔮 Long-term Research Vision (2025-2030)

Ultimate Goal: Create a unified theory of learning efficiency that applies across AI systems, human cognition, and human-AI collaborative environments.

Phase 1: Foundation (2024-2025)

  • ORBIT 2.0 multi-modal extension
  • Human-AI learning efficiency metrics
  • Industrial deployment frameworks

Phase 2: Integration (2025-2027)

  • Cross-domain efficiency principles
  • Real-time adaptive systems
  • Collaborative learning frameworks

Phase 3: Unification (2027-2030)

  • Universal learning efficiency theory
  • Self-improving AI-human systems
  • Next-generation educational tools

Mentorship & Collaboration

My research is greatly enhanced by working with exceptional mentors and collaborators:

Current Mentor

Professor Chengxiang Zhai

University of Illinois Urbana-Champaign

Expert in information retrieval and text mining. Guiding my work on advanced NLP techniques and domain-specific AI applications.

View Profile →

Former Mentor

Professor Volodymyr Kindratenko

NCSA / University of Illinois

Expert in high-performance computing and AI acceleration. Provided foundation in computational aspects of AI research.

View Profile →

Research Impact & Metrics

Publications Impact

  • ACL Findings: Top-tier venue with significant community reach
  • SIGIR Workshop: Cutting-edge retrieval research community
  • Open Source: Code and datasets available for reproducibility

Practical Impact

  • Cost Reduction: ORBIT methodology reduces dataset curation costs by orders of magnitude
  • Performance: Consistent improvements across multiple domains and benchmarks
  • Accessibility: Making specialized AI more accessible to smaller organizations

Community Engagement

  • Open Source Contributions: All methodologies and datasets publicly available
  • Mentoring: Supporting undergraduate and graduate researchers
  • Industry Collaboration: Bridging academic research with practical applications

Future Directions

Short-term (2024-2025)

  • Extend ORBIT to real-time data curation
  • Explore multi-modal dataset curation techniques
  • Develop more sophisticated domain adaptation methods

Medium-term (2025-2027)

  • Investigate human-AI learning efficiency connections
  • Build comprehensive frameworks for domain-specific AI
  • Establish industry partnerships for practical validation

Long-term Vision

  • Create a unified theory of learning efficiency across AI and human systems
  • Develop tools that make specialized AI accessible to any organization
  • Bridge the gap between academic research and industry application permanently

Resources & Tools


Interested in Collaboration?

I'm always looking for opportunities to collaborate on research that bridges theory and practice in AI learning efficiency.

Start a Conversation