Research

Making AI systems learn more effectively from data, and helping humans learn more efficiently from AI

Research Philosophy

My research is driven by a simple yet powerful question: How can we make learning more efficient? This applies to both artificial intelligence systems and human learners. I believe the most impactful advances come from work that bridges theoretical understanding with practical implementation, creating solutions that are both scientifically rigorous and immediately applicable.

Core Research Themes

Efficiency • Adaptability • Practicality

Current Research Areas

🎯 Domain Adaptation for Large Language Models

**The Challenge**: General-purpose AI models often lack the specialized knowledge needed for expert-level tasks in specific domains like astronomy, medicine, or law. **My Approach**: Developing cost-effective methodologies to adapt large language models for domain-specific applications without the computational expense of training from scratch. **Key Work**: The ORBIT methodology demonstrates how to curate high-quality, domain-specific datasets from noisy web sources, achieving significant performance improvements with minimal computational overhead. **Impact**: Making specialized AI accessible to organizations without massive computational resources.

📊 Efficient Dataset Curation

**The Challenge**: High-quality training data is crucial for AI performance, but manually curating datasets is expensive and time-consuming. **My Approach**: Developing automated systems that can identify and extract high-quality, domain-specific content from massive, noisy web-scale datasets. **Key Innovation**: ORBIT reduces dataset curation costs while improving model performance, demonstrated across multiple domains including astronomy, law, and medicine. **Future Directions**: Expanding to real-time dataset curation and multi-modal data sources.

🔍 Advanced Information Retrieval

**The Challenge**: AI systems need efficient access to relevant information for accurate and up-to-date responses. **My Approach**: Researching advanced retrieval-augmented generation (RAG) systems that can efficiently access and integrate information in real-time. **Key Work**: Contributing to SIGIR LiveRAG workshop on next-generation retrieval systems. **Applications**: Improving AI systems' ability to provide accurate, current information across various domains.

Research Timeline

2024: Breakthrough Year

Published ORBIT methodology in ACL Findings
Presented LiveRAG research at SIGIR Workshop
Demonstrated cross-domain applicability of efficiency techniques
Open-sourced datasets and methodologies for community impact

2023-2024: Deep Dive into Domain Adaptation

Developed core ORBIT framework
Validated approach across multiple domains
Established collaboration patterns between academia and industry

2022-2023: Foundation Building

Explored intersection of learning efficiency and AI systems
Built relationships with mentors and collaborators
Identified key gaps in current methodologies

Current Research Projects

🚀 ORBIT 2.0: Multi-Modal Adaptation

Status: Active Development

Target Publication: ICLR 2025

Goal: Extending ORBIT methodology to handle text, images, and code simultaneously for comprehensive domain adaptation

Key Innovation: Cross-modal quality assessment and unified filtering pipeline

Collaboration: Prof. Chengxiang Zhai (UIUC), Industry Partners

Timeline: 2024-2025

Recent Progress: Preliminary experiments show 40% improvement in multi-modal domain adaptation tasks.

🧠 Human-AI Learning Efficiency Framework

Status: Data Collection Phase

Target Publication: CHI 2025

Goal: Quantitative framework for measuring and optimizing learning efficiency in human-AI collaborative systems

Key Innovation: Bidirectional learning insights between AI training and human education

Focus: Deliberate practice principles applied to AI system design

Timeline: 2024-2026

Methodology: Conducting longitudinal studies with 200+ participants across different learning domains.

💼 Industrial AI Deployment Lessons

Status: Industry Collaboration

Target Publication: KDD Industry Track 2025

Goal: Documenting real-world deployment challenges and solutions for domain-adapted AI systems

Key Innovation: Bridging academic research with production system requirements

Partnership: Capital One AI/ML Engineering Team

Timeline: 2024-2025

Impact: Systems serving 1M+ daily users, 95% uptime, significant cost reduction vs. general models.

Research Vision & Timeline

🔮 Long-term Research Vision (2025-2030)

Ultimate Goal: Create a unified theory of learning efficiency that applies across AI systems, human cognition, and human-AI collaborative environments.

Phase 1: Foundation (2024-2025)

ORBIT 2.0 multi-modal extension
Human-AI learning efficiency metrics
Industrial deployment frameworks

Phase 2: Integration (2025-2027)

Cross-domain efficiency principles
Real-time adaptive systems
Collaborative learning frameworks

Phase 3: Unification (2027-2030)

Universal learning efficiency theory
Self-improving AI-human systems
Next-generation educational tools

Mentorship & Collaboration

My research is greatly enhanced by working with exceptional mentors and collaborators:

Current Mentor

Professor Chengxiang Zhai

University of Illinois Urbana-Champaign

Expert in information retrieval and text mining. Guiding my work on advanced NLP techniques and domain-specific AI applications.

View Profile →

Former Mentor

Professor Volodymyr Kindratenko

NCSA / University of Illinois

Expert in high-performance computing and AI acceleration. Provided foundation in computational aspects of AI research.

View Profile →

Research Impact & Metrics

Publications Impact

ACL Findings: Top-tier venue with significant community reach
SIGIR Workshop: Cutting-edge retrieval research community
Open Source: Code and datasets available for reproducibility

Practical Impact

Cost Reduction: ORBIT methodology reduces dataset curation costs by orders of magnitude
Performance: Consistent improvements across multiple domains and benchmarks
Accessibility: Making specialized AI more accessible to smaller organizations

Community Engagement

Open Source Contributions: All methodologies and datasets publicly available
Mentoring: Supporting undergraduate and graduate researchers
Industry Collaboration: Bridging academic research with practical applications

Future Directions

Short-term (2024-2025)

Extend ORBIT to real-time data curation
Explore multi-modal dataset curation techniques
Develop more sophisticated domain adaptation methods

Medium-term (2025-2027)

Investigate human-AI learning efficiency connections
Build comprehensive frameworks for domain-specific AI
Establish industry partnerships for practical validation

Long-term Vision

Create a unified theory of learning efficiency across AI and human systems
Develop tools that make specialized AI accessible to any organization
Bridge the gap between academic research and industry application permanently

Resources & Tools

💻 ORBIT Code 🤗 Models 📊 Datasets

Interested in Collaboration?

I'm always looking for opportunities to collaborate on research that bridges theory and practice in AI learning efficiency.

Start a Conversation