Scaling Sparse Mixture-of-Experts to 1T Parameters
A. Chen, M. Kowalski, R. Patel, S. Nakamura
We demonstrate stable training dynamics for trillion-parameter sparse
MoE architectures with novel load-balancing loss and expert-choice routing,
achieving 3.2x throughput over dense baselines.
ARCHITECTURESCALINGMOE
2026 · NEURIPS
Constitutional AI Alignment via Recursive Reward Modeling
L. Wagner, J. Kim, D. Okafor, T. Zhang
A scalable alignment framework that trains models to follow constitutional
principles through iterative self-critique and refinement, reducing harmful
outputs by 94% without human annotation overhead.
ALIGNMENTSAFETYRLHF
2025 · ARXIV
Autonomous Tool-Use Agents with Hierarchical Planning
S. Nakamura, A. Chen, R. Patel
We introduce HATS — a hierarchical agent-task system that decomposes complex
multi-step objectives into executable sub-goals, achieving 78% success rate
on WebArena benchmark with 4x fewer steps than flat baselines.
AGENTSPLANNINGBENCHMARKS
2025 · ICLR
Efficient Long-Context Training with Ring Attention
J. Kim, T. Zhang, L. Wagner
Ring attention with blockwise computation enables training on 1M+ token
contexts at near-linear scaling efficiency across 512 GPUs, unlocking new
capabilities in document understanding and code synthesis.
TRAININGCONTEXTSYSTEMS
2025 · EMNLP
Multimodal Grounding via Cross-Attention Fusion
M. Kowalski, D. Okafor, S. Nakamura
A unified cross-attention architecture that grounds language in visual,
audio, and sensor modalities simultaneously, surpassing single-modality
fine-tuned models on 12 of 15 multimodal benchmarks.
MULTIMODALVISIONFUSION
2025 · COLM
Speculative Decoding with Dynamic Draft Trees
R. Patel, A. Chen, T. Zhang
Dynamic draft tree construction adapts speculative decoding depth to
token-level uncertainty, achieving 2.8x inference speedup while preserving
exact output distribution from the target model.
INFERENCEOPTIMIZATIONDECODING
TEAM
RESEARCHERS & ENGINEERS
AC
Dr. Alex Chen
Chief Scientist
Former DeepMind, Stanford PhD. 40+ papers in scaling laws & architecture design.
LW
Dr. Lena Wagner
Head of Alignment
Previously OpenAI safety. Pioneering constitutional AI and recursive reward modeling.
JK
Dr. Jae Kim
Research Lead
MIT PhD. Long-context training, ring attention, distributed systems at scale.