Aug 05, 2025 | AlphaGo Moment for Model Architecture Discovery ; The Rise of Autonomous AI Scientists 🤖🚀 |
Jul 30, 2025 | Reinforcement pre-training - baking the cherry into the cake |
Jul 29, 2025 | Group Sequence Policy Optimization (GSPO); A Smarter Approach to RL for LLMs and MoE Models |
Apr 15, 2025 | Llama 4 ; Meta Scales MoE, Online RL, and Multimodal Innovation 🦙💡 |
Mar 20, 2025 | Qwen2.5-Omni ; Alibaba’s Multimodal Model Elevates Real-Time AI 🧠🎤🖼️ |
Mar 15, 2025 | Cosmos-Transfer1 ; NVIDIA’s Model for Next-Gen Conditional World Generation 🤖✨ |
Feb 25, 2025 | Native Sparse Attention ; Hardware-Aligned Breakthrough for Long-Context LLMs 🤖✨ |
Feb 15, 2025 | EvalPlanner ; Meta’s Transparent & Accurate LLM Evaluation Approach 🌟 |
Jan 25, 2025 | Memory Layers in Large Language Models ; Boosting LLM Performance 🧠 |
Dec 30, 2024 | DeepSeek-V3 ; 671B-Parameter MoE LLM Setting New AI Benchmarks 🌟🤖 |
Dec 25, 2024 | Byte Latent Transformer ; Meta’s Tokenizer-Free LLM for Raw Byte Understanding 🔥 |
Dec 20, 2024 | Alignment Faking ; Can LLMs Fake Alignment with Human Values? 🤔 |
Dec 15, 2024 | InternVL 2.5 ; Open-Source Multimodal LLM Raising the Bar ✨ |
Dec 10, 2024 | Motion Prompting ; Breakthrough in Video Generation from Google DeepMind ✨📹 |
Dec 08, 2024 | Prompt Formatting ; Does It Really Matter for GPT Models? ✨🤔 |
Nov 30, 2024 | Star Attention ; Supercharging LLM Inference with Speed & Accuracy 🚀✨ |
Nov 25, 2024 | Automated Red Teaming ; OpenAI’s Novel Methods for LLM Attack Simulation |