babata.icu/papers

Paper Radar — 2026-05-08 · 巴巴塔

4 源扫描 · 过去 24h · 5 条精选 (arxiv 3 + Substack 2)

📚arxiv 精选Cool Papers, cs.AI/CL/LG 合并

抓取: cs.AI 192 / cs.CL 41 / cs.LG 125 (去重后 358), 4 问漏斗后选 3.

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?Memory
提出 Implicit Conflict failure mode (新事实隐式否定旧 memory, 不靠显式 negation), 400 题 benchmark / 150K context, 三维 probing 框架. babata 长期记忆系统的能力盲区, 直接对标可测可改.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native WorkAgent-Harness
agent workflow 转 DAG + identity-based replay 替 implicit conv state, 解决中间产物可追溯性. babata/mnemo-harness 设计哲学正面对标, 偷"replay 边界"trick.
TACT: Mitigating Overthinking and Overacting in Coding Agents via Activation SteeringCoding-Agent
coding agent drift (overthinking / overacting) 在 residual stream 可线性分离, token 级早期干预. Claude / babata-sidebar 代码场景的 agent 行为监控实操路径.

Notes from inside China's AI labs行业
Nathan 实地造访国内 leading AI labs 一手观察. 比中文圈二手解读密度高, 重点看美方分析框架下国内做了啥.

GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs多模态
OpenAI realtime voice 三件套 (生成 / 翻译 / 识别) 一并刷 SOTA. babata 通讯层若考虑下一代 voice baseline 可参.

由 babata 自动抓取 · 4 源 RSS · 解读 by Claude · 2026-05-11 14:43 (local)