技术周报 | 2026-05-11

五月第一周, Anthropic 在 Code w/ Claude 2026 上甩出本年度最大的算力交易: 跟 SpaceX/xAI 签下 Colossus 数据中心 300MW / $5B/yr 的独家容量, ARR 年化增速 8000%. 同期 Claude Mythos preview 被 Mozilla 拿去给 Firefox 找漏洞, 抓出"几百个"由 AI 生成、质量好到令人不安的报告 — Simon Willison 评 "suddenly the bugs are very good"; METR 那张"AI 任务长度翻番周期"的图也被 Mythos 直接顶破图表上限. 一周之内, "前沿 coding agent 实际能动手修真实生产代码的能力" 被两件事同时坐实, 不再是论文段位的故事.

OpenAI 这边走的是另一条剧本: 集中发了 GPT-5.5 Instant (ChatGPT 新默认 + system card), GPT-5.5-Cyber (面向防御方 Trusted Access), GPT-Realtime-2 / -Translate / -Whisper 三件套 SOTA 语音 API, 以及为大规模训练集群推开源的网络协议 MRC (Multipath Reliable Connection) (走 OCP). 同时一脚踩进商业化深水: ChatGPT 开始测试广告 + 自助 Ads Manager + CPC 竞价, 配合 Trusted Contact 安全功能与 PwC / Uber / Parloa / Singular Bank / Simplex 一连串企业落地案例. 这周 OpenAI 把"模型 + 商业化 + 基础设施 + 安全"四条线压在同一波 release window.

反方阵营声量也罕见地大: Gary Marcus 连发 "Autonomous Agents are a Shitshow" + "Agents and ROI" + "Misplaced panic over AI progress" (拆 METR 那张图); Ed Zitron 在 wheresyoured.at 双开火 "AI's Circular Psychosis""The AI Compute Demand Story Is A Lie" (称当下产能紧张不是真实需求, 是 hyperscaler 失衡); Joan Westenberg 写下"快 vs 合法之战"; Sean Goedecke 反向给出 "AI makes weak engineers less harmful" 的乐观视角. 安全侧 Canvas 数据外勒索 横扫全美 9000 所学校 / 275M 学生数据, 配合 ShinyHunters 持续突破大牌防线, 教育与企业供应链同时被点亮. 商业八卦: Y Combinator 持有 OpenAI 0.6% (Gruber 顺线挖出, 按 $852B 估值反推近 $5B), NYT 法庭文件曝 Google 持有 Anthropic 大块股权, 两家"中立"母 VC 一边被点名利益冲突.

TOP 01

Behind the Scenes Hardening Firefox with Claude Mythos Preview

Mozilla 拿到 Claude Mythos preview 后用它定位并修复 Firefox 数百个漏洞. Simon 引用原文最关键的一段是 "Suddenly, the bugs are very good" — 几个月前 AI 生成的安全报告多半是 known-bad slop, 现在质量从噪声跳到信号. Anthropic 同期在 Code w/ Claude keynote 直接展示这件事.

📌 推荐理由: AI security research 从 "演示能跑" 跨到 "能在工业级 codebase 持续产出真 CVE" 的临界点. 对所有维护开源项目的人, 这是一条要把 "AI 报告默认 dismiss" 策略改回 "默认认真审" 的信号; 对蓝队 / red team, 工具栈要重排.

TOP 02

Notes on the xAI/Anthropic data center deal

Anthropic 跟 SpaceX/xAI 签下"占用 Colossus 全部容量"的合同, Latent Space 同周深度: 300MW / $5B/yr / ARR 增速 8000% 年化. Colossus 即 Musk 那座因燃气轮机污染许可被诟病的孟菲斯数据中心. Simon 称这是当天 keynote 之外最大新闻.

📌 推荐理由: AI 产能向少数几个 hyperscale site 集中的最新一步. 同时让 "Anthropic 的环保叙事" 与 "Colossus 的环境争议" 产生直接冲突, 后续社运 / 监管侧会持续发酵. 对独立厂商, 算力垄断进一步加深.

TOP 03

GPT-5.5 Instant: smarter, clearer, and more personalized

ChatGPT 的默认模型升级到 GPT-5.5 Instant, 主打 reduce hallucination + 个性化控制. 同步发了 system card. 配套 OpenAI 把 Codex 在自家内部安全运行 的细节 (sandboxing / approvals / network policies / agent-native telemetry) 也写成单独博客 — 跟 Anthropic 的 "harness 层 bug 复盘" 形成对照.

📌 推荐理由: 主线模型节奏没停. 5.5 Instant 即上即用, 比 5.5 thinking 更直接影响普通用户体感. Codex 安全运行那篇是少见的"前沿 lab 内部 agent harness 防护"参考资料, agent 系统开发者值得读.

TOP 04

OpenAI 推开源 MRC (Multipath Reliable Connection) 训练集群网络协议

OpenAI 把自研的 supercomputer 网络协议 MRC 通过 OCP (Open Compute Project) 开源, 解决大规模训练 fabric 的可靠性与性能问题. 一个高度软件化、走 multipath 的传输层, 替代部分 RDMA / RoCE 路径.

📌 推荐理由: AI infra 的"软件定义 fabric"信号. 大模型训练规模逼近物理 fabric 极限, 协议层一旦开源, 会改变 NVIDIA / Broadcom / Cisco 的话语权分配. 关注 systems / networking 的工程师值得跟一波.

TOP 05

Pushing Local Models With Focus And Polish — Armin Ronacher

Flask 作者 Armin 写一篇长文, 论证 "为什么我真的想让 local model 能用". 不是技术 review, 是关于"把试验门槛锁在 hosted API 后面, 普通开发者会被永久排除在 AI 实验之外"的论点. 反方对照: Reddit r/LocalLLaMA 同周热帖 "Hugging Face co-founder says Qwen 3.6 27B 飞机离线模式 ≈ Claude Code Opus" (1736 upvotes).

📌 推荐理由: 开源 / local 模型这周不是没新东西, 而是开始有"接近商用 hosted 体感"的真实证据. 这条搭 IBM Granite 4.1 / NVIDIA Star Elastic / Qwen 3.6 系列 release 一起读, 是本周开源侧的主线.

TOP 06

Code w/ Claude 2026 — Simon Willison 现场 live blog

Anthropic 一年一度的主场, Simon 拿出多年开发者会议直播经验给出了最完整的现场记录. 除了 Mythos / Colossus, 还有 Claude Code 工具链更新、企业落地案例. 配套阅读: Simon 同日的 "Using Claude Code: The Unreasonable Effectiveness of HTML" — Anthropic 内部团队 Thariq Shihipar 提出让 Claude 输出 HTML 而不是 Markdown 作为标准 agent 输出格式的论证.

📌 推荐理由: 这是本周 AI 产品/agent 工程的事实底稿. Code w/ Claude 是 Anthropic 一年中影响 roadmap 最深的窗口, 直接关系 babata 自身 (跟 Claude Code 同源) 的能力边界.

TOP 07

Canvas Breach Disrupts Schools & Colleges Nationwide

Canvas (美国 K-12 / 高校广用的 LMS) 遭数据勒索, 275M 学生 / 教职工数据, 近 9000 所学校 受影响. 攻击者直接把 ransom note 改到登录页. Troy Hunt 同周 weekly update 把 ShinyHunters 持续高产单独点出 — 表面是"low-tech 青少年组织", 战果却持续突破大品牌, 杠杆比令传统 APT 都难看.

📌 推荐理由: 教育垂类 + 供应链双重指标事件. SaaS 服务一旦成为某个垂类的事实垄断, 它就是单点风险, ed-tech 这次被实打实地验证一次.

TOP 08

The AI Compute Demand Story Is A Lie — Ed Zitron

Zitron 论点: 当前 AI 行业"算力告急"叙事不是真实需求驱动, 是 hyperscaler 在堆产能、消化资本支出的副产品 — Anthropic 没钱付云账单, 只在别人也付钱时才付. 同期他还有 "AI's Circular Psychosis" 阐述同一逻辑. 一周里 Gary Marcus 也连发三篇 (Agents and ROI / Autonomous Agents are a Shitshow / Misplaced panic over AI progress).

📌 推荐理由: 这周反方阵营声量是过去半年最大, 跟 Anthropic 8000% ARR 的乐观叙事形成强对照. 不一定全对, 但所有做 AI infra / agent 业务规划的人都该读一遍当压力测试.

🤖 AI & ML

  • GPT-5.5 Instant 主线升级 + System CardChatGPT 新默认 openai.com
  • GPT-5.5 Instant System Card完整安全 / 能力评估 openai.com
  • Granite 4.1 3B SVG Pelican GalleryIBM Apache 2.0 系列 (3B/8B/30B), Simon 跑 pelican benchmark simonwillison.net
  • NVIDIA Star Elastic一个 checkpoint 含 30B/23B/12B 推理模型, zero-shot slicing reddit.com
  • HF co-founder: Qwen 3.6 27B airplane mode ≈ Opus on Claude Code1736 upvotes 的 viral 帖 reddit.com
  • Qwen 3.6 35b a3b 8GB VRAM + 32GB RAM ~190k context社区验证低配跑法 reddit.com
  • DeepSeek V4 Pro at home社区炫耀帖 207 upvotes reddit.com
  • DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation85 tok/s @ 524k 双 RTX PRO 6000 reddit.com
  • MTP benchmark: speculative inference 在 coding 加速、creative 减速单变量决定性比对 reddit.com
  • EMO: Pretraining mixture of experts for emergent modularityAllenAI HF blog huggingface.co
  • gemma-4-26b-a4b one-shotting three.js一致性社区报告 reddit.com
  • Hello from 10KM high! - Qwen 3.6 35b a3b 飞机离线跑local model 飞行模式段子帖 reddit.com
  • vLLM V0 to V1: Correctness Before Corrections in RLServiceNow / HF blog huggingface.co
  • Adding Benchmaxxer Repellant to the Open ASR Leaderboard私有 holdout 数据反 benchmaxx huggingface.co
  • Why hasn't longer-horizon training slowed AI progress? — Sean GoedeckeDwarkesh 题目应答 seangoedecke.com
  • Notes from inside China's AI labs — Nathan Lambert走访中国主流 lab 综述 interconnects.ai
  • The distillation panicNathan Lambert 拆"蒸馏攻击"话术 interconnects.ai
  • Open weights are quietly closing up - and that's a problem维持前沿价格诚意的关键 martinalderson.com
  • Running Codex safely at OpenAIsandboxing/approvals/network policies/telemetry 全栈 openai.com
  • Using Claude Code: The Unreasonable Effectiveness of HTML — Simon Willison输出 HTML 替代 Markdown simonwillison.net
  • Vibe coding and agentic engineering are getting closer than I'd like — Simon Willison两条线开始合流 simonwillison.net
  • The Roadmap to Mastering Tool Calling in AI AgentsMLM 教程 machinelearningmastery.com
  • Implementing Permission-Gated Tool Calling in Python AgentsMLM 实操 machinelearningmastery.com
  • Implementing Statistical Guardrails for Non-Deterministic AgentsMLM agent 测试 machinelearningmastery.com
  • Agentic RAG Explained in 3 Levels of DifficultyMLM 入门 machinelearningmastery.com
  • Asimov's three laws are merely a suggestionidiallo 论 Asimov 三定律不适用 LLM agent idiallo.com
  • AI didn't delete your database, you did — idiallo反驳 Cursor 误删生产库 viral tweet idiallo.com
  • Our AI started a cafe in Stockholm — Andon LabsProject Mona 现场实验 andonlabs.com
  • Breaking: Autonomous Agents are a Shitshow — Gary Marcus garymarcus.substack.com
  • GenericAgent (170 stars/day)self-evolving agent, 3.3K seed → full system, 6x less token github.com/lsdefine/GenericAgent
  • everything-claude-code (1011 stars/day)agent harness performance optimization github.com/affaan-m/everything-claude-code
  • addyosmani/agent-skills (1092 stars/day)production-grade agent skills github.com/addyosmani/agent-skills
  • bytedance/UI-TARS-desktop (656 stars/day)多模态 agent 栈 github.com/bytedance/UI-TARS-desktop
  • OpenAI: Advancing voice intelligence with new models in the APIRealtime-2 全家桶 openai.com
  • [AINews] GPT-Realtime-2, -Translate, -Whisper: new SOTA realtime voice APIsLatent Space 评论 latent.space
  • How OpenAI delivers low-latency voice AI at scaleWebRTC stack 重建 openai.com
  • Quoting Luke Curley — WebRTC drops your prompt语音 stack 实战吐槽 simonwillison.net
  • [AINews] Anthropic 增长 10x/year, 其他大厂裁员 >10%反差信号 latent.space
  • [AINews] Anthropic-SpaceXai Colossus deal: 300MW/$5B/yr, ARR 8000%Latent Space 深度 latent.space
  • [AINews] Silicon Valley gets Serious about Servicesservices 是 AI 下个机会 latent.space
  • [AINews] The Other vs The Utility — Clippy vs Anton 之争AI character 取向 latent.space
  • Y Combinator's Stake in OpenAI: ~0.6%Daring Fireball 反推 ~$5B daringfireball.net
  • Google Owns a Big Chunk of AnthropicNYT 法庭文件 gift link nytimes.com
  • Anthropic Executive 一年前: Fully AI Employees Are a Year AwayAxios 旧文回顾 axios.com
  • Testing ads in ChatGPT清晰标注 + privacy 控制 openai.com
  • New ways to buy ChatGPT adsbeta self-serve Ads Manager + CPC openai.com
  • How frontier firms are pulling ahead — OpenAI B2B Signals企业 AI 采纳报告 openai.com
  • OpenAI + PwC: reimagine office of CFO财务工作流自动化 openai.com
  • Uber uses OpenAI for drivers/riders司机 voice + 车主 AI 助手 openai.com
  • Parloa builds service agentsvoice 客服垂类 openai.com
  • Singular Bank: 60-90 min/day saved私人银行 GPT + Codex 内部助手 openai.com
  • Simplex rethinks software development with Codex软件研发提效 openai.com
  • Introducing ChatGPT Futures: Class of 202626 学生 founder openai.com
  • Premium: AI's Circular Psychosis — Ed Zitron wheresyoured.at
  • Premium: The AI Compute Demand Story Is A Lie — Ed Zitron wheresyoured.at
  • Am I Meant To Be Impressed? — Ed Zitron wheresyoured.at
  • The war between fast and legitimate is here — Joan Westenberg joanwestenberg.com
  • The growing AI backlash — Gary Marcus garymarcus.substack.com
  • Agents and ROI — Gary Marcus引 MIT 报告 garymarcus.substack.com
  • Breaking news: "they hadn't figured out how OpenAI would pay for it" — Gary Marcus garymarcus.substack.com
  • Misplaced panic over AI progress — Gary Marcus拆 METR time horizon graph garymarcus.substack.com
  • Claude Mythos broke METR graph — Reddit reddit.com
  • AI makes weak engineers less harmful — Sean Goedecke seangoedecke.com
  • The left-wing case for AI — Sean Goedecke seangoedecke.com
  • Anthropic 8000% ARR 增速周内最强 traction 数据 (Latent Space 与多家引用)
  • What matters at the Musk-OpenAI trial — Gary Marcus garymarcus.substack.com
  • Doing Vibe Physics — Alex Lupsasca / OpenAIGPT-5.x 推导量子引力新结果 latent.space
  • Local AI needs to be the normHN 400 upvotes unix.foo
  • Maryland $2B power grid for AI data centersHN 80 upvotes 政策侧 tomshardware.com
  • Meta capturing employee mouse movements/keystrokes for AI trainingReuters reuters.com
  • Scaling Trusted Access for Cyber with GPT-5.5 / GPT-5.5-Cyber防御方专属 openai.com
  • How ChatGPT learns about the world while protecting privacy openai.com
  • Introducing Trusted Contact in ChatGPTself-harm 风险联系信任人 openai.com
  • Advancing youth safety in EMEA欧洲青少年安全 blueprint openai.com
  • Signals: most informative agent traces without LLM judgesr/ML 评估方法 reddit.com
  • Getting a feel for how fast X tokens/second really isr/LocalLLaMA 直觉化对比 273 upvotes reddit.com

🔧 工具 & 开发

🔐 安全

  • Canvas Breach Disrupts Schools & Colleges Nationwide — Krebs275M 学生 / 9000 学校 krebsonsecurity.com
  • Troy Hunt Weekly Update 502ShinyHunters 持续战果 / 杠杆论 troyhunt.com
  • CloakBrowser (567 stars/day)stealth Chromium / Playwright 替代, 30/30 bot detection tests passed github.com/CloakHQ/CloakBrowser

🌐 平台 / 商业 / 社会

🗒️ 长期 / 文化 / 个人写作

  • RSS Feeds Send Me More Traffic Than Google — shkspr shkspr.mobi
  • I Will Not Add Query Strings to Your URLs — susam susam.net
  • From RSS to Atom — susam susam.net
  • Wander Console 0.6.0 — susam susam.net
  • Links to CSS colour palettes — Julia Evans jvns.ca
  • Hi stranger — idiallo idiallo.com
  • Notes on the Hantavirus Outbreak — borretti borretti.me
  • Extremely low frequencies — computer.rip computer.rip
  • Book Review: The Names by Florence Knapp shkspr.mobi
  • Prolost Watches 1.0手表收藏 app prolost.com
  • Pedometer++ 8.0 — David SmithwatchOS 地图 6 年打磨 david-smith.org
  • Chess Peace iOS game把所有棋子摆到互不威胁的逻辑游戏 chesspeace.app
  • Quoting John Gruber on YC's OpenAI stake simonwillison.net
  • Quoting Andrew Quinn3GB SQLite → 7MB FST 趣事 simonwillison.net
  • Quoting Andy Masley on data center land use simonwillison.net
  • April 2026 newsletter — Simon Willison simonwillison.net
  • MachinaCheck: Multi-Agent CNC Manufacturability System on AMD MI300XHF 黑客松 huggingface.co
  • Pluralistic 系列 4 篇 — Cory DoctorowTrump 政策 / bubble 评论 / Lee Lai 漫画 / 后美国世界三军 pluralistic.net

Hacker News

GitHub Trending (本周)

  • bytedance/UI-TARS-desktop ⭐ 656/day — 多模态 agent 栈
  • anthropics/financial-services ⭐ 1479/day — Anthropic 金融服务 templates
  • addyosmani/agent-skills ⭐ 1092/day — production agent skills
  • affaan-m/everything-claude-code ⭐ 1011/day — agent harness 优化系统
  • decolua/9router ⭐ 806/day — 多 provider AI coding 路由 (40+ providers)
  • datawhalechina/hello-agents ⭐ 756/day — 中文 agent 教程
  • datawhalechina/easy-vibe ⭐ 642/day — 中文 vibe coding 入门
  • playcanvas/supersplat ⭐ 604/day — 3D Gaussian Splat 编辑器
  • CloakHQ/CloakBrowser ⭐ 567/day — stealth Chromium / Playwright 替代
  • HKUDS/AI-Trader ⭐ 255/day — 全自动 agent-native trading
  • jundot/omlx ⭐ 187/day — Apple Silicon LLM inference (menu bar)
  • lsdefine/GenericAgent ⭐ 170/day — self-evolving agent

Reddit 热议

  • HF cofounder: Qwen 3.6 27B airplane mode ≈ Opus on Claude Code (r/ClaudeAI, ⬆️ 1736 / 💬 235)
  • What's up, Claude? (r/ClaudeAI, ⬆️ 986 / 💬 35) — Claude 异常体感
  • I deleted a guy's entire Windows install with one backslash. 717 GB. Gone. (r/ClaudeAI, ⬆️ 840 / 💬 154)
  • I read threads complaining about Claude every week... what are y'alls workflows? (r/ClaudeAI, ⬆️ 465 / 💬 103)
  • NVIDIA Star Elastic: 30B/23B/12B 一个 checkpoint zero-shot slicing (r/LocalLLaMA, ⬆️ 293 / 💬 57)
  • Getting a feel for how fast X tokens/second really is (r/LocalLLaMA, ⬆️ 273 / 💬 80)
  • Tojan in "claude code" Google search first result (r/ClaudeAI, ⬆️ 258 / 💬 61) — 假冒投毒搜索结果
  • I have DeepSeek V4 Pro at home (r/LocalLLaMA, ⬆️ 207 / 💬 113)
  • Opus said something that reframed AI agent failures for me (r/ClaudeAI, ⬆️ 138 / 💬 47)
  • Hello from 10KM high! Qwen 3.6 35b a3b on plane (r/LocalLLaMA, ⬆️ 136 / 💬 37)
  • Opus 4.7 是 english only, 用德语会烧 token (r/ClaudeAI, ⬆️ 135 / 💬 59)
  • Claude Mythos 顶破 METR 图 (r/ClaudeAI, ⬆️ 96 / 💬 80)
  • Got parented by Claude (r/ClaudeAI, ⬆️ 83 / 💬 47)
  • I made Claude Code aware of its own usage limits (r/ClaudeAI, ⬆️ 74 / 💬 28)
  • Running Qwen 3.6 35b a3b on 8GB VRAM + 32GB RAM ~190k context (r/LocalLLaMA, ⬆️ 67 / 💬 44)

ClawHub 这一周仍是 skill 生态高频更新, 整体趋势两条线:

自演化 / harness 类 skill 是高下载量主力: - self-improving-agent v3.0.21 (Captures learnings/errors/corrections) — 持续在 high downloads - self-improving v1.2.16 (self-reflection + self-criticism + self-learning) — 同向 - proactive-agent v3.1.0 — 把 task-follower 转成 proactive - ontology v1.0.4 — typed knowledge graph for structured agent memory

安全 / 审计 类强势: - skill-vetter v1.0.0 — 三方 skill 安全 gate - skillscan v1.1.6 — 新 skill 上线必跑 - publish-skill-vettr v2.0.3 — 静态分析安全扫描

新发布有意思的几个: - meta-healing v0.1.2 / runtime-doctor v0.1.3 — 本地 OpenClaw runtime/config drift 诊断 - regenerative-intelligence v1.0.0 — energy-efficient harm-reducing memory - ambient-stamina v1.0.1 — rest / sleep 智能调节 - novel-multi-agent-skill v1.0.0 — 多 agent 协作小说创作 - superwise-drift-detection-skill v1.0.0 — 表格 ML 模型特征漂移检测 - huo15-openclaw-enhance v6.5.7 — "subagent 累加 + 预测式提醒" 上下文守护 - base-stable-arb-radar v0.1.3 — Base 链稳定币套利 (read-only) - iris-pro v1.0.1 — Gmail inbox intelligence - video-editing-ai-tool v1.0.0 — 3 min 屏幕录像 → 10 min 视频 - ai-diabetes-coach v1.0.4 — 糖尿病康复 (中文) - local-knowledge-retrieval v3.0.5 — local-first 文档搜索 (PPT/PDF)

热门趋势上 14h 前一批官方 skill (sonos / gog / github / weather / whisper / notion / obsidian / gemini / mcporter) 集中上架, 说明 ClawHub 在补齐主流系统集成入口.

指标
RSS 文章数 179
社区热帖 36 (HN 2 + GitHub 12 + Reddit 22)
ClawHub 新发布 20+
必读精选 8
全部精选 120+
覆盖窗口 168 小时 (May 4 - May 10)
RSS 源 103 / 107 成功
原始抓取量 4595