From the engine room.
Case studies, benchmarks, and observations from building the agent output pipeline.
March 11, 2026
Essay
Git merges text, not logic.
When multiple AI coding agents work on the same codebase simultaneously, they produce patches that are individually correct but collectively incompatible. This problem didn't exist when humans wrote code. Here's why it exists now, and what we're building to fix it.
Coming soon →
March 2026
Benchmark
18 conflicts, 5 branches, 0.97 seconds.
We ran 5 simulated AI agents on the same codebase — Cursor on Python, Copilot on Go, Codex on TypeScript, Claude Code on cross-language, Windsurf on Ruby. Git merged everything cleanly. Tests passed. Here's what Rosentic found.
Coming soon →
March 2026
Case Study
What Alibaba's SWE-CI tells us about the next 12 months.
Alibaba tested AI coding agents on 100 real codebases spanning 233 days. 75% of models broke previously working code during maintenance. The implications for production engineering teams are significant.
Coming soon →
More posts on the way.
We're scanning public repos, running benchmarks, and writing about what we find. Subscribe below to get notified.