Sid @sidgraph

Energy, Taste and Intelligence e/acc @lossfunk, @Basethesislabs, @GenesisAILabs basethesis.com Earth Joined September 2019

Tweets

767
Followers

1K
Following

2K
Likes

20K

Sid @sidgraph

15 hours ago

I was actually enjoying my Saturday 😭

0 0 1 25 0

View Details

really well written blog on the emerging shift from optimization-limited to rollout-limited frontier RL. what stood out to me was the observation that as reasoning trajectories become longer and more heterogeneous GPU utilization and learner-generator synchronization increasingly dominate training efficiency. one underexplored question is whether capability jumps themselves can be detected through staleness statistics. if old trajectories suddenly become much less useful for learning that may indicate the model has discovered a qualitatively new reasoning strategy. in that sense replay-buffer value decay could become an observable signal of phase transitions in capability acquisition. if you're into async-RL/large-scale training systems this is a really worthwhile read.

Luke J. Huang @whatthelukh

5 days ago

New blog! Is frontier asynchronous RL solved? The blog covers Async RL theory and infrastructure, surveying 8 open-weight frontier labs for the algorithmic techniques and systems fixes to handle train-inference mismatch. Also answered: why do current methods still fail at high

16 134 1K 236K 2K

2 27 297 43K 346

View Details

Radhika Sharma @Radhika00586574

a week ago

The open-source protein ML space just got a massive upgrade. Phenomenal work by @anindyadeeps and @try_litefold on dropping the biggest protein data collection on Hugging Face

Anindyadeep @anindyadeeps

a week ago

We have released the biggest protein data collection on Hugging Face, guys! We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is

38 74 359 86K 201

1 5 10 2K 3

View Details

Sid @sidgraph

3 weeks ago

Check this out 👇

Shubham Sharma @HappyyPablo

3 weeks ago

open sourcing Marlin-2B 🐟 a tiny VLM to extract structured information from videos Marlin is finetuned for two questions devs want to ask in their videos: what is happening, and when? Best open model in its weight class, competitive with Gemini-2.5-flash at only 2B params 🧵

135 523 5K 305K 5K

0 0 6 440 1

View Details

Sid @sidgraph

3 weeks ago

Damn — its soo over for open ai 🫡

Andrej Karpathy @karpathy

3 weeks ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K 11K 150K 27.4M 14K

0 0 2 256 0

View Details

Sid @sidgraph

3 weeks ago

@nileshtrivedi yes i mean it was very obvious because this exists; but amazing work anyways :) arxiv.org/pdf/2403.12014

0 0 0 23 1

View Details

Richard Sutton @RichardSSutton

3 weeks ago

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.

136 977 7K 572K 3K

View Details

Sid @sidgraph

3 weeks ago

@Kautukkundan 🔥🔥🚀🚀

0 0 0 142 0

View Details

Joykirat @joykiratsingh

3 weeks ago

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

2 39 67 16K 23

View Details

Sid @sidgraph

4 weeks ago

@thinkymachines released interaction models! ☄️ ➡️ decoder-only transformer running over a single interleaved (in_k, out_k) stream of 200ms micro-turns across audio, video, and text. Encoder-free early fusion: dMel intensity-binned mel filterbanks + light embed for audio, 40×40 patches through an hMLP stem for video, token embed for text; all collapsed into a bag of embeddings per 200ms chunk and co-trained from scratch. ➡️ Audio out is a flow-matching head over mel frames. No Whisper, no separate TTS, no VAD, no turn boundaries, no dialog manager. 276B/12B-active interaction model stays in real-time; an async background model handles tool use and long-horizon reasoning over shared context, with partial results merged back into the live stream at moments appropriate to user state. KAME and MoshiRAG did this pattern for retrieval; TML generalizes it. Streaming sessions (upstreamed to SGLang) ship each 200ms chunk as a separate request and append to a persistent in-GPU KV sequence, eliminating per-turn realloc and metadata recompute. MoE kernels use gather+gemv instead of grouped GEMM for decode shapes. Bitwise trainer-sampler alignment via batch-invariant kernels from their Sep 2025 work: NVLS deterministic all-reduce on Blackwell, Split-KV attention with consistent accumulation order between prefill and decode (SM-aligned 4096-token left splits), under 5% e2e overhead. Moshi hits 200ms but is audio-only. Qwen3-Omni is ~234ms but uses a separate AuT encoder + sliding-window DiT, not encoder-free. AURA does visual proactivity but wraps ASR/TTS around a VideoLLM (text-out, half-duplex). gpt-realtime-2 and Gemini Live still rely on VAD turn detection.

Thinking Machines @thinkymachines

4 weeks ago

The technical report includes our motivation, early evaluation results, and technical approach. thinkingmachines.ai/blog/interacti…

11 21 387 93K 151

0 0 3 296 3

View Details

Sid @sidgraph

4 weeks ago

Amazing talk by @DrJimFan -- exciting times for Physical AGI! TL;DR VLA architecture is parameter-misallocated toward language and should be replaced by World Action Models → pretrained video diffusion models that jointly predict future world states and robot actions, instantiated by Dream Zero (a 14B model running real-time control at 7Hz with 2× generalization gains over VLAs, @SeonghyeonYe ). arxiv.org/pdf/2602.15922 His central data claim is that egocentric human video is the FSD-equivalent ambient data flywheel for robotics, and EgoScale (@ruijie_zheng12) demonstrates a near-perfect log-linear scaling law (ℒ(N) = a − b log N, R² = 0.9983) between 1K and 20K hours of pretraining data and downstream dexterity performance. arxiv.org/pdf/2602.16710 His central environment claim is that classical physics simulators will be replaced by neural simulators, and Dream Dojo (@ShenyuanGao) demonstrates this with 44K hours of human video pretraining, 10 FPS real-time interaction, and Pearson r = 0.995 policy-evaluation fidelity. arxiv.org/pdf/2602.06949 Significant gaps in the talk: - it does not address runtime semantics (skill installation, behavior consistency, run-update separation) - it does not address the model-exploitation failure mode of training policies against learned simulators or learned rewards. My running notes w/ Opus 4.7 👇 docs.google.com/document/d/e/2… @NVIDIAAI

Jim Fan @DrJimFan

4 weeks ago

I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy

160 545 3K 559K 4K

1 2 12 1K 13

View Details

Sid @sidgraph

4 weeks ago

Notes on Robotics' End Game: Nvidia's Jim Fan! - @sidgraph and Opus 4.7 youtu.be/3Y8aq_ofEVs?si… TL;DR Vision-Language-Action architecture is parameter-misallocated toward language and should be replaced by World Action Models → pretrained video diffusion models that jointly predict future world states and robot actions, instantiated by Dream Zero (a 14B model running real-time control at 7Hz with 2× generalization gains over VLAs). His central data claim is that egocentric human video is the FSD-equivalent ambient data flywheel for robotics, and EgoScale demonstrates a near-perfect log-linear scaling law (ℒ(N) = a − b log N, R² = 0.9983) between 1K and 20K hours of pretraining data and downstream dexterity performance. His central environment claim is that classical physics simulators will be replaced by neural simulators, and Dream Dojo demonstrates this with 44K hours of human video pretraining, 10 FPS real-time interaction, and Pearson r = 0.995 policy-evaluation fidelity. The framework is strongest on data (genuinely empirically grounded), partial on architecture (Dream Zero shows the substrate works but is GPT-2-stage, not GPT-3-stage), and weakest on the predictive claims about timeline and the "VLA is dead" rhetorical framing. Three significant gaps in the talk: it does not address runtime semantics (skill installation, behavior consistency, run-update separation), it does not address safety, and it does not address the model-exploitation failure mode of training policies against learned simulators or learned rewards. docs.google.com/document/d/e/2…

0 0 13 408 6

View Details

Sid @sidgraph

4 weeks ago

docs.google.com/document/d/e/2…

0 0 0 99 1

View Details

Sid @sidgraph

4 weeks ago

@yacineMTB check out SWE-WebDevBench🥵 x.com/sidgraph/statu…

Sid @sidgraph

4 weeks ago

Vibe Coding is not vibing? Agents perform <60% on SWE-WebDevBench 👀. Code-level benchmarks (HumanEval, SWE-bench, FeatBench) take the specification as given and grade the patch. Vibe coding inverts this: the user gives natural-language intent, the platform must do PM,

1 3 9 3K 2

0 0 1 978 0

View Details

Sid @sidgraph

4 weeks ago

@GauravSarkar99 No worries, see you next time :)

0 0 2 38 0

View Details

Sid @sidgraph

4 weeks ago

Annnddd its a house full 😄🥳 its soo fun to bring best researchers of blr together — conversations are sickk cool

Sid @sidgraph

a month ago

We’re hosting an invitation only gathering of researchers primarily in AI (but not limited to) over dinner in Indiranagar BLR! Just thoughtful conversations over good food, about seminal papers, emerging fields, research of your group/ recent papers you’ve published 🫶 If you