Day 1 of #SnowflakeSummit is a lot. Booth visits, back to back talks, so many handshakes. Tomorrow night, relax with us. 🍻
Data Renegades Happy Hour, Tuesday June 2, 6-9pm, 7 min walk from the summit.
Drinks, not slides. 😉
luma.com/vnyf1nij?utm_s…
Data review flagged 99.999% row-count variance. PR was two lines.
Base: 5 years of prod history. Current: 1-hour CI build.
False alarms train reviewers to scroll past. That's the damage.
blog.reccehq.com/session-base-p…#dbt#DataEngineering
90% of enterprise programmers spend their time on maintenance, not greenfield development.
Michael Stonebraker's take on where AI actually earns its keep inverts the marketing narrative completely.
Michael Stonebraker was right about CODASYL. Right about NoSQL. Now he's run text-to-SQL on a real enterprise warehouse and got 10% accuracy against an 80% benchmark.
The pattern is hard to ignore.
blog.reccehq.com/benchmarks-lie…#DataEngineering#TextToSQL#AI
A wiki is something you look at. A shared AI system is something you work through.
When knowledge lives inside the workflow, it stays current. Every time someone runs a skill, outdated entries get noticed. Gaps get filled.
blog.reccehq.com/we-didnt-set-o…#ClaudeCode#DataEngineering
The loop: code → review → handoff → skills update. Every session makes the next one smarter. One aggregation bug became a permanent rule enforced automatically.
@data_dori broke it all down at Data Debug SF. Full writeup:
blog.reccehq.com/ai-skills-for-…#DataEngineering#AI
AI coding tools generate plausible but wrong SQL constantly. The fix isn't waiting for a smarter model.
AI skills are markdown files that encode domain knowledge into coding tools. No framework, just structured text in a repo.
One subagent fetches full PR context via a single GitHub GraphQL MCP call (replaced 5-10 gh CLI round-trips). The other explores data through 6 Recce MCP tools: lineage_diff, schema_diff, row_count_diff, custom queries.
Our own Kent Chen wrote up the multi-agent architecture the team built for Recce's AI Data Review.
Single agent kept forgetting findings as PRs got complex. Fix: orchestrator + two specialists, each with its own 200k context window.
"Pandas in 2011 was essentially book-driven development, quite literally." @wesmckinn wrote features because he needed them for the book chapter.
#pandasPython#DataRenegades
"If I had taken three years longer to do things the right way, it would have been too late." -- @wesmckinn on why pandas shipped imperfect and won.
#pandasPython#DataRenegades
32 Followers 7 FollowingAI agents for data engineering. Impact analysis, root cause debugging, model refactoring. All powered by full data stack context.
4K Followers 1K FollowingWe help data teams have confidence in their data, no matter what. GX Cloud, our end-to-end SaaS data quality platform, is powered by the open source GX Core.
2K Followers 847 FollowingSoftware Engineer (Storage) @supabase. Creator of https://t.co/p769oCRmDS. Co-organizer of https://t.co/JIJdxbc2gc. Peeling back the layers of abstraction.
8 Followers 73 FollowingData Enthusiast! Analytics | Cloud | Data Engineer
Here to learn from others and share my knowledge with the community.
Always open to different perspectives.
269 Followers 141 Followinghttps://t.co/vkOu4DkGLZ · ethical Big Data Analytics and AI · PhD in Computer Science · AI in Healthcare · Spark contributor · Fabric community speaker
4 Followers 91 FollowingGerk is a DataOps Engineer, interested in modern data stacks, tools, and architectures that improve the way we develop, deliver, and operationalize things.