When we shipped v2.0 with the Context Engine and five-database connectivity, the question we kept hearing was: "Can it stream changes in real time?" The answer is now yes — and we built a lot more around it.
Datavor v2.5 adds Change Data Capture, per-record fault tolerance, a Recipe Manager, Proactive Suggestions, and a dependency-aware DAG scheduler. The result is a pipeline that doesn't just run — it learns, adapts, and keeps itself healthy. Eleven new MCP tools bring the total to 45.
⚡ Change Data Capture — ~50ms from write to analytics
Traditional sync polls your database on a schedule. That means your analytics dashboard is always minutes or hours behind. CDC eliminates the gap by listening directly to the database's internal change log — the PostgreSQL WAL or MySQL binlog — and streaming every INSERT, UPDATE, and DELETE to your target the moment it happens.
Three new tools power this: start_cdc, stop_cdc, and cdc_status. Starting a stream is one sentence in Claude:
✅ CDC stream started on 3 tables
Replication slot: datavor_cdc_slot
Mode: PostgreSQL WAL (wal_level = logical)
Lag: ~50ms · Transforms: applying 2 saved rules
🛡️ Per-Record Fault Tolerance
Before v2.5, a single malformed row could halt an entire sync job. Now Datavor uses a bulk-first, row-fallback strategy: it attempts the full batch, and if any rows fail, it retries them individually — isolating and reporting each failure without stopping the rest.
Every failed row is reported individually with its error type. The sync doesn't stop — it skips what it can't handle, records why, and moves on. The Context Engine learns from recurring failures and can proactively suggest a fix rule next time.
📋 Recipe Manager — name it once, reuse forever
If you've run the same sync with the same transforms three times, Datavor v2.5 has probably already captured it. The Recipe Manager stores named transform configurations — column renames, type casts, value remaps, row filters — and lets you apply them by name across any sync job.
save_recipe
Save any transform config as a named recipe. Automatically captured from frequent patterns or saved explicitly.
list_recipes
Browse all saved recipes, filter by connection or table tag. See when each was last applied.
apply_recipe
Apply a saved recipe to any sync job in one step. Eliminates repeat configuration for recurring pipelines.
Auto-capture
Recipes are detected and suggested automatically when Datavor spots the same transform pattern recurring across sessions.
💡 Proactive Suggestions — the pipeline that watches itself
The Context Engine now surfaces actionable suggestions based on everything it's learned: schema changes in your source, recurring sync errors, and performance patterns. You can accept or dismiss each one — and the engine learns from both responses.
New column detected: orders.discount_code
Your source added a new VARCHAR(64) column not present in the analytics replica. Apply a cast rule and add it to the sync?
Recurring NULL constraint failure on customers.phone
Seen 47 times this week. Add a COALESCE('') rule to prevent future failures?
products sync is 3× slower than last week
Row count grew from 80K to 240K. Consider switching from full sync to incremental on updated_at.
🔗 DAG Scheduling — dependency-aware pipelines
Schedulers that run jobs in isolation miss a fundamental truth: most pipelines have dependencies. You can't sync order_items until orders is done. Datavor v2.5 introduces scheduler_add_dependency and scheduler_show_graph, letting you define and visualise a directed acyclic graph of sync jobs that run in the right order, every time.
The graph is validated for cycles at definition time — if you introduce a circular dependency, Datavor rejects it before it ever runs. Failed jobs back off exponentially: 1m → 2m → 4m → 8m, with full visibility in cdc_status and the sync dashboard.
Upgrading to v2.5
One command. No config changes. Your existing connections, scheduled jobs, and Context Engine knowledge are preserved.
# Verify
datavor --version
> 2.5.0 · 45 tools · 5 engines
For CDC specifically, PostgreSQL requires wal_level = logical set in your database config. MySQL requires binlog_format = ROW. Datavor's start_cdc tool checks both prerequisites and tells you exactly what to change if they're missing.
What's coming in v3.0
The next milestone brings a local visual web dashboard at localhost:3000 — a full-screen view of your pipelines, DAG graphs, sync history, and suggestion feed, without needing Claude Desktop open. We're also planning dbt integration and expanded connectors including MongoDB and BigQuery.
For now: 45 tools, five engines, real-time CDC, fault-tolerant sync, self-healing recipes, and a scheduler that understands your dependencies. All free. All local. All yours.
Ready to stream in real time?
Install Datavor v2.5 in one command and connect your first CDC stream in under 5 minutes.
⬇ npm install -g datavor