Datavor v2.5 — Real-Time CDC, Recipes & the Self-Managing Pipeline

When we shipped v2.0 with the Context Engine and five-database connectivity, the question we kept hearing was: "Can it stream changes in real time?" The answer is now yes — and we built a lot more around it.

Datavor v2.5 adds Change Data Capture, per-record fault tolerance, a Recipe Manager, Proactive Suggestions, and a dependency-aware DAG scheduler. The result is a pipeline that doesn't just run — it learns, adapts, and keeps itself healthy. Eleven new MCP tools bring the total to 45.

"Fivetran moves your data. Datavor understands it." — and now, it watches it change in real time.

45 MCP Tools

~50ms CDC Lag

5 DB Engines

$0 Always Free

⚡ Change Data Capture — ~50ms from write to analytics

Traditional sync polls your database on a schedule. That means your analytics dashboard is always minutes or hours behind. CDC eliminates the gap by listening directly to the database's internal change log — the PostgreSQL WAL or MySQL binlog — and streaming every INSERT, UPDATE, and DELETE to your target the moment it happens.

CDC Pipeline Flow

Three new tools power this: start_cdc, stop_cdc, and cdc_status. Starting a stream is one sentence in Claude:

Claude Desktop

        > Start CDC from my production PostgreSQL to analytics_db, tables: orders, customers, products

        ✅ CDC stream started on 3 tables

           Replication slot: datavor_cdc_slot

           Mode: PostgreSQL WAL (wal_level = logical)

           Lag: ~50ms · Transforms: applying 2 saved rules

🛡️ Per-Record Fault Tolerance

Before v2.5, a single malformed row could halt an entire sync job. Now Datavor uses a bulk-first, row-fallback strategy: it attempts the full batch, and if any rows fail, it retries them individually — isolating and reporting each failure without stopping the rest.

Bulk-First, Row-Fallback Strategy

Every failed row is reported individually with its error type. The sync doesn't stop — it skips what it can't handle, records why, and moves on. The Context Engine learns from recurring failures and can proactively suggest a fix rule next time.

📋 Recipe Manager — name it once, reuse forever

If you've run the same sync with the same transforms three times, Datavor v2.5 has probably already captured it. The Recipe Manager stores named transform configurations — column renames, type casts, value remaps, row filters — and lets you apply them by name across any sync job.

💾

save_recipe

Save any transform config as a named recipe. Automatically captured from frequent patterns or saved explicitly.

📋

list_recipes

Browse all saved recipes, filter by connection or table tag. See when each was last applied.

▶️

apply_recipe

Apply a saved recipe to any sync job in one step. Eliminates repeat configuration for recurring pipelines.

🔁

Auto-capture

Recipes are detected and suggested automatically when Datavor spots the same transform pattern recurring across sessions.

💡 Proactive Suggestions — the pipeline that watches itself

The Context Engine now surfaces actionable suggestions based on everything it's learned: schema changes in your source, recurring sync errors, and performance patterns. You can accept or dismiss each one — and the engine learns from both responses.

schema_change

New column detected: `orders.discount_code`

Your source added a new VARCHAR(64) column not present in the analytics replica. Apply a cast rule and add it to the sync?

error_fix

Recurring NULL constraint failure on `customers.phone`

Seen 47 times this week. Add a COALESCE('') rule to prevent future failures?

optimisation

products sync is 3× slower than last week

Row count grew from 80K to 240K. Consider switching from full sync to incremental on updated_at.

🔗 DAG Scheduling — dependency-aware pipelines

Schedulers that run jobs in isolation miss a fundamental truth: most pipelines have dependencies. You can't sync order_items until orders is done. Datavor v2.5 introduces scheduler_add_dependency and scheduler_show_graph, letting you define and visualise a directed acyclic graph of sync jobs that run in the right order, every time.

Example DAG — nightly pipeline execution order

The graph is validated for cycles at definition time — if you introduce a circular dependency, Datavor rejects it before it ever runs. Failed jobs back off exponentially: 1m → 2m → 4m → 8m, with full visibility in cdc_status and the sync dashboard.

Upgrading to v2.5

One command. No config changes. Your existing connections, scheduled jobs, and Context Engine knowledge are preserved.

bash

        npm install -g datavor

        # Verify

        datavor --version

        > 2.5.0 · 45 tools · 5 engines

For CDC specifically, PostgreSQL requires wal_level = logical set in your database config. MySQL requires binlog_format = ROW. Datavor's start_cdc tool checks both prerequisites and tells you exactly what to change if they're missing.

What's coming in v3.0

The next milestone brings a local visual web dashboard at localhost:3000 — a full-screen view of your pipelines, DAG graphs, sync history, and suggestion feed, without needing Claude Desktop open. We're also planning dbt integration and expanded connectors including MongoDB and BigQuery.

For now: 45 tools, five engines, real-time CDC, fault-tolerant sync, self-healing recipes, and a scheduler that understands your dependencies. All free. All local. All yours.

Ready to stream in real time?

Install Datavor v2.5 in one command and connect your first CDC stream in under 5 minutes.

⬇ npm install -g datavor

        Free · No account · No credit card · 45 MCP tools
      

Datavor v2.5 —Real-Time CDC, Recipes & the Self-Managing Pipeline