Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Pipelines in 2026

2026-01-30

How AI Forces a Shift to DataOps

As AI systems move from experimentation into sustained, operational use, data pipelines are coming under new pressure. What worked for exploratory analytics or one-off projects increasingly struggles when models depend on continuous, reliable access to large and evolving datasets.

 In recent predictions conversations, our analysts consistently highlighted that this shift exposes the limits of manual, project-based data engineering. Pipelines built to deliver data once or occasionally are now expected to run every day, with predictable behavior and auditable outcomes.

 “AI needs persistent, governed pipelines, and that immediately breaks the project-based model. You can’t keep rebuilding data prep for every use case when the workloads run every day and depend on consistent lineage,” explains Darrel Kent.

 This change marks a transition away from treating data engineering as a delivery function and toward operating data pipelines as long-lived systems.

When Governance Becomes an Operational Constraint

 Running data pipelines continuously places new demands on reliability, visibility, and control. As AI workloads scale, pipelines often reveal accumulated technical debt, particularly where manual processes and ad hoc scripts have grown over time. Says William McKnight, “AI exposes how much technical debt organizations have been carrying in their data pipelines. When everything is stitched together manually, it works for a demo, but it falls apart in production.”

 Reliability at this scale requires more than robustness in code. When AI models depend on live data feeds, small upstream changes can have disproportionate downstream effects. Testing, monitoring, and validation, therefore, need to be continuous rather than episodic.

 Alongside these operational challenges, governance concerns become more immediate. Lineage and provenance are no longer compliance artefacts produced after the fact, but functional requirements that directly affect whether models can be trusted. “Full lineage across multi-cloud environments isn’t a compliance checkbox anymore; it’s an operational requirement for AI. If you can’t trace the data, you can’t trust the model,” says Darrel Kent.

The Increasing Role of Sovereignty

 Sovereignty reinforces this shift. As AI moves into production, restrictions around where data can reside and how it can move begin to shape pipeline design in practical ways.

 As noted by Darrel Kent, “When AI moves from experiments to production, data sovereignty stops being a policy conversation and becomes an engineering constraint.”

Andrew Brust added that these constraints increasingly reflect forces beyond technology alone: “Data sovereignty isn’t abstract anymore. Between geopolitics, energy issues, and territorial moves, organizations have to think seriously about where data sits and how AI systems depend on it.”

 Together, these pressures make data operations both more complex and more central to AI success.

What To Do About It: Toward DataOps

 In response, the analysts described a clear shift toward more automated, policy-aware data pipelines — an operating model often referred to as DataOps. The emphasis is less on adopting specific tools and more on changing how data systems are designed, owned, and run over time. “Data engineering task automation proceeds apace and drives demands for new skills and platforms,” notes Darrel Kent.

 In practice, this means treating data pipelines as operational infrastructure rather than project deliverables. Teams take responsibility for the ongoing health of pipelines, including observability, failure handling, and remediation. Automation becomes a prerequisite for scale, not an optimization.

 It also reshapes multi-cloud and hybrid strategies. Placement decisions increasingly reflect control, jurisdiction, and data locality requirements, rather than simple flexibility or redundancy. We would recommend organizations to:

  1. Treat data pipelines as operational systems
  2. Build governance, lineage, and validation into the pipeline
  3. Automate for scale, resilience, and control

 The result is a more disciplined approach to data operations. Pipelines are designed to be persistent, observable, and governed by default. DataOps, in this sense, is less about a new methodology and more about recognizing that once AI systems run continuously, data itself becomes part of the runtime.