Adopting an open data lakehouse is like upgrading your data environment from a crowded regional terminal to a modern international airport. Modernizing to that lakehouse is the moment you realize the runways are too short, the control tower is running on outdated radar and half the jetways don’t line up with the planes.
In other words, getting to an open data lakehouse is hard. Much harder than architectural diagrams suggest, and certainly harder than most vendors admit. CDAOs, CIOs and IT architects rarely fail because they misunderstand open formats. They fail because modernization exposes everything the legacy estate was quietly holding together with workarounds, tribal knowledge and heroic acts of “we’ll fix it later.”
Across industries and especially in highly regulated ones, organizations tend to run into the same set of challenges:
The myth of the simple lift-and-shift
Lakehouse modernization is often framed as a clean migration. It’s more like a live renovation and the building never closes.
Teams quickly discover pipeline logic no one has documented in years, transformations hiding in stored procedures or macros, and workloads that simply don’t migrate without meaningful refactoring. Open formats preserve flexibility, but they don’t magically repair poor data quality or undocumented assumptions.
Analyst research reinforces this risk. Gartner has warned that a majority of AI initiatives that lack AI-ready data will be abandoned, not because models fail, but because the underlying data estate was never modernized with enough rigor or consistency.
This reality often surfaces early during the modernization project. Consider the banking example. Fraud and credit risk teams begin moving feature pipelines to Iceberg tables, only to discover that several critical features are still calculated in a forgotten script that only one person ever understood. That person retired years ago.
The problem isn’t the lakehouse. It’s the structure required to achieve it.
Governance gaps become deal-breakers
Many organizations treat governance as something to “add later.” In an open lakehouse, governance is the prerequisite for everything else.
Without consistent lineage, metadata standards and auditability, openness can increase fragmentation rather than reduce it. Modernization simply reveals where controls were uneven or missing all along.
This becomes painfully clear during model validation. In a typical bank, training datasets may be well governed, while monitoring datasets are not. Freshness checks apply to one feed but not another. Lineage stops at a deprecated workflow. To regulators, this is not modernization: it’s inconsistency.
Operational complexity spikes
There is a persistent myth that open lakehouses instantly simplify the data estate. They can...eventually. But during the transition, complexity often spikes.
Organizations suddenly face multiple engines hitting the same tables with different assumptions, years of inconsistent partitioning and schema conventions colliding and “temporary” datasets that turn out to be business-critical. Metadata systems proliferate before consolidation begins. As unstructured data is introduced to support retrieval, enrichment, or AI pipelines, these issues intensify.
Modernization doesn’t create complexity. It turns on the lights and reveals fraying wiring that has been quietly deteriorating for years.
Tool sprawl and integration overhead shift inward
Standardizing on open formats doesn’t eliminate the need for tooling decisions. It changes who owns the complexity. During modernization, enterprises layer in new ingestion frameworks, catalogs, governance tools, streaming platforms and query engines – all while legacy systems remain in flight. Because the architecture is modular, the integration responsibility shifts from the vendor to the organization.
In banking, the tension becomes obvious. Payment fraud teams want low latency. Risk teams demand reproducibility. Compliance insists on immutable logs. Data engineering optimizes for throughput. All of this happens on the same open tables.
Cultural and organizational friction slows progress
No lakehouse modernization fails purely for technical reasons. It fails when teams resist shared ownership, when BI and data science distrust each other’s outputs, when governance is seen as an obstruction rather than an accelerator and when business users continue to ask for “one more extract, just in case.” It also fails when no one owns the data contract that spans teams and use cases.
Open lakehouses require behavioral change. Shared data products, shared SLAs, shared definitions, and shared accountability. Without these, even the best architecture eventually stalls, and leadership loses confidence in the numbers being presented.
But these challenges aren’t reasons to abandon the journey. They support approaching modernization as a staged transformation rather than just a tooling upgrade.