DVRF™: Moving Beyond the Medallion Architecture in the Age of GenAI
Written by Jon Farr, Founder and CEO of TDAA Technologies, Creator of DataPancake, A Native App in the Snowflake Marketplace
GenAI Disclosure: I wrote this article myself, with AI tools used only for outlining, light editing, and fact-checking support.
What if organizations could accurately measure the value of their raw data, and understand how the value can increase at every stage of refinement? Would that lead to more effective prioritization and stronger outcomes for generative AI (GenAI)?
Data design patterns that focus on staging and layering, like the Medallion architecture, have served a valid purpose for years and have been widely adopted by every major data cloud vendor. But in the age of GenAI, we need to move beyond the limits of these patterns.
We need a framework that accurately identifies the value of raw data independently from its stage of refinement, creating a comprehensive measure of value.
Business and technology teams need a shared system to prioritize which data to refine, and to what level, with visibility into its current and potential value, and the impact it can have on the business. The staging-only design of the Medallion architecture cannot provide this level of insight. Yet this insight is essential for organizations to fully capitalize on the opportunities GenAI creates.
DVRF™, the Data Value Refinement Framework™, is a patent-pending framework inspired by the DataPancake methodology. DVRF scores the value of the raw data separately from the refinement process it must undergo.
DVRF introduces the idea of a “gap metric”, the measurable distance between what data is worth today and what it could be worth tomorrow.
DVRF helps business and technology teams prioritize data preparation to maximize its usability and impact for the business.
Without DVRF, organizations are left stitching together a patchwork of tools, catalogs, and spreadsheets that are difficult to maintain. Once the quality of the data behind the valuation slips, confidence in the results erodes just as quickly. This fragile approach drains resources, creates inconsistency, and collapses at scale.
The Current Challenge and Language Limitations of the Medallion Architecture
Data teams today have been trained to manage refinement as a series of layers: raw and untouched (bronze), transformed and cleansed (silver), and denormalized, aggregated and enriched for business use (gold). While this layered approach provides organization and clarity for the technical teams, it does not translate into a shared language for the business.
For example, no business executive would ever consider describing raw FHIR (medical patient records) as bronze; they would more likely call it rhodium, a metal valued at nearly twice the price of gold. And the idea of that same data being referred to as gold at its most refined state, because the technology team uses the terminology of the Medallion architecture, won’t resonate either. The disconnect is immediate.
Meanwhile, the stages of governing, documenting and contextualizing data become siloed processes handled by different tools and teams. The refinement process quickly becomes fragmented and disjointed.
And as semantic intelligence becomes more important to power RAG (retrieval augmented generation) use cases, the need for a unified and orchestrated value and refinement framework is paramount.
As the presence, or absence, of data quality and availability becomes increasingly visible, business and technology teams must effectively communicate and prioritize together. The organizations that do this best will be the ones to lead and innovate the fastest.
The Paradigm Shift: DVRF
DVRF was inspired by the process we call “Data Pancaking”. Data Pancaking is a methodology that brings the entire data refinement process together for semi-structured data like JSON and XML. DataPancake unifies accurate flattening and normalization, cleansing, enrichment, security, documentation, and semantic modeling into one cohesive process rather than multiple siloed processes that often create technical debt and slow innovation.
DVRF is a data maturity model that starts with the premise that raw data should be valued and scored first by the business.
Once this initial score (DVRFs™) is assigned, an additional score can be assigned based on the refinement process the data has undergone. The combination of these scores give business and technology teams a clear view of the current state value of their data.
The teams can then analyze the additional refinement stages available to calculate and assign a maximum potential score. The gap between the current and potential scores highlights the highest priorities for data preparation.
Collections of datasets can be grouped by business impact to create an index (DVRFi™) that measures an organization’s progress and value contribution over time.
DVRF outputs include prioritization matrices, scorecards, dashboards, and strategy recommendations, providing organizations both a quantitative measure and an actionable roadmap for maximizing the business value of their data.
Staying with the FHIR data example, the raw data starts with a baseline score. Once it is “Pancaked” into a series of accurately defined relational tables, its score increases. A further refinement might be the mapping of lab result names to standardized LOINC codes for interoperability. After this code standardization process, the dataset’s comprehensive score would rise again, reflecting its greater utility, readiness, and value.
And as additional refinements are identified, DVRF calculates a potential score. Comparing this potential score with the current score creates a measurable gap that can be used to prioritize future data preparation work.
Enhancing Collaboration and Accelerating Innovation
Innovation thrives on collaboration. Positive communication, coupled with positive tension, built on trusted data enables the kind of teamwork that accelerates innovation rather than reinforcing the status quo.
DVRF eliminates the noise and confusion around which datasets should be prioritized and how the refined data will benefit the business. It brings business, product, engineering, information security, data governance, and AI teams together with a unified framework that promotes alignment and builds shared value.
To become GenAI-ready, organizations need every team’s skills working in harmony, like a symphony. When the entire organization can see the finish line, and knows that reaching it requires data to be accurate, enriched, secured, documented and contextualized, teams will encourage and help each other to cross that finish line, because they all win together.
GenAI Readiness — The New Finish Line: Accurate, Dependable, and Less Prone to Hallucinations
As our world is coming to terms with the impact and possibilities created by GenAI, it is also recognizing a hard truth: GenAI’s output will only be as good as its input. Yet stated this way, the idea is so generalized that it’s almost meaningless.
DVRF gives organizations a methodical, measurable, and meaningful path to achieve the highest possible value for maximum confidence as they embrace the opportunities created by GenAI.
Data Strategy is now inseparable from AI Strategy. Organizations cannot have one without the other. GenAI has made data readiness the new finish line. DVRF provides the framework to accurately measure value, prioritize refinement, and close the gap between data as it exists today and its true potential.
In future articles, I’ll share examples of how DVRF can be applied, the secret sauce of how scores are calculated, and how we are incorporating DVRF into technology that will enable organizations to apply it directly to their own data.
