Migration of decision reports from an aging Pentaho platform to Power BI. Data model enrichment, refresh industrialization, access governance, integration with the existing Microsoft ecosystem.
Loading...
Loading...
Modernization of a Pentaho platform to Power BI Premium: conversion of Pentaho Data Integration (Kettle) transformations to Azure Data Factory or Power Query, redesign of Pentaho Report Designer and Community Dashboard Editor reports as Power BI, redesign of Mondrian cubes as tabular models.
Pentaho was one of the reference open source BI suites between 2005 and 2018, adopted by mid-sized and large companies seeking an alternative to IBM Cognos, SAP BO, or Oracle BI licenses. The suite covers a powerful ETL (Pentaho Data Integration, alias Kettle), a BI server (Pentaho BI Server, alias Pentaho User Console), a report studio (Report Designer), a community dashboard editor (CDE), and an OLAP engine (Mondrian). Acquired by Hitachi in 2015 then integrated into Hitachi Vantara, Pentaho has experienced several roadmap slowdowns. The Community Edition distribution has been progressively neglected in favor of the paid Enterprise Edition, and many companies today face a choice: pay Hitachi Vantara licenses, maintain an aging Community version, or migrate.
For organizations already engaged in Microsoft 365 and Azure, Power BI Premium offers a natural target for Pentaho. Kettle transformations find their equivalent in Azure Data Factory (cloud-native orchestration) or in Power Query M (lightweight transformations on the dataset side). Mondrian schemas translate to tabular models in Power BI, more performant thanks to the VertiPaq engine. Report Designer reports and CDE dashboards are rebuilt as Power BI visuals, with a considerable user experience leap (interactivity, mobile, Office integration). Our AI, Data, and Automation expertise covers these migrations end-to-end, framed by the ATLAS methodology.
Four signals converge toward the migration decision. First, Hitachi Vantara license renewal is approaching and its cost becomes hard to justify against an already-paid Microsoft stack. Then, the internal Pentaho team shrinks — Kettle expert departures, near-impossible recruitment. Then, business users demand modern features absent or behind on Pentaho (smooth mobile, self-service, generative AI). Finally, the global data strategy shifts to Microsoft Fabric, Azure Synapse, or Databricks, and the visualization and ETL layer must follow. When three of these four signals are present, migration imposes itself as IT priority.
Pentaho BI Server, Pentaho Data Integration (Kettle), Report Designer, CDE, Mondrian
Power BI Premium, modèles tabulaires, Azure Data Factory, Power Query M, Microsoft Fabric (optionnel)
Default choice for Microsoft organizations. ETL orchestrated in Azure Data Factory for complex pipelines, Power Query for lightweight transformations, Power BI Premium for visualization. Main recommendation for the majority of Pentaho migrations.
More ambitious project including a modern lakehouse, OneLake, Spark notebooks, and Direct Lake. Relevant when the Pentaho migration is part of a complete data redesign, with real-time ingestion and data science.
Organization keeps a dedicated enterprise ETL, distinct from the BI platform. Relevant when Kettle pipelines are very large or integrate non-Microsoft sources where a specialized ETL adds value.
Specific cases where sovereignty or modern open source stack are priorities. Significant migration effort because semantics differ more from Pentaho than the Microsoft target.
A Pentaho to Power BI migration program is generally structured over **six to eighteen months** depending on ETL volume and number of reports. For an estate of **fifty to one hundred fifty Kettle transformations** and **two hundred to four hundred reports**, plan **eight to twelve months** with a cell of five to seven people: a Power BI / Azure data architect, a senior Azure Data Factory data engineer, two Power BI / DAX developers, an ETL developer for Kettle translation, a business referent assigned at 30%, and a project manager. For very large estates including Mondrian, plan a twelve to twenty-four month program in successive waves.
Underestimating the effort of Kettle transformation translation. A complex Pentaho transformation uses very specific steps (ScriptValueMod, Modified JavaScript, JSON Input, JMS Producer) that don't all have a direct equivalent in Azure Data Factory.
Systematic mapping of Kettle steps used across the entire estate, with inventory of their frequency. Each rare or complex step is the subject of a dedicated feasibility study (native ADF equivalent, custom Azure Function, or logic refactoring). Choices are recorded in the discrepancy registry, for validation and documentation. See the ATLAS methodology.
Reproducing Report Designer and CDE reports identically. These tools have a display logic very different from Power BI, and copying pixel-perfect rendering produces heavy, non-interactive Power BI visuals that don't leverage the platform.
Targeted redesign: for each critical report, a 30-minute workshop with the business allows redefining the objective (what is the user actually looking to see) before rebuilding in Power BI. Purely operational reports (monthly PDF export) are reproduced as Paginated Reports. CDE dashboards become interactive Power BI reports with slicers and drill-through.
Migrating Mondrian schemas in MDX to DAX without rethinking the model. MDX and DAX answer different paradigms (multidimensional vs tabular), and a literal translation produces inefficient DAX.
Tabular re-design: each Mondrian schema is analyzed to identify hierarchies, calculations, and granularity, then rebuilt as a Power BI tabular model. DAX measures rely on published patterns (DAX Patterns, time intelligence). Historical Mondrian optimizations (caches, materialized aggregations) are replaced by native Power BI optimizations (automatic aggregations, Direct Lake mode).
Ignoring Pentaho schedules and triggers. The estate runs thanks to scheduled jobs (Quartz, cron-like) that orchestrate Kettle. Forgetting them means losing critical data production chains on cutover day.
Schedule inventory from the capture phase: number of jobs, frequencies, dependencies, criticality. For each schedule, an Azure equivalent is designed (ADF trigger, Logic App, scheduled Azure Function). Critical chains are migrated in dual run for two to four weeks to validate parity before Pentaho decommissioning.
Letting Pentaho and Power BI coexist without an exit plan. Dual maintenance (two ETLs, two report engines, two security logics) quickly consumes more than the migration itself.
Dated cutover plan per functional perimeter. Each perimeter follows a short cycle: migration over two to three months, dual run of two to four weeks, Pentaho decommissioning within three months. Global coexistence must not exceed twelve to eighteen months, milestones and exit criteria validated in program governance.
Migration of decision reports from an aging Pentaho platform to Power BI. Data model enrichment, refresh industrialization, access governance, integration with the existing Microsoft ecosystem.
The effort depends on the number of Kettle transformations, report complexity, and the presence or absence of Mondrian schemas. As a benchmark, an estate of fifty to one hundred fifty transformations and two to four hundred reports is typically migrated in eight to twelve months with a cell of five to seven people in nearshore co-delivery. For very large estates or those including Mondrian, a multi-year program structured in waves is necessary.
Three options depending on the pattern. First, simple transformations (extraction, joins, aggregations) become Azure Data Factory pipelines or Power Query M steps. Second, complex transformations using Modified JavaScript or custom Kettle steps are rewritten as Azure Functions called from ADF. Third, purely orchestration transformations (Pentaho jobs that chain other jobs) become ADF pipelines with explicit dependencies. Each Kettle step pattern used is the subject of systematic mapping at the start of the program.
Mondrian is a multidimensional OLAP engine based on XML schemas and MDX queries. Power BI uses a tabular engine with DAX. The two paradigms differ, and a re-design is necessary. In practice, each Mondrian schema is analyzed to identify its hierarchies, measures, and calculations, then rebuilt as a Power BI Premium tabular model. MDX calculations are rewritten in DAX based on published patterns. For cases where the multidimensional pattern must absolutely be preserved, Microsoft Analysis Services Multidimensional remains an option but it is increasingly less used.
Pentaho relies on Quartz Scheduler or external cron jobs to trigger Kettle transformations. Migration to Azure follows three steps. First, the complete schedule inventory is frozen at the start of the program (frequency, criticality, dependencies). Then, each schedule is translated to Azure Data Factory trigger, Logic App, or scheduled Azure Function depending on complexity. Finally, critical chains are switched in dual run to validate that outputs are identical for two to four weeks before Pentaho decommissioning.
Pentaho manages authorizations via folder and report-level ACLs, plus Mondrian constraints on the cube side. Power BI uses workspace and dataset-level permissions, plus row-level security (RLS) via roles and DAX filters. The migration follows three steps: extraction of the Pentaho security model (users, groups, ACLs, Mondrian constraints), design of the equivalent Power BI model (workspace permissions, dynamic RLS via USERPRINCIPALNAME, Object Level security if necessary), validation by tests with representative user accounts before each production deployment.
Five typical gains. First, Hitachi Vantara license costs and Pentaho infrastructure are freed. Second, user experience progresses strongly (mobile, Excel integration, Teams, native accessibility). Third, the delay between a business request and report delivery decreases thanks to Power BI self-service and the central tabular model. Fourth, data governance consolidates around shared datasets and Microsoft Purview. Finally, the platform benefits at no additional cost from Microsoft AI evolutions (Copilot in Power BI, narratives, natural language Q&A).
We frame the trajectory, the budget, and the deliverables in a first thirty-minute conversation. A short POC can be proposed before committing to the full program.
Start this path →