Legacy ↔ Python architect
Bridge between COBOL batch/analytical and the Python data ecosystem, COMP-3/PIC pattern mapping, pipeline design
Loading...
Migration of COBOL batch, compute, and analytical workloads to Python, pandas, NumPy, and scientific libraries. ATLAS methodology, proven functional parity, integration with the modern data ecosystem (Databricks, Jupyter).
Each phase aggregates one or more of the ten ATLAS steps. No phase starts until the previous one has delivered its validated artefact. The ATLAS methodology makes AS/400 migration predictable and auditable.
Inventory of candidate COBOL programs (analytical batch, actuarial computing, scoring, reporting), exclusion of transactional (targeted to Java/.NET). Python target choice (pandas+PostgreSQL, Databricks/PySpark, FastAPI). Functional spec rebuild of calculation rules.
Capture of representative production datasets (inputs, expected outputs), freezing the ground truth. Dependency mapping: VSAM/QSAM files, copybooks, JCL scheduling, DB2 integrations.
Python pipeline design: decimal.Decimal for financial calculations, pandas for transformations, relational (PostgreSQL) or columnar (Parquet/Delta Lake) modeling per volume, Airflow or Databricks Workflows orchestration, observability (logging, metrics). Characterization test suite preparation.
Pattern-by-pattern translation: PIC S9(n)V9(n) → decimal.Decimal, sequential PERFORMs → pandas/NumPy vectorized operations (5-20× gain), CALL → Python modules, VSAM files → tables or DataFrames. COBOL/Python parallel runs for 4-8 weeks on production datasets, automatic line-by-line comparison.
Progressive go-live with COBOL ↔ Python coexistence during transition. Databricks or Jupyter integration for exploration. Packaging (poetry/uv) and containerization (Docker/Kubernetes or serverless Cloud Run). Ops handover to client team with documentation and runbooks.
Automatic legacy-code conversion tools produce code that compiles but stays unreadable and unmaintainable: original patterns are copied as-is, without idiomatization, with dependencies on a proprietary runtime. Pushed to production without full re-characterization, this code is neither reliable nor scalable. End-to-end automatic translation isn't a modernization method — it's a debt transfer.
Our approach is the opposite. ATLAS relies on multiple readings of the legacy code, from several angles: data flows, business rules, dependencies, edge cases. AI comes in as a comprehension accelerator — to decipher decades of accumulated business logic, reverse-document uncommented branches, surface the intent behind the code. It doesn't decide and it doesn't translate: it informs the architect's work, who then designs the target architecture (cloud, database, services) and drives the migration pattern by pattern, under parity audit.
This understanding still requires humans who know legacy languages. That's our edge: where Europe and North America face a retirement wave among mainframe and legacy developers, Tunisia retains a pool of experienced developers (COBOL, Delphi, PowerBuilder, RPG…). Paired with modern architects and developers trained in the ATLAS method, they ensure continuity between the original business intent and the target system.
Python as a COBOL migration target is only relevant on specific perimeters. For compute workloads (actuarial, risk scoring, Monte Carlo simulations, statistical analyses), Python brings an unmatched scientific ecosystem (NumPy, SciPy, pandas, scikit-learn). For batch analytical pipelines feeding a data lake or lakehouse, Python integrates naturally with Databricks, Airflow, and Jupyter. For high-availability transactional or critical financial programs, Java or .NET Core remain preferable.
Good candidates for Python migration: life or P&C insurance analytical batches (provisions, reinsurance), bank credit scoring, administration tax calculations, regulatory reporting (Solvency II, BCBS 239), scientific data processing in industry or healthcare. Typical volume: 20 to 100 thousand lines of COBOL per perimeter. For larger volumes or transactional, see the other paths COBOL to Java, .NET Core, or TypeScript.
Classic analytical batches, data lake integration, Airflow orchestration. Default choice for analytical pipelines.
Massive volumes (TB), distributed workloads, ML integration. See Data engineering pipelines.
Exposing calculations as REST APIs. Lightweight, fast, easy to industrialize.
High-availability transactional, critical performance, dominant enterprise ecosystem. See COBOL to Java or COBOL to .NET Core.
A COBOL to Python migration is structured as a sequence of functional batches, with the cadence set at scoping based on volume and calculation complexity. Typical cell: a legacy-Python architect, a Python tech lead, Python developers (ideally with scientific or data engineering background), a QA engineer specialized in characterization tests, a business referent (actuary, data analyst, tax expert). Composition and headcount are not fixed upfront: they are determined after the POC and scoping, once the real work has been measured.
Using the Python float type for financial calculations. Floating-point rounding errors are guaranteed and cent-level discrepancies eventually exceed acceptable thresholds of business controls.
Systematic mapping of PIC S9(n)V9(n) and COMP-3 to the native decimal.Decimal Python class with explicit context precision and rounding mode. Unit parity tests on overflow, underflow, deterministic rounding division. Automatic comparison of outputs with COBOL reference datasets.
Reproducing sequential COBOL loops in Python without using pandas or NumPy. The result is slow Python that doesn't benefit from the language's advantages.
Systematic vectorization with pandas and NumPy for data transformations. COBOL PERFORMs become vector operations or pandas `apply`. For massive volumes: PySpark on Databricks. Typical performance gain: 5 to 20× vs naive translation.
Migrating one program at a time without revising the data model. The COBOL estate typically uses VSAM or QSAM files with positioned access — naively ported to relational SQL, performance is lost.
Relational modeling adapted to target queries with relevant composite indexes. For massive analytics: columnar storage (Parquet, Delta Lake) instead of classic PostgreSQL. See the Data engineering pipelines path for lakehouse patterns.
Declaring the migration complete after calculation conversion, without validating on complete production datasets. Domain edges (overflow, outliers, dates before 1900) reserve surprises.
Principle E7 — validation on real production datasets mandatory before delivery. Parallel runs COBOL/Python for four to eight weeks on complete production datasets, automatic line-by-line comparison of outputs, registry of classified discrepancies. See the ATLAS methodology.
Several distinct profiles, mobilized over the full program duration. Reproducing this cell internally is rarely realistic — the legacy skills shortage and ATLAS expertise depth make outsourcing structurally faster and less risky.
Bridge between COBOL batch/analytical and the Python data ecosystem, COMP-3/PIC pattern mapping, pipeline design
Python 3, pandas, NumPy, decimal.Decimal, COBOL→Python translation patterns, line-by-line traceability
Ideally with scientific or data engineering background (NumPy, SciPy, vectorization, advanced pandas)
Actuary, data analyst, tax expert — resolves accumulated calculation rule ambiguities, validates parity gaps
Migration of VSAM/QSAM files to PostgreSQL or columnar storage (Parquet, Delta Lake), data parity audit
Characterization test bench, line-by-line legacy/target comparison on production datasets, classified discrepancy registry
Proven capability on COBOL migration with 10 internal POCs covering Java and TypeScript targets (39 COBOL patterns covered, 44 tracked discrepancies). The documented patterns are applicable to a Python target: decimal.Decimal for COMP-3, pandas for vectorization, Airflow or Databricks for orchestration. Capability combinable with our data engineering expertise.
No, in most cases. Python is perfectly suited to analytical batches and scientific calculations, but not to high-availability transactional (real-time banking, payments, critical transactions). For these perimeters, Java or .NET Core remain preferable. Python sweet spot: actuarial, scoring, regulatory reporting, analytical ETL.
Three levers. decimal.Decimal systematic for all financial calculations (never float). Explicit precision context and rounding mode aligned with source COBOL (typically ROUND_HALF_EVEN or ROUND_HALF_UP per business conventions). Characterization tests on production datasets with line-by-line comparison, reconciliation of discrepancies classified as CRITICAL / ADAPTATION / COSMETIC per our ATLAS methodology.
For 30 to 80 thousand lines of COBOL compute/analytical in nearshore co-delivery, plan 400 to 800 k€ parity tests and documentation included. Cost typically lower than Java or .NET thanks to Python conciseness (ratio ~10:1) and more compact cell (4 to 6 people). See the delivery models.
Databricks: encapsulate migrated calculations as PySpark notebooks, orchestrate via Databricks Workflows or Airflow. Git versioning, CI/CD via Databricks Repos. Jupyter: for ad hoc exploration and interactive documentation of business calculations. Production: Python packaging (poetry or uv), execution in Docker containers orchestrated by Kubernetes or serverless (Cloud Run, Lambda). See the Data engineering pipelines path.
Three concrete ways to start — from a POC on your code to a full program. Python is perfectly suited to analytical batches and scientific calculations; we explicitly exclude critical transactional workloads (targeted to Java/.NET).