🏗️ JD Edwards Data Warehouse Modernization

Enterprise-grade data pipeline migrating legacy JD Edwards EnterpriseOne ERP data into a modern Azure Cloud Data Warehouse using the Medallion Architecture (Bronze → Silver → Gold).

📐 Architecture

flowchart LR
    subgraph Source ["JD Edwards ERP"]
        F0101["F0101 — Address Book"]
        F4211["F4211 — Sales Orders"]
    end

    subgraph Bronze ["Bronze — ADLS Gen2"]
        B["CSV → Parquet\nSnappy Compression\nTime Partitioned"]
    end

    subgraph Silver ["Silver — ADF Data Flows"]
        S1["Julian Date → ISO Date\nImplicit Decimals → Float"]
        S2["SCD Type 2\nSHA256 Change Detection"]
    end

    subgraph Gold ["Gold — Azure SQL"]
        G1["Dim_Date"]
        G2["Dim_Customer"]
        G3["Fact_Sales"]
    end

    subgraph Analytics ["Power BI / Tableau"]
        BI["Dashboards\n& Reports"]
    end

    F0101 --> B
    F4211 --> B
    B --> S1
    S1 --> S2
    S2 --> G1 & G2 & G3
    G3 --> BI

    style Source fill:#2d2d2d,color:#fff,stroke:#555
    style Bronze fill:#cd7f32,color:#fff,stroke:#a0642a
    style Silver fill:#b0b0b0,color:#000,stroke:#888
    style Gold fill:#daa520,color:#000,stroke:#b8860b
    style Analytics fill:#0078d4,color:#fff,stroke:#005a9e

Orchestration: Azure Data Factory • Security: Key Vault + Managed Identity • Monitoring: Logic Apps Alerts • Budget: < $200 Azure Credits

🎯 Problem Statement

JD Edwards EnterpriseOne stores data using legacy conventions that break modern analytics tools:

Challenge	JDE Format	Example Raw Value	Correct Value
Julian Dates	`CYYDDD`	`123001`	`2023-01-01`
Implicit Decimals	Integer (÷100)	`1050`	`$10.50`
Cryptic Columns	Abbreviated codes	`SDAN8`, `SDAEXP`	Customer ID, Extended Price

If these are not decoded, revenue reports are overstated by 100× and dates appear as nonsensical values like "Year 123."

✨ Key Features

Medallion Architecture — Bronze (raw), Silver (cleansed), Gold (star schema) for full auditability
SCD Type 2 Tracking — Historical changes to customers preserved using SHA256 hash comparison
Metadata-Driven Ingestion — Configuration-based pipeline; zero hardcoded paths
Point-in-Time Joins — Sales attributed to the historical customer state at time of order
Logic Apps Alerting — Real-time HTML email notifications on pipeline failures
Cost Optimized — Full platform runs under $200/month using Basic DTU + ephemeral Spark
CI/CD Ready — GitHub Actions validates Python, SQL, and ADF JSON on every push

📁 Repository Structure

data-warehouse-migration/
├── .github/
│   └── workflows/
│       └── ci.yml                  # GitHub Actions: lint Python, validate SQL & JSON
├── adf/
│   ├── dataflow/
│   │   ├── DF_Clean_JDE.json       # Julian date + decimal conversion logic
│   │   └── DF_SCD2_Customer.json   # Slowly Changing Dimensions Type 2
│   ├── dataset/
│   │   ├── ds_bronze_csv.json      # Parameterized CSV source
│   │   ├── ds_bronze_parquet.json  # Time-partitioned Parquet sink
│   │   ├── ds_silver_parquet.json  # Cleansed Silver output
│   │   └── ds_gold_sql.json        # Azure SQL star schema tables
│   ├── linkedService/
│   │   ├── ls_adls_gen2.json       # Data Lake Storage (Managed Identity)
│   │   ├── ls_azure_sql.json       # Gold DB (Key Vault secret)
│   │   └── ls_key_vault.json       # Centralized secret management
│   └── pipeline/
│       ├── PL_Ingest_Bronze.json   # Lookup → ForEach → Copy (CSV → Parquet)
│       ├── PL_Transform_Silver.json # Execute JDE decoder + SCD2 data flows
│       ├── PL_Load_Gold.json       # Point-in-time fact table loading
│       └── PL_Master.json          # Orchestrator with failure alerting
├── config/
│   └── source_config.json          # Metadata-driven ingestion configuration
├── data/                           # Generated CSVs (gitignored)
├── sql_scripts/
│   ├── create_schemas.sql          # Bronze / Silver / Gold schema DDL
│   ├── dim_customer.sql            # SCD2 customer dimension
│   ├── dim_date.sql                # 20-year calendar dimension
│   └── fact_sales.sql              # Star schema fact table
└── src/
    └── python/
        ├── generate_jde_data.py    # Synthetic JDE data generator
        └── requirements.txt        # pandas, faker

🚀 Getting Started

Prerequisites

Azure Subscription with Data Factory, Storage Account (HNS enabled), SQL Database, and Key Vault
Python 3.11+ for data generation
Azure CLI or Azure Storage Explorer for file uploads

Step 1: Generate Synthetic Data

cd data-warehouse-migration
pip install -r src/python/requirements.txt
python src/python/generate_jde_data.py

This creates data/F0101.csv (50 customers) and data/F4211.csv (200 sales orders) with JDE-formatted fields.

Step 2: Upload to ADLS Gen2

Using Azure Storage Explorer or CLI, upload the CSVs to your Data Lake container:

datalake/01-bronze/landing/F0101.csv
datalake/01-bronze/landing/F4211.csv

Also upload config/source_config.json to datalake/config/source_config.json.

Step 3: Create Gold Layer Tables

Execute the SQL scripts against your Azure SQL Database in order:

sqlcmd -S your-server.database.windows.net -d sql-dw-gold -U admin -P 'password' \
  -i sql_scripts/create_schemas.sql \
  -i sql_scripts/dim_date.sql \
  -i sql_scripts/dim_customer.sql \
  -i sql_scripts/fact_sales.sql

Step 4: Connect ADF to GitHub

Open ADF Studio → Manage → Git Configuration
Select GitHub, point to this repository
Set Collaboration Branch = main, Root Folder = /adf/
ADF will auto-discover all pipelines, datasets, and data flows

Step 5: Run the Pipeline

Trigger PL_Master in ADF Studio. It executes sequentially:

Bronze — Ingests CSVs → Parquet (partitioned)
Silver — Decodes Julian dates + decimal normalization + SCD2
Gold — Loads star schema with point-in-time dimension joins

💰 Cost Optimization Strategy

Resource	Configuration	Monthly Cost
Azure SQL Database	Basic Tier (5 DTU / 2GB)	~$4.99
ADF Data Flows	TTL = 0 (no idle billing)	Pay per use
ADLS Gen2	Cool tier lifecycle policy	~$0.01/GB
Key Vault	Standard tier	~$0.03/10K ops
Total		< $15/month

FinOps Note: Debug sessions are limited to 60-minute timeouts. Spark clusters spin up on-demand and shut down immediately after execution (TTL=0).

🔄 SCD Type 2 — How It Works

sequenceDiagram
    participant New as New F0101 Data
    participant Hash as SHA256 Hasher
    participant Lookup as Business Key Lookup
    participant Split as Conditional Split
    participant SQL as Gold.Dim_Customer

    New->>Hash: Hash business columns
    Hash->>Lookup: Match on CustomerID
    Lookup->>Split: Compare hashes
    
    Split->>SQL: NEW → Insert (IsActive=1)
    Split->>SQL: CHANGED → Update old (IsActive=0)
    Split->>SQL: CHANGED → Insert new (IsActive=1)
    Split-->>Split: NO CHANGE → Discard

📄 License

This project is open-source under the MIT License.

Built as a portfolio demonstration of enterprise-grade Azure Data Engineering capabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
adf		adf
config		config
docs		docs
sql_scripts		sql_scripts
src/python		src/python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏗️ JD Edwards Data Warehouse Modernization

📐 Architecture

🎯 Problem Statement

✨ Key Features

📁 Repository Structure

🚀 Getting Started

Prerequisites

Step 1: Generate Synthetic Data

Step 2: Upload to ADLS Gen2

Step 3: Create Gold Layer Tables

Step 4: Connect ADF to GitHub

Step 5: Run the Pipeline

💰 Cost Optimization Strategy

🔄 SCD Type 2 — How It Works

📄 License

About

Uh oh!

Releases

Packages

Languages

justinarndt/data-warehouse-migration

Folders and files

Latest commit

History

Repository files navigation

🏗️ JD Edwards Data Warehouse Modernization

📐 Architecture

🎯 Problem Statement

✨ Key Features

📁 Repository Structure

🚀 Getting Started

Prerequisites

Step 1: Generate Synthetic Data

Step 2: Upload to ADLS Gen2

Step 3: Create Gold Layer Tables

Step 4: Connect ADF to GitHub

Step 5: Run the Pipeline

💰 Cost Optimization Strategy

🔄 SCD Type 2 — How It Works

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages