Enterprise-grade data pipeline migrating legacy JD Edwards EnterpriseOne ERP data into a modern Azure Cloud Data Warehouse using the Medallion Architecture (Bronze → Silver → Gold).
flowchart LR
subgraph Source ["JD Edwards ERP"]
F0101["F0101 — Address Book"]
F4211["F4211 — Sales Orders"]
end
subgraph Bronze ["Bronze — ADLS Gen2"]
B["CSV → Parquet\nSnappy Compression\nTime Partitioned"]
end
subgraph Silver ["Silver — ADF Data Flows"]
S1["Julian Date → ISO Date\nImplicit Decimals → Float"]
S2["SCD Type 2\nSHA256 Change Detection"]
end
subgraph Gold ["Gold — Azure SQL"]
G1["Dim_Date"]
G2["Dim_Customer"]
G3["Fact_Sales"]
end
subgraph Analytics ["Power BI / Tableau"]
BI["Dashboards\n& Reports"]
end
F0101 --> B
F4211 --> B
B --> S1
S1 --> S2
S2 --> G1 & G2 & G3
G3 --> BI
style Source fill:#2d2d2d,color:#fff,stroke:#555
style Bronze fill:#cd7f32,color:#fff,stroke:#a0642a
style Silver fill:#b0b0b0,color:#000,stroke:#888
style Gold fill:#daa520,color:#000,stroke:#b8860b
style Analytics fill:#0078d4,color:#fff,stroke:#005a9e
Orchestration: Azure Data Factory • Security: Key Vault + Managed Identity • Monitoring: Logic Apps Alerts • Budget: < $200 Azure Credits
JD Edwards EnterpriseOne stores data using legacy conventions that break modern analytics tools:
| Challenge | JDE Format | Example Raw Value | Correct Value |
|---|---|---|---|
| Julian Dates | CYYDDD |
123001 |
2023-01-01 |
| Implicit Decimals | Integer (÷100) | 1050 |
$10.50 |
| Cryptic Columns | Abbreviated codes | SDAN8, SDAEXP |
Customer ID, Extended Price |
If these are not decoded, revenue reports are overstated by 100× and dates appear as nonsensical values like "Year 123."
- Medallion Architecture — Bronze (raw), Silver (cleansed), Gold (star schema) for full auditability
- SCD Type 2 Tracking — Historical changes to customers preserved using SHA256 hash comparison
- Metadata-Driven Ingestion — Configuration-based pipeline; zero hardcoded paths
- Point-in-Time Joins — Sales attributed to the historical customer state at time of order
- Logic Apps Alerting — Real-time HTML email notifications on pipeline failures
- Cost Optimized — Full platform runs under $200/month using Basic DTU + ephemeral Spark
- CI/CD Ready — GitHub Actions validates Python, SQL, and ADF JSON on every push
data-warehouse-migration/
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions: lint Python, validate SQL & JSON
├── adf/
│ ├── dataflow/
│ │ ├── DF_Clean_JDE.json # Julian date + decimal conversion logic
│ │ └── DF_SCD2_Customer.json # Slowly Changing Dimensions Type 2
│ ├── dataset/
│ │ ├── ds_bronze_csv.json # Parameterized CSV source
│ │ ├── ds_bronze_parquet.json # Time-partitioned Parquet sink
│ │ ├── ds_silver_parquet.json # Cleansed Silver output
│ │ └── ds_gold_sql.json # Azure SQL star schema tables
│ ├── linkedService/
│ │ ├── ls_adls_gen2.json # Data Lake Storage (Managed Identity)
│ │ ├── ls_azure_sql.json # Gold DB (Key Vault secret)
│ │ └── ls_key_vault.json # Centralized secret management
│ └── pipeline/
│ ├── PL_Ingest_Bronze.json # Lookup → ForEach → Copy (CSV → Parquet)
│ ├── PL_Transform_Silver.json # Execute JDE decoder + SCD2 data flows
│ ├── PL_Load_Gold.json # Point-in-time fact table loading
│ └── PL_Master.json # Orchestrator with failure alerting
├── config/
│ └── source_config.json # Metadata-driven ingestion configuration
├── data/ # Generated CSVs (gitignored)
├── sql_scripts/
│ ├── create_schemas.sql # Bronze / Silver / Gold schema DDL
│ ├── dim_customer.sql # SCD2 customer dimension
│ ├── dim_date.sql # 20-year calendar dimension
│ └── fact_sales.sql # Star schema fact table
└── src/
└── python/
├── generate_jde_data.py # Synthetic JDE data generator
└── requirements.txt # pandas, faker
- Azure Subscription with Data Factory, Storage Account (HNS enabled), SQL Database, and Key Vault
- Python 3.11+ for data generation
- Azure CLI or Azure Storage Explorer for file uploads
cd data-warehouse-migration
pip install -r src/python/requirements.txt
python src/python/generate_jde_data.pyThis creates data/F0101.csv (50 customers) and data/F4211.csv (200 sales orders) with JDE-formatted fields.
Using Azure Storage Explorer or CLI, upload the CSVs to your Data Lake container:
datalake/01-bronze/landing/F0101.csv
datalake/01-bronze/landing/F4211.csv
Also upload config/source_config.json to datalake/config/source_config.json.
Execute the SQL scripts against your Azure SQL Database in order:
sqlcmd -S your-server.database.windows.net -d sql-dw-gold -U admin -P 'password' \
-i sql_scripts/create_schemas.sql \
-i sql_scripts/dim_date.sql \
-i sql_scripts/dim_customer.sql \
-i sql_scripts/fact_sales.sql- Open ADF Studio → Manage → Git Configuration
- Select GitHub, point to this repository
- Set Collaboration Branch =
main, Root Folder =/adf/ - ADF will auto-discover all pipelines, datasets, and data flows
Trigger PL_Master in ADF Studio. It executes sequentially:
- Bronze — Ingests CSVs → Parquet (partitioned)
- Silver — Decodes Julian dates + decimal normalization + SCD2
- Gold — Loads star schema with point-in-time dimension joins
| Resource | Configuration | Monthly Cost |
|---|---|---|
| Azure SQL Database | Basic Tier (5 DTU / 2GB) | ~$4.99 |
| ADF Data Flows | TTL = 0 (no idle billing) | Pay per use |
| ADLS Gen2 | Cool tier lifecycle policy | ~$0.01/GB |
| Key Vault | Standard tier | ~$0.03/10K ops |
| Total | < $15/month |
FinOps Note: Debug sessions are limited to 60-minute timeouts. Spark clusters spin up on-demand and shut down immediately after execution (TTL=0).
sequenceDiagram
participant New as New F0101 Data
participant Hash as SHA256 Hasher
participant Lookup as Business Key Lookup
participant Split as Conditional Split
participant SQL as Gold.Dim_Customer
New->>Hash: Hash business columns
Hash->>Lookup: Match on CustomerID
Lookup->>Split: Compare hashes
Split->>SQL: NEW → Insert (IsActive=1)
Split->>SQL: CHANGED → Update old (IsActive=0)
Split->>SQL: CHANGED → Insert new (IsActive=1)
Split-->>Split: NO CHANGE → Discard
This project is open-source under the MIT License.
Built as a portfolio demonstration of enterprise-grade Azure Data Engineering capabilities.