Welcome to the FastCommerce Data Engineering Challenge!
You'll be building a data pipeline for an e-commerce company that processes orders, manages inventory, and generates analytics. This challenge should take 4-6 hours to complete.
- Sample Data: Representative datasets in
Evaluation-Files/ - Challenge Instructions: Detailed requirements in
challenge_instructions.md - Evaluation Rubric: Scoring criteria in
evaluation_rubric.md - Data Generator: Script to create additional test data if needed
Design and implement a data pipeline solution using your preferred technology stack. You have complete freedom in:
- Architecture design
- Technology choices (Python, Node.js, SQL databases, etc.)
- Folder structure and organization
- Implementation approach
- Click the "Fork" button at the top of this GitHub repository
- Clone your forked repository to your local machine:
git clone https://github.com/YOUR_USERNAME/lifi-data-engineer-test.git cd lifi-data-engineer-test - Create a new branch for your solution:
git checkout -b solution
- Read through
challenge_instructions.mdcarefully - Explore the sample data to understand the structure
- Design your solution architecture
- Implement your pipeline
- Document your approach and decisions
- Commit your work to your solution branch:
git add . git commit -m "Complete data engineering challenge" git push origin solution
- Create a pull request from your
solutionbranch to yourmainbranch - Share the link to your forked repository for evaluation
- 1,000 orders across 30 days (scalable to 10K+ orders/day requirement)
- 50 products with full catalog information
- 50 inventory records across multiple warehouses
- Multiple channels: web, mobile, API
- Various order statuses: confirmed, pending, cancelled
Evaluation-Files/
├── orders_stream.jsonl # 1,000 orders in JSONL format (~372KB)
├── inventory_updates.csv # 50 inventory records (~4KB)
├── product_catalog.json # 50 products with metadata (~8KB)
├── broken_pipeline_code.py # Bonus debugging challenge
├── data_generator_js.js # Generate additional test data
└── package_json.json # Node.js dependencies for generator
{
"order_id": "ORD-123456",
"customer_id": "CUST-789",
"timestamp": "2024-01-15T10:30:00Z",
"channel": "web",
"items": [
{
"product_id": "PROD-001",
"quantity": 2,
"unit_price": 29.99
}
],
"shipping_address": {
"country": "US",
"state": "CA",
"city": "San Francisco"
},
"status": "confirmed"
}product_id,available_quantity,warehouse_location,last_updated
PROD-001,150,WH-SF,2024-01-15T09:00:00Z
PROD-002,75,WH-NY,2024-01-15T08:45:00Z[
{
"product_id": "PROD-001",
"category": "Electronics",
"brand": "TechCorp",
"price": 299.99,
"launch_date": "2023-06-15T00:00:00.000Z",
"description": "High-performance wireless headphones"
}
]- Ingest streaming orders and batch inventory updates
- Enrich orders with product catalog data
- Handle schema evolution and data validation
- Ensure exactly-once processing
- Implement data completeness and business rule validation
- Build anomaly detection for unusual patterns
- Create pipeline health monitoring
- Set up alerting for data quality issues
- Daily revenue by channel and region
- Top products and customer insights
- Inventory analysis and low-stock alerts
- Operational metrics and performance tracking
- Design for 10x scalability
- Address deployment and security concerns
- Plan for failure recovery and monitoring
- Document operational procedures
- Debug and fix
broken_pipeline_code.py - Improve code quality and add error handling
- Document all issues found and fixes made
- Fork this repository and work on your
solutionbranch - Commit all your work with clear commit messages
- Create a pull request from
solutiontomainin your fork - Share your forked repository URL for evaluation
- Ensure your solution runs with clear setup instructions
- Working Code: Complete pipeline implementation
- Documentation: Architecture overview and setup instructions
- Analysis: Performance considerations and design decisions
- Tests: Validation of your solution
- README: Clear instructions on how to run your solution
- Technical Excellence: Clean, scalable, maintainable code
- Problem Solving: Complete solution addressing all requirements
- Communication: Clear documentation and decision justification
- Production Mindset: Considerations for scale, monitoring, and reliability
- 1 hour: Requirements analysis and architecture design
- 2-3 hours: Core pipeline implementation
- 1 hour: Data quality and monitoring
- 1 hour: Analytics queries and documentation
If anything is unclear, make reasonable assumptions and document them. We're interested in seeing your thought process and engineering judgment.
Good luck! 🚀
This challenge tests real-world data engineering skills including pipeline design, data quality, analytics, and production readiness.