Machine Learning Approach (Primary Solution) We successfully reverse-engineered the legacy system using advanced machine learning with sophisticated feature engineering that achieved a 95% improvement over naive methods:
- Initial Score: 93,192 (naive formula)
- Final Score: 4,866 (95% improvement)
- Average Error: Reduced from $930 to $47.66
Key Breakthrough:
- Trip Categorization: Implemented Kevin's "6 calculation paths" theory using one-hot encoded trip categories
- Two-Model Architecture: Separate specialized models for normal vs high-receipt cases (>$1400 threshold)
- Advanced Feature Engineering: Captured complex business rules from interviews in ML features
- Decision Tree Regressor: With tuned hyperparameters (max_depth=12, min_samples_leaf=3)
Component-Based KISS Approach (Previous Solution) Our initial interpretable approach that laid the foundation:
- Score: 25,601 (73% improvement over naive)
- Average Error: $255
Key Discoveries:
- 5-Day Trip Penalty: Major breakthrough - 5-day trips have a consistent -$400 penalty
- Complex Receipt Processing: Receipt treatment varies dramatically by trip length:
- 1-day trips: Heavy penalty (-0.5x for normal amounts)
- 5-day trips: Full reimbursement (1.04x)
- 8+ day trips: Severe penalty (0.02x)
- Tiered Mileage System: First 100 miles at $0.75/mile, remainder at $0.50/mile
- Efficiency Bonuses: Sweet spot of 180-220 miles/day earns $50 bonus
- Receipt Amount Caps: $2000+ receipts always get 0.25x multiplier
| Approach | Score | Average Error | Close Matches (±$1) |
|---|---|---|---|
| Naive Formula | 93,192 | $930+ | 0 |
| Basic Components | 85,358 | $852 | 0 |
| + 5-Day Penalty | 46,037 | $459 | 0 |
| + Receipt Ratios | 26,000 | $259 | 0 |
| KISS Final | 25,601 | $255 | 2 |
| ML Breakthrough | 4,866 | $47.66 | 20 |
Machine Learning Model Features:
- Trip categories:
quick_trip_high_miles,long_haul,low_efficiency,sweet_spot_efficiency,balanced - 3-tier mileage system with explicit tiers
- Receipt multipliers based on trip length and amount
- Efficiency bonuses for optimal miles/day ratios
- Interaction features (days × miles)
- Separate models for normal vs outlier cases
Implementation:
# Two-model architecture
if receipts > 1400:
model = outlier_model # Specialized for high-receipt cases
else:
model = main_model # Optimized for standard cases
reimbursement = model.predict(engineered_features)- Immediate Target: Achieve first exact matches, score <1,000
- Advanced Target: Score <500 with refined outlier handling
- Ultimate Goal: Multiple exact matches with score <100
Reverse-engineer a 60-year-old travel reimbursement system using only historical data and employee interviews.
ACME Corp's legacy reimbursement system has been running for 60 years. No one knows how it works, but it's still used daily.
8090 has built them a new system, but ACME Corp is confused by the differences in results. Your mission is to figure out the original business logic so we can explain why ours is different and better.
Your job: create a perfect replica of the legacy system by reverse-engineering its behavior from 1,000 historical input/output examples and employee interviews.
The system takes three inputs:
trip_duration_days- Number of days spent traveling (integer)miles_traveled- Total miles traveled (integer)total_receipts_amount- Total dollar amount of receipts (float)
- A PRD (Product Requirements Document)
- Employee interviews with system hints
- Single numeric reimbursement amount (float, rounded to 2 decimal places)
public_cases.json- 1,000 historical input/output examples
top-coder-challenge/
├── models/ # Machine learning models (.joblib files)
├── scripts/ # Analysis and calculation scripts
├── docs/ # Development documentation and logs
├── public_cases.json # Historical input/output examples
├── private_cases.json # Private test cases
├── PRD.md # Product Requirements Document
├── INTERVIEWS.md # Employee interviews with system hints
├── run.sh # Main execution script
└── eval.sh # Testing script
- Analyze the data:
- Look at
public_cases.jsonto understand patterns - Look at
PRD.mdto understand the business problem - Look at
INTERVIEWS.mdto understand the business logic
- Look at
- Create your implementation:
- Copy
run.sh.templatetorun.sh - Implement your calculation logic
- Make sure it outputs just the reimbursement amount
- Copy
- Test your solution:
- Run
./eval.shto see how you're doing - Use the feedback to improve your algorithm
- Run
- Submit:
- Run
./generate_results.shto get your final results. - Add
arjun-krishna1to your repo. - Complete the submission form.
- Run
Your run.sh script must:
- Take exactly 3 parameters:
trip_duration_days,miles_traveled,total_receipts_amount - Output a single number (the reimbursement amount)
- Run in under 5 seconds per test case
- Work without external dependencies (no network calls, databases, etc.)
Example:
./run.sh 5 250 150.75
# Should output something like: 487.25Run ./eval.sh to test your solution against all 1,000 cases. The script will show:
- Exact matches: Cases within ±$0.01 of the expected output
- Close matches: Cases within ±$1.00 of the expected output
- Average error: Mean absolute difference from expected outputs
- Score: Lower is better (combines accuracy and precision)
Your submission will be tested against private_cases.json which does not include the outputs.
When you're ready to submit:
- Push your solution to a GitHub repository
- Add
arjun-krishna1to your repository - Submit via the submission form.
- When you submit the form you will submit your
private_results.txtwhich will be used for your final score.
Good luck and Bon Voyage!