Top Coder Challenge: Black Box Legacy Reimbursement System

🚀 Our Solution Overview

Approaches Implemented

Machine Learning Approach (Primary Solution) We successfully reverse-engineered the legacy system using advanced machine learning with sophisticated feature engineering that achieved a 95% improvement over naive methods:

Initial Score: 93,192 (naive formula)
Final Score: 4,866 (95% improvement)
Average Error: Reduced from $930 to $47.66

Key Breakthrough:

Trip Categorization: Implemented Kevin's "6 calculation paths" theory using one-hot encoded trip categories
Two-Model Architecture: Separate specialized models for normal vs high-receipt cases (>$1400 threshold)
Advanced Feature Engineering: Captured complex business rules from interviews in ML features
Decision Tree Regressor: With tuned hyperparameters (max_depth=12, min_samples_leaf=3)

Component-Based KISS Approach (Previous Solution) Our initial interpretable approach that laid the foundation:

Score: 25,601 (73% improvement over naive)
Average Error: $255

Key Discoveries:

5-Day Trip Penalty: Major breakthrough - 5-day trips have a consistent -$400 penalty
Complex Receipt Processing: Receipt treatment varies dramatically by trip length:
- 1-day trips: Heavy penalty (-0.5x for normal amounts)
- 5-day trips: Full reimbursement (1.04x)
- 8+ day trips: Severe penalty (0.02x)
Tiered Mileage System: First 100 miles at $0.75/mile, remainder at $0.50/mile
Efficiency Bonuses: Sweet spot of 180-220 miles/day earns $50 bonus
Receipt Amount Caps: $2000+ receipts always get 0.25x multiplier

Performance Comparison

Approach	Score	Average Error	Close Matches (±$1)
Naive Formula	93,192	$930+	0
Basic Components	85,358	$852	0
+ 5-Day Penalty	46,037	$459	0
+ Receipt Ratios	26,000	$259	0
KISS Final	25,601	$255	2
ML Breakthrough	4,866	$47.66	20

Current Architecture

Machine Learning Model Features:

Trip categories: quick_trip_high_miles, long_haul, low_efficiency, sweet_spot_efficiency, balanced
3-tier mileage system with explicit tiers
Receipt multipliers based on trip length and amount
Efficiency bonuses for optimal miles/day ratios
Interaction features (days × miles)
Separate models for normal vs outlier cases

Implementation:

# Two-model architecture
if receipts > 1400:
    model = outlier_model  # Specialized for high-receipt cases
else:
    model = main_model     # Optimized for standard cases

reimbursement = model.predict(engineered_features)

Next Optimization Goals

Immediate Target: Achieve first exact matches, score <1,000
Advanced Target: Score <500 with refined outlier handling
Ultimate Goal: Multiple exact matches with score <100

Reverse-engineer a 60-year-old travel reimbursement system using only historical data and employee interviews.

ACME Corp's legacy reimbursement system has been running for 60 years. No one knows how it works, but it's still used daily.

8090 has built them a new system, but ACME Corp is confused by the differences in results. Your mission is to figure out the original business logic so we can explain why ours is different and better.

Your job: create a perfect replica of the legacy system by reverse-engineering its behavior from 1,000 historical input/output examples and employee interviews.

What You Have

Input Parameters

The system takes three inputs:

trip_duration_days - Number of days spent traveling (integer)
miles_traveled - Total miles traveled (integer)
total_receipts_amount - Total dollar amount of receipts (float)

Documentation

A PRD (Product Requirements Document)
Employee interviews with system hints

Output

Single numeric reimbursement amount (float, rounded to 2 decimal places)

Historical Data

public_cases.json - 1,000 historical input/output examples

Project Structure

top-coder-challenge/
├── models/           # Machine learning models (.joblib files)
├── scripts/          # Analysis and calculation scripts
├── docs/             # Development documentation and logs
├── public_cases.json # Historical input/output examples
├── private_cases.json # Private test cases
├── PRD.md           # Product Requirements Document
├── INTERVIEWS.md    # Employee interviews with system hints
├── run.sh           # Main execution script
└── eval.sh          # Testing script

Getting Started

Analyze the data:
- Look at public_cases.json to understand patterns
- Look at PRD.md to understand the business problem
- Look at INTERVIEWS.md to understand the business logic
Create your implementation:
- Copy run.sh.template to run.sh
- Implement your calculation logic
- Make sure it outputs just the reimbursement amount
Test your solution:
- Run ./eval.sh to see how you're doing
- Use the feedback to improve your algorithm
Submit:
- Run ./generate_results.sh to get your final results.
- Add arjun-krishna1 to your repo.
- Complete the submission form.

Implementation Requirements

Your run.sh script must:

Take exactly 3 parameters: trip_duration_days, miles_traveled, total_receipts_amount
Output a single number (the reimbursement amount)
Run in under 5 seconds per test case
Work without external dependencies (no network calls, databases, etc.)

Example:

./run.sh 5 250 150.75
# Should output something like: 487.25

Evaluation

Run ./eval.sh to test your solution against all 1,000 cases. The script will show:

Exact matches: Cases within ±$0.01 of the expected output
Close matches: Cases within ±$1.00 of the expected output
Average error: Mean absolute difference from expected outputs
Score: Lower is better (combines accuracy and precision)

Your submission will be tested against private_cases.json which does not include the outputs.

Submission

When you're ready to submit:

Push your solution to a GitHub repository
Add arjun-krishna1 to your repository
Submit via the submission form.
When you submit the form you will submit your private_results.txt which will be used for your final score.

Good luck and Bon Voyage!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Top Coder Challenge: Black Box Legacy Reimbursement System

🚀 Our Solution Overview

Approaches Implemented

Performance Comparison

Current Architecture

Next Optimization Goals

What You Have

Input Parameters

Documentation

Output

Historical Data

Project Structure

Getting Started

Implementation Requirements

Evaluation

Submission

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
models		models
scripts		scripts
INTERVIEWS.md		INTERVIEWS.md
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md
eval.sh		eval.sh
eval_fast.sh		eval_fast.sh
generate_results.sh		generate_results.sh
generate_results_parallel.sh		generate_results_parallel.sh
generate_results_ultra_fast.sh		generate_results_ultra_fast.sh
private_cases.json		private_cases.json
private_results.txt		private_results.txt
public_cases.json		public_cases.json
requirements.txt		requirements.txt
run.sh		run.sh
run.sh.template		run.sh.template

License

RaghavMangrola/top-coder-challenge

Folders and files

Latest commit

History

Repository files navigation

Top Coder Challenge: Black Box Legacy Reimbursement System

🚀 Our Solution Overview

Approaches Implemented

Performance Comparison

Current Architecture

Next Optimization Goals

What You Have

Input Parameters

Documentation

Output

Historical Data

Project Structure

Getting Started

Implementation Requirements

Evaluation

Submission

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages