The primary analytics objective of this project is to develop useful sales leads from the public FMCSA motor carrier data.
This project also serves as a pilot demonstration of using the following tools/processes for managing the data science workflow:
- Data Science Lifecycle Process
- Git Projects (i.e. KanBan Board)
- Git Workflows - Automatic Containerization & Deployment
Use this repo leverages the Data Science Life Cycle Process. This is a framework for managing the data science lifecycle within a git code repository. See full documentation about this process HERE.
├── .cloud # for storing cloud configuration files and templates (e.g. ARM, Terraform, etc)
├── .github
│ ├── ISSUE_TEMPLATE
│ │ ├── Ask.md
│ │ ├── Data.Aquisition.md
│ │ ├── Data.Create.md
│ │ ├── Experiment.md
│ │ ├── Explore.md
│ │ └── Model.md
│ ├── labels.yaml
│ └── workflows
├── .gitignore
├── README.md
├── code
│ ├── datasets # code for creating or getting datasets
│ ├── deployment # code for deploying models
│ ├── features # code for creating features
│ └── models # code for building and training models
├── data # directory is for consistent data placement. contents are gitignored by default.
│ ├── README.md
│ ├── interim # storing intermediate results (mostly for debugging)
│ ├── processed # storing transformed data used for reporting, modeling, etc
│ └── raw # storing raw data to use as inputs to rest of pipeline
├── docs
│ ├── code # documenting everything in the code directory (could be sphinx project for example)
│ ├── data # documenting datasets, data profiles, behaviors, column definitions, etc
│ ├── media # storing images, videos, etc, needed for docs.
│ ├── references # for collecting and documenting external resources relevant to the project
│ └── solution_architecture.md # describe and diagram solution design and architecture
├── environments
├── notebooks
├── pipelines # for pipeline orchestrators i.e. AzureML Pipelines, Airflow, Luigi, etc.
├── setup.py # if using python, for finding all the packages inside of code.
└── tests # for testing your code, data, and outputs
├── data_validation
└── unit