Skip to content

Extraction and analysis of data from the Federal Motor Carrier Safety Administration.

Notifications You must be signed in to change notification settings

jdegregorio/fmcsa-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Federal Motor Carrier Safety Administration (FMCSA) Analysis

Overview

The primary analytics objective of this project is to develop useful sales leads from the public FMCSA motor carrier data.

This project also serves as a pilot demonstration of using the following tools/processes for managing the data science workflow:

  • Data Science Lifecycle Process
  • Git Projects (i.e. KanBan Board)
  • Git Workflows - Automatic Containerization & Deployment

Data Science Lifecycle Process (DSLP)

Use this repo leverages the Data Science Life Cycle Process. This is a framework for managing the data science lifecycle within a git code repository. See full documentation about this process HERE.

Project Directory Structure

├── .cloud              # for storing cloud configuration files and templates (e.g. ARM, Terraform, etc)
├── .github
│   ├── ISSUE_TEMPLATE
│   │   ├── Ask.md
│   │   ├── Data.Aquisition.md
│   │   ├── Data.Create.md
│   │   ├── Experiment.md
│   │   ├── Explore.md
│   │   └── Model.md
│   ├── labels.yaml
│   └── workflows
├── .gitignore
├── README.md
├── code
│   ├── datasets        # code for creating or getting datasets
│   ├── deployment      # code for deploying models
│   ├── features        # code for creating features
│   └── models          # code for building and training models
├── data                # directory is for consistent data placement. contents are gitignored by default.
│   ├── README.md
│   ├── interim         # storing intermediate results (mostly for debugging)
│   ├── processed       # storing transformed data used for reporting, modeling, etc
│   └── raw             # storing raw data to use as inputs to rest of pipeline
├── docs
│   ├── code            # documenting everything in the code directory (could be sphinx project for example)
│   ├── data            # documenting datasets, data profiles, behaviors, column definitions, etc
│   ├── media           # storing images, videos, etc, needed for docs.
│   ├── references      # for collecting and documenting external resources relevant to the project
│   └── solution_architecture.md    # describe and diagram solution design and architecture
├── environments
├── notebooks
├── pipelines           # for pipeline orchestrators i.e. AzureML Pipelines, Airflow, Luigi, etc.
├── setup.py            # if using python, for finding all the packages inside of code.
└── tests               # for testing your code, data, and outputs
    ├── data_validation
    └── unit

About

Extraction and analysis of data from the Federal Motor Carrier Safety Administration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages