Vendor Database Standardization and Consolidation Project

Introduction

This project contains scripts for address parsing and database standardization for vendor databases.

Folder Structure

The project has two main folders:

Address Parsing Scripts: This folder contains scripts for parsing addresses. There are two main scripts in this folder:
- RPA Script and Address Parse: These scripts uses regex for address parsing, it is not entirely accurate as there are constant edge cases and incorrect address inputs.
- API Address Parse: This script uses the OpenCage API for more accurate address parsing. Please insert your own API key found at OpenCage. Please note that this script has API rate limits.
Database Standardization Script Trials: This folder contains all the trial files we made to standardize the databases. These files are not accurate and up-to-date, but they serve as a foundation for our project. The folder includes two MVP main files that leads to:
- peter_ingestion_mapping.ipynb: This is the final version of our database standardization script.
- sejal_ingestion_mapping.ipynb: This script is niched for a certain database.

How to Use

Here are the steps on how to operate peter_ingestion_mapping.ipynb:

Open the peter_ingestion_mapping.ipynb file in Jupyter Notebook or Jupyter Lab.
To include files for upload, navigate to the section of the script where file uploading is handled. Follow the instructions in the script comments to include your files.
Run the script cells in order as they appear in the notebook.
Make note that some cells are specific to certain databases, do not run them if you're not parsing the sepcific database.

Please ensure you have the necessary Python packages installed and that your files are in the correct format as specified in the script comments.

Contact

If you have any questions or need further clarification, feel free to reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
MVP standardization scripts		MVP standardization scripts
address aka		address aka
address parsing		address parsing
ingestion		ingestion
LICENSE		LICENSE
README.md		README.md
sejal_ingestion_mapping.ipynb		sejal_ingestion_mapping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vendor Database Standardization and Consolidation Project

Table of Contents

Introduction

Folder Structure

How to Use

Contact

About

Releases

Packages

Contributors 2

Languages

License

itspetah/Vendors-Data-Standardization

Folders and files

Latest commit

History

Repository files navigation

Vendor Database Standardization and Consolidation Project

Table of Contents

Introduction

Folder Structure

How to Use

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages