You can find the project at Omega2020 DS API
Stephen Plainte | Ivan Santos | Brandon Bruce | Marvin Davila |
---|---|---|---|
Rob Hamilton | Johana Luna | Hira Khan | Rudy Enriquez |
---|---|---|---|
The Omega2020 DS API serves as the backbone for image processing and computer vision pipelines to enable a better experience of transferring analog paper Sudoku puzzles to a digital form.
Below is an annotated breakout of the Cloud Architecture for the Omega 2020 Solution. Each step is also explained below.
- (Black Arrow) - Standard Inflow of Data for uploading a Paper Sudoku Puzzle
- (Orange Arrow) - Querying a Sudoku Puzzle String to solve
- (Green Arrow) - Responses to Front End
Data Pipeline:
-
Web Team's Front end Deployed on Netlify at Omega2020.netlify.com
-
Elastic Beanstalk endpoint, redirected from an HTTPS: hosted website.
-
Auto scales between 1-4 servers to be able to handle spikes in demand.
-
Entrypoint to Flask App
- (Black Arrow) First entry point within the Flask App, posts the raw image to S3.
- (Orange Arrow) Passes Puzzle string To Solver
-
After the raw image is uploaded, it goes the Image Processing Script. Cropping out the Sudoku Puzzle from the image background, and subdivides a Sudoku Puzzle to 81 cells, stored as a list of 81 Numpy Arrays. Each Numpy Array is 784 integers long, representing a 28x28 pixel image.
-
Solver Module
- (Orange Arrow) With Digits Passed via GET request from front end, solver checks if submitted Sudoku Puzzle is valid, if valid, the solution is passed as well as forecasted difficulty.
- (Green Arrow) With predicted digits passed back from Sagemaker API Endpoint, solver checks if submitted Sudoku Puzzle is valid, if valid, the solution is passed as well as forecasted difficulty.
-
Amazon API Endpoint called for Analysis. Acts as a handler between Flask App and Sagemaker back end.
-
Lambda Function receives URL metadata from the API Gateway, and transforms it into the Sagemaker format.
-
Amazon Sagemaker Scores the inbound predictions.
-
S3 Bucket is organized into different folders of Raw Images, Processed Images, Individual Sudoku Cells,
Auxiliary Services:
- A. AWS Ground Truth was used to initially bootstrap the training of our model where our team individually scored 5,000 digits from a Sudoku Puzzle Book.
- B. The Sagemaker Train Function reads a specific folder in the S3 Bucket, and runs on a scheduled basis allowing Omega2020 learns over time as more data is shared.
- C. Reference Puzzles generated from our scraper is pulled on request to the front end, organized by difficulty.
Using an XGBClassifier model, we a weighted have Digit recognition at +95% precision and recall across all classes, trained on over 100,000 images (combination of manually scrapped and MNIST), and a validation dataset of over 25,000 digits. Here is an output of our most recent classification report and validation score: (0.0 represents blank/noise cells that are not any single number)
Validation Accuracy 0.9552200984651028
starting validation test
precision recall f1-score support
0.0 0.99 0.98 0.99 1624
1.0 0.95 0.99 0.97 2936
2.0 0.96 0.96 0.96 3010
3.0 0.95 0.94 0.95 2958
4.0 0.94 0.96 0.95 2787
5.0 0.95 0.94 0.95 2680
6.0 0.97 0.98 0.98 2894
7.0 0.97 0.95 0.96 3026
8.0 0.95 0.93 0.94 2802
9.0 0.93 0.93 0.93 2907
micro avg 0.96 0.96 0.96 27624
macro avg 0.96 0.96 0.96 27624
weighted avg 0.96 0.96 0.96 27624
Here is an example of the intermediary steps for taking a raw image and formatting it in such a way for digit recognition.
Original Photo:
Processed Image:
Cell Splicing:
Each Cell is then casted into a Numpy Array (each cell is 28x28 pixels, reshaped to a Numpy vector with length 784) and then fed into the Model.
Predicted Sudoku Grid and Solution Grid.
Note: For the modeling, the "blank" classifier is a 0
, but for the front end it is a .
.
To predict the difficulty level of a Sudoku we used a Logistic Regression, by counting the number of times different techniques are used to solve a given puzzle, we can forecast an accurate difficulty tracking at above 70%.
- Reference Sudoku Puzzles Scraped
- Paper Sudoku Puzzles Processed
- MNIST
- 4x4 Puzzles Dataset
- Sudoku Scraper
- 6x6 Puzzles Scraped
- 12x12 Puzzles Scraped
- 16x16 Puzzles Scraped
route | description |
---|---|
POST: /demo_file |
With an image attached, the predicted digits, sudoku solution (if applicable), and puzzle difficulty (if applicable) |
GET: /solve?puzzle=*puzzle_string* |
For submitted Sudoku String, returns sudoku solution (if applicable), and difficulty (if applicable) |
DEV ONLY /reset |
Drops Tables and reinitiates database. Use only in testing. |
/bulk_processing |
Batch Processing of images in raw_images folder in S3, useful for upates to image processing |
/train |
Submit all valid Sudoku Puzzles images (as numpy arrays) and predicted values to a validation S3 folder to be fed into Sagemaker Training |
/upload |
Simple HTML page to test image upload independent of front end (used for DS testing) |
GET Request for the puzzle solution:
API Request:
https://api.lambda-omega2020.com/solve?puzzle=.7.8..6.4.36..5.1...514...2369.78.....2...5.....46.3987...319...9.2..18.5.4..7.6.
API Response:
{
"difficulty": "Gentle",
"puzzle_status": 1,
"solution": "271893654436725819985146732369578421842319576157462398728631945693254187514987263",
"values": ".7.8..6.4.36..5.1...514...2369.78.....2...5.....46.3987...319...9.2..18.5.4..7.6."
}
POST Request for image upload:
API Request:
https://api.lambda-omega2020.com/demo_file
Image in the file paramater.
API Response:
{
"difficulty": "Gentle",
"puzzle_status": 1,
"solution": "271893654436725819985146732369578421842319576157462398728631945693254187514987263",
"values": ".7.8..6.4.36..5.1...514...2369.78.....2...5.....46.3987...319...9.2..18.5.4..7.6."
}
This package uses enviornment variables stored in a .env file to store secrets, example of used variables here:
FLASK_ENV=development
FLASK_DEBUG=TRUE
DATABASE_URL='postgres://username:password@hostname:5432/database'
S3_KEY = 'KEY_HERE'
S3_SECRET = 'SECRET_HERE'
S3_BUCKET = 'omega2020-ds' #S3 Buket Name
S3_LOCATION = 'https://omega2020-ds.s3.amazonaws.com/' #S3 Bucket URL
ExtraArgs='{"ACL": "public-read", "ContentType": "image/png", "ContentDisposition": "inline"}' #extra arguments for image uploads
MODEL_FILEPATH='data/reference_knn_model.sav' #relative path for where local model files are scored in with the reference KNN model in the package
TRAIN_DATABASE_HOST= 'database-1.us-east-1.rds.amazonaws.com'#Deployed on AWS RDS
TRAIN_DATABASE_PW = 'databasepassword'
TRAIN_DATABASE_USER = 'postgres' #default value for postgres RDS databaes
TRAIN_DATABASE_TABLE = 'postgres' #default value for postgres RDS databaes
SAGEMAKER_API_URL = 'https://execute-api.us-east-1.amazonaws.com/test/omega-predict-digits-s3/' #used if sagemaker endpoint is sued
This app as structured is intended to be deployed using AWS Elastic Beanstalk. The .ebextensions folder contains configuration for CORS as well as HTTPS certification, but requires an updated Role with the ARN linked to the AWS Certificate Manager role for the signed SSL certificate. (SSL Is required to work in production with netlify, as netlify will not accept HTTP traffic).
We are documenting outstanding issues on the issues page of this repo: https://github.com/Lambda-School-Labs/omega2020-ds/issues
If you are having an issue with the existing project code, please submit a bug report under the following guidelines:
- Check first to see if your issue has already been reported.
- Check to see if the issue has recently been fixed by attempting to reproduce the issue using the latest master branch in the repository.
- Create a live example of the problem.
- Submit a detailed bug report including your environment & browser, steps to reproduce the issue, actual and expected outcomes, where you believe the issue is originating from, and any potential solutions you have considered.
We would love to hear from you about new features which would improve this app and further the aims of our project. Please provide as much detail and information as possible to show us why you think your new feature should be implemented.
If you have developed a patch, bug fix, or new feature that would improve this app, please submit a pull request. It is best to communicate your ideas with the developers first before investing a great deal of time into a pull request to ensure that it will mesh smoothly with the project.
Remember that this project is licensed under the MIT license, and by submitting a pull request, you agree that your work will be, too.
- Ensure any install or build dependencies are removed before the end of the layer when doing a build.
- Update the README.md with details of changes to the interface, including new plist variables, exposed ports, useful file locations and container parameters.
- Ensure that your code conforms to our existing code conventions and test coverage.
- Include the relevant issue number, if applicable.
- You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.
These contribution guidelines have been adapted from this good-Contributing.md-template.
Sudoku Image Processing was developed with reference to: Sarthak Vajpayee's https://medium.com/swlh/how-to-solve-sudoku-using-artificial-intelligence-8d5d3841b872
Use your own Algorithim with AWS Sagemaker: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html AWS: Bring Your own Container: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own/container
Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda: https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/
Help with Sudoku Solver Code: Peter Norvig, https://norvig.com/sudoku.html
Naked Twins Solver Technique Reference: http://hodoku.sourceforge.net/en/tech_naked.php
See Backend Documentation for details on the backend of our project.
See Front End Documentation for details on the front end of our project.