Skip to content

This project consists of training an NLP model to detect the "programming" language of a given code snippet.

Notifications You must be signed in to change notification settings

amal-hasni/spot-language

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Spot-Language Model

This project was created to build an NLP solution for programming language detection out of source code.

banner

Link to Medium Article

How I Built a Classification Model for Source Code Languages

Supported Languages

The trained model supports the following languages:

C C++ Objective-C C# Swift
Ruby Julia Lua Java Groovy
Kotlin Scala Shell Batchfile PowerShell
Python Markdown HTML PHP CSS
TypeScript JavaScript CoffeeScript Haskell Perl
Go SQL Rust TeX Erlang
Visual Basic Dart Pascal Jupyter Notebook

Demonstration

To try the model out, you can follow this link to the Demo App deployed on Heroku.

Training:

To train the model, you need to download the dataset we used through this kaggle notebook. You can read it, to see how we extracted it from "Github Repos" dataset or run the all cells to skip to the download link at the end directly.

Once you have the dataset, replace the DATA_PATH variable with the appropriate value in the train.py and run the code to see the accuracy it gives you. It should be around 97%.

You can use libraries such as joblib, pickle or piskle to serialize it, if you need to use it at a later time.


If you like the project and want to support us, you can buy us a coffee here:

Buy Me A Coffee

About

This project consists of training an NLP model to detect the "programming" language of a given code snippet.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages