-
Notifications
You must be signed in to change notification settings - Fork 28
pqai classifier
This service is intended to contain algorithms and ML models that classify patents into one or more of a predefined set of finite categories.
In the real world, patent classifiers are useful for a number of reasons, such as:
- to identify patents related to a particular technology area, such as steel manufacturing
- to do patent landscaping
- for creating automated patent alerts
- for routing new patent applications to an appropriate department in a patent office.
In the PQAI search pipeline, patent classifiers can be used to identify the technology area of a search query, allowing the downstream search to narrow down on a specific segment of the prior-art database. For example, if a query is about OLED displays, patents related to pharmaceuticals can be excluded from the search. This is a way to reduce search latency without affecting accuracy of results.
root
|-- core
|-- classifiers.py // defines classifiers
|-- assets // files needed by classifiers, e.g. ML models
|-- tests
|-- test_server.py // Tests for the REST API
|-- test_classifiers.py // Tests for classifiers module
|-- main.py // Defines the REST API
|
|-- requirements.txt // List of Python dependencies
|
|-- Dockerfile // Docker files
|-- docker-compose.yml
|
|-- env // .env file template
|-- deploy.sh // Script for setting up on local system
The classifiers
module defines patent classifiers. There are two defined in the current implementation. Both associate one of the 600+ CPC/IPC subclasses to a given input text snippet.
BOWSubclassPredictor
BERTSubclassPredictor
The BOWSubclassPredictor
treats the input text as a bag of words. It uses a neural network model that was trained in a supervised manner to associate CPC subclass labels to a patent claim preambles.
The BERTSublassPredictor
is a BERT based model that was trained in a supervised manner to associate CPC subclass labels to patent abstracts.
Both of the above classifiers are instantiated as singleton objects, so there is never more than one instance of them in the memory.
Typical usage is as follows (given for BERTSubclassPredictor
but same for BOWSubclassPredictor
):
clf = BERTSubclassPredictor() # or BOWSubclassPredictor()
text = "natural language processsing"
subclasses = clf.predict_subclasses(text)
print(subclasses) # ["G06F", "G10L", "G09B", "G06Q", "G06T"]
The assets required to run this service are stored in the /assets
directory.
When you clone the Github repository, the /assets
directory will have nothing but a README file. You will need to download actual asset files as a zip archive from the following link:
https://https://s3.amazonaws.com/pqai.s3/public/assets-pqai-classifier.zip
After downloading, extract the zip file into the /assets
directory.
(alternatively, you can also use the deploy.sh
script to do this step automatically - see next section)
The assets contain the following files/directories:
-
uncased_L-12_H-768_A-12
: a BERT based model for associating CPC subclasses to snippets of text -
pmbl2subclass.features.json
: list of features (words) used by a deep learning model trained to identify CPC subclasses on the basis of claim preambles -
pmbl2subclass.h5
: neural network model weights -
pmbl2subclass.json
: neural network model metadata -
pmbl2subclass.targets.json
: list of target labels
Prerequisites
The following deployment steps assume that you are running a Linux distribution and have Git and Docker installed on your system.
Setup
The easiest way to get this service up and running on your local system is to follow these steps:
-
Clone the repository
git clone https://github.com/pqaidevteam/pqai-classifier.git
-
Using the
env
template in the repository, create a.env
file and set the environment variables.cd pqai-classifier cp env .env nano .env
-
Run
deploy.sh
script.chmod +x deploy.sh bash ./deploy.sh
This will create a docker image and run it as a docker container on the port number you specified in the .env
file.
Alternatively, after following steps (1) and (2) above, you can use the command python main.py
to run the service in a terminal.
This service is not dependent on any other PQAI service for its operation.
The following services depend on this service:
- pqai-gateway