Skip to content
Mahesh Maan edited this page Dec 5, 2022 · 3 revisions

PQAI Classifier

This service is intended to contain algorithms and ML models that classify patents into one or more of a predefined set of finite categories.

In the real world, patent classifiers are useful for a number of reasons, such as:

  • to identify patents related to a particular technology area, such as steel manufacturing
  • to do patent landscaping
  • for creating automated patent alerts
  • for routing new patent applications to an appropriate department in a patent office.

In the PQAI search pipeline, patent classifiers can be used to identify the technology area of a search query, allowing the downstream search to narrow down on a specific segment of the prior-art database. For example, if a query is about OLED displays, patents related to pharmaceuticals can be excluded from the search. This is a way to reduce search latency without affecting accuracy of results.

Code structure

root
  |-- core
        |-- classifiers.py			// defines classifiers
  |-- assets						// files needed by classifiers, e.g. ML models
  |-- tests
        |-- test_server.py			// Tests for the REST API
        |-- test_classifiers.py		// Tests for classifiers module
  |-- main.py						// Defines the REST API
  |
  |-- requirements.txt				// List of Python dependencies
  |
  |-- Dockerfile					// Docker files
  |-- docker-compose.yml
  |
  |-- env							// .env file template
  |-- deploy.sh						// Script for setting up on local system

Core modules

Classifiers

The classifiers module defines patent classifiers. There are two defined in the current implementation. Both associate one of the 600+ CPC/IPC subclasses to a given input text snippet.

  1. BOWSubclassPredictor
  2. BERTSubclassPredictor

The BOWSubclassPredictor treats the input text as a bag of words. It uses a neural network model that was trained in a supervised manner to associate CPC subclass labels to a patent claim preambles.

The BERTSublassPredictor is a BERT based model that was trained in a supervised manner to associate CPC subclass labels to patent abstracts.

Both of the above classifiers are instantiated as singleton objects, so there is never more than one instance of them in the memory.

Typical usage is as follows (given for BERTSubclassPredictor but same for BOWSubclassPredictor):

clf = BERTSubclassPredictor() # or BOWSubclassPredictor()
text = "natural language processsing"
subclasses = clf.predict_subclasses(text)
print(subclasses)  # ["G06F", "G10L", "G09B", "G06Q", "G06T"]

Assets

The assets required to run this service are stored in the /assets directory.

When you clone the Github repository, the /assets directory will have nothing but a README file. You will need to download actual asset files as a zip archive from the following link:

https://https://s3.amazonaws.com/pqai.s3/public/assets-pqai-classifier.zip

After downloading, extract the zip file into the /assets directory.

(alternatively, you can also use the deploy.sh script to do this step automatically - see next section)

The assets contain the following files/directories:

  • uncased_L-12_H-768_A-12: a BERT based model for associating CPC subclasses to snippets of text
  • pmbl2subclass.features.json: list of features (words) used by a deep learning model trained to identify CPC subclasses on the basis of claim preambles
  • pmbl2subclass.h5: neural network model weights
  • pmbl2subclass.json: neural network model metadata
  • pmbl2subclass.targets.json: list of target labels

Deployment

Prerequisites

The following deployment steps assume that you are running a Linux distribution and have Git and Docker installed on your system.

Setup

The easiest way to get this service up and running on your local system is to follow these steps:

  1. Clone the repository

    git clone https://github.com/pqaidevteam/pqai-classifier.git
    
  2. Using the env template in the repository, create a .env file and set the environment variables.

    cd pqai-classifier
    cp env .env
    nano .env
    
  3. Run deploy.sh script.

    chmod +x deploy.sh
    bash ./deploy.sh
    

This will create a docker image and run it as a docker container on the port number you specified in the .env file.

Alternatively, after following steps (1) and (2) above, you can use the command python main.py to run the service in a terminal.

Service dependency

This service is not dependent on any other PQAI service for its operation.

Dependent services

The following services depend on this service:

  • pqai-gateway
Clone this wiki locally