Skip to content

Python PMML scoring library for PySpark as SparkML Transformer

License

Notifications You must be signed in to change notification settings

autodeployai/pypmml-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPMML-Spark

PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4S-Spark.

Prerequisites

  • Java >= 1.8
  • Python 2.7 or >= 3.5

Dependencies

Module PySpark
pypmml-spark PySpark >= 3.0.0
pypmml-spark2 PySpark >= 2.4.0, < 3.0.0

Installation

pip install pypmml-spark

Or install the latest version from github:

pip install --upgrade git+https://github.com/autodeployai/pypmml-spark.git

After that, you need to do more to use it in Spark that must know those jars in the package pypmml_spark.jars. There are several ways to do that:

  1. The easiest way is to run the script link_pmml4s_jars_into_spark.py that is delivered with pypmml-spark:

    link_pmml4s_jars_into_spark.py
  2. Use those config options to specify dependent jars properly. e.g. --jars, or spark.executor.extraClassPath and spark.executor.extraClassPath. See Spark for details about those parameters.

Usage

  1. Load model from various sources, e.g. filename, string, or array of bytes.

    from pypmml_spark import ScoreModel
    
    # The model is from http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml
    model = ScoreModel.fromFile('single_iris_dectree.xml')
  2. Call transform(dataset) to run a batch score against an input dataset.

    # The data is from http://dmg.org/pmml/pmml_examples/Iris.csv
    df = spark.read.csv('Iris.csv', header='true')
    score_df = model.transform(df)

Use PMML in Scala or Java

See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.

Use PMML in Python

See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.

Use PMML in Spark

See the PMML4S-Spark project. PMML4S-Spark is a PMML scoring library for Spark as SparkML Transformer.

Deploy PMML as REST API

See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.

Support

If you have any questions about the PyPMML-Spark library, please open issues on this repository.

Feedback and contributions to the project, no matter what kind, are always very welcome.

License

PyPMML-Spark is licensed under APL 2.0.