Skip to content

agyryk/train-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. Install CB, create empty bucket

  2. load "salaries" dataset by running 'generate_salaries_dataset.py'

  3. run the parser by running primitive_v3/run.py

    APIs:

    http://localhost:5000/api/ping - returns pong

    http://localhost:5000/api/run - POST request to start indexing (need to post cb server settigns and tags rules

    http://localhost:5000/api/progress - general status of the parser

  4. run indexer using post to api/run. Example of the data:

    curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/run -d '{"server": {"name":"localhost", "bucket":"train", "port":"8091", "password":"password"},"rules":[{"name": "city", "source":"city", "value":"", "filter":""},{"name":"position", "source": "position", "value":"any java positions", "filter":"include(java)"},{"name":"position", "source": "position", "value":"any developer position", "filter":"include(developer)"},{"name":"position", "source": "position", "value":"only java developers", "filter":"include(java, developer)"},{"name":"position", "source": "position", "value":"all qa", "filter":"include(qa)"}, {"name":"salary", "source": "salary", "value":"below 150K", "filter":"range(<150)"},{"name":"salary", "source": "salary", "value":"above 150K", "filter":"range(>=150)"}]}'

    json structure:

    { "server": {"name":"localhost", "bucket":"train", "port":"8091", "password":"password"}}, "rules":[ {"name": "city", "source":"city", "value":"", "filter":""}, {"name":"position", "source": "position", "value":"any java positions", "filter":"include(java)"}, {"name":"position", "source": "position", "value":"any developer position", "filter":"include(developer)"}, {"name":"position", "source": "position", "value":"only java developers", "filter":"include(java, developer)"}, {"name":"position", "source": "position", "value":"all qa", "filter":"include(qa)"}, {"name":"salary", "source": "salary", "value":"below 150K", "filter":"range(<150)"}, {"name":"salary", "source": "salary", "value":"above 150K", "filter":"range(>150)"}, {"name": "experience", "source":"experience", "value":"", "filter":""} ] }

    the POC supports 3 types of filters: all values, include(test) and range(number)

  5. Indexer APIs:

    /api/index_stats - returns size of the index /api/query - post request to run query

  6. Queries supported:

    get_all_tags get_values_by_tag get_docids_by_tag_and_value get_tags_by_docid get_values_by_docid_and_tag

    Examples: curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/query -d '{"type":"all_tags"}' curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/query -d '{"type":"values_by_tag", "tag":"city:}'' curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/query -d '{"type":"ids_by_tag_and_value", "tag":"city", "value":"Chicago"}'' curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/query -d '{"type":"tags_by_id", "id":"13"}'' curl -i -H "Content-Type: application/json" -XPOST http://localhost:5000/api/query -d '{"type":"value_by_id_and_tag", "tag":"city", "id":"23"}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages