Skip to content

Commit

Permalink
InnerHit (#577)
Browse files Browse the repository at this point in the history
Support InnerHit in Nrtsearch
  • Loading branch information
waziqi89 authored Jun 8, 2023
1 parent 2d2ebbb commit 1b04273
Show file tree
Hide file tree
Showing 20 changed files with 3,415 additions and 1,164 deletions.
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ sourceCompatibility = 1.14
targetCompatibility = 1.14

allprojects {
version = '0.24.1'
version = '0.25.0'
group = 'com.yelp.nrtsearch'
}

Expand Down
23 changes: 23 additions & 0 deletions clientlib/src/main/proto/yelp/nrtsearch/search.proto
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,26 @@ message SearchRequest {
Highlight highlight = 24;
// If Lucene explanation should be included in the response
bool explain = 25;
// Search nested object fields for each hit
map<string, InnerHit> inner_hits = 26;
}

/* Inner Hit search request */
message InnerHit {
// Nested path to search against assuming same index as the parent Query.
string query_nested_path = 1;
// Which hit to start from (for pagination); default: 0
int32 start_hit = 2;
// How many top hits to retrieve; default: 3. It limits the hits returned, starting from index 0. For pagination: set it to startHit + window_size.
int32 top_hits = 3;
// InnerHit query to query against the nested documents specified by queryNestedPath.
Query inner_query = 4;
// Fields to retrieve; Parent's fields except its id field are unavailable in the innerHit.
repeated string retrieve_fields = 5;
// Sort hits by field (default is by relevance).
QuerySortField query_sort = 6;
// Highlight the children documents.
Highlight highlight = 7;
}

/* Virtual field used during search */
Expand Down Expand Up @@ -542,6 +562,7 @@ message SearchResponse {
map<string, double> facetTimeMs = 9;
double rescoreTimeMs = 10;
map<string, double> rescorersTimeMs = 11;
map<string, Diagnostics> innerHitsDiagnostics = 12;
}

message Hit {
Expand Down Expand Up @@ -582,6 +603,8 @@ message SearchResponse {
map<string, Highlights> highlights = 5;
// Lucene explanation of the hit
string explain = 6;
// InnerHits for each hit
map<string, HitsResult> innerHits = 7;
}

message SearchState {
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ A high performance gRPC server, with optional REST APIs on top of `Apache Lucene
querying_nrtsearch
analysis
highlighting
inner_hit
additional_collectors
index_settings
index_live_settings
Expand Down
167 changes: 167 additions & 0 deletions docs/inner_hit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
InnerHit
==========================

Nested objects are stored as the separate documents, compared to the parent documents. NestedQuery enables the filter on parent documents that have at least one nested/child document matches the inner filters. However, the HitResponse will return only the parent documents, so no matched child information will be available from it. To get all the matched child documents per parent document, the innerHit must be used. Users may think the innerHit as a second layer search for each parent hit, and an empty innerHit query would return all children for each hit.

Requirements
------------

To start an innerHit, a parent searchRequest must be present. In additional, the index to search must have the child object field registed as nested field.

.. code-block:: json
{
"name": "field_name"
"nestedDoc": true,
"multiValued": true,
"type": "OBJECT",
"childFields": [
...
]
}
Query Syntax
------------

This is the proto definition for InnerHit message which can be specified in SearchRequest:

.. code-block:: protobuf
/* Inner Hit search request */
message InnerHit {
// Nested path to search against assuming same index as the parent Query.
string query_nested_path = 1;
// Which hit to start from (for pagination); default: 0
int32 start_hit = 2;
// How many top hits to retrieve; default: 3. It limits the hits returned, starting from index 0. For pagination: set it to startHit + window_size.
int32 top_hits = 3;
// InnerHit query to query against the nested documents specified by queryNestedPath.
Query inner_query = 4;
// Fields to retrieve; Parent's fields except its id field are unavailable in the innerHit.
repeated string retrieve_fields = 5;
// Sort hits by field (default is by relevance).
QuerySortField query_sort = 6;
// Highlight the children documents.
Highlight highlight = 7;
}
Example Queries
---------------

Assuming we have a yaml representation of the documents stored in `index_alpha`:

.. code-block:: yaml
// parent document 1
- business_name: restaurant_A
business_address: 10 A street
menu:
- food_name: chicken
price: 5
- food_name: burger
price: 8
// parent document 2
- business_name: restaurant_B
business_address: 6 B avenue
menu:
- food_name: coke
price: 4
- food_name: cheeseburger
price: 10
Case 1
^^^^^^
We would like to get all parents. - get all business. (no innerHit involvement)

.. code-block:: json
{
"indexName": "index_alpha",
"retrieveFields": ["business_name"]
}
Case 2
^^^^^^
We would like to get all children. - get all food in the menu for each business.

.. code-block:: json
{
"indexName": "index_alpha",
"retrieveFields": ["business_name"],
"innerHit": {
"query_nested_path": "menu",
"retrieve_fields": ["menu.food_name"]
}
}
Case 3
^^^^^^
We would like to get all children with parent filtering. - get all food in the menu for restaurant_A.

.. code-block:: json
{
"indexName": "index_alpha",
"query": {
"termQuery":{
"field": "business_name",
"textValue": "restaurant_A"
}
},
"retrieveFields": ["business_name"],
"innerHit": {
"query_nested_path": "menu",
"retrieve_fields": ["menu.food_name"]
}
}
Case 4
^^^^^^
We would like to get all children with child filtering. - get all food in the menu whose price is lower than 6.

.. code-block:: json
{
"indexName": "index_alpha",
"retrieveFields": ["business_name"],
"innerHit": {
"query_nested_path": "menu",
"query": {
"rangeQuery":{
"field": "menu.price",
"upper": "6"
}
},
"retrieve_fields": ["menu.food_name"]
}
}
Case 5
^^^^^^
We would like to get children with both parent and child filtering. - get all food in the menu whose price is lower than 6 within resturant_A.

.. code-block:: json
{
"indexName": "index_alpha",
"query": {
"termQuery":{
"field": "business_name",
"textValue": "restaurant_A"
}
},
"retrieveFields": ["business_name"],
"innerHit": {
"query_nested_path": "menu",
"query": {
"rangeQuery":{
"field": "menu.price",
"upper": "6"
}
},
"retrieve_fields": ["menu.food_name"]
}
}
59 changes: 59 additions & 0 deletions grpc-gateway/luceneserver.swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -2032,6 +2032,12 @@
"type": "number",
"format": "double"
}
},
"innerHitsDiagnostics": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/SearchResponseDiagnostics"
}
}
}
},
Expand Down Expand Up @@ -2068,6 +2074,13 @@
"explain": {
"type": "string",
"title": "Lucene explanation of the hit"
},
"innerHits": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/luceneserverHitsResult"
},
"title": "InnerHits for each hit"
}
}
},
Expand Down Expand Up @@ -3688,6 +3701,45 @@
},
"title": "A suggester that matches terms anywhere in the input text, not just as a prefix. (see @lucene:org:server.InfixSuggester)"
},
"luceneserverInnerHit": {
"type": "object",
"properties": {
"query_nested_path": {
"type": "string",
"description": "Nested path to search against assuming same index as the parent Query."
},
"start_hit": {
"type": "integer",
"format": "int32",
"title": "Which hit to start from (for pagination); default: 0"
},
"top_hits": {
"type": "integer",
"format": "int32",
"description": "How many top hits to retrieve; default: 3. It limits the hits returned, starting from index 0. For pagination: set it to startHit + window_size."
},
"inner_query": {
"$ref": "#/definitions/luceneserverQuery",
"description": "InnerHit query to query against the nested documents specified by queryNestedPath."
},
"retrieve_fields": {
"type": "array",
"items": {
"type": "string"
},
"description": "Fields to retrieve; Parent's fields except its id field are unavailable in the innerHit."
},
"query_sort": {
"$ref": "#/definitions/luceneserverQuerySortField",
"description": "Sort hits by field (default is by relevance)."
},
"highlight": {
"$ref": "#/definitions/luceneserverHighlight",
"description": "Highlight the children documents."
}
},
"title": "Inner Hit search request"
},
"luceneserverIntObject": {
"type": "object",
"properties": {
Expand Down Expand Up @@ -4585,6 +4637,13 @@
"explain": {
"type": "boolean",
"title": "If Lucene explanation should be included in the response"
},
"inner_hits": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/luceneserverInnerHit"
},
"title": "Search nested object fields for each hit"
}
}
},
Expand Down
Loading

0 comments on commit 1b04273

Please sign in to comment.