Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Free Form Text to Enum #26

Open
akshayphilar opened this issue Nov 14, 2019 · 3 comments
Open

Convert Free Form Text to Enum #26

akshayphilar opened this issue Nov 14, 2019 · 3 comments
Labels
enhancement New feature or request

Comments

@akshayphilar
Copy link
Member

akshayphilar commented Nov 14, 2019

The availability field of the unified schema can have the following values

  • Discontinued
  • InStock
  • InStoreOnly
  • LimitedAvailability
  • OnlineOnly
  • OutOfStock
  • PreOrder
  • PreSale
  • SoldOut

As an example, the text "Limited quantity available" in the image below maps to LimitedAvailability.

image

Currently such a transformation would be very unwieldy using shublang expressions and would have to be managed outside it's context.

@akshayphilar akshayphilar changed the title Enum Mapping using NLP Convert Free Form Text to Enum Value Dec 1, 2019
@akshayphilar akshayphilar changed the title Convert Free Form Text to Enum Value Convert Free Form Text to Enum Dec 1, 2019
@akshayphilar akshayphilar added the enhancement New feature or request label Dec 2, 2019
@renancunha
Copy link
Contributor

renancunha commented Aug 25, 2020

Given the tests that we're doing this month through different fronts, this feature will be certainly used in several ways to simplify the task of mapping an input to output, according to some kind of instructions.

A possible approach here could be creating a new functionality map_value that should receive a parameter containing the mapping setup. Considering that there are cases where multiple inputs are mapped to the same output (for example, a hypothetical situation: Limited quantity available and Last units available should be mapped to LimitedAvailability), I think that this mapping setup could be a dictionary where the key is the desired output and the value is the possible(s) input(s).

To exemplify this idea, suppose the use-case where we need to do these conversions:

Input -> Output
Product available -> InStock
Limited quantity available -> LimitedAvailability
Last units available -> LimitedAvailability
Out of stock -> OutOfStock

We will call our new functionality map_value by this way, to convert input to output:

sanitize | map_value({"InStock": "Product available", "LimitedAvailability": ["Limited quantity available", "Last units available"], "OutOfStock": "Out of stock"})

PS 1.: I proposed to use the input as the dictionary value because in this way we can have multiple inputs pointing to the same output (key), but the otherwise will also work.

PS 2.: The name map_value is just an idea. If there is something that makes more sense we can rename it.

In this case, we will be returning always the first match as output. This is the first idea that comes to my mind in terms of doing this function in a concise way, keeping it simple/readable.

Let me know what you think about it and if there is something that I miss here. 😃

cc @akshayphilar @BurnzZ

@BurnzZ
Copy link
Contributor

BurnzZ commented Aug 27, 2020

Hi @renancunha

In this case, we will be returning always the first match as output. This is the first idea that comes to my mind in terms of doing this function in a concise way, keeping it simple/readable.

This would be for the list value in the key-value pair right? So in your example of "LimitedAvailability": ["Limited quantity available", "Last units available"], then "Limited quantity available" would have a higher priority than "Last units available"?

The map_value idea sounds good to me. However, I'm wondering how could we support longer mapping values. 🤔 So something like:

... | map_value({"InStock": "Product available", "LimitedAvailability": ["Limited quantity available", "Last units available"], "OutOfStock": "Out of stock"}) | ...

would be:

... | map_value(some_already_stored_mapping) | ...

Shublang, by its DSL nature, is stateless. So we could not refer to anything that was pre-defined before and refer to it in the future. The only reference it uses are the inputs and outputs via pipes, such as:

input | operator | output

I'm wondering if this breaks its statelesness. However, it might just be the necessary evil we need to handle those large mappings that would support the entire suite of the availability field needs.

What do you guys think?

@renancunha
Copy link
Contributor

Hi @BurnzZ

This would be for the list value in the key-value pair, right? So in your example of "LimitedAvailability": ["Limited quantity available", "Last units available"], then "Limited quantity available" would have a higher priority than "Last units available"?

In that case, ["Limited quantity available", "Last units available"] is the list of free form texts that will be converted to LimitedAvailability. I'm defining it as a list to support mapping multiple inputs to the same output, like the LimitedAvailability case shown above. But for the cases where there is just one possible free form text that should be mapped to something, you will set up it like "InStock": "The product is available on our store".

About the question of long mapping values, you're right. I had the same concern here. Following this approach, sometimes we will have huge mappings. The idea is that you define the entire mapping set up once, and call the map_value just once (without storing the mappings), preserving the stateless nature of the Shublang, but this will result in a long mapping dict as a parameter to the function. But as you said, maybe it's interesting to have the mappings stored (the necessary evil 😄) as they are defined, to be re-used later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants