Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Higher Level Abstractions to Extract Simple Field Types #35

Open
akshayphilar opened this issue Nov 30, 2019 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@akshayphilar
Copy link
Member

The unified schema has several text fields, some of which are listed below.

  1. Weight
  2. Unit
  3. Volume
  4. Rating
  5. Date
  6. URL
  7. Phone Number
  8. Email Address
  9. Rank

On a given web page, these could be enmeshed in other text such that it would currently require either an re_search function or a lengthier pipeline expression.

+ $36.56 Shipping & Import Fees Deposit to India

Thus not only would it make pipeline expressions more terse, it would also make them robust since we are abstracting away the intricate logic within well-tested shublang functions.

An interesting side effect would be that it would give us the building blocks to construct an AST for implementing #29

WDYT @VMRuiz @ivankivanov @mabelvj @BurnzZ

@akshayphilar akshayphilar added the enhancement New feature or request label Nov 30, 2019
@BurnzZ
Copy link
Contributor

BurnzZ commented Nov 30, 2019

@akshayphilar, If I understand this correctly, are we to create abstractions on extracting these fields, quite similar to how price and date are for the following?

@akshayphilar
Copy link
Member Author

akshayphilar commented Nov 30, 2019

There may be some sort of pre processing that may be required, but yes essentially we will be wrapping shublang functions around such libs whenever they are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants