Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial rule system #1737

Draft
wants to merge 6 commits into
base: dev
Choose a base branch
from
Draft

Conversation

assafdayan
Copy link

@assafdayan assafdayan commented Mar 1, 2021

Created an initial pull request for the rule system. Issue #1483

Main PR contents:

  • Added a newsflash_features table (with Alembic)
  • The only feature at this stage is is_urban
  • Added the Features object to the newsflash's context (RequestParams)
  • Added a Feature Generator class (currently a stub implementation) with versioning
  • Added synchronous, one-time (per feature data version) feature generation flow
  • Modified the returned widget array to be sorted by widget relevance, with the relevance in the rank field and the range [0.0-1.0]

Details

Basics

The rules feature couples each newsflash object with a features object. The features object is generated once per newsflash (per version, more on that later). The features are stored in the DB, both for avoiding constant re-generation and for later analysis and debugging. The generated features are used by each widget for grading its own relevance to the newsflash. The widget generation system sorts the widget by descending relevance - from highest to lowest (based on the ranking by the rule system).

Features

The features are a set of enrichment values which are used by the widgets' ranking code. The features themselves are generated by a FeatureGenerator object. The Feature Generator receives a newsflash and outputs a NewsflashFeatures object, which is stored to the DB. Aside from the enrichment fields and the coupled newsflash ID, the NewsflashFeatures object has some metadata - creation timestamp and version.

Features Version

The version is a numeric constant which denotes the current feature code version. It is a monotonically increasing integer. For example, it may be 1, but after a modification was made to fix a feature's generation logic, it is incremented to 2. An incrementation MUST take place whenever a change is made to the feature generation code or schema.
When retrieving the stored features for a newsflash, only the latest row with the matching version to the constant in the app's code is fetched, ensuring compatibility.

Generation flow

When a Newsflash object is retrieved from the DB, the existence of a matching NewsflashFeatures with a version matching the current value in the app's code is checked. If a features object is found, it is fetched. Otherwise, it is generated and persisted. This flow ensures that whenever the features generation version changes, any existing features objects are invalidated and new ones are generated on-demand - once per newsflash. Keeping the existing rows allows us to perform comparative analysis between features generated by different versions.

TBD in Next PRs

@assafdayan assafdayan requested a review from atalyaalon March 1, 2021 20:51
@atalyaalon atalyaalon marked this pull request as draft March 31, 2022 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant