Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create initial warehouse star schema #1325

Open
2 tasks
blarghmatey opened this issue Oct 10, 2024 · 1 comment
Open
2 tasks

Create initial warehouse star schema #1325

blarghmatey opened this issue Oct 10, 2024 · 1 comment
Assignees
Labels
Data Analytics Data Engineering product:data-platform Issues related to the Data Platform product

Comments

@blarghmatey
Copy link
Member

blarghmatey commented Oct 10, 2024

User Story

  • As an analytics engineer I would like to have a well-factored star schema to improve the flexibility and re-use of data assets in the warehouse

Description/Context

For detailed context refer to https://github.com/mitodl/hq/pull/5749

This epic covers the creation of the initial set of fact and dimension tables needed to support building the current "mart" tables from those underlying assets.

Acceptance Criteria

  • All of the current "mart" tables can be created as views or materialized tables that are constructed from the fact and dimension tables
  • All fact and dimension tables are fully documented such that any data consumer will understand the meaning of each field and record in each table

Plan/Design

Create the following set of dimensional tables (subject to adjustment as we progress through implementation):

Dimensions

  • dim_content: Contains information about content objects including courses,
    course blocks, videos, site pages (e.g. in OCW), etc.
  • dim_platform: Information about the technical system that a given event is
    executed within the context of.
  • dim_user: Contains information about system users including
    demographic/profile attributes. This will need to be a
    slowly-changing-dimension model.
  • dim_purchaseable: Includes information about elements of content and
    services that can be sold for >=$0 (e.g. programs, course runs, bootcamps,
    subscriptions, etc.)

Facts

  • tfact_purchase: Monetary transactions (including $0 transactions, coupon
    redemption, financial aid, etc.)
  • afact_grade: Information about a grade computed for a user in a course-run
  • tfact_account: Contains events related to account actions (e.g. created,
    logged in, etc.)
  • tfact_submit: Events involving a learner submitting information to a system,
    such as answering a question
  • afact_credential: Information about the granting of a credential
    (e.g. course certificates, program certificates, etc.)
  • tfact_coupon: Events pertaining to ecommerce coupons including creation,
    consumption, deletion/deactivating
  • tfact_submit: Contains event details about information submitted by a user
    into a platform, such as answering a question, executing a search, posting a
    forum question, etc.
  • tfact_video: Contains event information about interactions with a video such
    as play, pause, seek, etc.

Draft ERD (https://erdlab.page.link/7agMHFfKTiL2yxD57)

Image

@KatelynGit
Copy link
Contributor

KatelynGit commented Dec 18, 2024

https://erdlab.page.link/xjxGkH2X17N8shaR9
Image

proposing a new model for engagement data to answer questions the xpro team has that aren't answerable in the current marts. Some questions they have are here: https://github.com/mitodl/hq/issues/6176.

Since a design couldn't be agreed on I'm not implementing this but I'll leave it here in case it's useful in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Analytics Data Engineering product:data-platform Issues related to the Data Platform product
Projects
None yet
Development

No branches or pull requests

3 participants