Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement .withBatchInputTable() #10

Open
EvanBoyle opened this issue Dec 28, 2019 · 0 comments
Open

Implement .withBatchInputTable() #10

EvanBoyle opened this issue Dec 28, 2019 · 0 comments
Assignees

Comments

@EvanBoyle
Copy link
Owner

https://github.com/EvanBoyle/pulumi-serverless-db/pull/8/files#diff-f95bdcb0f919d600e736e8e9da74022dR93

We should implement a batch counterpart to withStreamInputTable. The primary use I have in mind is aggregation of streaming events. Consider a DW with two tables impressions, and clicks. The business would like to create a higher level table that includes some aggregated hourly facts such as click through rate. To accomplish this once an hour you need to run a job once an hour that scans an hours worth of clicks, and an hours worth of impressions, and outputs an s3 file with the contents of the summary:

adID: 1, impressions: 100, clicks: 5, CTR: .05
adId: 2, impressions: 10, clicks 1, CTR: .1

We can provide an api withBatchInputTable that enables this sort of scenario using ECS Fargate. Mainly we can allow the user to define a function to run in a container (the code that issues a query to one or more tables, and writes the results to the correct s3 location, and even creates a partition if necessary), and define the interval that the task should run on.

We can certainly start with a prototype using lambda. I think that would be a good proof of concept to get the API surface area fleshed out. Eventually we should probably using something like ECS fargate given lambda's memore and disk limitations. ECS fargate gives you up to 30 GB of RAM which enables more flexibility in the scale of aggregations and queries that users will be able to do.

Some examples for fargate, which involves building a docker image: https://github.com/pulumi/examples/tree/master/aws-ts-hello-fargate
https://www.pulumi.com/blog/get-started-with-docker-on-aws-fargate-using-pulumi/
https://www.pulumi.com/docs/tutorials/aws/ecs-fargate/

@EvanBoyle EvanBoyle assigned EvanBoyle and jmaysrowland and unassigned EvanBoyle Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants