Implement .withBatchInputTable() #10

EvanBoyle · 2019-12-28T03:10:03Z

https://github.com/EvanBoyle/pulumi-serverless-db/pull/8/files#diff-f95bdcb0f919d600e736e8e9da74022dR93

We should implement a batch counterpart to withStreamInputTable. The primary use I have in mind is aggregation of streaming events. Consider a DW with two tables impressions, and clicks. The business would like to create a higher level table that includes some aggregated hourly facts such as click through rate. To accomplish this once an hour you need to run a job once an hour that scans an hours worth of clicks, and an hours worth of impressions, and outputs an s3 file with the contents of the summary:

adID: 1, impressions: 100, clicks: 5, CTR: .05
adId: 2, impressions: 10, clicks 1, CTR: .1

We can provide an api withBatchInputTable that enables this sort of scenario using ECS Fargate. Mainly we can allow the user to define a function to run in a container (the code that issues a query to one or more tables, and writes the results to the correct s3 location, and even creates a partition if necessary), and define the interval that the task should run on.

We can certainly start with a prototype using lambda. I think that would be a good proof of concept to get the API surface area fleshed out. Eventually we should probably using something like ECS fargate given lambda's memore and disk limitations. ECS fargate gives you up to 30 GB of RAM which enables more flexibility in the scale of aggregations and queries that users will be able to do.

Some examples for fargate, which involves building a docker image: https://github.com/pulumi/examples/tree/master/aws-ts-hello-fargate
https://www.pulumi.com/blog/get-started-with-docker-on-aws-fargate-using-pulumi/
https://www.pulumi.com/docs/tutorials/aws/ecs-fargate/

The text was updated successfully, but these errors were encountered:

EvanBoyle assigned EvanBoyle and jmaysrowland and unassigned EvanBoyle Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement .withBatchInputTable() #10

Implement .withBatchInputTable() #10

EvanBoyle commented Dec 28, 2019

Implement .withBatchInputTable() #10

Implement .withBatchInputTable() #10

Comments

EvanBoyle commented Dec 28, 2019