I procrastinate a lot by reloading webpages, looking for new content. However, I don't like being a Skinner box rat, so I wrote this digest generator to tame my FOMO.
⚠️ no-more-f5
works only with Java 8! Java 9 is not supported.
Rough idea of the required cloud stack:
- Emails are sent using AWS SES via the SMTP protocol.
- The function itself is deployed to AWS Lambda and is triggered by a scheduled CloudWatch event (cron).
If you want to know, here's the motivation for this stack:
Why SMTP protocol?
Since we need to scrape RSS feeds, we need Internet access. This can be configured in two ways:
- Place the Lambda function outside a VPC and connecting to SES via SMTP.
- Place the Lambda function inside a VPC and route Internet traffic through a NAT Gateway. In this case we can talk to SES directly.
I use the former way. Although SMTP emails cost a little bit more, configuring a VPC and a NAT Gateway is tedious and a NAT Gateway is certainly much more expensive than the SMTP emails. However, if you already have one, you can certainly try it. YMMV.
You will need Leiningen to build your uberjar.
But first, create a list of your Atom/RSS feeds and save it in a file, e.g. my_feeds
:
$ cat > my_feeds <<EOF
https://github.com/BurntSushi/ripgrep/releases.atom
https://github.com/atom/atom/releases.atom
EOF
Now we build a standalone uberjar and add my_feeds
to it
(remember, jars are just zip archives).
This process is automated in prepare_package.sh
(specify your feeds file as a call parameter):
$ ./prepare_package.sh my_feeds
- Verify your email address in SES.
- Create SMTP credentials and save them -- we'll need them later.
Important: Creating SMTP credentials also creates an IAM user. Do not use this user's credentials for the SMTP server!
- Create a new Lambda function.
- Use a standard IAM role, just enough to store CloudWatch logs.
- Select Java 8 as runtime.
- Add a CloudWatch event as a trigger. Schedule it to something like
cron(0 6 * * ? *)
, i.e. every day at 6:00 UTC. - Choose something around 384 MB memory and 90 seconds timeout (depends heavily on the number of feeds you want to digest).
- Set handler to
no_more_f5.core::handler
- Now we need to setup environment variables. Add following envvars:
Variable | Note | Example |
---|---|---|
FEEDS |
Filename of the file with your feed URLs | my_feeds |
USER_AGENT |
See below | Mozilla/5.0 ... |
SMTP_SERVER |
Address of your AWS SES SMTP server | email-smtp.eu-west-1.amazonaws.com |
SMTP_PORT |
SMTP server port, check out your SES docs | 587 |
SMTP_USER |
Use your SES SMTP credentials here | |
SMTP_PASS |
Use your SES SMTP credentials here | |
EMAIL_FROM |
Must be verified in AWS SES | [email protected] |
EMAIL_TO |
All of them must be verified in AWS SES | [email protected], [email protected] |
SINGLE_SITE_TIMEOUT |
Timeout for each fetching connection | 2000 |
You need to specify USER_AGENT
since some sites block scrapers without it.
Just use something similar to your main browser.
EMAIL_TO
can contain multiple addresses, separated by commas.
Make sure you use only verified addresses if you are still in the SES Sandbox mode.
SINGLE_SITE_TIMEOUT
is helpful if some feed is unresponsive.
Instead of timing out the whole Lambda function,
you'll just get an exception message for the unresponsive feed.
Ok, you should be ready to go! Create a dummy testing event
(just use an empty dict {}
as context) and see if you've got a digest in your inbox!
One more thing:
Go to CloudWatch and configure log retention for your no-more-f5
log group.
Set it to something reasonable, e.g. 7 days.
Storing a lot of logs (several GBs) might be expensive and it's just not worth it in this case.
For local testing, create a profiles.clj
file in the root repo folder.
Add the following map to it:
{:dev
{:env
{
:feeds "dev_feeds"
:single-site-timeout "2000"
:smtp-user "..."
:smtp-pass "..."
:smtp-server "email-smtp.eu-west-1.amazonaws.com"
:smtp-port "587"
:user-agent "..."
:email-from "..."
:email-to "..."
}
}
}
Then just use lein run
to run the app.
Alternatively, you can set all required environment variables and call
$ java -cp <path_to_your_uberjar> no_more_f5.core
If you have your own server running 24/7, you can schedule local execution with cron. And of course you can use your own email account, just make sure to get an app token for SMTP instead of using your password.
No idea, I'll update this when I get my first monthly bill. But probably not much.