Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralized CMS Caching #1823

Closed
wants to merge 9 commits into from
Closed

Conversation

anthonyshull
Copy link
Contributor

We're using a centralized Redis cache. We use the Nebulex library with the Redis adapter. We also use Telemetry to get cache stats directly from the adapter.

We have two endpoints, protected by basic auth, that allow us to expire cache keys.

Comment on lines +197 to +199
def generate(_, _, [path, %{}]) do
"/cms/#{String.trim(path, "/")}"
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates a key from the path by preceding it with /cms.

Comment on lines +202 to +204
def generate(_, _, [path, params]) do
"/cms/#{String.trim(path, "/")}" <> params_to_string(params)
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creates a key like above, but also handles parameters so you can have something like /cms/foo?bar=baz&bop=bam

Comment on lines -50 to +65
{:httpoison, ">= 0.0.0"},
{:poison, ">= 0.0.0", override: true},
{:timex, ">= 0.0.0"},
{:plug, "~> 1.14.2"},
{:html_sanitize_ex, "1.3.0"},
{:bypass, "~> 1.0", only: :test},
{:quixir, "~> 0.9", only: :test},
{:decorator, "1.4.0"},
{:html_sanitize_ex, "1.3.0"},
{:httpoison, ">= 0.0.0"},
{:mock, "~> 0.3.3", only: :test},
{:nebulex, "2.5.2"},
{:nebulex_redis_adapter, "2.3.1"},
{:phoenix_html, "~> 3.0"},
{:repo_cache, in_umbrella: true},
{:plug, "~> 1.14.2"},
{:poison, ">= 0.0.0", override: true},
{:quixir, "~> 0.9", only: :test},
{:telemetry, "0.4.3"},
{:telemetry_metrics, "0.6.1"},
{:telemetry_metrics_statsd, "0.7.0"},
{:telemetry_poller, "0.5.1"},
{:timex, ">= 0.0.0"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetized these.

Comment on lines -64 to -78
path = "/news/2018/news-entry"
params = %{}
cache_key = {:view_or_preview, path: path, params: params}

# ensure cache is empty
case ConCache.get(Repo, cache_key) do
nil ->
:ok

{:ok, %{"type" => [%{"target_id" => "news_entry"}]}} ->
ConCache.dirty_delete(Repo, cache_key)
end

assert %NewsEntry{} = Repo.get_page(path, params)
assert {:ok, %{"type" => [%{"target_id" => "news_entry"}]}} = ConCache.get(Repo, cache_key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed this for now. The final PR will have added tests.

Comment on lines -82 to -87
path = "/basic_page_no_sidebar"
params = %{"preview" => "", "vid" => "112", "nid" => "6"}
cache_key = {:view_or_preview, path: path, params: params}
assert ConCache.get(Repo, cache_key) == nil
assert %Basic{} = Repo.get_page(path, params)
assert ConCache.get(Repo, cache_key) == nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This test will be back later.

@@ -29,6 +33,8 @@ defmodule CMS.Repo do

@cms_api Application.get_env(:cms, :cms_api)

@ttl :timer.hours(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set a TTL because we don't have a sophisticated mechanism to handle failed requests from Drupal to Dotcom.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: If you're increasing the TTL to an hour, do need any of the Nebulex changes? It seems like that alone would make a big difference in the number of cache hits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will, but the content creators need the ability to reset a key so that content gets pushed more quickly. the ttl has been one minute so that they see the content change quickly. this lets us actually cache the content, but also update it quickly...something we can't do without significant work using ets and elixir nodes that don't talk to each other currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ideal state would actually to use a message queue instead of sending patch requests. now that i know terraform...lookout. ultimately, there is no way around the two generals problem. so setting some ttl is a safety net.

Copy link
Member

@paulswartz paulswartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question: how does Nebulex handle things like Redis going down? That isn't something we've needed to address with the in-memory cache.
general question: what's the performance like? we're adding both a network round trip, as well as a serialize/deserialize step.

@@ -29,6 +33,8 @@ defmodule CMS.Repo do

@cms_api Application.get_env(:cms, :cms_api)

@ttl :timer.hours(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: If you're increasing the TTL to an hour, do need any of the Nebulex changes? It seems like that alone would make a big difference in the number of cache hits.

groups = Enum.group_by(metrics, & &1.event_name)

for {event, metrics} <- groups do
:telemetry.attach({__MODULE__, event, self()}, event, &__MODULE__.handle_event/4, metrics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do you need a GenServer for this? I don't see it maintaining any state, so the metric source could do the logging directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, this is how you're supposed to write custom telemetry reporters. it's how the other reporter we're using telemetry_metrics_statsd works: https://github.com/beam-telemetry/telemetry_metrics_statsd/blob/main/lib/telemetry_metrics_statsd.ex. and the console reporter that telemetry metrics ships with: https://github.com/beam-telemetry/telemetry_metrics/blob/main/lib/telemetry_metrics/console_reporter.ex. the idea behind telemetry is to separate out the metric emitters from reporters.

@@ -43,6 +43,18 @@ defmodule SiteWeb.CMSController do
|> handle_page_response(conn)
end

def reset_cache_key(conn, %{"object" => object, "id" => id}) do
CMS.Repo.delete("/cms/#{object}/#{id}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is there a CMS PR for this, or is this built-in Drupal functionality that we're using?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apps/site/lib/site_web/router.ex Show resolved Hide resolved
@anthonyshull
Copy link
Contributor Author

general question: how does Nebulex handle things like Redis going down? That isn't something we've needed to address with the in-memory cache. general question: what's the performance like? we're adding both a network round trip, as well as a serialize/deserialize step.

redix handles reconnections. https://github.com/whatyouhide/redix?tab=readme-ov-file#features if there is some kind of serious error, the site wouldn't function. but, the same could be said about ec2 going down or postgres going down, or route 53, etc.

we won't have performance numbers until we're actually pushing data through the system. but i highly suspect that it's going to be much faster given the fact that it current cache hit rate is 50%.

Comment on lines +40 to +46
config :cms, CMS.Repo,
conn_opts: [
host: "127.0.0.1",
port: 6379
],
stats: true,
telemetry: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be different for prod. i just don't know how until we can get something running on aws.

@paulswartz
Copy link
Member

if there is some kind of serious error, the site wouldn't function. but, the same could be said about ec2 going down or postgres going down, or route 53, etc.

That's a great point, and why it's good to avoid introducing new external dependencies to the critical path. Something to think about!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants