elasticsearch history support #1408

jheld · 2024-11-22T03:31:19Z

This looks to add elasticsearch as the task event history backend to flower.

Proposed Solution

Note this essentially still the original PR #821 (I don't recall why I closed it)

This is mainly 2 pieces:

indexing events into elasticsearch in a somewhat efficient manner (for the process and for elasticsearch)
searching/sorting/pagination of task history & task lookup by means of elasticsearch.

Done so far (working)

indexing tasks into elasticsearch
- I have a background thread that buffers up task events based on a queue and sends bulk index requests into elasticsearch
Searching (moderate support for different fields), sorting on all fields. The sorting & pagination work but need more QA.
Dashboard able to pull from elasticsearch (at startup)

Questions

where will this logic live? flower subcommand? flower proper (w/ elasticsearch flag settings) @johnarnold
originally I had the indexer outside of flower. It's now in flower, but configured in a hack-off standalone mode, based currently just on argv. There are a few --elasticsearch flags to control the behavior, just for draft/dev mode for now.
I am using kombu's LRUCache to cache certain search_after queries for the task history pagination. It is my way around the elasticsearch pagination restrictions, and keeps deep pagination requests super performant in my testing.
can we improve the elasticsearch indexing process?

Copilot

Pull Request Overview

This PR adds Elasticsearch support to Flower for task event history management. It introduces an alternative backend to store and query task events in Elasticsearch instead of relying solely on in-memory storage.

Adds Elasticsearch indexing capabilities through a background thread that buffers and bulk indexes task events
Implements search, sorting, and pagination functionality using Elasticsearch queries
Introduces new configuration options for Elasticsearch connection, indexing behavior, and data retention

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
setup.py	Adds Elasticsearch dependencies and console script entry points
flower/views/tasks.py	Integrates Elasticsearch querying for task views with fallback to in-memory storage
flower/utils/tasks.py	Extends task filtering parameters and adds type annotation
flower/utils/search.py	Enhances search term parsing with new fields and time-based filtering
flower/urls.py	Adds Elasticsearch refresh API endpoint
flower/options.py	Defines new Elasticsearch configuration options
flower/logging_utils.py	Adds custom logging formatter for Celery exceptions
flower/indexer_app.py	Creates dedicated indexer application for Elasticsearch event processing
flower/events.py	Adds Elasticsearch dashboard data retrieval functionality
flower/elasticsearch_events.py	Core Elasticsearch indexing logic with background threading
flower/command.py	Adds new indexer command for standalone Elasticsearch indexing
flower/api/tasks.py	Integrates Elasticsearch support in task API endpoints
flower/api/elasticsearch_history.py	Implements Elasticsearch-based task history API handlers
flower/init.py	Updates version number
flower/indexer.py	Entry point for standalone indexer command
examples/tasks.py	Adds example chained task
docs/config.rst	Documents new Elasticsearch configuration options
.pylintrc	Disables too-many-positional-arguments warning

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

flower/views/tasks.py

Copilot · 2025-08-17T15:34:33Z

flower/views/tasks.py

-                task_dict['worker'] = task_dict['worker'].hostname
+        self.write(dict(draw=draw, data=filtered_tasks,
+                        recordsTotal=total_records,
+                        recordsFiltered=records_filtered))  # bug?


The comment '# bug?' suggests uncertainty about the correctness of this code. The recordsFiltered value should accurately reflect the number of filtered records, but the logic may be incorrect when switching between Elasticsearch and in-memory backends.

flower/views/tasks.py

flower/utils/search.py

Copilot · 2025-08-17T15:34:34Z

flower/utils/search.py

+            parsed_search['root_id'] = preprocess_search_value(query_part[len('root_id:'):])
+        elif query_part.startswith('parent_id:'):
+            parsed_search['parent_id'] = preprocess_search_value(query_part[len('parent_id:'):])
+        if parsed_search:


This condition checks if parsed_search has any content, but it's set to False after each search term is processed. This logic appears incorrect as it would always evaluate parsed_search as a dict, not a boolean.

flower/utils/search.py

flower/elasticsearch_events.py

flower/command.py

Co-authored-by: Copilot <[email protected]>

jheld · 2025-09-10T23:22:49Z

@auvipy (as a repost from personal account) thank you for the fixes. How far do you think this PR is from being merged? Anything I can do to help? Docs, comments, screenshots, etc.

auvipy · 2025-09-11T06:13:05Z

I need some more time to properly review and test this....

auvipy · 2025-09-11T06:55:43Z

setup.py



+EXTRAS_REQUIRE = {
+    "elasticsearch": ["elasticsearch>=5.4,<6.4", "elasticsearch_dsl>=5.4,<6.4", "requests>=2.13,<3", ],


why not more latest versions?

I think this is just outdated because it's a very long-lived branch. We should be safe to support whatever celery does.

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-11T07:02:01Z

flower/views/tasks.py

+                        for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:
+                            task.children.append(Task(uuid=re_match.group("task_uuid")))


This regex parsing and Task creation logic is fragile and could break if the task representation format changes. Consider using a more robust method to parse task children or storing them in a structured format in Elasticsearch.

Suggested change

for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:

task.children.append(Task(uuid=re_match.group("task_uuid")))

# Use structured children if available, else fall back to regex parsing

if isinstance(task.children_raw, list):

# Assume list of UUIDs or dicts with 'uuid'

for child in task.children_raw:

if isinstance(child, dict) and 'uuid' in child:

task.children.append(Task(uuid=child['uuid']))

elif isinstance(child, str):

task.children.append(Task(uuid=child))

elif isinstance(task.children_raw, str):

for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:

task.children.append(Task(uuid=re_match.group("task_uuid")))

Copilot · 2025-09-11T07:02:01Z

flower/views/tasks.py

+                            )
+                            cache_value = sorted_tasks.execute().hits.hits[-1]['sort']
+                    sorted_tasks = es_s.extra(from_=0, size=length, search_after=cache_value).sort(
+                        {sort_by: 'asc' if not sort_order else 'desc'}, {'_uid': 'desc', }).execute().hits


The sort order logic is inverted. When sort_order is True (descending), the condition 'asc' if not sort_order else 'desc' evaluates to 'desc', but it should be 'desc' when sort_order is True. This should be 'desc' if sort_order else 'asc'.

Suggested change

{sort_by: 'asc' if not sort_order else 'desc'}, {'_uid': 'desc', }).execute().hits

{sort_by: 'desc' if sort_order else 'asc'}, {'_uid': 'desc', }).execute().hits

flower/utils/search.py

Copilot · 2025-09-11T07:02:03Z

flower/command.py

+    custom_es_setup = True
+    if custom_es_setup:
+        app.loader.import_default_modules()
+        if getattr(app.conf, 'timezone', None):
+            os.environ['TZ'] = app.conf.timezone
+            time.tzset()
+        flower_app = Flower(capp=app, options=options, **settings)
+


The custom_es_setup flag is hardcoded to True and creates duplicate Flower app initialization. This should be refactored to either remove the flag or make it configurable, and eliminate the duplicate app creation.

Suggested change

custom_es_setup = True

if custom_es_setup:

app.loader.import_default_modules()

if getattr(app.conf, 'timezone', None):

os.environ['TZ'] = app.conf.timezone

time.tzset()

flower_app = Flower(capp=app, options=options, **settings)

app.loader.import_default_modules()

if getattr(app.conf, 'timezone', None):

os.environ['TZ'] = app.conf.timezone

time.tzset()

Copilot · 2025-09-11T07:02:03Z

flower/elasticsearch_events.py

+            uuid=task.uuid,
+            worker=task.hostname,
+            info=task.info(),
+            received=received_time,
+            started=start_time,
+            task.name,
+            task.uuid,
+            task.hostname,
+            received_time,
+            start_time,
+            succeeded_time,
+            task.info(),


This logging statement has incorrect syntax. Lines 282-287 use keyword arguments, but lines 288-294 are positional arguments without keywords. This will cause a SyntaxError. Either use all keyword arguments or all positional arguments.

WIP: elasticsearch history support

47cea6f

jheld force-pushed the jheld/elasticsearch_history branch from 789688c to 47cea6f Compare November 23, 2024 02:46

auvipy requested review from auvipy and Copilot August 17, 2025 15:32

Copilot AI reviewed Aug 17, 2025

View reviewed changes

auvipy and others added 2 commits August 17, 2025 21:49

Update flower/utils/search.py

40ea7fa

Co-authored-by: Copilot <[email protected]>

Update flower/utils/search.py

2de7f07

Co-authored-by: Copilot <[email protected]>

auvipy reviewed Sep 11, 2025

View reviewed changes

auvipy and others added 4 commits September 11, 2025 12:57

Update flower/command.py

b7cf95e

Co-authored-by: Copilot <[email protected]>

Update flower/elasticsearch_events.py

acc7b78

Co-authored-by: Copilot <[email protected]>

Update flower/views/tasks.py

6feb434

Co-authored-by: Copilot <[email protected]>

Update flower/elasticsearch_events.py

3e67a3c

Co-authored-by: Copilot <[email protected]>

auvipy requested review from ask, auvipy and Copilot and removed request for ask September 11, 2025 06:59

Copilot AI reviewed Sep 11, 2025

View reviewed changes



		EXTRAS_REQUIRE = {
		"elasticsearch": ["elasticsearch>=5.4,<6.4", "elasticsearch_dsl>=5.4,<6.4", "requests>=2.13,<3", ],

		for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:
		task.children.append(Task(uuid=re_match.group("task_uuid")))

-                        for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:
-                            task.children.append(Task(uuid=re_match.group("task_uuid")))
+                        # Use structured children if available, else fall back to regex parsing
+                        if isinstance(task.children_raw, list):
+                            # Assume list of UUIDs or dicts with 'uuid'
+                            for child in task.children_raw:
+                                if isinstance(child, dict) and 'uuid' in child:
+                                    task.children.append(Task(uuid=child['uuid']))
+                                elif isinstance(child, str):
+                                    task.children.append(Task(uuid=child))
+                        elif isinstance(task.children_raw, str):
+                            for re_match in [m for m in re.finditer(r"<Task: \w+([.]\w+)*\((?P<task_uuid>\w+(-\w+)+)\) \w+ clock:\d+>", task.children_raw) if m]:
+                                task.children.append(Task(uuid=re_match.group("task_uuid")))

	{sort_by: 'asc' if not sort_order else 'desc'}, {'_uid': 'desc', }).execute().hits
	{sort_by: 'desc' if sort_order else 'asc'}, {'_uid': 'desc', }).execute().hits

elasticsearch history support #1408

Are you sure you want to change the base?

elasticsearch history support #1408

Uh oh!

Conversation

jheld commented Nov 22, 2024

Proposed Solution

Done so far (working)

Questions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jheld commented Sep 10, 2025

Uh oh!

auvipy commented Sep 11, 2025

Uh oh!

auvipy Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

jheld Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants