Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken downtime comment sync #10000

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Feb 12, 2024

All objects must be synced sorted by their load dependency. Otherwise, downtimes and/or comments might get synced before their respective Checkables, which will result in comments and downtimes being ignored by the other endpoint since it does not yet know about their checkables. Given that the runtime config updates event does not trigger a reload on the remote endpoint, these objects won't be synced again until the next reload.

~/master2/icinga2 (bundled-cluster-fixes ✗) ls prefix/var/lib/icinga2/api/packages/_api/*/conf.d/downtimes | wc -l
    3501
~/master2/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5666/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    1501
~/master1/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5665/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    3501

After master2 reload:

~/master2/icinga2 (bundled-cluster-fixes ✗) curl -sSku root:icinga 'https://localhost:5666/v1/objects/downtimes?pretty=1' | grep ' "attrs": {' | wc -l
    3501

closes #7786
closes #9873

TODO

@yhabteab yhabteab added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) ref/IP area/runtime Downtimes, comments, dependencies, events labels Feb 12, 2024
@cla-bot cla-bot bot added the cla/signed label Feb 12, 2024
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
@log1-c
Copy link
Contributor

log1-c commented Jul 22, 2024

Quick question: would this also fix the following issue:
Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

@Mordecaine
Copy link

Mordecaine commented Aug 13, 2024

Quick question: would this also fix the following issue: Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

It would be very helpful to get an answer to this.

@yhabteab
Copy link
Member Author

Quick question: would this also fix the following issue: Downtimes scheduled via API (sometimes) get synced/re-created with delay and are doubled #10078

It would be very helpful to get an answer to this.

Hi, we don't know for sure whether this will fix #10078 as we still haven't identified exactly what is going wrong there, other than something is not working as expected. It's unlikely that this PR will fix #10078, but we can't tell you for sure until the cause for #10078 is identified.

@Al2Klimov Al2Klimov removed their assignment Aug 20, 2024
@Al2Klimov Al2Klimov self-requested a review August 20, 2024 11:16
@dgiesselbach
Copy link

@yhabteab When will this request be completed? Is there a timeline?

@yhabteab yhabteab changed the base branch from master to enhanced-sort-types-by-load-dependencies September 16, 2024 07:23
@yhabteab yhabteab force-pushed the enhanced-sort-types-by-load-dependencies branch 4 times, most recently from cb4fe57 to eb97676 Compare September 20, 2024 14:18
@Al2Klimov Al2Klimov changed the base branch from enhanced-sort-types-by-load-dependencies to master September 20, 2024 15:30
@Al2Klimov Al2Klimov marked this pull request as draft September 20, 2024 15:31
@julianbrost
Copy link
Contributor

I believe similar problems will still exist for other types where there's no load_after dependency at the moment: many objects can refer to a time period, however, there's not a single load_after TimePeriod. There are more such examples: Host/Service -> *Command, Notification -> NotificationCommand (not necessarily a complete list).

@yhabteab yhabteab force-pushed the broken-downtime-comment-sync branch 2 times, most recently from 1d62919 to 70ccb4c Compare November 4, 2024 16:16
@yhabteab yhabteab requested review from Al2Klimov and julianbrost and removed request for Al2Klimov and julianbrost November 4, 2024 16:17
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab force-pushed the broken-downtime-comment-sync branch 2 times, most recently from 8645d1e to 1261254 Compare December 3, 2024 07:58
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've overseen an elephant, despite I like small diffs. Sorry😅

lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Show resolved Hide resolved
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab force-pushed the broken-downtime-comment-sync branch 3 times, most recently from 845a61b to 2fd478a Compare December 3, 2024 14:39
Al2Klimov
Al2Klimov previously approved these changes Dec 3, 2024
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFTM

@yhabteab
Copy link
Member Author

yhabteab commented Dec 4, 2024

Fortunately, @julianbrost found out during testing that the objects are not being synchronised in the correct order, as they are supposed to be with this PR. Thus, I made some modifications in the DependencyGraph class to achieve the desired result without producing too many code changes.

Before

[2024-12-04 08:45:47 +0100] critical/config: Error: Validation failed for object 'dsl-groups' of type 'HostGroup'; Attribute 'groups': Object 'dns-groups' of type 'HostGroup' does not exist.
Location: in /Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf: 2:2-2:26
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(1): object HostGroup "dsl-groups" {
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(2):  groups = [ "dns-groups" ]
                                                                                                                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(3):  version = 1733298340.337465
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(4):  zone = "master"
[2024-12-04 08:45:47 +0100] critical/config: 1 error
[2024-12-04 08:45:47 +0100] critical/ApiListener: Could not create object 'dsl-groups':
[2024-12-04 08:45:47 +0100] critical/ApiListener: Error: Validation failed for object 'dsl-groups' of type 'HostGroup'; Attribute 'groups': Object 'dns-groups' of type 'HostGroup' does not exist.
Location: in /Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf: 2:2-2:26
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(1): object HostGroup "dsl-groups" {
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(2):  groups = [ "dns-groups" ]
                                                                                                                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(3):  version = 1733298340.337465
/Users/yhabteab/ClionProjects/icinga2/prefix/var/lib/icinga2/api/packages/_api/354bb4c7-3d94-4df3-bba3-add86e471cd3/conf.d/hostgroups/dsl-groups.conf(4):  zone = "master"
[2024-12-04 08:45:47 +0100] information/ConfigObjectUtility: Created and activated object 'dns-groups' of type 'HostGroup'.

After

[2024-12-04 08:51:01 +0100] information/ApiListener: Copying file 'master//_etc/overflow.conf' from config sync staging to production zones directory.
[2024-12-04 08:51:01 +0100] information/ApiListener: Copying file 'master//_etc/services.conf' from config sync staging to production zones directory.
[2024-12-04 08:51:02 +0100] information/ConfigObjectUtility: Created and activated object 'dns-groups' of type 'HostGroup'.
[2024-12-04 08:51:02 +0100] information/ConfigObjectUtility: Created and activated object 'dsl-groups' of type 'HostGroup'.
[2024-12-04 08:51:03 +0100] information/Application: Got reload command: Starting new instance.

Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try the below suggestions starting with "💡" myself. (I'm kind of "waiting" for #10254 and #9990.)

Edge edge(parent, child);
if (auto it(m_Dependencies.find(edge)); it != m_Dependencies.end()) {
if (it->count == 1) {
m_Dependencies.erase(edge);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you erase by iterator as you already have it?

m_Dependencies[child][parent]++;
Edge edge(parent, child, 1);
if (auto it(m_Dependencies.find(edge)); it != m_Dependencies.end()) {
m_Dependencies.modify(it, [](Edge& e) { e.count++; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... just like here?

if (auto it(m_Dependencies.find(edge)); it != m_Dependencies.end()) {
m_Dependencies.modify(it, [](Edge& e) { e.count++; });
} else {
m_Dependencies.insert(edge);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm wondering whether the new map

  • either allows to pass an iterator as hint on insertion
  • or supports kind of insert-if-new-else-return-old-position op

};

using DependencyMap = boost::multi_index_container<
Edge, // The value type we want to sore in the container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Maybe the stuff mentioned above can be mitigated by using kind of a multimap?

static std::mutex m_Mutex;
static std::map<Object *, std::map<Object *, int> > m_Dependencies;
static DependencyMap m_Dependencies;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 And, while on it, I could even imagine two (multi?)maps – parent-child and child-parent.

// Example: A service object is dependent on a host object, thus the service object is the *parent* and
// the host object is the *child* one :).
Object* parent;
Object* child;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 And, while on it, theoretically we could make them ConfigObject*.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But only if that's not too complicated. 😅


void DependencyGraph::AddDependency(Object *parent, Object *child)
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_Dependencies[child][parent]++;
Edge edge(parent, child, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Edge edge(parent, child, 1);
Edge edge(parent, child);

{
size_t seed = 0;
boost::hash_combine(seed, edge.parent);
boost::hash_combine(seed, edge.child);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is interesting! I just had XORed two std::hash-es, but good to know.

boost::multi_index::indexed_by<
// The first indexer is used for lookups by the Edge object itself, thus it
// needs its own hash function and comparison predicate.
boost::multi_index::hashed_non_unique<boost::multi_index::identity<Edge>, Edge::HashPredicate, Edge::HashPredicate>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why NON unique if you don't use this as multi-map anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/runtime Downtimes, comments, dependencies, events bug Something isn't working cla/signed consider backporting Should be considered for inclusion in a bugfix release ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants