-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new elements to delete event #392
Added new elements to delete event #392
Conversation
… all canonical facts to delete event
… delete_by_id method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed this from a theoretical side. I’ll test whether this is working and drop another comment.
README.md
Outdated
@@ -92,6 +92,14 @@ _prometheus_multiproc_dir_ environment variable. This is done automatically. | |||
python run_gunicorn.py | |||
``` | |||
|
|||
Should issues occur with the server's connection to the docker service, it might |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not relevant to the subject of this pull request. I would extract that to another pull request. But it should be no longer relevant as we have #394 almost merged in.
run.py
Outdated
@@ -3,7 +3,7 @@ | |||
|
|||
from app import create_app | |||
|
|||
config_name = os.getenv("APP_SETTINGS", "development") | |||
config_name = os.getenv("APP_SETTINGS", "testing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been probably added by accident.
api/host.py
Outdated
@@ -286,7 +286,7 @@ def delete_by_id(host_id_list): | |||
host_ids_to_delete = [] | |||
for host in query.all(): | |||
try: | |||
host_ids_to_delete.append(host.id) | |||
host_ids_to_delete.append({"id": host.id, **host.canonical_facts, "account": host.account}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the variable doesn’t reflect its contents any more. Now it’s more hosts_to_delete than host_ids. I wonder however, can’t we just pass the whole Host object to the delete event factory? Like this we are building an arbitrary host-like dictionary for this single purpose.
We can argue whether the events module should know about the models, but I don’t see a problem in that. It’s its output what has to be independent, but a most natural input for a host is a… Host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
app/events.py
Outdated
def delete(id): | ||
return HostEvent(strict=True).dumps({"id": id, "timestamp": datetime.utcnow(), "type": "delete"}).data | ||
def delete(host): | ||
return HostEvent(strict=True).dumps({"id": host['id'], "insights_id": host['insights_id'], "rhel_machine_id": host['rhel_machine_id'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see quite an overlap with existing Host object serialization. Which this actually is, although the format is a bit different.
I‘d ask whether it’s not possible just to dump a whole serialized host in there, only adding the timestamp and type fields to that. If not (probably because of the size of the payload, e.g. the facts field), then I’d rather add a to_event_json method to the Host module to keep the logic at the same place. You also have a list of Canonical Facts there, which you can use.
The CanonicalFacts.to_json method already populates all the Canonical Facts, filling None for missing values.
app/events.py
Outdated
@@ -11,7 +12,21 @@ class HostEvent(Schema): | |||
id = fields.UUID() | |||
timestamp = fields.DateTime(format="iso8601") | |||
type = fields.Str() | |||
account = fields.Str(validate=validate.Length(min=1, max=255)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking whether it wouldn’t be possible to DRY this to some extent with the existing HostSchema using inheritance. I don’t insist on doing it in this pull request though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of not repeating the field definitions, but it might take some work to remove the duplication. I think we'd have to break the HostSchema down into smaller chunks (CanonicalFactsSchema, for example, which only contains the canonical facts) and then we'd have to construct the larger schemas (HostSchema, HostEventSchema, etc) out of the smaller schemas using inheritance/composition (if that's possible using marshmallow...i've not looked into marshmallow in that much detail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposed solution sounds great! I created an issue for that, so it’s not forgotten. #398
app/events.py
Outdated
def delete(id): | ||
return HostEvent(strict=True).dumps({"id": id, "timestamp": datetime.utcnow(), "type": "delete"}).data | ||
def delete(host): | ||
return HostEvent(strict=True).dumps({"id": host['id'], "insights_id": host['insights_id'], "rhel_machine_id": host['rhel_machine_id'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💥 This doesn’t work if the host doesn’t have all canonical facts. If you don’t pass along all canonical facts when creating/updating a host, the key is not there. It is even removed if it contains None or an empty string. It happens here.
You can bypass it by using host.get("insights_id")
, but if you use the existing Canonical Facts serialization method as suggested, you’ll get all the None’s for free.
Side note: I was thinking about changing this to actually always store all fields, although with a None if not filled in, but it has some consequences. We insert the internal representation of the canonical facts directly to the database and it’s necessary not to have the empty fields there for a native PostgreSQL deduplication to work correctly. That happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested some more little change for the host serialization.
Tested for actual working and I can confirm that now messages are correctly produced with the account and the canonical facts fields, even if some are missing. Is it ok though to have all of those in there? I found a ticket reqesting an account number (RHCLOUD-1561) and another one requesting machine ID (RHCLOUD-1995). I guess the extra fields do no harm, just saying.
One last thing: our pre-commit hooks fail on the code. Please install the hooks to your repository, so these violations are caught or fixed automatically.
pipenv run pre-commit install
The package has been added only recently, so you may need to install the dependencies.
$ pipenv sync --dev --keep-outdated
@@ -283,28 +283,28 @@ def find_hosts_by_hostname_or_id(account_number, hostname): | |||
def delete_by_id(host_id_list): | |||
query = _get_host_list_by_id_list(current_identity.account_number, host_id_list) | |||
|
|||
host_ids_to_delete = [] | |||
hosts_to_delete = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better! 😀
api/host.py
Outdated
for host in query.all(): | ||
try: | ||
host_ids_to_delete.append(host.id) | ||
hosts_to_delete.append(host.to_json()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still wondering whether we want to pass a serialized object. It’s app.events’ responsibility to turn the internal object into an external representation.
hosts_to_delete.append(host.to_json()) | |
hosts_to_delete.append(host) |
I am not sure whether we could shorten all this to:
try:
hosts_to_delete = query.all()
except …:
…
I am afraid that the sqlalchemy.orm.exc.ObjectDeletedError exception is raised during the iteration. But if so, it would happen during the for … in and not here in the append. This is strange and I’ll make an issue for that. Let’s not touch this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brandon ran a test that was causing the accessing of the host object to raise that exception. I cannot remember the specifics. I do like the idea of passing the host object down to the event module. Agreed. The event code should be responsible for producing the external representaion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that’s really so, then the exception may not be raised when not accessing the id property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue: #400
app/events.py
Outdated
def delete(id): | ||
return HostEvent(strict=True).dumps({"id": id, "timestamp": datetime.utcnow(), "type": "delete"}).data | ||
def delete(host): | ||
return HostEvent().dumps({"timestamp": datetime.utcnow(), **host, "type": "delete"}).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return HostEvent().dumps({"timestamp": datetime.utcnow(), **host, "type": "delete"}).data | |
return HostEvent().dumps({"timestamp": datetime.utcnow(), **host.to_json(), "type": "delete"}).data |
If it’s ok to dump all the canonical facts here, then this works. The suggestion relates to my previous comment. I don’t like however much the fact that host.to_json() produces a dictionary with more fields and they are removed by the Schema.
return HostEvent().dumps({"timestamp": datetime.utcnow(), **host, "type": "delete"}).data | |
from app.models import CanonicalFacts | |
return HostEvent().dumps({"timestamp": datetime.utcnow(), "account": host.account, **CanonicalFacts.to_json(host.canonical_facts), "type": "delete"}).data |
I would prefer if the message were not a flat dictionary of event metadata mixing with host data. That would be a breaking change though, so it’s not good to do it here.
@@ -92,7 +92,6 @@ _prometheus_multiproc_dir_ environment variable. This is done automatically. | |||
python run_gunicorn.py | |||
``` | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change shouldn’t be here, but let it be.
Pipfile
Outdated
watchtower = "*" | ||
"boto3" = "*" | ||
kafka = "*" | ||
flask = "==1.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
59a1c1c
to
4612bf4
Compare
…insights-host-inventory into new_delete_conditions Made changes to test_api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request is all fine now. Tests are green and the delete event works well.
I just don’t want to give an approval, until we’re sure that the ObjectDeletedError didn’t break. I don’t insist on an automatic test, but at least a manual test is necessary. It looks like that the host.id access causes a deferred loading to happen in SQLAlchemy, which is where the error can happen. I am not sure though and it should be confirmed.
I ran the tests and I seem to be getting this error back every time So as far as I can tell it seems that we haven't broken anything! I also tested deleting the row with an access to an element of host before the append and it gives me back the same error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
I think we need to relax the event schema validation checks though.
@aleccohan I am not sure I follow. You are getting the ObjectDeletedError every time you delete a host?
return HostEvent(strict=True).dumps({"id": id, "timestamp": datetime.utcnow(), "type": "delete"}).data | ||
def delete(host): | ||
return ( | ||
HostEvent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using the Schemas for defining what our input and output should look like. I think it makes it easy to point someone to the schema so that they can see what is expected for input/output. Otherwise, it can be kind of difficult to determine what the expected input/output is.
I did just think of something that we need to watch out for though. I think we should remove the validation checks from this schema. As @Glutexo mentioned, we are the ones producing the data...so there shouldn't be a need to validate it on the way out. Plus if someone tries to delete an "old" host that has data that doesn't match our current rules, then it would fail to produce a delete event for that object.
Oh! No the app works fine when I test it, what I was talking about were the tests I did to see if we broke the ObjectDeletedError test and from the results I'm pretty sure that it's working just as it's supposed to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@Glutexo do you see any issues with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about removing only the validate clauses. If a host format changes, it can be a more complex change than just a string length. Fields can be added (quite probable) or even removed (improbable, but possible). If we change the internal representation of a host, we have to cope with the problem of how to emit the events in a backward-compatible way. I’d say that if we are already validating, we should do it completely and at least have a schema compatible with anything we really want to produce.
If we want to have a schema for reference and use it for security, I’d love to see a test case instead that would ensure that right data go through and invalid don’t. It would be a good foundation for any possible future changes to the messages. Such test case would also make sure that the limits (there, max=500 and similar) are exactly corresponding to what we allow to store, and not just some random numbers.
I was originally thinking about removing the Schema entirely, not thinking much about having it for reference. Any violations from the imaginary schema could be well caught by a similar test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the validations are 1:1 with those used when adding a host. There is no problem approving this.
But! 🍑 The validation on add has been added quite late when there were already records in the database. That means that there are possibly data in the database that wouldn’t pass the validation. The rules are quite benevolent, but we can’t just assume. It is necessary to make sure that for any host in a database, a delete event can be created. To do this, running a simple script against the database would be required. It would be nice to have it at least a bit universal, so it’d be possible to add more checks in the future and run a new check when our validation changes.
But beware – I approved the changes as the code changes look OK to me. We have still unanswered questions though, so please don’t merge before those are resolved:
Also don’t deploy this anywhere, not even CI (than means don’t merge) unless a check against the existing database is run. If the check fails, either consider loosening the Validation or merge in a migration that sanitizes the existing data. |
This change in the placement of the ObjectDeletedError test removes the triggering of the 500 error and also fixes the race condition guard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes about the ObjectDeletedError.
@@ -305,7 +299,13 @@ def delete_by_id(host_id_list): | |||
logger.debug("Deleted hosts: %s", hosts_to_delete) | |||
|
|||
for deleted_host in hosts_to_delete: | |||
emit_event(events.delete(deleted_host)) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change actually modifies the behavior. Before the change, if a host has been already deleted, it is not added to the hosts_to_delete list. That caused aborting with 404 if no hosts were actually deleted, and also those not actually deleted hosts not to be logged. Now, all hosts to delete are logged even though they have been already deleted and raise the ObjectDeletedError here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might change the behavior a bit but i'm not sure how big of a change it really is.
With this change, the only difference I see is that we might not send back a 404 in the case where all the hosts that were requested to be deleted, were already deleted. (This situation is likely pretty rare anyhow.) I suspect with the try/catch added around the emit_event(), then we shouldn't produce a delete event in the case where a host was deleted already (that would be the correct behavior in my opinion).
In either case (200 return code vs 404 return code), the host would be deleted and on event would be logged. I'm not sure that's a big difference.
"Encountered sqlalchemy.orm.exc.ObjectDeletedError exception during delete_by_id operation. Host was " | ||
"already deleted." | ||
) | ||
hosts_to_delete.append(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query.all()
actually returns a pure list, so we can do hosts_to_delete = query.all()
directly. But as I noted in the next comment, we are not checking now in any way whether the host has been already deleted or not.
I guess we need to figure out how to eagerly load the host here, essentially doing manually what happens on the attribute access. I would not eagerly load all of them at once, just do something like object identity check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we’re not going to eagerly load in this PR, please abbreviate this to
hosts_to_delete.append(host) | |
hosts_to_delete = query.all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to the validation of the outgoing data, we are currently not performing a full validation of the data that the REST service returns. I do not feel like validating the fields in the delete event are required.
What is the worst case scenario? If we do not validate the data for the event, then I think the worst case scenario is that we send out an event that might contain canonical fact fields that contain bad data. If we validate the data, then we might not send out an event for a host that is deleted...thus potentially orphaning a host in the apps.
@@ -305,7 +299,13 @@ def delete_by_id(host_id_list): | |||
logger.debug("Deleted hosts: %s", hosts_to_delete) | |||
|
|||
for deleted_host in hosts_to_delete: | |||
emit_event(events.delete(deleted_host)) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might change the behavior a bit but i'm not sure how big of a change it really is.
With this change, the only difference I see is that we might not send back a 404 in the case where all the hosts that were requested to be deleted, were already deleted. (This situation is likely pretty rare anyhow.) I suspect with the try/catch added around the emit_event(), then we shouldn't produce a delete event in the case where a host was deleted already (that would be the correct behavior in my opinion).
In either case (200 return code vs 404 return code), the host would be deleted and on event would be logged. I'm not sure that's a big difference.
Hey guys, really appreciate the discussion here, but I would like us to move ahead with the merge to stay within the original scope : currently clients are only interested in account_number, host_id and insights_id and are not going to be broken if we send extraneous invalid data outside these fields. Currently we do not validate our data output when requested via REST, so I don't think that its essential that we do that when emitting events. Blocking a delete event because of a validation failure of extraneous facts would be really bad. We do need to clean our database and purge invalid data, but lets do that a separate task. |
We‘re already validating the event. It is a simple, safe validation, because it contained only id, timestamp and type. After the changes, even without the length and required rules, the validation is still there, listing the allowed fields. What about listing only the fields we really want to output – account and insights_id? account would be without a length limit as we don’t have sanitized data in the database yet. Like that it would be safe and at the same time it would well communicate the intended message format. class HostEvent(Schema):
id = fields.UUID()
timestamp = fields.DateTime(format="iso8601")
type = fields.Str()
account = fields.Str(required=True)
insights_id = fields.Str(validate=verify_uuid_format) The As for the changed behavior, I can understand that it’s not a problem. The exception catching has been added probably because it was the first place where it occurred, without a clear definition of what should be logged, what the response code should be etc. in such case. So even though it’s not very good practice, I agree that we can let it go like that. As long as I’m right that there is no defined correct behavior that we’d be breaking. |
This is the most recent push with only the fields that we intend to validate at this time. I also included the request_id since it is requested on the JIRA ticket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm the request ID is passed to the message. A request like this
DELETE {{host}}/api/inventory/v1/hosts/fe29c3b5-aa1e-4dae-8f05-4cb89d30cf7a
x-rh-identity: eyJpZGVudGl0eSI6eyJhY2NvdW50X251bWJlciI6ImFiYzEyMyIsIm9yZ19pZCI6ImJjZDIzNCJ9fQ==
x-rh-insights-request-id: some request id
Produces an event like this
{"request_id": "some request id", "account": "abc123", "timestamp": "2019-08-27T08:46:27.611355+00:00", "type": "delete", "id": "fe29c3b5-aa1e-4dae-8f05-4cb89d30cf7a", "insights_id": "ec76e4c2-ba1c-4e9b-9b94-463c637cbe44"}
👍🏻
I am just missing a test that would confirm that not only the default request ID (-1), but the real one from the header as well. That should be an easy fix.
I still don’t like the race condition error catching, because it catches only a side-effect (even emitting) instead of clearly communicating what’s going on by triggering the lazy load manually. But that can be done in a subsequent PR, although I am not happy with that. See my inline comment.
I can confirm though, that if the race condition happens, the error is caught, an exception is logged (although twice, once by logger when logging the delete, and once by our code when emitting the event), the event emitting doesn’t crash and an event is emitted only for hosts that were really deleted in the current request context.
As for the status codes, I can confirm that 404 Not Found is returned only if no records are found when querying. If the race condition occurs and any number of the hosts is already deleted, they are still logged as being deleted and the request returns 200 OK. This behavior should be tested and documented at least by tests, or fixed if it’s not correct. But again, that can be done in a subsequent PR.
If there is not going to be the eager loading in the host iteration in this PR, please abbreviate the host_to_delete list population. See my inline comment.
mac_addresses = fields.List(fields.Str()) | ||
external_id = fields.Str() | ||
account = fields.Str(required=True) | ||
insights_id = fields.Str(validate=verify_uuid_format) | ||
request_id = fields.Str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the request ID has a fixed format? Even if so, we probably don’t want to specify it here as we’re only blindly forwarding it.
self.assertEqual("delete", event["type"]) | ||
self.assertEqual(self.added_hosts[0].id, event["id"]) | ||
self.assertEqual(self.added_hosts[0].insights_id, event["insights_id"]) | ||
self.assertEqual("-1", event["request_id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test a real forwarded request ID too along with this default unknown one?
|
||
logger.debug("Deleted hosts: %s", host_ids_to_delete) | ||
logger.debug("Deleted hosts: %s", hosts_to_delete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only thanks to the fact that logger has some own exception handling mechanism. This logging call actually calls _Host._repr for every host, even for those that are already deleted. This function internally calls Host.id that triggers the lazy loading and bang, an uncaught exception occurs.
Hopefully, this is internally caught by the logger output to stderr and the execution continues normally. This is probably for proper logging calls not to cause unexpected termination. But if logger’s interpolation weren’t used (f"Deleted hosts: {hosts_to_delete}"
), this would explode spectacularly.
So, sorry, @dehort. I understand that you don’t want to have this pending any longer, but we’re really on a thin ice here. The logging doesn’t crash just because it behaves unexpectedly wise, and catching the error there when emitting an event, that’s just randomly putting a try-expect on a first side-effect that triggers the lazy loading.
If you really need to have this merged in the current state. Please, at least, let @aleccohan to fix this in a subsequent PR, having that properly in a Sprint. This is not something we want in our codebase. It can go well with writing automatic test for this.
"Encountered sqlalchemy.orm.exc.ObjectDeletedError exception during delete_by_id operation. Host was " | ||
"already deleted." | ||
) | ||
hosts_to_delete.append(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we’re not going to eagerly load in this PR, please abbreviate this to
hosts_to_delete.append(host) | |
hosts_to_delete = query.all() |
This has been merged although there are no tests for request ID other than default. |
Needed to add insights_id to delete event in order to ping legacy API to duplicate the deletion. Also added instructions to the README to reflect the difficulty I had with standing up the host-inventory platform locally.