Data integrity validation #8016
jonasraoni
started this conversation in
Proposals
Replies: 1 comment
-
Reminder:
If it's not an atomic operation, there's no promise that data will still be ok at the step 3... As an example I can point out to the And using transactions will also improve some issues. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Vitaliy
The question about ensuring data integrity and DAO classes. Probably, the best way would be considering on an example with insert method in Category DAO, where categories are associated with publications:
pkp-lib/classes/category/DAO.inc.php
Line 148 in 70d7bb3
Let's say I'm trying to associate non-existing category ID or publication ID, in this case foreign key constraints will prevent this from happening (and this pattern we widely adopting). But what if I'm trying to associate publication with a category that exists but belongs to a different context? Should DAO class prevent such database operations?
asmecher
In my opinion this kind of check should be up to the Repository class to manage.
Jonas Raoni
We can't trust the user input, so that's a good practice, even though it's annoying/non-performatic =]
asmecher
Thinking about this a little further, I'd like to distinguish between "business rules" and "database integrity". Ideally the business rules should not go in the DAO, and data integrity enforcement should be done by the database.
However, Vitaliy’s example actually is a data integrity issue rather than a business rule, suggesting it should be handled by the database. We don't yet have any examples of this beyond simple foreign key integrity checks.
I think the "right" way to do it might actually be a database trigger on insertion. This would take a little research to make sure 1) it's supported by all MySQL / PostgreSQL we support, and ideally 2) the same syntax can be used (this might be unlikely).
Failing that, though, we'll have to come up with a workaround -- and for now I think that's the Repository class, in order to keep the DAO class streamlined.
Jonas Raoni
The trigger will probably require a bump to the base MySQL version we're supporting, I doubt someone is using PHP 8 with MySQL 4, so I'd vote to bump it at least to 5.7.22 (it will guarantee support for the
JSON_ARRAYAGG()
andJSON_OBJECTAGG()
).About the approach, I think it's ok (soft constraints, such as
status <> 0
are fine as well).I'd just advise against going further than that, debugging things that start in the code then pass through the database might become tough, especially that we're supporting more than one database.
asmecher
I think we should stop well short of moving business logic into the database, but I do like the principle of making it impossible to represent invalid data in the database. That should be done by good database design, but in the case of a relationship table per Vitaliy's example above, a trigger seems like the best approach.
Andrew Gearhart
Maybe this is better to continue in the testing group, but … I’d like to suggest that good database design can make a significant impact, but I don’t think it can do it alone. Particularly in the case where the database is going through many different types of transformations/moves/migrations.
There ought to be some way to validate/repair:
For example, we ran into an issue with this recently where pieces were missing from an object… and it resulted in the upload/select button not working (without a meaningful error being displayed to the end user).
Jonas Raoni
The schema validator is feasible (in case there's a ready library out there), I just know such libraries for C#/SQL Server. If not, we can just leave it as an advice and recommend some tools to do the comparison.
The auto-repair is more complex, as some modifications might be intentional (e.g. plugin creating fields/tables or an user trying to improve the performance of something), so we should at least show all the required changes 😁
This is going to get better in the next release, asmecher has been setting up the foreign keys.
natewr
Sorry for the late arrival to this conversation. One thing to consider is that designing a user-friendly application requires that constraints are checked at a layer in the code where errors can be handled sensibly.
Yes, this is a data integrity issue. But it's also a business logic issue, because any time someone is performing an action and runs into this data integrity issue, they need to receive a useful error message that they can act upon.
In other words, when taking an action, we need to perform data integrity checks during a validation step of some kind. If we add integrity checks to the DAO, these will have to be extra checks.
That's why I'm in favor of keeping the DAOs "dumb". A fatal error at this point is kind of unrecoverable, from a user's point of view.
To take this example, a database constraint error that fails at the code Vitaliy pointed out will leave malformed data in the database.
The user most likely submitted the publication form with published date, etc. If someone deletes a category while another person is saving a category assignment, the DAO insert will fail if the request hasn't been validated properly.
In such a circumstance, the request would result in a mixed response: some of the data saved correctly, some of the data didn't. It's very hard to design a UI that can clearly explain what happened to the user in such cases.
Beta Was this translation helpful? Give feedback.
All reactions