-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database format 21: add JSON data format #1786
base: master
Are you sure you want to change the base?
Conversation
One thing I am not sure about. If you need to update a table from version 20 or older, you need to open it with the old schema. How does that work? |
If we are moving from BLOBs to JSON then we should really use the new format. See PR #800. The new format uses the The main benefit of the new format is that it is easier maintain and debug. Instead of lists we use dictionaries. So, for example, we refer to the field "parent_family_list" instead of field number 9. Upgrades are no problem. We just read and write the raw data. When I have more time I'll update you on discussion whilst you have been away. |
Oh, that sounds like a great idea! I'll take a look at the JSON format and switch to that. Should work even better with the SQL JSON_EXTRACT(). |
There are a few places where the new format is used, so we will get some bonus performance improvements. Feel free to make changes to my existing code if you see a benefit. You may also want to have a quick look at how we serialize |
Making some progress. Turns out, the serialized format had leaked into many other places, probably for speed. Probably good candidates for business logic. |
I added a |
@Nick-Hall , I will probably need your assistance regarding the complete save/load of the to_json and from_json functions. I looked at your PR but as it touches 590 files, there is a lot there. In this PR, I can now upgrade a database, and load the people views (except for name functions which I have to figure out). |
Thanks @Nick-Hall, that was very useful. I think that I will cherry pick some of the changes (like attribute name changes, elimination of private attributes). You'll see that I did many of the same changes you made. But, one thing I found is that if we want to allow upgrades from previous versions, then we need to be able to read in blob_data, and write out json_data. I think my version has that covered. I'll continue to make progress. |
@dsblank Why are you removing the properties? The validation in the setters will no longer be called. |
@Nick-Hall , I thought that was what @prculley did for optimization, and I thought was needed. I can put those back :) |
Is there something that could be posted to the Discourse forum to stir up some energy? For instance, the Isotammi Filter+ gramplet shows a timer for filtering. Is there a filter (or custom filter) that would demonstrate the newly optimized difference when run on 5.2 vs. 5.3? Posting a capture of Filter+ of the Example.gramps timings would encourage people to archive a similar test now in 5.2 of their real-world data... and again when 5.3 comes out. That might stir up some interest in optimizing the aging internal rules. |
Probably not yet.
I haven't done much in the way of speeding up yet. This PR is merely a replacement for one representation over another. Tests in #1787 show that the two formats are similar in terms of timings.
That will be next! Nothing will change much until we can exploit the database level. I'll continue that work in #1785. I'm going to start working on getting the tests to pass. Then this should be ready for review. |
@dsblank I'll fix the |
Oh great! I just hit bugs in the tests that I'm working on. Thanks! |
Down to two categories of pytest errors and failures:
The second one seems like a random number change, but still tracking it down. @Nick-Hall, I think this is all that is needed for Date fix: d2c301a#diff-de77e4fb6f5c704f314759c3ec0ceddea525f966ddf056e6ee1dee0ba852c5f3L760-L763 But do see that the difference between |
@dsblank We should also update the JSON schema. See PR #1789. There is a unit test that validates the example database against the schema in schema_test.py. The JSON representation of an empty date is longer than null, as you point out. |
This PR adds a new column, "json_data" to all primary tables consisting of JSON data (just TEXT) encoded from the pickled blobs.
It doesn't remove the blob_data from existing databases, so that they can still be opened in earlier versions of gramps.
However, newly created databases no longer contain pickled data.
Work in progress.