Database format 21: add JSON data format #1786

dsblank · 2024-10-10T21:56:00Z

This PR adds a new column, "json_data" to all primary tables consisting of JSON data (just TEXT) encoded from the pickled blobs.

It doesn't remove the blob_data from existing databases, so that they can still be opened in earlier versions of gramps.

However, newly created databases no longer contain pickled data.

Work in progress.

dsblank · 2024-10-10T22:05:05Z

One thing I am not sure about. If you need to update a table from version 20 or older, you need to open it with the old schema. How does that work?

Nick-Hall · 2024-10-10T23:04:14Z

If we are moving from BLOBs to JSON then we should really use the new format. See PR #800.

The new format uses the to_json and from_json methods in the serialize module to build the json from the underlying classes. It comes with get_schema class methods which provide a JSON Schema that allow the validation that we already use in our unit tests.

The main benefit of the new format is that it is easier maintain and debug. Instead of lists we use dictionaries. So, for example, we refer to the field "parent_family_list" instead of field number 9.

Upgrades are no problem. We just read and write the raw data.

When I have more time I'll update you on discussion whilst you have been away.

dsblank · 2024-10-11T01:14:12Z

Oh, that sounds like a great idea! I'll take a look at the JSON format and switch to that. Should work even better with the SQL JSON_EXTRACT().

Nick-Hall · 2024-10-11T15:57:22Z

There are a few places where the new format is used, so we will get some bonus performance improvements.

Feel free to make changes to my existing code if you see a benefit.

You may also want to have a quick look at how we serialize GrampsType. Enough information is stored so that we can recreate the object, but I don't think that I chose to store all fields.

dsblank · 2024-10-12T23:49:21Z

Making some progress. Turns out, the serialized format had leaked into many other places, probably for speed. Probably good candidates for business logic.

dsblank · 2024-10-13T02:07:32Z

I added a to_dict() and from_dict() based on the to_json() and from_json(). I didn't know about the object hooks. Brilliant! That saves so much code.

dsblank · 2024-10-13T16:07:30Z

@Nick-Hall , I will probably need your assistance regarding the complete save/load of the to_json and from_json functions. I looked at your PR but as it touches 590 files, there is a lot there.

In this PR, I can now upgrade a database, and load the people views (except for name functions which I have to figure out).

Nick-Hall · 2024-10-13T17:01:29Z

@dsblank I have rebased PR #800 on the gramps51 branch. Only 25 files were actually changed.

You can also see the changes suggested by @prculley resulting from his testing and performance benchmarks.

dsblank · 2024-10-13T19:04:34Z

Thanks @Nick-Hall, that was very useful. I think that I will cherry pick some of the changes (like attribute name changes, elimination of private attributes).

You'll see that I did many of the same changes you made. But, one thing I found is that if we want to allow upgrades from previous versions, then we need to be able to read in blob_data, and write out json_data. I think my version has that covered.

I'll continue to make progress.

Nick-Hall · 2024-10-13T23:04:02Z

@dsblank Why are you removing the properties? The validation in the setters will no longer be called.

dsblank · 2024-10-14T02:47:46Z

@Nick-Hall , I thought that was what @prculley did for optimization, and I thought was needed. I can put those back :)

This reverts commit a9da731.

dsblank · 2024-10-28T21:52:10Z

Major milestone reached: database upgraded, and all of the views can now be rendered using the JSON raw data. Lots of work still to be done, but wanted to take a minute to update on the status.

emyoulation · 2024-10-29T02:48:38Z

Is there something that could be posted to the Discourse forum to stir up some energy?

For instance, the Isotammi Filter+ gramplet shows a timer for filtering. Is there a filter (or custom filter) that would demonstrate the newly optimized difference when run on 5.2 vs. 5.3?

Posting a capture of Filter+ of the Example.gramps timings would encourage people to archive a similar test now in 5.2 of their real-world data... and again when 5.3 comes out. That might stir up some interest in optimizing the aging internal rules.

dsblank · 2024-10-29T11:22:13Z

Is there something that could be posted to the Discourse forum to stir up some energy?

Probably not yet.

For instance, the Isotammi Filter+ gramplet shows a timer for filtering. Is there a filter (or custom filter) that would demonstrate the newly optimized difference when run on 5.2 vs. 5.3?

I haven't done much in the way of speeding up yet. This PR is merely a replacement for one representation over another. Tests in #1787 show that the two formats are similar in terms of timings.

Posting a capture of Filter+ of the Example.gramps timings would encourage people to archive a similar test now in 5.2 of their real-world data... and again when 5.3 comes out. That might stir up some interest in optimizing the aging internal rules.

That will be next! Nothing will change much until we can exploit the database level. I'll continue that work in #1785.

I'm going to start working on getting the tests to pass. Then this should be ready for review.

Nick-Hall · 2024-10-29T23:48:12Z

@dsblank I'll fix the Date in the JSON schema for you tomorrow.

dsblank · 2024-10-29T23:50:48Z

Oh great! I just hit bugs in the tests that I'm working on. Thanks!

dsblank · 2024-10-30T12:56:40Z

Down to two categories of pytest errors and failures:

vcard related failures
'Test Case Generator' and 'Check & Repair Database'

The second one seems like a random number change, but still tracking it down.

@Nick-Hall, I think this is all that is needed for Date fix: d2c301a#diff-de77e4fb6f5c704f314759c3ec0ceddea525f966ddf056e6ee1dee0ba852c5f3L760-L763

But do see that the difference between null and {'format': None, 'calendar': 0, 'modifier': 0, 'quality': 0, 'dateval': [0, 0, 0, False], 'text': '', 'sortval': 0, 'newyear': 0, '_class': 'Date'} is a big difference in length of json_object.

Nick-Hall · 2024-10-30T16:13:42Z

@dsblank We should also update the JSON schema. See PR #1789. There is a unit test that validates the example database against the schema in schema_test.py.

The JSON representation of an empty date is longer than null, as you point out.

Database format 21: add JSON, remove pickle

a05fe5a

dsblank requested a review from Nick-Hall October 10, 2024 21:56

dsblank added 3 commits October 10, 2024 18:10

Rename new column to json_data

a8ef265

Read prev version

97d3388

Load old version

7497abd

Added to_dict, from_dict

76d622e

dsblank added 2 commits October 12, 2024 14:43

Refactor for upgrade uses

c014c15

Peoplemodel mostly working

43ea2b2

dsblank added 3 commits October 12, 2024 22:32

Save new db 21 with JSON data field

1350f5c

Docstrings

e677833

Generic needs to handle both blob and json during upgrades

7ac4f7b

dsblank added 4 commits October 13, 2024 15:23

name fixes

08869eb

black linting

99b3d2b

Removed unneeded properties on primary objects

a9da731

Use a version of Nick's to/from json funcs

bc5ac5b

dsblank added 4 commits October 13, 2024 22:49

Revert "Removed unneeded properties on primary objects"

33928a9

This reverts commit a9da731.

linting

3785e46

WIP: eventmodel, and familymodel

b211640

Use column position in model

21caaa2

dsblank added 7 commits October 27, 2024 11:55

People model converted

b3c0540

Family model converted

5a01af2

Merge remote-tracking branch 'origin/master' into dsb/depickle

22fc4b8

Typo: return -> yield

125d46d

Protection for null dates in events

f16a268

All remaining views converted

b59fb64

All remaining views converted

0489b5b

dsblank added 5 commits October 29, 2024 04:08

No need to make treemodels serialize agnostic

1085ced

Date positions are in dateval; ok

65d84c8

Import from XML working

ae204b8

Remove old commented code

fe2b561

Bug fixing

c1c1804

dsblank added 9 commits October 29, 2024 16:21

Linting

e4c0f2c

Linting

adda6d5

Bug fixes

136d907

Updated libgedcom

ef5fc30

Got to be careful adding random values to objects

73d2054

Fixing undo/redo

d3dfd4f

Fixing undo/redo

69f9bbb

Suss bug?

dbeb864

Temp fix before Nick's fix

368e2ac

dsblank added 2 commits October 30, 2024 03:20

Date JSON tweak

d2c301a

Fix PAT/MAT ronymics

20c4781

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database format 21: add JSON data format #1786

Database format 21: add JSON data format #1786

dsblank commented Oct 10, 2024 •

edited

Loading

dsblank commented Oct 10, 2024

Nick-Hall commented Oct 10, 2024

dsblank commented Oct 11, 2024

Nick-Hall commented Oct 11, 2024

dsblank commented Oct 12, 2024

dsblank commented Oct 13, 2024

dsblank commented Oct 13, 2024

Nick-Hall commented Oct 13, 2024

dsblank commented Oct 13, 2024

Nick-Hall commented Oct 13, 2024

dsblank commented Oct 14, 2024

dsblank commented Oct 28, 2024

emyoulation commented Oct 29, 2024

dsblank commented Oct 29, 2024

Nick-Hall commented Oct 29, 2024

dsblank commented Oct 29, 2024

dsblank commented Oct 30, 2024

Nick-Hall commented Oct 30, 2024

Database format 21: add JSON data format #1786

Are you sure you want to change the base?

Database format 21: add JSON data format #1786

Conversation

dsblank commented Oct 10, 2024 • edited Loading

dsblank commented Oct 10, 2024

Nick-Hall commented Oct 10, 2024

dsblank commented Oct 11, 2024

Nick-Hall commented Oct 11, 2024

dsblank commented Oct 12, 2024

dsblank commented Oct 13, 2024

dsblank commented Oct 13, 2024

Nick-Hall commented Oct 13, 2024

dsblank commented Oct 13, 2024

Nick-Hall commented Oct 13, 2024

dsblank commented Oct 14, 2024

dsblank commented Oct 28, 2024

emyoulation commented Oct 29, 2024

dsblank commented Oct 29, 2024

Nick-Hall commented Oct 29, 2024

dsblank commented Oct 29, 2024

dsblank commented Oct 30, 2024

Nick-Hall commented Oct 30, 2024

dsblank commented Oct 10, 2024 •

edited

Loading