Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency with "source" attribute #4135

Open
pfalcon opened this issue Jul 6, 2024 · 4 comments
Open

Inconsistency with "source" attribute #4135

pfalcon opened this issue Jul 6, 2024 · 4 comments
Labels

Comments

@pfalcon
Copy link
Contributor

pfalcon commented Jul 6, 2024

An idea to have a field which identifies a source/producer of some entry in the PP data model is very useful one. And the first impression is "how smart PP to already implement that!". However, closer look shows that it's more of an unloved one, due to issues like 0617a99 .

Another sad case I'd like to share here. So, grepping thru models package, one can rejoice that SecurityEvent has a "source" attribute. But closer look shows that it's only for DividendEvent subclass! How could that be? Now, fixing that would be trivial, if not for proprietary binary formats.

Just to prove that there're inconsistency the the "source" attributes, let's look what else has them.
AttributeType has them, but why? Code like https://github.com/portfolio-performance/portfolio/blob/master/name.abuchen.portfolio/src/name/abuchen/portfolio/model/ClientSettings.java#L112 can only perplex. Like, source of "ter" attribute is ... "ter"? Really?

By common sense, what requires "source" is a value, not type. For example, you wrote a script which uses Site1's API to automatically query TER. Site1's API perhaps handles only subset of securities, and if you queried it once, you may want to re-query it again in the future. So, for values you actually set from Site1, you set source of "site1", and then during update operation, touch only values tagged like that (not touching values set by another script from Site2, and not blindly overwriting values manually entered by user (== empty source)).

Now next, Taxonomy also has "source", and one can again wonder why. By logical sense, even if some taxonomy is treated as "value", i.e. may be updated by automatic means, it can be done by the name of [root of] taxonomy. E.g., "Taxonomy from Site1". Everything under it would need to be synced with Site1. But yeah, I get the idea that "everything under" requires a recursive search, so having a "source" on taxonomy entries may be an optimization to avoid that.

But anyway, what happens here? So well, one can probably can read PP source code before sleep every night and find new exciting stories again and again. Because who might have thought, that hidden under the name "Sync with EODHistoricalData.com" in the "Online" menu, which is usually used to get quotes - there sits an ability to indeed sync a taxonomy hierarchy with external site and mapping between user's security and that taxonomy. Astonishing spywork from both EODHistoricalData.com and PP! Who's that telepath who'd know that EODHistoricalData.com is actually "We are end of date historical data dot com, but don't despair, we have things you'd never think we have if you looked only at the name". And PP? All could be easily solved by menu item name like "Sync taxonomies with EODHistoricalData.com". But no. Again, hide-and-seek prizes to both entities.

Anyway, back to "source" attribute inconsistencies. It's two different operations to update a taxonomy hierarchy vs updating assignments for taxonomy. Assignments are actual "value" which is the most useful to track. Taxonomies are more or less static, or at least belong to a particular source. For example, there can be more or less generally accepted taxonomy "Industries according to XYZ". Again, the source, XYZ, is encoded in the taxonomy name. However, it would be quite expectable situation that Site1 claims that stock FOO is a shipping stock, while Site2 claims that they make boots. So again, it's the source of assignment which should be tracked. So that if Site1 or Site2 mends their ways, only their erroneous assignments can be (automatically) updated.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jul 6, 2024

If there ever would be any changes to SecurityEvent model, it's worth to think of: the whole idea of events is that they're shown on a chart. But you can put only so much (repetitive) text on a chart without clutter. But of course, one should still be able to read the full info (ideally via tooltip on the chart. If not, then in the event list, like now). So, there should be something like "label" or "title" field to be shown on the chart, in addition to existing "details".

And taking history lessons from #1604 (that people want more structured stuff, but it's boring to maintain models for them), then something like JSON-encoded "extra" field.

@buchen buchen added the question label Aug 2, 2024
@buchen
Copy link
Member

buchen commented Aug 2, 2024

Haha, source is a mess, no doubt about it.

Let's recap the "source" fields and their reasons to exist:

  • in the transaction, the source field contains the name of the PDF file. It is just for reference. The idea was to allow some "open pdf" functionality. However, that not trivial to implement because a) the app can run in a sandbox (macOS, Linux) and b) the PDF files can move
  • in the security event, the source field is used to understand from which source historical dividend payments are synced. For now only DivvyDiary. But this will be enhanced to manually add items. And maybe other sources.
  • in the attribute, the source field was introduced to allow syncing with external data source. Because the attribute has a UUID, I cannot reliable match it across different files. And using the label or column text also is language dependent. Alas, it is not really used at the moment.
  • in the taxonomy the source field was introduced for same reason: to know against which remote data to synchronize the taxonomy (it is then using the "properties" in the classification such as ISO codes).

The binary format is not preventing changes.

What are you proposing to change?

Astonishing spywork from both EODHistoricalData.com and PP!

I implemented that code some time back but never published it - in the sense of talking about it. It is marked as experimental for a reason. I never published it because it did not work well enough. For example, the country hierarchies for different instruments were different which made it very hard to sync. If you have here "Baltic" and then "Estland", it difficult to have on hierarchy for all instruments (I made this example up as I cannot remember exactly the differences). I haven't checked if it has improved now.

BTW, that was already the second attempt. The first attempt was a free service that unfortunately shut down after too many requests from PP users.

Anyway, Thomas and I are working on getting more classification data into Portfolio Report. And then I will revise the synchronization of the taxonomy again.

SecurityEvent [...] So, there should be something like "label" or "title" field to be shown on the chart, in addition to existing "details".

Fair point. It makes sense to have two properties: one as title, and one as the full text.

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 4, 2024

@buchen : Thanks for reply!

The binary format is not preventing changes.
What are you proposing to change?

So, specific proposal: Move "source" property from DividendEvent subclass to apply to the Event class in general. That's fairly trivial change, but would require migrating Protobuf binaries files, which I don't know how to do. So, let me try to prepare patch for the first part of it, and see if you can pick it up from there and handle Protobuf part. Thanks in advance.

And since that will require bumping file format version anyway, makes sense to add "description" field for Events at the same time, that will make Event schema "complete".

BTW, that was already the second attempt.

I keep being astonished by the functionality PP has, it's miles ahead of any other open-source project. But as the same time, it's sadenning, because with so much work already went into it, it's still maybe 50% of what a particular person wants, and given that these 50% are different for each person, it's unrealistic to fit everything into PP itself. Instead, it would rather be a generic (without adhoc limitations here and there) viewer/editor allowing easy external access to its data, letting people write standalone utilities to import external datasources. With access to PP data now much more streamlined, changes discussed in this ticket will allow external tools to reliably import and manage events data.

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 4, 2024

So, let me try to prepare patch for the first part of it, and see if you can pick it up from there and handle Protobuf part. Thanks in advance.

I posted #4181 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants