Statistical view of photoanalysis scores #1173

cfc62 · 2023-08-21T17:05:46Z

cfc62
Aug 21, 2023

This is graphical, so I couldn't put it in the reddit thread...

Here's the distribution of overall scores for my library:

Pretty much a bell-shaped curve. I do take a lot of snapshots as well as photos so this isn't all that surprising. The snapshots are taken as memory joggers and not necessarily for artistic content.

Curation:

I wonder if everything starts with 0.5000 and things get distributed from there.

If we work from that assumption, then it's curious that few photos are actually curated. NOTE: for this test I un-did all of my favorites. I will re-favorite photos and see if there's any change in curation scores over time.

For the rest of these, not a lot commentary yet, I still need to compare pictures to the scores to get an idea of what's being scored... the name of the score is in the screenshot.

For sharply focussed it starts from zero and goes up:

I suppose Photos is giving me just 0.2 credit for taking well-composed shots... lol

I take a lot of night-time and sunset/sunrise photos, so this may account for this:

I take that back on well-composed shots...

Noise only hurts, which would be interesting if you're going for the grain look:

Another bell curve for lively color:

It does catch when there's something in the picture that isn't helping the scene. Again, likely due to me taking snapshots as opposed to pictures:

Photos does not like my subjects!

No idea of what this is yet:

Or this, but it's got a log function in it...

This seems to get a boost in sunsets/golden hour/blue skies...

I find failure to be pretty good at spotting mistakes.... but not perfect.

No idea here...

The ones with the high score are almost always videos or timelapses:

Next up, correlations to overall score in the next post.

cfc62 · 2023-08-21T17:22:09Z

cfc62
Aug 21, 2023
Author

I took each score and correlated it to score_overall.

I'll start with the strongest correlations and work my way down. I won't show items like effect size, but you can infer it from the headline of each graphic. All correlations are waayyyy below p=0.05, usually around 0.00001.

3 replies

cfc62 Aug 21, 2023
Author

As it turns out, we can build a pretty darned good regression model...

That is, when we look at all of the scores, we can find what drives the score_overall value the most.

3 variables don't contribute and were removed:

lively_color
pleasant_pattern
pleasant_symmetry

After those are removed, we get a very nice model with high accuracy (r-sq of 92%).

The top contributors to score_overall are:

interesting_subject @ 16% contribution
pleasant_perspective @ 12% contribution
pleasant_lighting @ 11% contribution
harmonious_color @ 11% contribution
pleasant_composition @ 9% contribution
pleasant_camera_tilt @ 6% contribution
pleasant_post_processing @ 5% contribution
tastefully_blurred @ 4% contribution (this is bokeh and if I had more photos with bokeh this would probably rise)

Followed in order by well_framed_subject. pleasant_reflection, curation, intrusive_object_presence, noise, failure, well_timed_shot, immersiveness, well_chosen_subject, sharply_focused_subject, behavioral and low_light.

While these variables are important to the model, they get 0% contribution: interaction and highlight_visibility.

The predicted score_overall matches quite well to the actual score_overall:

Now, these are just the scores from photoanalysis. I will be adding camera/EXIF data to the mix as well, perhaps even some keywords.

Again, this is for my library. While I suspect the AI that's baked into Photos is the same for everyone, AI can and does do some odd things.

RhetTbull Aug 21, 2023
Maintainer

Nice work!

cfc62 Aug 21, 2023
Author

I'm also running cluster analysis but so far the results aren't anything really insightful.

RhetTbull · 2023-08-21T17:37:58Z

RhetTbull
Aug 21, 2023
Maintainer

Interesting -- thanks for sharing! I wonder how the overall distribution would change if you exclude screenshots and documents? These can be excluded by adding the following to the query:

--query-eval "'Document' not in photo.labels" --not-screenshot

(Assumes an English locale otherwise Documents might be something else)

1 reply

cfc62 Aug 21, 2023
Author

It most likely wouldn't change very much as I don't have a lot of them, maybe 100 out of 19,000 data points...

But I'll be taking a look regardless.

RhetTbull · 2023-08-22T20:59:09Z

RhetTbull
Aug 22, 2023
Maintainer

I posted this on reddit but re-posting here as it's easier to track the discussion here. Some additional details on what photoanalysisd is doing.

I think we can probably get more details about analysis state from the database it self that's more accurate than looking at scores. For example, for two photos, one that is analyzed (has scores) and one that isn't, the following are interesting (done inside the osxphotos repl interactive REPL:

>>> selected[0].tables().ZASSET.ZPHOTOANALYSISATTRIBUTES
(22510,)

>>> selected[0].tables().ZASSET.ZMEDIAANALYSISATTRIBUTES
(186,)

>>> selected[0].tables().ZASSET.ZANALYSISSTATEMODIFICATIONDATE
(714412529.694129,)

>>> selected[0].tables().ZADDITIONALASSETATTRIBUTES.ZSCENEANALYSISTIMESTAMP
(712063504.375,)

>>> no_score[0].tables().ZASSET.ZPHOTOANALYSISATTRIBUTES
(None,)

>>> no_score[0].tables().ZASSET.ZMEDIAANALYSISATTRIBUTES
(None,)

>>> no_score[0].tables().ZASSET.ZANALYSISSTATEMODIFICATIONDATE
(None,)

>>> no_score[0].tables().ZADDITIONALASSETATTRIBUTES.ZSCENEANALYSISTIMESTAMP
(None,)

So there's a timestamp of when the analysis was done. The ZPHOTOANALYSISATTRIBUTES and ZMEDIAANALYSISATTRIBUTES columns are foreign keys to tables with the same name. Perhaps there's a revisit rate based on ZANALYSISSTATEMODIFICATIONDATE or ZSCENEANALYSISTIMESTAMP. I'll try to test this by writing a script that logs these and does a diff to see if the delta is consistent.

3 replies

cfc62 Aug 22, 2023
Author

Cool!

I blew up my test library on my MBP today somehow when creating a new album with low scores of less than 0.3. I didn't realize how long it would take (it was running for more than 30 minutes), I decided to hit Control-C and dumped out. However, something I did also left the database in an odd state, and when I tried some subsequent album creations based on scores, it would maybe finish but also a lot of what looked like diagnostic output.

Also after that point the 'inspect' functionality also stopped working. It then got to a point where I had to force quit both the Terminal and Photos apps, neither would stop running.

I rebooted and the bad behavior continued, so I decided to rebuild the library, which took about 20 minutes and things have been fine since then.

(Note: this is on me, not on osxphotos. Measure twice, cut once and always work on test data on a test machine!)

After the rebuild, all scores were zeroed out. That alone is a bit interesting, but I've been interested in starting over again anyways with scores so actually a nice side benefit.

That's the prologue, what I've found as I've let photoanalysisd do its thing is that it's pretty darned quick on an m1 machine, only took about an hour to get through about 80% of the images. But curation must be another pass as there's nothing but 0.0s in there currently.

On my near highly-spec'ed iMac Intel machine, it would take days using fast RAID-0 Thunderbolt drives - I could see the drive LEDs flicker for days and days. Clearly the extra neural engines on the M1 silicon are in play here somehow.

Before I rebuilt the library, I looked through the very low score photos, and it's disappointing.

Disappointing in that there are memorable, important and even very nice photos in that bunch. They may not be artistic, but they are still valuable to me, and I am hoping Apple doesn't somehow programmatically relegate them to the trash bin or decide to not show them to me in favor of other photos.

Are there some truly low quality photos in there that I can cull? Yep. But I'd say at least half of the very poorly scoring photos are still valuable to me.

RhetTbull Aug 22, 2023
Maintainer

blew up my test library on my MBP today somehow when creating a new album with low scores of less than 0.3. I didn't realize how long it would take (it was running for more than 30 minutes),

The "add to album" feature uses AppleScript which is buggy and slow. I'm working on a way to do this programmatically via the native API which would be as fast as doing it in Photos. Hacked on it today and made some progress.

cfc62 Aug 23, 2023
Author

It would at least appear that the analysis is done in phases.

That is, nearly all of the scores are generated quite quickly, but not all of them.

interaction, behavioral and curation come after the first phase. Interestingly, I still have some completely unscored images in my (rebuilt) library, so whatever these phases are apparently overlap to some extent - the first phase isn't complete when the next analysis phase begins.

All three of these scores are 0 to begin with.

At some point, curation splits into a series of suspiciously round numbers, plus reals. The round numbers are 0.25, 0.5, 0.6, 0.625, 0.75 and then everything is reals above that point. My guess is that there's some formulaic approach to come up with a number here quickly and then let ML further the image at some later point. Letting the machine run overnight gives us 80% of the images tagged with 0.5, so I suspect that's the starting default value. About 5% are tagged with either 0.25, 0.6, 0.625 or 0.75 and we still have 4% with 0 score, with the remaining 11% or so having a real, unrounded figure assigned (ex 0.63877302).

First snapshot I took was all 0s for curation. The next had 0 and 0.5, and the one following that had all of the round numbers except 0.75 and then then on the last one 0.75 appeared.

I tend to think those real, unrounded figures are the ones that have been more deeply processed, and so while yes many of the scores are generated within hours, there's much less progress on curation.

Note, all of the curation values that are real are above 0.625.

behavioral is a bit different. Here there don't appear to be any default or formulaic numbers, either the figure is 0 (likely unprocessed) or a real, and the real range here is from 0.100000001 to 0.911086917. Here, 24% have a real value, so we're ahead curation in that respect (which looks to be at 11%).

When I look at the 0 scores for behavioral every single one of them corresponds to a placeholder value for the curation score. So tending to think that the generation of the curation score has a dependency on behavioral being generated.

interaction is interesting as well. First snapshot was all 0s, then the second snapshot showed 97% completion, but to an odd set of values: 0.01, 0.02, 0.0399999999, 0.05999999 ... to a maximum of 0.37999999. 87% of values are 0.01.

So, to get a curation score we need to have both an interaction score and a behavioral score, but just because we have those scores available doesn't mean a curation score is also generated just yet.

I have never seen promotion be anything other than 0 so far. Perhaps these are likes from shared album users, which I don't use yet.

It doesn't appear that score_overall changes over time, but I've only done a spot check so far. It looks to be generated once and then not changed, but I'll keep looking at this.

My guess is:

ML chews through the originals very quickly and assigns nearly all scores; this is on the order of a few hours for 100k images
Even while the first step above is still progressing, interaction analysis starts
Even while the steps above are still progressing, behavioral analysis starts
Even while all of the above steps are progressing, curation analysis starts on those assets where all other scores are generated

I'm not sure if curation is applied to all images.

Final notes, this library is on my M1 MBP and is on the internal drive. This library is the same as the other library that I analyzed above except I kept favorites in the mix here, and it's internal, which supposedly means additional spotlight capabilities.

cfc62 · 2023-08-24T16:06:44Z

cfc62
Aug 24, 2023
Author

A few observations at this point:

It's clear that whatever curation is only gets applied to a very small subset of photos; I have some ideas here but need more time to work on theories on what does/does not get curated and then to test those theories. On both my M1 MBP (with library just rebuilt) and M1 Mac mini (library running for weeks uninterrupted) the same small set of images were "chosen" for curation, and that set isn't being added to over time that I can tell so far
The analysis loves bokeh, flowers, and people; since I shoot mainly landscapes and architecture and such, I believe my photos are scoring lower
The scores are nonsensical at times, on both ends of the ranges, meaning you can have great images that have poor scores and vice versa

To some extent, I wonder if I'm looking at an incomplete scoring system, or at least one that's still a work in progress - or one that will be constantly evolving.

That is, while the db schema exists with score entries and there are values generated, that doesn't mean the models are actually working well and Apple could be tuning these over time, or releasing functionality over time (quietly, of course, as is their way). I'd wager there's a photo scoring analysis roadmap somewhere and there are plenty of items yet to be tuned, worked on or released.

And it could also be that some of these scores - current or future - depend on/will depend on ML machine hardware to either execute or execute efficiently, and I might not have the hardware that unlocks everything. This may also explain why an Intel system chews on the library for weeks and weeks while M1 hardware is essentially complete within a day or so.

Regardless of the above, I find this all fascinating and will continue looking into these scores.

I'll load in the post-rebuilt MBP scores into the stats tool and look for any meaningful differences, and the same for the Mac mini's database. Note, the MBP had favorites included in the library so that will let me know if favoriting anything changes scores or scoring methods in any way.

I'll also add camera and other information into the export query and re-analyze the data set. I've long wondered if somehow Apple isn't biasing results towards iPhone photos, that's one thing I'd like to look at. If I can figure out how to properly export generated tags so they can be correctly analyzed by the stats package, that's something I'd like to do.

While I've been burned by taking beta MacOS releases in the past, perhaps Sonoma is stable enough for me to try the new Photos version. Here, I'd create a new user account on my MBP, then create a small, new Sonoma-version photoslibrary by not importing images, and importing from the existing originals directory; my assumption is that EXIF data is not changed/removed for the originals. I suppose I should double-check that.

TL; DR: suspect photoanalysis is a work in progress by Apple and it continues to evolve, so any findings I may come up with could be invalidated by a software update or model update, but I'm still delving into this out of curiosities sake.

4 replies

RhetTbull Aug 24, 2023
Maintainer

my assumption is that EXIF data is not changed/removed for the originals. I suppose I should double-check that.

In my experience, originals are never modified when importing.

Since you have an M1 you can use VirtualBuddy to virtualize a Sonoma machine for testing. I am doing that for osxphotos development. Works well but you can't (easily) login to iCloud from a virtual machine.

cfc62 Aug 24, 2023
Author

Thanks for the tip on VirtualBuddy, will check that out!

I did check the EXIF data and it looks intact. Clearly the filename changes to a unique, hashed value but that looks like the end of the changes.

cfc62 Aug 24, 2023
Author

I've been told that sometimes I can't let things go... lol

What I didn't notice in the screenshots above is that for some reason my stats package decided to take only a 20% sample of the data that I have. I fixed that, and... nothing changes. The graphs are smoother, the correlation graphs are also smoother, but the results are the same as before. I also filtered out 0 scores this time; should have done that the first time but oops.

I thought I might see some improvement in the regression model with an increase in data, but it's almost exactly the same. Still with 92% r-squared and the same contribution levels as I wrote above.

One thing I can say, tagging something as a favorite has no impact whatsoever; I included what I had favorited in this run and... it has no impact. It contributes 0 to the regression model. Apple may be missing an opportunity here to start to understand what sort of photos I like...

I still find it curious that score_overall has a bell-shaped distribution. Is there some sort of forced ranking going on here? I find it hard to believe that somehow I take photos that wind up in this distribution...

RhetTbull Aug 24, 2023
Maintainer

Interesting!

One thing I can say, tagging something as a favorite has no impact whatsoever

If you could find a correlation between favorite and other scores, it would be possible to write a osxphotos plugin that proposed "possible favorites" and added those to an album.

I still find it curious that score_overall has a bell-shaped distribution

Definitely curious. I'd have expected the distribution to be skewed towards higher scores as most of the time, the user is framing the shot and attempting to produce a good output.

cfc62 · 2023-08-24T21:42:41Z

cfc62
Aug 24, 2023
Author

Looking at this a different way, even though almost all of the curation values are set to round values and there's a definite lack of deeper curation analysis, it's the only variable that has a relationship to favorited photos. It is a pretty strong relationship though, regression analysis above notwithstanding.

OK, that's not strictly true, as highlight_visibility also has a correlation to favorites, but I suspect that's because I've favorited all of my time lapse videos in the past to try to promote them manually in the past - and highlight_visibility is almost always an indication of a video of some sort.

Back to curation, a shocking 98% of my favorited photos/videos have a curation score of 0.75, the other 2% are all above 0.8.

Note, I stopped favoriting photos a couple of years ago; to make this accurate I'll have to go through my library and select more. I'll do that but with 75,000 photos to look at it won't be quick (and if I cheat and use existing scores I'll likely bias my selection).

So I wouldn't draw any conclusions from this just yet.

As mentioned before, I don't think Photos is scoring my favorites higher, it's more like we "agree" that the photos I selected as favorites are curated at 0.75 or higher. But with more favoriting this may change.

0 replies

cfc62 · 2023-08-24T21:57:38Z

cfc62
Aug 24, 2023
Author

Before I forget, one of the things that frustrated me very much and got me looking into this analysis in the first place is the "Days" tab of the Library.

I really like this presentation. As far as I know, it's the only place where videos get presented alongside photos. It's easily digestible and just pleasant overall.

The issue is that I cannot get items into this view. Sure, I can remove ones I don't like, but some how, some way, Photos is making the decision on what to display there.

Is there a way to dump a list of all of the assets shown in that view? It's persistent so I'd have to think it's in the db somewhere...

6 replies

RhetTbull Aug 25, 2023
Maintainer

By the way, you can see the schema in the wiki or by running:

sqlite3 ~/Pictures/Photos\ Library.photoslibrary/database/Photos.sqlite ".schema"

cfc62 Aug 25, 2023
Author

Wow, you just keep giving me more red meat... lol.

One interesting thing about the "days" view is it is the only view I have even seen that specifically mentions curating photos. When the analysis phase is still underway, say after adding another batch of photos, the very bottom of the Days tab will have a message regarding curation and it's underway and if I remember right it had a thermometer bar in it as well. This message of course disappears when whatever curation process they're referring to completes, so if you were to look right now there may be no message there at all.

I see some crazy things in the schema I definitely want to view. I'm starting to think there's some other value per asset in the db that controls presentation as opposed to scores. I don't have anything to back that up but I'm going to look.

I'm going to start a different thread on mdls output, seeing some off the wall stuff in there as well - but only for photos outside of the photos library. IIRC someone had started a thread on outside of Photos data I may add it to there. I won't put it here on GitHub for osxphots as it's outside scope but it's also relevant to what we've been discussing here.

Spoiler, there's a LOT of data being captured in spotlight that is not in Photos, especially for iPhones.

RhetTbull Aug 25, 2023
Maintainer

If you want to explore the mdls (and other data), I wrote a tool for that too. I doubt Photos is using this for curation because it's computed on device and would have to be synced to iCloud to keep the views consistent but you never know.

If you discover values in the database that would be useful let me know as it's fairly easy to add those to osxphotos.

RhetTbull Aug 25, 2023
Maintainer

When the analysis phase is still underway, say after adding another batch of photos, the very bottom of the Days tab will have a message regarding curation and it's underway and if I remember right it had a thermometer bar

One thing you could do is add a bunch of photos, immediately run osxphotos snap then wait until curation is done and run osxphotos diff

These commands create a temporary snapshot of the database then lists the differences (as a SQL diff) in the database

cfc62 Aug 25, 2023
Author

I found the spotlight tool right after I wrote my response, ha.

I don't think there's a connection between the analysis that spotlight does vs. Photos but some things appear to be in common. I think we already know that dev groups within Apple don't always talk or coordinate between themselves.

I'll try the snap at some point soon.

cfc62 · 2023-08-27T20:58:42Z

cfc62
Aug 27, 2023
Author

Curiosity got the better of me so I've been playing with Sonoma.

It hasn't been a smooth ride. Installation to an external USB SSD failed twice, then I tried a TB-based SSD and that worked. But let's just say switching back and forth between boot drives has been a nightmare. Twice I was left with a completely unresponsive MBP, which isn't a great feeling. Turns out regardless of battery level you MUST be attached to power to reliably change boot devices.

As far as Photos goes, attempting to import more than a handful of originals fails with "unknown error" for 99% plus of the photos when trying to bulk import.

But if I choose a few hundred, they import fine.

One note, Photos does change originals to a unique hashed filename. I'm wondering if that's somehow impacting the importation process. So they're not completely untouched. Whether or not that makes a material difference... shrugs.

In Sonoma, I did not add the originals to the library; I wanted to see if that had any impact on the photoanalysisd process.

So far, things seem about the same but it's early. Somewhat surprisingly, the originals don't look like they're being read during analysis. The drive with the referenced originals on it has a power/activity light and it powers down after a few minutes if it's not being used; I haven't seen it light up once during analysis, but the external boot drive's activity LED has been flickering. That means lower quality renders are tapped for analysis.

Perhaps the generated tags are a bit more specific and more words are generated, but again it's early. I'll be trying to compare scores as well, especially photo-to-photo.

Also looks like my suspicion that I've been carrying along cruft from previous versions back to iPhoto and Aperture is correct, but also early days there. But now there's just a handful of directories.

1 reply

RhetTbull Aug 27, 2023
Maintainer

Turns out regardless of battery level you MUST be attached to power to reliably change boot devices.

I just use virtualization instead of dual booting. Very easy to spin up an instance of Sonoma for testing.

One note, Photos does change originals to a unique hashed filename. I'm wondering if that's somehow impacting the importation process. So they're not completely untouched. Whether or not that makes a material difference... shrugs.

Since Catalina (10.15), Photos has renamed imported photos to match a universally unique ID (UUID) (referred to by Photos as the "local identifier") -- this is unique to the particular library on a particular Mac. If you rebuild the library, the photo will likely get a new UUID. The same photo on another Mac will have a different UUID. But it is very convenient as the UUID is the same as the key in the Photos database for that photo so it's easy to find all assets associated with a photo without even looking in the database.

Other than the name change, Photos does not modify the original photo. I just ran a test to verify this: created a photo called test_photo.jpg. Imported that to Photos. Took md5 hash of both the original and the imported copy and they are identifcal:

# hash of original
❯ md5 test_photo.jpg
MD5 (test_photo.jpg) = 59dac66d9967c24e6a1a5be587273f08

# find path to the imported photo
❯ osxphotos query --quiet --print "{photo.path}" --name "test_photo.jpg"
/Users/rhet/Pictures/Test-13.0.0.photoslibrary/originals/D/D13EB1B5-DF50-4F66-9628-39FD43AA69B7.jpeg

❯ md5 /Users/rhet/Pictures/Test-13.0.0.photoslibrary/originals/D/D13EB1B5-DF50-4F66-9628-39FD43AA69B7.jpeg
MD5 (/Users/rhet/Pictures/Test-13.0.0.photoslibrary/originals/D/D13EB1B5-DF50-4F66-9628-39FD43AA69B7.jpeg) = 59dac66d9967c24e6a1a5be587273f08

cfc62 · 2023-08-28T16:46:26Z

cfc62
Aug 28, 2023
Author

Sonoma Photos first impressions:

There's something amiss with importation. Lots of failures, lots of error messages including the very helpful "unknown reason" for failure. I had hoped to start completely new and import all my originals to essentially have a "scratch install" of Photos assets and to have everything re-scored without any past scores influencing re-scoring. Not saying this happens, but wanted to keep the variables to a minimum. Well, that wasn't working out. I'll send Apple feedback on this.

Decided to use the copy-on-write functionality of APFS and created a duplicate of my existing .photoslibrary. This will (hopefully) leave the current version alone while giving me a new version that I don't mind gets updated to the new schema.

Upgrading the library to Sonoma took maybe a couple minutes at most. Then I could see both photoslibraryd and photosanalysisd fire off, even while I was still in Photos. Waited for those to quiet down, which took about 30 minutes.

Ran the query at that point and:

comparing Ventura asset scores to Sonoma asset scores shows they were indeed re-calculated. But the differences are quite small; say 0.53145 vs. 0.53234.
it seems like it hustles through the entire dataset all at the same time. I couldn't detect any time when interaction or behavioral were not filled in when other scores were not, but it's possible I missed it.
Suspect the curation phase may be taking place now - I walked away from the machine to let it ruminate, went back and can see disk access lights flashing where the library is located, so something is up
Even though the scores changes to some small amount, the distributions of the scores, the correlations of the scores to score_overall and the regression model are statistically identical.
Way way way more ML-generated tags, as Apple promised. For example, now I see the type of animal (giraffe) or flower (lily) as well.
Text OCR seems more accurate
Reverse geolocation tags are also more detailed, more synonyms, down to the neighborhood and or geographical nickname

This library did have some assets favorited, and now it's clear to see that if you favorite something, it will automatically be curated at 0.75 vs. the standard 0.5.

But there is still no correlation between score_overall and favorite. In fact, as I've mentioned before I've favorited pictures all across the score_overall range, and below is proof of that:

Is there a correlation? A slight one at best.

I looked at the relationships between me favoriting a picture and the other scores, and there's not much there, certainly not enough to stand on. I think the only thing I can safely say is that I don't favorite assets that have lower (negative) failure scores. There's some interaction influence as well but we're talking very minor.

8 replies

cfc62 Aug 29, 2023
Author

To be very clear, this is exactly the same library with snapshots before and after additional photoanalysisd processing. I made absolutely no changes in any way. In fact, I haven't touched the library whatsoever since I converted it from the Ventura version.

I'm looking through the schema now (holy smokes there's a lot in there) in the vain hope that I can spot a truly constant ID - but you'll be far far far more familiar with this than I am.

RhetTbull Aug 29, 2023
Maintainer

The fingerprint (ZFINGERPRINT I think) is unique to an image but if there are duplicates, the fingerprint will be identical. (it's a hash)

cfc62 Aug 29, 2023
Author

On the "Days" tab it finally says it's done curating.

I'll give it another night then pull the scores again.

LOTS of new items in the new schema, including a lot of facial recognition values, like gender, gaze, glasses, ethnicity (!), hair color, expression and more. These are integers so guessing coded, 1,2,3,4,5 etc.

Also see overallaestheticscore which would be nice to query if it's not already possible - it's not new in Sonoma, but there are a few other scores: iconic, "stickerconfidence" (whatever that is), blur and I think that's it. A few scores look to be for media analysis as well.

RhetTbull Aug 29, 2023
Maintainer

LOTS of new items in the new schema, including a lot of facial recognition values, like gender, gaze, glasses, ethnicity (!), hair color, expression and more.

See #1175

Also see overallaestheticscore which would be nice to query

photo.score.overall

cfc62 Aug 29, 2023
Author

Awesome, I'll add that for the next query run!

cfc62 · 2023-08-30T17:03:03Z

cfc62
Aug 30, 2023
Author

I've always wondered if the camera type influenced how Photos behaves, and it might:

Sure, of course I've gotten better at framing, subjects and all that, but in the end, there are clear differences in the scores here.

I took out any camera where either a) I don't have it and therefore someone sent me the photo or b) there weren't enough samples to be fair. The "_" is screenshots/pngs and the like, also filtered out.

0 replies

cfc62 · 2023-08-30T18:36:02Z

cfc62
Aug 30, 2023
Author

It chewed on the library quite a bit overnight, love an easily-seen LED disk activity light... I also use iStats Menus so I track both CPU and disk access activity, it was busy last night.

Good news, no more UUID changes - everything lined up as expected, which means I could (somewhat) easily check for scores being updated. I checked a handful of scores and there were only a smattering of updates, meaning maybe 20-5000 updated scores - 5000 sounds like a lot but it's 5% of the total:

score_overall had only 22 updates, but all of them were for movie file formats
curation had 62 updates, with a mix of photos and videos
highlight_visibility had 5378 updates
behavioral had 23 updates - mainly but not completely the same as score_overall assets
failure had 20 updates, all were in the set of videos that had score_overall updates
interaction had 35 updates, all videos but not the same exact set that had the score_overall updates - perhaps 40% in common

Clearly some secondary and even tertiary re-scoring going on, especially for videos. If it's going to really chew through every video I have, I could see it taking months - actually I suspect it may never catch up and new ML models or even new Photos versions may come along before it's ever really "finished."

There are 570 assets - mainly photos - with overall_scores of 0. shrugs

I need to shelve this project for a bit but there are still items to figure out and I will get back to this in maybe a week or so.

We can test the odd bell-shaped distribution by exporting nothing but highly scored assets, creating a new library and importing them and seeing if it forces them into another bell shaped distribution
I'm now exporting fingerprints so if something changes in the UUIDs going forward it will be easier to track
I'd like to figure out how to include machine-generated tags into my analysis
I'd like to analyze other library data sets to see how the statistics change - perhaps all of these findings are unique to me (suspect not though)
Create albums of outlier scores for each score to determine what the scores actually represent IRL

I need my daily driver back on Ventura! Thanks so much for this software, it's been and will be extremely useful!

0 replies

RhetTbull · 2023-09-04T17:40:16Z

RhetTbull
Sep 4, 2023
Maintainer

Still too early to tell if this will work but I've found the private API for "PHAManager" which is the "Photo Analysis Manager". I'm able to access it without crashing the machine (that's the first step!). If I can get this to work, there are interesting methods such as:

stopAllBackgroundActivities
startTurboProcessing
dumpAnalysisStatusWithContext

If I can get this to work, I think a status bar app that lets you start/stop photoanalysis and check status would be useful.

1 reply

cfc62 Sep 4, 2023
Author

That would be awesome - and telling the machine to get on with it would be great too, which is what I assume TurboProcessing is...

Statistical view of photoanalysis scores #1173

cfc62 Aug 21, 2023

Replies: 11 comments · 27 replies

cfc62 Aug 21, 2023 Author

cfc62 Aug 21, 2023 Author

RhetTbull Aug 21, 2023 Maintainer

cfc62 Aug 21, 2023 Author

RhetTbull Aug 21, 2023 Maintainer

cfc62 Aug 21, 2023 Author

RhetTbull Aug 22, 2023 Maintainer

cfc62 Aug 22, 2023 Author

RhetTbull Aug 22, 2023 Maintainer

cfc62 Aug 23, 2023 Author

cfc62 Aug 24, 2023 Author

RhetTbull Aug 24, 2023 Maintainer

cfc62 Aug 24, 2023 Author

cfc62 Aug 24, 2023 Author

RhetTbull Aug 24, 2023 Maintainer

cfc62 Aug 24, 2023 Author

cfc62 Aug 24, 2023 Author

RhetTbull Aug 25, 2023 Maintainer

cfc62 Aug 25, 2023 Author

RhetTbull Aug 25, 2023 Maintainer

RhetTbull Aug 25, 2023 Maintainer

cfc62 Aug 25, 2023 Author

cfc62 Aug 27, 2023 Author

RhetTbull Aug 27, 2023 Maintainer

cfc62 Aug 28, 2023 Author

cfc62 Aug 29, 2023 Author

RhetTbull Aug 29, 2023 Maintainer

cfc62 Aug 29, 2023 Author

RhetTbull Aug 29, 2023 Maintainer

cfc62 Aug 29, 2023 Author

cfc62 Aug 30, 2023 Author

cfc62 Aug 30, 2023 Author

RhetTbull Sep 4, 2023 Maintainer

cfc62 Sep 4, 2023 Author

cfc62
Aug 21, 2023

Replies: 11 comments 27 replies

cfc62
Aug 21, 2023
Author

cfc62 Aug 21, 2023
Author

RhetTbull Aug 21, 2023
Maintainer

cfc62 Aug 21, 2023
Author

RhetTbull
Aug 21, 2023
Maintainer

cfc62 Aug 21, 2023
Author

RhetTbull
Aug 22, 2023
Maintainer

cfc62 Aug 22, 2023
Author

RhetTbull Aug 22, 2023
Maintainer

cfc62 Aug 23, 2023
Author

cfc62
Aug 24, 2023
Author

RhetTbull Aug 24, 2023
Maintainer

cfc62 Aug 24, 2023
Author

cfc62 Aug 24, 2023
Author

RhetTbull Aug 24, 2023
Maintainer

cfc62
Aug 24, 2023
Author

cfc62
Aug 24, 2023
Author

RhetTbull Aug 25, 2023
Maintainer

cfc62 Aug 25, 2023
Author

RhetTbull Aug 25, 2023
Maintainer

RhetTbull Aug 25, 2023
Maintainer

cfc62 Aug 25, 2023
Author

cfc62
Aug 27, 2023
Author

RhetTbull Aug 27, 2023
Maintainer

cfc62
Aug 28, 2023
Author

cfc62 Aug 29, 2023
Author

RhetTbull Aug 29, 2023
Maintainer

cfc62 Aug 29, 2023
Author

RhetTbull Aug 29, 2023
Maintainer

cfc62 Aug 29, 2023
Author

cfc62
Aug 30, 2023
Author

cfc62
Aug 30, 2023
Author

RhetTbull
Sep 4, 2023
Maintainer

cfc62 Sep 4, 2023
Author