Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Scaled sorting over-weights communities with one or two prolific users #5210

Open
5 tasks done
andrewmoise opened this issue Nov 18, 2024 · 6 comments
Open
5 tasks done
Labels
bug Something isn't working

Comments

@andrewmoise
Copy link

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

I'm subscribed to some communities which have only a single very prolific poster, and 'Scaled' always puts them way high up in the rankings, because ca.users_active_month is very low.

I did a hack to change that parameter to, instead, the sum of:

    SELECT community_id, 
           SUM(comments + upvotes + downvotes) as total_interactions
    FROM post_aggregates
    WHERE published >= date_trunc('month', CURRENT_TIMESTAMP - interval '1 month')

... and the results look a lot more sensible.

My current code isn't clean, but I can try to knock up a PR that implements it in a better fashion, if there is interest.

Steps to Reproduce

Sort by scaled, subscribe to any community that has not much activity but lots of posts from one user.

Technical Details

Debian 12.8, installed from source

Version

0.19.5

Lemmy Instance URL

No response

@andrewmoise andrewmoise added the bug Something isn't working label Nov 18, 2024
@dessalines
Copy link
Member

That's what scaled is supposed to do: give a boost to smaller / less active communities. If someone makes a new community and posts a lot to it, trying to grow it, then scaled should be boosting that community so it can get more subscribers.

I'd suggest either using the Hot or Active sort, or blocking that specific community if its no interest to you.

@andrewmoise
Copy link
Author

Yeah, I get that. "Scaled" is great. I'm just saying that it has a particular failure mode, in communities where there's a prolific series of postings but not a lot of response from the community. Basically, when the ratio of "amount of content" to "number of users" is particularly high. I think "amount of content" is a much better denominator to use when trying to surface those smaller communities, as opposed to the number of users who reacted to the content. Using the latter gives sort of a backwards incentive, where it surfaces most the content that has the largest ratio of content posted vs. users who reacted to it.

I'm proposing using the former metric instead, which lets us keep all of those good factors -- giving a boost to smaller / less active communities, and boosting communities when someone makes a new community and posts a lot to it to try to grow it. But also, letting the communities without a lot of content have more of a chance in their turn, instead of consistently surfacing someone into the main feed day after day if they've been posting consistently for most of the month but not getting much response. Does that make sense?

It's up to you of course. I'm just saying that when I implemented the quick-and-dirty version of this, my scaled home feed started looking markedly better (showing the small communities without giving a boost to any of the good-naturedly-spammy communities).

@dessalines
Copy link
Member

I think I follow what you're saying, using "amount of content" instead of "active users" as the denominator, would boost low-content communities vs low-user communities. Although since our "active" counts are derived from posts, comments, and likes anyway, I'm not sure how different they'd be.

I think it could work as a different sort entirely, and wouldn't be opposed if you wanted to add that.

@andrewmoise
Copy link
Author

Yeah, it's almost always pretty much the same metric. The only place a difference comes in, is in communities which have an anomalously large amount of content contributed by an anomalously low relative number of users. The current metric boosts those way up in the rankings (because the denominator is tiny), which to me usually isn't what I want when that content surfaces frequently (even if I do want that community subscribed for some of the upvoted stuff).

I'll knock up a PR offering a new sort with a different name. Adding an option lets people compare the two before switching, and gives more chance for good feedback. It makes sense.

What should I call the new sort? "Rescaled"?

@dessalines
Copy link
Member

Cool. Not sure, maybe LowContent ? Once the PR is open we can get some more suggestions.

@andrewmoise
Copy link
Author

I created a PR, #5261. Looking at the complexity involved, I think it's better to just change the way that "Scaled" works. The sorting that this PR implements is basically exactly the same as the existing "Scaled" sort, it's just measuring the level of activity in the sub more accurately for the purposes of scaling. The complexity cost of implementing a whole new option, to me, doesn't seem worth it.

One useful aspect of implementing the PR the way that I did is that it's possible to apply the PR to a development instance and then flip back and forth between the two, to compare. But I don't think the PR should go in as-is; I think just amending "Scaled" is a better way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants