-
-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aggr-] allow ranking rows by key column #2417
base: develop
Are you sure you want to change the base?
Conversation
The standard way is to convert the column into a known type, and then anything that can't be converted (errors and nulls) become |
Yes, that seems like it should work. Should the rank aggregator pick the known type, and if so, which one? Or is it the user who should convert the column? |
Since it's not obvious which type to pick, the user can convert the column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love what this is adding, and I think with a few tweaks it would be even more powerful!
There are two kinds of ranking operations people may want.
What is a good name for these two aggregators? |
8078fb6
to
c6c608e
Compare
Okay, I implemented a command that adds a column and applies an aggregator to rows after grouping them by key columns. It's To get this to work with list aggregators, I made a new class I also tried making a I'm a bit unsure about the new behavior of the
|
There is one detail about the grouping in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move most of this into features/addcol_sheetrank.py
? It's a fair amount of code and I'd like to make it self-contained as much as possible.
It is needed now that addcol-aggregate can apply stdev to groups, which may include lists of size 1.
Okay, I moved the I had to move the One notable consequence of this move is that the new visidata/visidata/features/rank.py Line 41 in ec28446
|
This PR adds a
rank
aggregator that returns a list, and a commandaddcol-rank
, which adds a new column with the rank of each row. Ranks are calculated by comparing key columns.It also fixes a bug in
memo-aggregate
where long output takes an extremely long time to show up in the statusbar.For example:
seq 1222333 |vd -
, thenz+
list
. After the list is calculated, visidata will get stuck for many seconds showingprocessing…
, because it's very slow to runformat()
on a long sequence.I think it's worth having an aggregator for rank, and the need for a simpler solution than the current method has come up before. On the other hand, I know part of Visidata philosophy is that it's not a spreadsheet. How do people feel about having a rank aggregator?
Also, in its current form, the rank aggregator will give errors when comparing key columns with different types across 2 rows:
What's the standard way to handle sorting mixed types for Visidata?