-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatically merging cells in tables #970
Comments
Welcome to fpdf2, @stevenlis ! Your suggestion is interesting, but raises a few tricky questions: Is a specific text string really the optimal trigger? The same string might appear in the data as legitimate cell content, after all. I expect that we'll eventually support row spans as well. How should we distinguish between the two cases in auto merging? More generally speaking: Column and row spans are essentially metadata of the table. Is it really wise trying to derive those from the cell contents? While it might be possible to get it to work in a very specific set of circumstances, a system like that seems very prone to suffering from unintended consequences. |
@gmischler Thank you for response. Indeed, there are definitely cases where you may want to leave an empty cell as it is, which is why I mention "any other user-defined placeholder" to provide users with flexibility. Regarding row spans, perhaps we can use two different characters/strings to define the rules. For example, "-" for colspan and "|" for rowspan, or any other characters that users prefer. I'm new to fpdf2 and my main challenge is that it requires looping over the table instead of taking the data as a whole. This approach provides flexibility in defining the style of each cell, but it would be easier if we had a function like As for the table layout, I took inspiration from matplotlib's mosaic, which allows you to easily define the layout of a figure. Maybe we can have a function that returns from fpdf import FPDF
pdf = FPDF()
pdf.set_font(family='helvetica')
pdf.add_page()
def remove_empty_map_colspan(table_data: list, empty_cell: str) -> tuple[list, list]:
"""
Remove the table cells with the specified value for empty_cell,
and maps the cleaned data based on the following rules:
- If the item/cell is not empty, it is mapped to 1.
- If the item/cell is empty, it is mapped to 2, and the next item in the row is skipped.
Args:
table_data (list): Nested list representing the table data.
empty_cell (string): Value representing an empty cell.
Returns:
tuple: A tuple containing the mapped colspan data and the cleaned data.
Example:
table_data = [
['0', '1', '2', '3'],
['A1', 'A2', '', 'A4'],
['B1', '', 'B3', 'B4'],
]
empty_removed_table, colspans = remove_empty_map_colspan(table_data)
print(empty_removed_table)
# Output: [['0', '1', '2', '3'], ['A1', 'A2', 'A4'], ['B1', 'B3', 'B4']]
print(colspans)
# Output: [[1, 1, 1, 1], [1, 1, 2], [1, 2, 1]]
"""
colspans = []
skip_next = False
for row in table_data:
new_row = []
for item in row:
if skip_next:
skip_next = False
continue
if item != empty_cell:
new_row.append(1)
else:
new_row.append(2)
skip_next = True
colspans.append(new_row)
empty_removed_table = [
[item for item in row if item != empty_cell] for row in table_data
]
return empty_removed_table, colspans
table_data = [
['0', '1', '2', '3'],
['A1', 'A2', '-', 'A4'],
['B1', '-', 'B3', 'B4'],
]
table_data, colpans = remove_empty_map_colspan(table_data, empty_cell='-')
with pdf.table(width=120) as table:
for data_row, colspan_row in zip(table_data, colpans):
row = table.row()
for cell, colspan in zip(data_row, colspan_row):
row.cell(cell, colspan=colspan)
pdf.output('test-file.pdf') |
The longer I think about it, the less comfortable I am with the idea of encoding structural information about a table in individual data items. This just looks loke bad coding and data management practise to me. Where do you encounter such data? When fpdf2 started, it described itself as "simplistic". It has gained quite some additional functionality since then, so we have now arrived at "simple and straightforward", essentially following the Python motto of "make simple tasks easy, and complex tasks possible". Tables are a very recent addition, and there's still room for improvement. But for simple data (without spans), you already can feed them in all at once. What you'd like to have though, is not simple. It's still possible, but you'll have to do some of the legwork yourself. If there is no way to acquire your data in a more resonable form, then the best workaround is probably to write a wrapper for |
If you perform data analysis to generate reports or check simple statistics using crosstabs, it's a part of your daily routine. I understand that fpdf2 is not specifically designed for report generation but has a more general purpose. Nonetheless... Thank you for the explanation. I don't think overwriting a class every time for simple things is very pythonic. Also, there's no need to overcomplicate things by using a context manager to create a table when it's already in a table structure. Anyway, that's just my opinion. |
Are you saying that data where column spans are defined through special cell contents are an inherent and necessary result of statistical analysis? Or are we really talking about different things here? |
To be honest, I'm not quite sure, but here's an example where I would like to generate summary statistics for different group breakdowns. import seaborn as sns
df = sns.load_dataset('tips')
df_result = (
df.groupby(['sex', 'smoker', 'time'])[['tip', 'size', 'total_bill']]
.mean().round(1).reset_index()
.pivot(index=['smoker', 'time'], columns=['sex'])
)
df_result The one above is perhaps the most readable format, but I think we can also accept the following if rowspan is a challenge. df_result.reset_index(inplace=True)
df_result.columns.names = [None, None]
df_result Then the issue is that how do we represent this structure, I think we one way is: table_data = [
['smoker', 'time' , 'tip', '-', 'size', '-', 'total_bill', '-'],
[' ', ' ', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
['Yes', 'Lunch', 2.8, 2.9, 2.2, 2.3, 17.4, 17.4],
['Yes', 'Dinner', 3.1, 2.9, 2.6, 2.2, 23.6, 18.2],
['No', 'Lunch', 2.9, 2.5, 2.5, 2.5, 18.5, 15.9],
['No', 'Dinner', 3.2, 3.0, 2.8, 2.7, 20.1, 20.0]]
] Each row represents a list with 8 items. An empty cell is denoted by a Of course, you can always convert it into a standard table structure. However, when including it in a PDF report, it becomes less readable. df_result.columns = ['_'.join(col) for col in df_result.columns.values]
df_result I closed this because I don't want you to feel like I'm requesting something just for myself. However, it's a very common use case in my experience when you create any kind of report that includes a table. |
I guess I should have made the connection to #969 myself, eh? 🙄 So we're not talking about detecting spans in the actual grid data, but rather about how to get the labelling formatted correctly. That makes it a bit less weird. Actually, in the context of a jupyter notebook, that data format makes perfect sense. The trick there is that jupyter simply doesn't draw any border lines. That way it doesn't have to do anything special with the empty cells, and they'll still give the visual appearance of a column or row span. What you're looking at there is essentially ASCII art... When you try to put the same data in a PDF for print (and with border lines), the requirements change quite a bit. In an ideal world, pandas would offer an option to give you the data in a more explicitly structured form (maybe HTML?). I've never looked at it in any detail, so I have no idea if it does. If it doesn't though, then the next necessary step is "data cleanup", to transform the ASCII art hack into something more palatable. You were hoping for fpdf2 to handle this cleanup for you. I don't think that is the right place for it (I may not have the last word on it, so that is just my take). From a software architecture perspective, I think this sort of problem would be most reasonably solved with a system of "data adapters". This could be a subclass as previously suggested, but a less tightly coupled approach might be even more flexible. Let's say we have a class that knows how to talk to a fpdf2 table. Then you create a subclass that can process and cleanup pandas output. Other subclasses can be added to handle any weird data produced by other packages. This keeps fpdf2 itself simple, while still enabling data exchange with pretty much anything out there. Don't worry about bringing this up. The topic raises some interesting questions that are clearly of general interest. While I personally don't agree with your initial approach, we do have a strong interest in enabling data transfer from other software (there's a whole section about that in the docs). I'll reopen this for now to see if someone else has a better idea than what I've come up with so far. |
This is less about generalized formatting and more specifically formatting multi-level indexes, and that feels like a pretty widely applicable use case! I'm going to take a crack at building out a more robust header/index functionality for tables, with an eye towards fixing this problem specifically. |
Both row span and row labels are features that we could use in any case, as they are necessary to create more complex tables than is currently possible. If you can come up with an implementation that offers those and is still reasonably easy to use, more power to you! 👍 The question of allowing special data items (ie. empty strings) to control the table formatting probably needs some more input and consideration. I'd recomment to postpone that aspect to a later phase. |
Hi @afriedman412, I'm sorry that your last, detailed comment was left unanswered for a year 😢 Thank you very much for the thoughts & efforts you put in this, and in PR #1046 👍 Are you still willing to continue this work, almost a year later? I'm going to review it now 🙂 |
Please explain your intent
It is common to encounter merged cells in a table. Currently, you need to manually create a merged cell using
colspan
inrow.cell()
. It would be helpful if merged cells could be handled automatically, eliminating the need to loop over your data and add multiple nested if-else conditions.Describe the solution you'd like
Assuming our table data appears as follows.
Here, we use
'-'
(or any other user-defined placeholder) to indicate an empty cell.The text was updated successfully, but these errors were encountered: