Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of Excel's tables feature #2671

Merged
merged 19 commits into from
Apr 23, 2022

Conversation

aswinkumar863
Copy link
Contributor

This is:

- [ ] a bugfix
- [x] a new feature

Checklist:

Why this change is needed?

Excel's tables feature (i.e. Select Home > Format as Table in Excel App).

Tables are similar to AutoFilter but tables have several other advantages like named ranges, easy formatting, totals row and header row with filter. Tables can also be converted to charts and pivot tables easily.

Unlike AutoFilter, Tables are not restricted to only one per worksheet.

Issue: #1816, #2337

Usage:

$table = new Table();
$table->setName('Sales_Data');
$table->setRange('A1:D17');
$spreadsheet->getActiveSheet()->addTable($table);

In this Commit:

  • Added Table API with initial support for header and totals row.
  • Added complete styling options for Table.
  • Added Xlsx Writer for Table.
  • Added samples.
  • Covered with unit tests.

To be done:

  • Filter expressions similar to AutoFilter.
  • Precalucate formulas for totals row (Check sample 2).
  • Table named ranges in formulas and calculation.

aswinkumar863 and others added 2 commits March 10, 2022 19:54
Initial implementation of Excel's tables feature (i.e. Select Home >
Format as Table in Excel App).

Tables are similar to AutoFilter but tables have other advantages like
named ranges, easy formatting, totals row and header row with filter.
Tables can also be converted to charts and pivot tables easily.

Usage:
$table = new Table();
$table->setName('Sales_Data');
$table->setRange('A1:D17');
$spreadsheet->getActiveSheet()->addTable($table);

In this Commit:
- Added Table API with initial support for header and totals row.
- Added complete styling options for Table.
- Added Xlsx Writer for Table.
- Added samples.
- Covered with unit tests.

To be done:
- Filter expressions similar to AutoFilter.
- Precalucate formulas for totals row (Check sample 2).
- Table named ranges in formulas and calculation.
@MarkBaker
Copy link
Member

MarkBaker commented Mar 25, 2022

Sorry that it's taken me a while to look at this.

Looking at the PR, I have a couple of questions:

  • Should the table be protected to prevent adding new rows/columns or deleting columns/rows within the table range? If not, then it probably needs a change to the ReferenceHelper to adjust the table range
  • What styling is applied by default to the table? Is it only the pre-defined styles, or can this be changed by developers? Does it need any new styles writing to the Xlsx? (Custom styling created a lot of problems for me the last time I tried an implementation of pivot tables). e.g. adding a number format mask to your totals row, or for monetary value columns.
  • Does the table need an entry in the Defined Names section? (e.g. to handle your formula in the totals example). There's no defined name entry in the XML for the file, but it does appear in the Name Manager in MS Excel itself

@MarkBaker
Copy link
Member

MarkBaker commented Mar 25, 2022

And I already know that structured references are going to be a real pain in the Calculation Engine. I notice that you're explicitly setting $writer->setPreCalculateFormulas(false); before saving in your example to prevent the formula hitting the CalcEngine; but this needs to the emphasised in documentation and examples until the CalcEngine can handle those structured references, otherwise we'd end up with a lot of issues being raised

Copy link
Member

@MarkBaker MarkBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few comments, just to ensure that you're considering those factors

@aswinkumar863
Copy link
Contributor Author

Sorry that it's taken me a while to look at this.

Looking at the PR, I have a couple of questions:

  • Should the table be protected to prevent adding new rows/columns or deleting columns/rows within the table range? If not, then it probably needs a change to the ReferenceHelper to adjust the table range
  • What styling is applied by default to the table? Is it only the pre-defined styles, or can this be changed by developers? Does it need any new styles writing to the Xlsx? (Custom styling created a lot of problems for me the last time I tried an implementation of pivot tables). e.g. adding a number format mask to your totals row, or for monetary value columns.
  • Does the table need an entry in the Defined Names section? (e.g. to handle your formula in the totals example). There's no defined name entry in the XML for the file, but it does appear in the Name Manager in MS Excel itself
  • No, the table is not protected. Let me check the ReferenceHelper.
  • Excel's pre-defined TableStyleMedium2 style is applied by default. Only the predefined styles can be used by developers. I have to plan about custom column formatting via dataDxfId. Custom table style is not yet planned (More priority to Filter expressions and Precalculation).
  • No, the table doesn't need an entry in the Defined Names. But the calculation engine must consider the table and its column as defined names to perform precalculation.

@MarkBaker
Copy link
Member

  • No, the table is not protected. Let me check the ReferenceHelper.

The ReferenceHelper should have the example of AutoFilters to provide you with an example; because deleting rows in a Table area is straightforward (especially as column style and calculated columns are for a future stage), but if columns are deleted you have to check remove those entries and adjust offsets. Hopefully it won't be too difficult.

@MarkBaker
Copy link
Member

MarkBaker commented Mar 26, 2022

  • No, the table doesn't need an entry in the Defined Names. But the calculation engine must consider the table and its column as defined names to perform precalculation.

If you can reach the point where formulae can be applied and written; I'll try to work out how to process structured references in the Calculation engine... until then, I think it's easiest if the save() method disables preCalculation if there are Tables on a worksheet, rather than relying on developers remembering that they need to do so.
I'm already doing a lot of work with the Calculation Engine at the moment to support array formulae and spillage; but I'll have to have a think about how to approach the complexities that structured references introduce.

@MarkBaker
Copy link
Member

Thank you for the PR and the work that you're doing.

This is probably the single most requested new feature for PhpSpreadsheet; and it's one that I've attempted (and failed) To implement twice in the past. Custom column formatting (if the style wasn't used elsewhere in the worksheet) was my biggest problem, because the Writer wasn't writing it... hopefully, if you're aware of that pitfall, you'll be able to avoid it.

aswinkumar863 and others added 6 commits April 3, 2022 18:06
Validation added for
- invalid characters
- invalid names ("C", "c", "R", or "r")
- cell references
- space separate words
- maxlength of 255 characters
- unique table names across worksheet
Option to remove the table from table collection of
worksheet
Automatically adjusts table range on insertion and
deletion of rows and columns within table range
Option to add column formula that applied automatically
for any new rows added to the table range
Copy link
Member

@MarkBaker MarkBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few more comments here. Some of these issues might seem to be nit-picking; but this is very close to being ready now

aswinkumar863 and others added 7 commits April 16, 2022 18:46
Added support for UTF-8 Table names (including
combined character)
testColumnInRange method renamed to isColumnInRange
Range must be at least 1 column and 2 rows
Replaced worksheet argument with table name
Copy link
Member

@MarkBaker MarkBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one more change that I've noted;

You're well ahead of my work on Structured References in the Calculation Engine: I've started that, and have made some progress; but am having "fun" with the capturing regexp

aswinkumar863 and others added 3 commits April 23, 2022 18:31
Table name comparison changed to UTF-8 aware and case insensitive
Table constructor now accepts AddressRange and array
of [$fromColumnIndex, $fromRow, $toColumnIndex, $toRow]
@aswinkumar863
Copy link
Contributor Author

Names are case insensitive; so you should probably force both to upper or lower case before doing a strict comparison; remembering that they are UTF-8 strings, so using strToUpper() or strToLower() from the Shared StringHelper, which is UTF-8 aware

Implemented in d414f13

Copy link
Member

@MarkBaker MarkBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution; it's much appreciated. I'll add the details to the Change Log.

@MarkBaker MarkBaker merged commit 10d175e into PHPOffice:master Apr 23, 2022
This was referenced Apr 23, 2022
@m-thalmann-athesia
Copy link

Hi all! I have a question regarding this feature: Is this documented anywhere? I can't seem to find this in the documentation...

@MarkBaker
Copy link
Member

@m-thalmann-athesia
This is documented only with samples, and was part of the 1.23.0 release in April this year.

It's a complex feature, and being implemented in stages, with creation/writing first, and Reading basic tables is the second part included in the next release. Calculated table values won't be fully supported until I can get the Calculation Engine handling Structured References, which has been driving me insane for the past 9 months. Until then, calculated values can't be calculated after reading; and it's necessary to disable calculation when writing (although the formulae can still be written).

Full documentation is likely to be created only after the calculation engine works with Structured References; though I might add some basic documentation over Christmas/New Year.

@m-thalmann-athesia
Copy link

@MarkBaker oh I see! Okay, thank you very much for the hard & great work!

@bwl21
Copy link

bwl21 commented Jun 30, 2023

@MarkBaker with this code fragment

        // Create Table Style
        $tableStyle = new TableStyle();
        $tableStyle->setTheme(TableStyle::TABLE_STYLE_MEDIUM23);
        $table->setStyle($tableStyle);
        $spreadsheet->getActiveSheet()->addTable($table);

I get

image

with

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Reparaturergebnis speichern in persons (27)0.xml</logFileName>
<summary>Fehler in Datei '/Users/beweiche/Downloads/persons (27).xlsx'</summary>
<repairedRecords summary="Die folgenden Reparaturen wurden durchgeführt:_x000d__x000a__x000d__x000a_">
<repairedRecord>Reparierte Datensätze: Tabelle von /xl/tables/table1.xml-Part (Tabelle)</repairedRecord>
</repairedRecords>
</recoveryLog>

What can I do?

@oleibman
Copy link
Collaborator

@bwl21 Samples/Table/01_Table.php uses exactly the code you asked about, and does not create a corrupt file. (It does a little additional styling, but, even when I comment out those additional statements, there is no problem with the output.) In order to see what's going wrong for you, I need more information - the script or the output file or both.

@bwl21
Copy link

bwl21 commented Nov 16, 2023

@bwl21 Samples/Table/01_Table.php uses exactly the code you asked about, and does not create a corrupt file. (It does a little additional styling, but, even when I comment out those additional statements, there is no problem with the output.) In order to see what's going wrong for you, I need more information - the script or the output file or both.

thanks for your investigatin. i will try to make a test case.

@bwl21
Copy link

bwl21 commented Dec 2, 2023

@oleibman I managed to find the reason:

  1. the content of a cell must not bei null
  2. I observe that the headers must be unique

can you confirm this?

@oleibman
Copy link
Collaborator

oleibman commented Dec 2, 2023

Confirmed. Although my following statement may not be precisely correct, it captures the gist. Excel will report a problem if a table has columns with the same name, or a column with no name. We might be able to do something to prevent us creating a spreadsheet with those conditions. Please open a new issue if you want us to look into that. In the meantime, you should have enough information to be able to get around your problem for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants