-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table.traverse()
is extremely slow when table has many blanks
#46
Comments
A complex question. In short: odfdo is inherently slow. What does traverse() do:
I don't see much room for optimization. One could imagine stripping the document on opening (with optimize_width or something), but that would change the actual content of the document, which is bad. In your example, all lines have a style. So maybe the author of the original document wants all lines to have a specific appearance. The optimize_width() method actually cuts the document: if you style a row or column to be "blue background", you can you can obtain such a big "number-rows-repeated". The optimize_width() method will cut the document when there is no real content below except the styles. But the document is modified: this must be a deliberate choice of the user. And, maybe you really want to edit the 100,000th line. |
Yeah, that's the sticking point, I think. The example in question was just a test document Save As'd from XLSX. The real-world documents I'm handling take about 10 minutes to run on a fairly reasonable amount of records, though I'm wondering if that is affected by every cell including
Yeah, I'm only reading in files to get at their data, so I was trying to avoid this, but it looks like |
I have an ODS file that includes the following silly row at the end (saved from Excel):
Attempting to run
Table.traverse()
on this file (or any of the row functions that rely on it) without first callingTable.optimize_width()
takes an extremely long time for what is essentially just a bunch of blanks.Is this an inherent speed limitation of Python, or can the traverse repeat algorithm be optimized somehow?
The text was updated successfully, but these errors were encountered: