Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Validations are generated one-per-cell instead of per-range #131

Closed
billblume opened this issue Apr 4, 2017 · 2 comments
Closed

Data Validations are generated one-per-cell instead of per-range #131

billblume opened this issue Apr 4, 2017 · 2 comments
Labels
help wanted writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files

Comments

@billblume
Copy link
Contributor

billblume commented Apr 4, 2017

Data Validation elements written by the Xlsx writer are generated one-per-cell instead of by column ranges. This causes two issues:

  1. The generated Xlsx files may be much larger than necessary
  2. Excel has a limit on the number of data validations per worksheet. I don't know what this limit is exactly, but I suspect that it is 64k. If you exceed this limit, excel will drop all data validations for the worksheet.

For us, we had a worksheet with 15,000 rows and 8 data validations spanning 8 different columns. This resulted in 120,000 data validation elements in the Xslx file, causing both issues described above.

Data validations should be generated per range of cells, not per cell.

Here is the fix we implemented for PHPExcel 1.8.1. Given that the data validation generation code is unchanged for PhpSpreadsheet, you should be able to use this fix with few or no changes,

        /* This function is unchanged except for the addition of the call to the new function _mergeDataValidationRanges() */
	private function _writeDataValidations(PHPExcel_Shared_XMLWriter $objWriter = null, PHPExcel_Worksheet $pSheet = null)
	{
		// Datavalidation collection
		$dataValidationCollection = $pSheet->getDataValidationCollection();

		// Write data validations?
		if (!empty($dataValidationCollection)) {
			$dataValidationCollection = $this->_mergeDataValidationRanges($dataValidationCollection);

			$objWriter->startElement('dataValidations');
			$objWriter->writeAttribute('count', count($dataValidationCollection));

			foreach ($dataValidationCollection as $coordinate => $dv) {
				$objWriter->startElement('dataValidation');
                                ...
				$objWriter->writeAttribute('sqref', $coordinate);
                                ...
				$objWriter->endElement();
			}

			$objWriter->endElement();
		}
	}

        /* New function that converts a map of cell-coordinates to data-validations to a map cell-ranges to data-validations. */
	private function _mergeDataValidationRanges(array $dataValidationCollection)
	{
		$hashedDvs = array();

		foreach ($dataValidationCollection as $coord => $dv) {
			list($column, $row) = PHPExcel_Cell::coordinateFromString($coord);
			$row = intval(ltrim($row,'$'));
			$hashCode = $column . '-' . $dv->getHashCode();

			if (! isset($hashedDvs[$hashCode])) {
				$hashedDvs[$hashCode] = (object) array(
					'dv' => $dv,
					'col' => $column,
					'rows' => array($row)
				);
			} else {
				$hashedDvs[$hashCode]->rows[] = $row;
			}
		}

		$mergedDataValidationCollection = array();
		ksort($hashedDvs);

		foreach ($hashedDvs as $hashedDv) {
			sort($hashedDv->rows);
			$rowStart = null;
			$rowEnd = null;
			$ranges = array();

			foreach ($hashedDv->rows as $row) {
				if ($rowStart === null) {
					$rowStart = $row;
					$rowEnd = $row;
				} else if ($rowEnd === $row - 1) {
					$rowEnd = $row;
				} else {
					if ($rowStart == $rowEnd) {
						$ranges[] = $hashedDv->col . $rowStart;
					} else {
						$ranges[] = $hashedDv->col . $rowStart . ':' . $hashedDv->col . $rowEnd;
					}

					$rowStart = $row;
					$rowEnd = $row;
				}
			}

			if ($rowStart !== null) {
				if ($rowStart == $rowEnd) {
					$ranges[] = $hashedDv->col . $rowStart;
				} else {
					$ranges[] = $hashedDv->col . $rowStart . ':' . $hashedDv->col . $rowEnd;
				}
			}

			foreach ($ranges as $range) {
				$mergedDataValidationCollection[$range] = $hashedDv->dv;
			}
		}

		return $mergedDataValidationCollection;
	}
@PowerKiKi
Copy link
Member

This would be a welcomed contribution. Would you have the time to make a PR for it ? ideally including a few unit tests ?

@PowerKiKi PowerKiKi added the writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files label Apr 18, 2017
billblume added a commit to billblume/PhpSpreadsheet that referenced this issue Jul 31, 2017
…er to reduce the size of outputted worksheet.
billblume added a commit to billblume/PhpSpreadsheet that referenced this issue Jul 31, 2017
Added bugfix item (issue PHPOffice#131).
@billblume
Copy link
Contributor Author

I have created Pull Request #193. This PR contains unit tests for the fix.

In this PR, I generalized my merging function and moved it to Cell.php so you can use it for other collections such as conditional formatting and hyperlink collections as well. However, I did not add this function to these other cases since I do not know whether it is safe or needed for these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files
Development

No branches or pull requests

2 participants