Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzipping large raster corrupts file #335

Open
CeresBarros opened this issue Jun 16, 2023 · 2 comments
Open

Unzipping large raster corrupts file #335

CeresBarros opened this issue Jun 16, 2023 · 2 comments

Comments

@CeresBarros
Copy link
Member

CeresBarros commented Jun 16, 2023

I've been hitting an issue with loading a very large raster and have recently traced it down to an issue with unzipping.

Require("PredictiveEcology/reproducible@f5b0cf1059534b4dcaa40f1cc238fae992112e8b (HEAD)")
mainDir <- tempdir()
options("reproducible.cacheSaveFormat" = "qs",
        "reproducible.useNewDigestAlgorithm" = 2,
        "reproducible.useCache" = TRUE,
        "reproducible.destinationPath" = normPath(file.path(mainDir, "inputs")),
        "reproducible.inputPaths" = normPath(file.path(mainDir, "data")),
        "reproducible.useGDAL" = FALSE,
        "reproducible.useMemoise" = TRUE,
        "reproducible.useTerra" = TRUE,
        "reproducible.rasterRead" = "terra::rast")
        
        
rawBiomassMap <- Cache(prepInputs,
                       url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",
                       targetFile = "CA_forest_total_biomass_2015.tif",
                       archive = "CA_forest_total_biomass_2015_NN.zip",
                       datatype = "INT2U",
                       filename2 = .suffix("rawBiomassMap.tif", "test"),
                       overwrite = TRUE,
                       userTags = c("rawBiomassMap"))

here's the output of a similar call, in which I was passing the to = studyArea and method = "bilinear" arguments (I'm pretty sure the lack of those won't make a difference)

Running preProcess
Preparing: CA_forest_total_biomass_2015.tif
Checking local files...
Finished checking local files.
Checking local files...
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
Appending checksums to CHECKSUMS.txt. If you see this messagePrepInputs repeatedly,
  you can specify targetFile (and optionally alsoExtract) so it knows
  what to look for.
  |======================================================================================================================================| 100%
...downloading...  Downloading https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip ...
Checking local files...
Files found in CHECKSUMS.txt that match by basename; using these.
  User should specify all files (e.g., targetFile, alsoExtract, archive)
  with subfolders specified.
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping extractFromArchive: all needed files now present
Appending checksums to CHECKSUMS.txt. If you see this messagePrepInputs repeatedly,
  you can specify targetFile (and optionally alsoExtract) so it knows
  what to look for.
...using copy in getOption('reproducible.inputPaths')...
Copy of file: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015_NN.zip
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.dat.tif.aux.xml
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.dat.tif.xml
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tfw
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif.ovr, was created at: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015_NN.zip
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.dat.tif.aux.xml
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.dat.tif.xml
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tfw
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif.ovr
targetFile located at F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
Loading object into R
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE, purge = 7): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE, purge = 7): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 
> rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:

When I try a direct rast call on the unzipped .tif I get the same error:

## same error:
rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1)                        
@CeresBarros
Copy link
Member Author

CeresBarros commented Jun 16, 2023

So I manually unzipped the file (on reproducible.inputPaths) and then tried to rerun the call.
Right after manually unzipping, I tried to directly load the raster using rast. That worked:

> rast("F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif")
class       : SpatRaster 
dimensions  : 156966, 193936, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -2660911, 3157169, -851351.9, 3857628  (xmin, xmax, ymin, ymax)
coord. ref. : Lambert_Conformal_Conic_2SP 
source      : CA_forest_total_biomass_2015.tif 
name        : CA_forest_total_biomass_2015 

I then tried to run the same Cache(prepInputs(...)) call and got the same error, but this time there was no unzipping involved (so maybe the problem is happening on file copying/linking between reproducible.inputPaths and reproducible.destinationPath?):

Running preProcess
Preparing: CA_forest_total_biomass_2015.tif
Checking local files...
Finished checking local files.
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping download. All requested files already present
alsoExtract is unspecified; assuming that all files must be extracted
Extracting all files from archive
  Skipping extractFromArchive attempt: no files missing
... copying to getOption('reproducible.inputPaths')...
Copy of file: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif, was created at: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
targetFile located at F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
Loading object into R
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: In prepInputs(url = "https://opendata.nfis.org/downloads/forest_change/CA_forest_total_biomass_2015_NN.zip",     targetFile = "CA_forest_total_biomass_2015.tif", archive = "CA_forest_total_biomass_2015_NN.zip",     to = studyArea, datatype = "INT2U", method = "bilinear",     filename2 = .suffix("rawBiomassMap.tif", paste0("_", SAname)),     overwrite = TRUE): In do.call(theFun, args2): F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 

After the above, I tried to re-load the raster in reproducible.inputPaths again, using rast which failed -- the same happends with the copy in reproducible.destinationPath. This makes me think that somehow the two copies get screwed up by prepInputs/preProcess?

## reproducible.inputPaths copy
rast("F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/Data/CrossProjectRawData/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 

## reproducible.destinationPath copy
rast("F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif")
Error: [rast] cannot open this file as a SpatRaster: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif
In addition: Warning messages:
1: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFFetchDirectory:F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: Can not read TIFF directory count (GDAL error 1) 
2: F:/NEcosystemModelling/R/SpaDES/inputs/CA_forest_total_biomass_2015.tif: TIFFReadDirectory:Failed to read directory at offset 121770457026 (GDAL error 1) 

@CeresBarros
Copy link
Member Author

any news on this front @eliotmcintire ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant