Skip to content
This repository has been archived by the owner on Jun 27, 2020. It is now read-only.

PDF viewing errors in IE (with byte range requests) #348

Open
seanaery opened this issue Apr 13, 2016 · 2 comments
Open

PDF viewing errors in IE (with byte range requests) #348

seanaery opened this issue Apr 13, 2016 · 2 comments
Labels

Comments

@seanaery
Copy link
Contributor

In Internet Explorer (confirmed in IE10 & IE11), when clicking Download on a PDF item in the DDR and then scrolling the PDF before it has fully loaded in the browser, the document stops loading and displays an error.

screen shot 2016-04-12 at 10 46 01 am

There was an error processing a page. There was a problem reading this document (109).

Ctrl-clicking on "OK" displays: Object label badly formatted.

With the same actions, I have also received these errors:

There was an error processing a page. There was a problem reading this document (135).
There was an error processing a page. There was a problem reading this document (14).
@seanaery seanaery added the bug label Apr 13, 2016
@dchandekstark
Copy link
Member

I believe the problem is most likely caused by samvera/hydra-head#335.

A workaround that seems to be effective is to fixup the response headers in Apache when the client is IE 10 or 11 to reject range requests:

BrowserMatch "MSIE 1[01]" ie_10_or_11
Header always edit Accept-Ranges bytes none env=ie_10_or_11

@seanaery
Copy link
Contributor Author

Documenting some additional troubleshooting on the issue...

IE10 SUCCESSFUL IE PDF STREAM

If bypassing the repository, we can get a PDF in IE piecemeal (in byte ranges) from the filesystem successfully. There are two different kinds of byte range requests (and corresponding responses) at play. Most resemble this, a single byte range:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range:  bytes=1766912-1767689

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 1766912-1767689/1767690
Content-Type: application/pdf

But occasionally (e.g., 13/80 of the requests), the interaction resembles the following; it’s not just a single byte range requested/returned in one request/response, but multiple:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range: bytes=846336-846847, 846848-847359, 847360-847871, 847872-848383, 848384-848895, 848896-849407, 849408-849919, 849920-850431, 850432-850943, 850944-851455, 851456-851967, 851968-852479, 852480-852991, 852992-853503, 853504-854015, 854016-854527, 854528-855039, 855040-855551, 855552-856063, 856064-856575, 856576-857087, 857088-857599, 857600-858111, 858112-858623, 858624-859135, 859136-859647, 859648-860159, 860160-860671, 860672-861183, 861184-861695, 861696-862207, 862208-862719, 862720-863231, 863232-863743, 863744-864255, 864256-864767, 864768-865279, 865280-865791, 865792-866303, 866304-866815, 866816-867327, 867328-867839, 867840-868351

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Type: multipart/byteranges; boundary=5304cce3ce69a22

IE10 BROKEN PDF STREAM VIA HYDRA

Loading the file through the repository, there are the same two different kinds of byte range requests (some with a single range, some with multiple). The responses are incorrect for multi-part byte range requests, so getting the PDF piecemeal fails. Again, most resemble this, a single byte range:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
Range: bytes=2689536-2693631

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 2689536-2693631/2694205
Content-Type: application/pdf

But here’s where it’s problematic: when multiple byte ranges are requested in the same HTTP request:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
**Range: bytes=2693632-2694204, 1579520-1589247**

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
**Content-Range: bytes 2693632-2694204/2694205**
**Content-Type: application/pdf**

The response is incorrect; only the first range of bytes has been returned.

The problem appears to be a combination of 1) how the hydra-head gem parses range requests, and 2) that IE’s native PDF reader uses multipart range requests to begin with. We haven’t observed the problem in other browsers’ PDF readers; they likely all issue only single byte range requests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants