Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Streams API support for networking task of PDF.js project. #8617

Merged
merged 1 commit into from
Jul 27, 2017

Conversation

mukulmishra18
Copy link
Contributor

@mukulmishra18 mukulmishra18 commented Jul 5, 2017

This PR aims to add Streams API support for networking task of PDF.js. For the same, network.js file is moved to the main thread and PDFNetwokStream is implemented at worker to talk to main via sendWithStream and/or sendWithPromise method of MessageHandler.

Worker will ask for data(when needed) by sending stream messages to main side. PDF data is returned from the handlers(main to worker) in small chunks.

TODO:

  • enqueue needs to access tranfers parameter

  • check how size and highwatermark work (for now highwatermark = 1 and every chunk size = 1)

@@ -197,6 +197,115 @@ IPDFStreamRangeReader.prototype = {
}

/** @implements {IPDFStream} */
class PDFNetworkStream {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the IRC discussion yesterday, I'm assuming that we shouldn't use class here (or elsewhere), since this code is in src/core!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..., I see. Okay, I will change it to the way we implement class in PDF.js, like: IPDFStream. Although, I still think class looks nice :-)

@@ -197,6 +197,115 @@ IPDFStreamRangeReader.prototype = {
}

/** @implements {IPDFStream} */
class PDFNetworkStream {
constructor(data, msgHandler) {
this._data = data;
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Underscore is usually used to denote a property, or method, that should be considered "private". However, in both PDFNetworkStreamReader and PDFNetworkStreamRangeReader below, you're directly accessing _data (and _msgHandler).
To me, this feels like abusing notation, and it'd be better it you change the properties to this.data = data (and this.msgHandler) here instead. (Another option would be to keep them "private" here, and add getters.)

Copy link
Contributor Author

@mukulmishra18 mukulmishra18 Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I will change it to this.data = data and this.msgHandler = msgHandler.

let sendStreamRequest = ({ stream, chunk, success, reason, }) => {
let sendStreamRequest = ({ stream, chunk, transfers,
success, reason, }) => {
if (transfers) {
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to ask: Why isn't it sufficient to simply always call

this.postMessage({ sourceName, targetName, stream, streamId,
                   chunk, success, reason, }, transfers);

and just let the postMessage method deal with it?
Also, don't you need to add similar code in _processStreamMessage below as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just let the postMessage method deal with it?

Yeah, right. Seems like this.postMessage is doing same thing, that I am trying here. Okay, calling just this.postMessage sounds like a good idea. Thanks.

Also, don't you need to add similar code in _processStreamMessage below as well?

sendStreamResponse mostly used for signaling *_complete messages e.g. pull_complete(to resolve callbacks), and I don't think we are going to send transfers in these types of messages, at least for now. Maybe we require this in future, but not sure. @yurydelendik, do we require this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the concern, but sendStreamRequest will have optional transfers, and I recommend just call this.postMessage({...}, transfers) as mentioned above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking, do we really need same changes in sendStreamResponse? like:

let sendStreamResponse = ({ stream, transfers, success, reason, }) => {
  this.postMessage({ sourceName, targetName, stream, 
                     success, streamId, reason, }, transfers);
};

@yurydelendik
Copy link
Contributor

  1. display/api.js has one more pdfjs/core/network reference that can be removed.

  2. transfers can be used now

-            sink.enqueue(result.value);
+            sink.enqueue(new Uint8Array(result.value), 1, [result.value]);
  1. and notice, the result.value array buffer is wrapped into Uint8Array, which needs to be unwrapped:
   read() {
-    return this._reader.read();
+    return this._reader.read().then(({value, done}) => {
+      if (done) { return {value: undefined, done: true}; }
+      return {value: value.buffer, done: false};
+    });;
   },

}, this);

messageHandler.on('GetRangeReader', function(data, sink) {
this._PDFNetworkStream = new PDFNetworkStream(data.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to create new PDFNetworkStream (and data field can be removed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

this.data = data;
let source = data.source;
this._contentLength = source.length;
this.msgHandler = msgHandler;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's place this.msgHandler = msgHandler; right after the this.data = data; line, to group the properties that are set from the provided parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

this._isStreamingSupported = false;

let readableStream = this._msgHandler.sendWithStream('GetReader',
this._data);
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indentation, let's align this directly under 'GetReader' instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

this._reader = readableStream.getReader();

this._headersReady = this._msgHandler.sendWithPromise('ReaderHeadersReady').
then((data) => {
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Two more spaces of indentation on this line please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

this.onProgress = null;

let readableStream = this._msgHandler.sendWithStream('GetRangeReader',
{ begin, end, });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indentation, let's align this directly under 'GetRangeReader' instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Promise.all([SystemJS.import('pdfjs/core/network'),
SystemJS.import('pdfjs/core/worker')]).then((modules) => {
Promise.resolve(SystemJS.import('pdfjs/core/worker')).
then((modules) => {
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please indent this line with two more spaces, and also change it to then(worker) => { instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

// Enqueue data chunk into sink, and transfer it
// to other side as `Transferable` object.
sink.enqueue(new Uint8Array(value), 1, [value]);
}).catch(function(reason) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an intermediate function here, }).catch(sink.error); should work just as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Thanks for pointing out.

return;
}
sink.enqueue(new Uint8Array(value), 1, [value]);
}).catch(function(reason) {
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an intermediate function here, }).catch(sink.error); should work just as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

let sendStreamRequest = ({ stream, chunk, transfers,
success, reason, }) => {
this.postMessage({ sourceName, targetName, stream, streamId,
chunk, success, reason, }, transfers);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, done with new commit. Thanks.

@@ -28,7 +28,6 @@ importScripts('./shared/compatibility.js');
importScripts('../node_modules/systemjs/dist/system.js');
importScripts('../systemjs.config.js');

Promise.all([SystemJS.import('pdfjs/core/network'),
SystemJS.import('pdfjs/core/worker')]).then(function () {
Promise.resolve(SystemJS.import('pdfjs/core/worker')).then(function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in api.js, you can directly use SystemJS.import without wrapping with Promise.resolve:

SystemJS.import('pdfjs/core/worker').then(function () {...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit. Thanks.

this._contentLength = null;
this._isRangeSupported = false;
this._isStreamingSupported = false;
let queueingStrategy = { highWaterMark: 100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove queueingStrategy from here and range reader, or make explicit highWaterMark:1 and size:()=>1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in new commit. Thanks.

}, this);

messageHandler.on('GetRangeReader', function(data, sink) {
this._rangeReader =
Copy link
Contributor

@yurydelendik yurydelendik Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to let rangeReader it must be unique for every new range request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@yurydelendik yurydelendik changed the title [WIP]Adds Streams API support for networking task of PDF.js project. Adds Streams API support for networking task of PDF.js project. Jul 7, 2017
if (initialData && initialData.length > 0) {
this._queuedChunks.push(initialData);
}
this._msgHandler = msgHandler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this._msgHandler and msgHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, done with new commit.

this._networkStream = new PDFWorkerStream(data.source);
} else {
this._networkStream = new PDFNetworkStream(data);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this code to the WorkerTransport constructor and remove this.networkStreamCapability promise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with new commit. Thanks.

@@ -1522,6 +1522,14 @@ var WorkerTransport = (function WorkerTransportClosure() {
this.destroyCapability = null;
this._passwordCapability = null;

this._networkStream = null;
this._fullReader = null;
if (pdfDataRangeTransport /* === true = chunkedViewerLoading */) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The comment isn't completely easy to read at first, and furthermore I'm not sure how much value it adds!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed with new commit. Thanks.

@mukulmishra18 mukulmishra18 force-pushed the network-streaming branch 2 times, most recently from 288e9c1 to ec6de1a Compare July 27, 2017 19:22
@yurydelendik
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @yurydelendik received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f0e8b6d03703f3d/output.txt

@yurydelendik
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @yurydelendik received. Current queue size: 1

Live output at: http://54.67.70.0:8877/31567a9079e83b3/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://54.215.176.217:8877/02ee71ae88e44dc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/f0e8b6d03703f3d/output.txt

Total script time: 2.32 mins

Published

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/31567a9079e83b3/output.txt

Total script time: 16.58 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/02ee71ae88e44dc/output.txt

Total script time: 29.32 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

network.js file moved to main thread and `PDFNetworkStream` implemented
at worker thread, that is used to ask for data whenever worker needs.
@yurydelendik
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @yurydelendik received. Current queue size: 1

Live output at: http://54.67.70.0:8877/b49b2e015836bb8/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/b49b2e015836bb8/output.txt

Total script time: 2.32 mins

Published

@yurydelendik
Copy link
Contributor

Thank you for the patch.

@yurydelendik yurydelendik merged commit 343b4dc into mozilla:master Jul 27, 2017
@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jul 28, 2017

It seems that this patch may have caused some weird behaviour in the Firefox addon with disableAutoFetch = true, as evident by strange loading bar behaviour when opening http://mirrors.ctan.org/info/lshort/english/lshort.pdf#disableStream=true&disableAutoFetch=true.[1]

Looking at data received through onProgress, it seems that with this patch the entire PDF file is fetched at once thus bypassing the disableAutoFetch setting. When doing skip-cache reloads, I'm getting percent equal to (in order) 101, 100, 101 when setting a break-point in PDFViewerApplication.progress.


[1] It may be necessary to do "skip-cache" reloads, i.e. Ctrl+F5 to ensure that the PDF data isn't cached.

@yurydelendik
Copy link
Contributor

It seems that this patch may have caused some weird behaviour in the Firefox addon

I was encountering with behavior before: the initial XHR fetches quickly entire file when cached. I think it's normal for small files. We just need to pay attention if data and events is handled properly in chunks and streams.

@Snuffleupagus
Copy link
Collaborator

I was encountering with behavior before: the initial XHR fetches quickly entire file when cached.

Huh, I'm seeing this when doing skip-cache reloads (as indicated above), so I don't think caching should be relevant then!?

@yurydelendik
Copy link
Contributor

yurydelendik commented Jul 28, 2017

Not sure I'm getting the same behavior locally: PDFViewerApplication.progress reports proper percent values in ascending order (that start from 4).

@Snuffleupagus
Copy link
Collaborator

Not sure I'm getting the same behavior locally: PDFViewerApplication.progress reports proper percent values in ascending order.

I can confirm that it works locally (with gulp server), but I'm seeing the weird behaviour with the Firefox addon.

@yurydelendik
Copy link
Contributor

yurydelendik commented Jul 28, 2017

but I'm seeing the weird behaviour with the Firefox addon.

FWIW I'm testing on Nightly (e10s) with current development addon. (non-e10s is unusably broken)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants