Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[File upload] Add file upload x-pack plugin #32945

Closed
wants to merge 24 commits into from

Conversation

kindsun
Copy link
Contributor

@kindsun kindsun commented Mar 11, 2019

Adds file upload as a plugin, original functionality pulled from ML file upload. Accomplishes the following:

  • Create x-pack plugin for file upload

  • Create endpoint for file upload

  • Update telemetry to track apps submitting files and types of files uploaded

  • Add new telemetry tests

  • Bubble up ES warnings to users for poorly formatted data submissions

@kindsun kindsun added enhancement New value added to drive a business result v8.0.0 v7.2.0 labels Mar 11, 2019
@kobelb kobelb self-requested a review March 11, 2019 23:05
@kindsun kindsun added the WIP Work in progress label Mar 11, 2019
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

* you may not use this file except in compliance with the Elastic License.
*/

export const MAX_BYTES = 104857600;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100mB is a rather large default, we read the entire request into memory and then duplicate it's contents to another variable before doing the bulk insert in Elasticsearch, this will require 200mB additional heap. We'll likely want to make this smaller by default and configurable, or split this up into multiple bulk insert operations (which would alleviate these constraints).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reduce it to 10 MB initially and adjust as needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me.

import { once } from 'lodash';

const callWithRequest = once((server) => {
const cluster = server.plugins.elasticsearch.createCluster('fileupload');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a dedicated cluster here as opposed to the 'data' cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a historical decision that I copied over. I'll switch it over to the data cluster



function generateId() {
return Math.random().toString(36).substr(2, 9);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using a uuid here instead of our own id generation.

return new SavedObjectsClient(internalRepository);
}

export async function incrementFileDataVisualizerIndexCreationCount(server: Server): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, planning to do another pass on telemetry. I'll give this a look!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the belated follow up! I diverted for some time to the related #34748 which focuses on the front-end portion. I needed this context to determine how well the plugin would work in practice for full round-trips.

I did end up revising the telemetry to be more similar to what kql was doing but I'm certain it could use another pass. I think (hopefully) we're in a reasonable spot now but it's worth nothing that we'll end up iterating on telemetry at least one more round (and maybe more) for #35809.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@kindsun kindsun mentioned this pull request Apr 8, 2019
21 tasks
@kindsun
Copy link
Contributor Author

kindsun commented May 1, 2019

@peteharverson Thanks for correcting me and getting @jgowdyelastic and @walterra on board, glad to have you both reviewing as well!

The MAX_BYTES limit here is applied per upload chunk, therefore it would be possible to upload 100MB in 10 chunks.

@jgowdyelastic Ah right. Thanks for correcting me there

Bumping the limit to 25MB might be safer.

@jgowdyelastic Done. I actually bumped it to 30 MB. This will likely change in the future but hopefully that number will work well for now. Also @peteharverson, I realized it might not be as configurable on the fly as I'd imagined (very likely by design) as that value is assigned to the config portion of the route at route initialization.

@walterra Would love to get your feedback on the telemetry. As I mentioned to Pete above, it will likely change and get more detail on future iterations, but hopefully we're close to being in a good spot for now.

General question. Would it be worth creating an issue to move ML file upload over to using this plugin following its merge with master? We could then delete the redundant code from ML. I only ask because this question comes up frequently on the GIS team. We just don't want to introduce redundant code in the form of a plugin that only we use. I'd be happy to create the issue and potentially the PR as well when the time comes.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@peteharverson
Copy link
Contributor

@aaronjcaldwell I just created a new issue #35947 to switch the ML file data visualizer over to using the new file upload plugin and remove the redundant code from ML. Once your PR is merged, we will schedule the work in - we are happy to do the changes ourselves.

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code in general LGTM, just added a comment about the telemetry data structure.

},
filesUploadedByApp: {
...filesUploadedByApp,
[app]: {
Copy link
Contributor

@walterra walterra May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand how the server collecting the data works, this data structure using app as part of the field names will require to update the mapping and possibly reindex data every time a new type of app gets added here. Same for fileType above. So it should be fine to use this structure, it's just something to be aware of.

To make this more obvious for future developers coming across this it might be worth adding a comment and typing app and fileType more strictly for example like app: 'maps' | 'ml'.

I'd suggest verifying this with the telemetry team because I'm not 100% sure I remember all of this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! I've added some comments above the dynamic fields. Resolution of the docs issue shortly after this should help as well. I've also created an issue for the telemetry team's review.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@kindsun
Copy link
Contributor Author

kindsun commented May 2, 2019

retest

@elasticmachine
Copy link
Contributor

💔 Build Failed

@thomasneirynck thomasneirynck added the [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation label May 3, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-gis

@elasticmachine
Copy link
Contributor

💔 Build Failed

@kindsun
Copy link
Contributor Author

kindsun commented May 7, 2019

retest

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@kindsun
Copy link
Contributor Author

kindsun commented May 7, 2019

@kobelb @jgowdyelastic @peteharverson I think we should be in reasonable shape for approval on this PR but please let me know if there's anything else you'd like to see updated or changed!

@walterra Still waiting to hear back on telemetry structure but I don't think that should necessarily hold us up in terms of merging the overall plugin. We'll be iterating on telemetry regardless. Let me know what you think!

Just a general FYI, I'm not planning to merge this until #34748 is ready to merge as well. It's not so much a blocker as a "context". We absolutely wanted ML's expertise for review on this portion of the plugin, but we don't want to merge the plugin until the Maps UI portion is also approved, at that point we'll merge them both back-to-back. Hopefully that makes sense but happy to discuss if needed!

@kindsun kindsun changed the title Add file upload x-pack plugin [File upload] Add file upload x-pack plugin May 7, 2019
Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, LGTM 🙌

@kindsun
Copy link
Contributor Author

kindsun commented May 10, 2019

After some internal discussion, we've elected to move this plugin to a feature branch prior to merge. This feature branch will be merged via #36404

@kindsun kindsun closed this May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation enhancement New value added to drive a business result v7.2.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants