-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[File upload] Add file upload x-pack plugin #32945
Conversation
💚 Build Succeeded |
💔 Build Failed |
💔 Build Failed |
* you may not use this file except in compliance with the Elastic License. | ||
*/ | ||
|
||
export const MAX_BYTES = 104857600; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100mB is a rather large default, we read the entire request into memory and then duplicate it's contents to another variable before doing the bulk insert in Elasticsearch, this will require 200mB additional heap. We'll likely want to make this smaller by default and configurable, or split this up into multiple bulk insert operations (which would alleviate these constraints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reduce it to 10 MB initially and adjust as needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable to me.
import { once } from 'lodash'; | ||
|
||
const callWithRequest = once((server) => { | ||
const cluster = server.plugins.elasticsearch.createCluster('fileupload'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using a dedicated cluster here as opposed to the 'data' cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a historical decision that I copied over. I'll switch it over to the data cluster
|
||
|
||
function generateId() { | ||
return Math.random().toString(36).substr(2, 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend using a uuid
here instead of our own id generation.
return new SavedObjectsClient(internalRepository); | ||
} | ||
|
||
export async function incrementFileDataVisualizerIndexCreationCount(server: Server): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend emulating what KQL's telemetry endpoint is doing here: https://github.com/elastic/kibana/blob/master/src/legacy/core_plugins/kibana/server/routes/api/kql_telemetry/index.js#L35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, planning to do another pass on telemetry. I'll give this a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the belated follow up! I diverted for some time to the related #34748 which focuses on the front-end portion. I needed this context to determine how well the plugin would work in practice for full round-trips.
I did end up revising the telemetry to be more similar to what kql was doing but I'm certain it could use another pass. I think (hopefully) we're in a reasonable spot now but it's worth nothing that we'll end up iterating on telemetry at least one more round (and maybe more) for #35809.
💔 Build Failed |
💔 Build Failed |
💔 Build Failed |
@peteharverson Thanks for correcting me and getting @jgowdyelastic and @walterra on board, glad to have you both reviewing as well!
@jgowdyelastic Ah right. Thanks for correcting me there
@jgowdyelastic Done. I actually bumped it to 30 MB. This will likely change in the future but hopefully that number will work well for now. Also @peteharverson, I realized it might not be as configurable on the fly as I'd imagined (very likely by design) as that value is assigned to the config portion of the route at route initialization. @walterra Would love to get your feedback on the telemetry. As I mentioned to Pete above, it will likely change and get more detail on future iterations, but hopefully we're close to being in a good spot for now. General question. Would it be worth creating an issue to move ML file upload over to using this plugin following its merge with master? We could then delete the redundant code from ML. I only ask because this question comes up frequently on the GIS team. We just don't want to introduce redundant code in the form of a plugin that only we use. I'd be happy to create the issue and potentially the PR as well when the time comes. |
💔 Build Failed |
💔 Build Failed |
5f49bc9
to
3e48977
Compare
💔 Build Failed |
💚 Build Succeeded |
@aaronjcaldwell I just created a new issue #35947 to switch the ML file data visualizer over to using the new file upload plugin and remove the redundant code from ML. Once your PR is merged, we will schedule the work in - we are happy to do the changes ourselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code in general LGTM, just added a comment about the telemetry data structure.
}, | ||
filesUploadedByApp: { | ||
...filesUploadedByApp, | ||
[app]: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand how the server collecting the data works, this data structure using app
as part of the field names will require to update the mapping and possibly reindex data every time a new type of app
gets added here. Same for fileType
above. So it should be fine to use this structure, it's just something to be aware of.
To make this more obvious for future developers coming across this it might be worth adding a comment and typing app
and fileType
more strictly for example like app: 'maps' | 'ml'
.
I'd suggest verifying this with the telemetry team because I'm not 100% sure I remember all of this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! I've added some comments above the dynamic fields. Resolution of the docs issue shortly after this should help as well. I've also created an issue for the telemetry team's review.
💔 Build Failed |
retest |
💔 Build Failed |
Pinging @elastic/kibana-gis |
# Conflicts: # x-pack/index.js
💔 Build Failed |
retest |
💚 Build Succeeded |
@kobelb @jgowdyelastic @peteharverson I think we should be in reasonable shape for approval on this PR but please let me know if there's anything else you'd like to see updated or changed! @walterra Still waiting to hear back on telemetry structure but I don't think that should necessarily hold us up in terms of merging the overall plugin. We'll be iterating on telemetry regardless. Let me know what you think! Just a general FYI, I'm not planning to merge this until #34748 is ready to merge as well. It's not so much a blocker as a "context". We absolutely wanted ML's expertise for review on this portion of the plugin, but we don't want to merge the plugin until the Maps UI portion is also approved, at that point we'll merge them both back-to-back. Hopefully that makes sense but happy to discuss if needed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, LGTM 🙌
After some internal discussion, we've elected to move this plugin to a feature branch prior to merge. This feature branch will be merged via #36404 |
Adds file upload as a plugin, original functionality pulled from ML file upload. Accomplishes the following:
Create x-pack plugin for file upload
Create endpoint for file upload
Update telemetry to track apps submitting files and types of files uploaded
Add new telemetry tests
Bubble up ES warnings to users for poorly formatted data submissions