Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Upload Feature #1902

Merged
merged 54 commits into from
Jan 21, 2023
Merged

File Upload Feature #1902

merged 54 commits into from
Jan 21, 2023

Conversation

pzl
Copy link
Member

@pzl pzl commented Sep 22, 2022

What is the problem this PR solves?

Adding the ability to upload files from integrations, through fleet server, into elasticsearch

Link to Swagger docs for API contract

A followup PR is under way to add this API to the new openapi.yml.

Extracting a few heuristics (file size limit, timeout) into fleet server configs will also come as a followup during scale testing

How does this PR solve the problem?

Adds three HTTP routes,

  • POST /api/fleet/uploads: Begins an "upload operation". Includes the full metadata about a file, with some required fields like name and size
  • PUT /api/fleet/uploads/<uploadID>/<chunkNum>: Uploading a segment (chunk) of the file contents.
  • POST /api/fleet/uploads/<uploadID>: Completes an "upload operation". Fleet server verifies all contents were uploaded, and initially-provided checksums match

How to test this PR locally

These routes are API-Key authorized, and as well deal with file chunking and checksums. It may be difficult to test routes in isolation without a nearly-fully-operational client implementation. Both Endpoint integration and the Agent itself are developing client implementations for this release.

If one were to (for development or checking purposes) disable the API key checking, they may be able to use this gist as a mock file upload client.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (update to openapi.yml spec as a followup)
  • I have made corresponding change to the default configuration files (adding configs coming in followup)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@pzl pzl added enhancement New feature or request 8.6-candidate labels Sep 22, 2022
@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-20T23:38:55.333+0000

  • Duration: 14 min 14 sec

Test stats 🧪

Test Results
Failed 0
Passed 515
Skipped 1
Total 516

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a very quick 1st look here.
I can see that currently the api handers are interacting with upload, and dl to add files to ES.

However I think a cleaner implementation would be to have the ES interactions in upload so that the caller can just call something like upload.Chunk(uplID, chunkID, r.Body) error. What do you think?

internal/pkg/api/handleUpload.go Outdated Show resolved Hide resolved
internal/pkg/api/schema.go Outdated Show resolved Hide resolved
internal/pkg/api/schema.go Outdated Show resolved Hide resolved
internal/pkg/model/schema.go Outdated Show resolved Hide resolved
internal/pkg/upload/doc.go Outdated Show resolved Hide resolved
internal/pkg/upload/upload.go Outdated Show resolved Hide resolved
internal/pkg/upload/upload.go Outdated Show resolved Hide resolved
internal/pkg/upload/upload.go Outdated Show resolved Hide resolved
internal/pkg/upload/upload.go Outdated Show resolved Hide resolved
internal/pkg/upload/upload.go Outdated Show resolved Hide resolved
}
}

func (ut *UploadT) handleUploadStart(zlog *zerolog.Logger, w http.ResponseWriter, r *http.Request) error { //nolint:unparam // log is standard first arg for the handlers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way is to do something like this

func (ut *UploadT) handleUploadStart(_ *zerolog.Logger, w http.ResponseWriter, r *http.Request) error

Copy link
Member

@aleksmaus aleksmaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, assuming the @todos will be addressed before merging.
left few comments/nits, please respond or address.

internal/pkg/cache/cache.go Outdated Show resolved Hide resolved
internal/pkg/cache/cache.go Show resolved Hide resolved
c.mut.RLock()
defer c.mut.RUnlock()

scopedKey := "upload:" + id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe use a function that makes the key string to avoid doing this in two places, easier to make mistake if anything changes.

}

// now ensure all positions are accounted for, no gaps, etc
sort.Slice(chunks, func(i, j int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/nit: could you avoid sorting if the chunks are always selected "ordered by pos" from elasticsearch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can guarantee that they are always sorted correctly by ES. The positional information is not in a sortable field, it is in just a portion of the document's _id. Considering this is a bounded size slice (limited maximum file size), sorting was not out of the question

log.Debug().Int("chunkID", i).Msg("non-final chunk was incorrectly marked last")
return false
}
if chunk.Size != int(info.ChunkSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just have the chunk size defined as int64(or uint64) for both structs and avoid casting? the int is int64 anyways, the fleet server is built for 64bit only atm

internal/pkg/uploader/upload.go Outdated Show resolved Hide resolved
@aleksmaus
Copy link
Member

aleksmaus commented Jan 18, 2023

@cmacknz @michel-laterman would like to have somebody from the agent/fleet-server team to review this as well.

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other than CHANGELOG missing, and few TODOs in the code.

@elastic elastic deleted a comment from mergify bot Jan 19, 2023
@pzl pzl dismissed michel-laterman’s stale review January 19, 2023 02:55

rewritten for multiple fleet server

Comment on lines 231 to 233
// Searches for Upload Metadata document in local memory cache if available
// otherwise, fetches from elasticsearch and caches for next use
func (u *Uploader) GetUploadInfo(ctx context.Context, uploadID string) (upload.Info, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion | Go convention]

Suggested change
// Searches for Upload Metadata document in local memory cache if available
// otherwise, fetches from elasticsearch and caches for next use
func (u *Uploader) GetUploadInfo(ctx context.Context, uploadID string) (upload.Info, error) {
// GetUploadInfo searches for Upload Metadata document in local memory cache if available
// otherwise, fetches from elasticsearch and caches for next use
func (u *Uploader) GetUploadInfo(ctx context.Context, uploadID string) (upload.Info, error) {

Comment on lines 13 to 14
// the only valid values of upload status according to storage spec
type Status string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion | Go convention]

Suggested change
// the only valid values of upload status according to storage spec
type Status string
// Status represents the only valid values of upload status according to storage spec
type Status string

@kevinlog
Copy link

Checked out the latest and tested again e2e, LGTM!

I'm able to get files from the Endpoint host:

image

This also includes testing with Fleet installation of the relevant indices. This includes running open my PR locally to update the mappings needed by the new fleet server implementation. I'll merge this PR soon.

image

@pzl pzl merged commit deaf781 into elastic:main Jan 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6-candidate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants