Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS Fargate fluent-bit side-car container exit with code 139 #840

Open
zinho9 opened this issue Jul 2, 2024 · 2 comments
Open

ECS Fargate fluent-bit side-car container exit with code 139 #840

zinho9 opened this issue Jul 2, 2024 · 2 comments

Comments

@zinho9
Copy link

zinho9 commented Jul 2, 2024

Describe the question/issue

Hi , I am using aws fluent-bit image as sidecar for my server container.
I had to send logs to s3 either opensearch ingestion, so I add configs (for OUTPUT section to s3 and opensearch ingestion) and then re-build the aws-for-fluent-bit image.

Here is the config I had added,

[INPUT]
    Name                  forward
    unix_path             /var/run/fluent.sock
    Mem_Buf_Limit         200MB

[OUTPUT]
    Name http
    Match *
    Host
    Port 443
    URI 
    format json
    aws_auth true
    aws_region ap-northeast-2
    aws_service osis
    aws_role_arn 
    tls On
    net.keepalive false

[OUTPUT]
    Name s3
    Match *
    bucket 
    region us-west-2
    total_file_size 50M
    upload_timeout 60m
    use_put_object Off
    static_file_path On
    s3_key_format /$TAG[0]-$TAG[1]-$TAG[2]-$TAG[3]/%Y%m%d/%H%M%S.log
    s3_key_format_tag_delimiters ".-"
    store_dir /var/fluent-bit/state/flb-storage/s3
    compression gzip
    net.keepalive false

And this is my Dockerfile

FROM amazon/aws-for-fluent-bit:stable
ADD ./logDestinations.conf /fluent-bit/alt/fluent-bit.conf

CMD ["/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/alt/fluent-bit.conf"]

Finally, my application logs are successfully send to s3 and opensearch.
But there is some problems.

This fluent-bit container is terminating in 2~4 hours after running.

For debug logs , I added FLB_LOG_LEVEL key while value is debug.
And then I add net.keepalive false in fluentbit config OUTPUT section.

But exit code 139 is not resolved.
This is the logs when fluent-bit container terminated with 139 code.


#1 0x4fd429 in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:147
#2 0x4fe33d in prepare_destroy_conn() at src/flb_upstream.c:466
#3 0x4fe3c0 in prepare_destroy_conn_safe() at src/flb_upstream.c:492
#4 0x4fec1b in cb_upstream_conn_ka_dropped() at src/flb_upstream.c:752
#5 0x4e7a7c in output_thread() at src/flb_output_thread.c:300
#6 0x5002da in step_callback() at src/flb_worker.c:43
#7 0x7f0aaf19b44a in ???() at ???:0
#8 0x7f0aad59552e in ???() at ???:0
#9 0xffffffffffffffff in ???() at ???:0
[2024/07/02 05:36:17] [engine] caught signal (SIGSEGV)
[2024/07/02 05:36:17] [debug] [upstream] KA connection #40 to sts.ap-northeast-2.amazonaws.com:443 has been disconnected by the remote service
[2024/07/02 05:36:17] [debug] [socket] could not validate socket status for #40 (don't worry)
[2024/07/02 05:36:17] [debug] [upstream] drop keepalive connection #-1 to sts.ap-northeast-2.amazonaws.com:443 (keepalive idle timeout)
[2024/07/02 05:35:47] [ info] [output:http:http.0] :443, HTTP status=200
200 OK
[2024/07/02 05:35:47] [debug] [out flush] cb_destroy coro_id=14
[2024/07/02 05:35:47] [debug] [task] destroy task=0x7f0aa6a91a80 (task_id=0)
[2024/07/02 05:35:47] [debug] [upstream] KA connection #40 to sts.ap-northeast-2.amazonaws.com:443 is now available
[2024/07/02 05:35:46] [debug] [http_client] not using http_proxy for header
[2024/07/02 05:35:46] [debug] [aws_credentials] Retrieving credentials from the HTTP provider..
[2024/07/02 05:35:46] [debug] [http_client] not using http_proxy for header
[2024/07/02 05:35:46] [debug] [output:http:http.0] signing request with AWS Sigv4
[2024/07/02 05:35:46] [debug] [aws_credentials] Requesting credentials from the STS provider..
[2024/07/02 05:35:46] [debug] [aws_credentials] STS Provider: Refreshing credential cache.
[2024/07/02 05:35:46] [debug] [aws_credentials] Calling STS..
[2024/07/02 05:35:46] [debug] [task] created task=0x7f0aa6a91a80 id=0 OK
[2024/07/02 05:35:46] [debug] [output:http:http.0] task_id=0 assigned to thread #1
[2024/07/02 05:35:46] [debug] [output:s3:s3.1] task_id=0 assigned to thread #0
[2024/07/02 05:35:46] [debug] [out flush] cb_destroy coro_id=29
[2024/07/02 05:35:45] [debug] [input chunk] update output instances with new chunk size diff=938, records=1, input=forward.0
[2024/07/02 05:35:41] [debug] [output:s3:s3.1] Running upload timer callback (cb_s3_upload)..

Please check my problem , help me..

Configuration

Fluent Bit Log Output

Fluent Bit Version Info

Cluster Details

Application Details

Steps to reproduce issue

Related Issues

@wkozak-eh
Copy link

I am actually experiencing the same issue, but in my case the container exits 1min after it's created with no other logs other than the fluent-bit startup log.

@sauravsa21
Copy link

sauravsa21 commented Oct 4, 2024

I am also facing the same issue we are using opensearch ingestion pipeline for ingesting logs to opensearch serverless. We are sending logs from side car container to opensearch ingestion pipeline and then ingestion pipeline send logs to opensearch serverless and the container not stopping immediately it is stopping after 2-4 hrs before that it's working fine.

Using Image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants