Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Milvus standalone container crashes automatically #33782

Open
1 task done
hoangph3 opened this issue Jun 12, 2024 · 8 comments
Open
1 task done

[Bug]: Milvus standalone container crashes automatically #33782

hoangph3 opened this issue Jun 12, 2024 · 8 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@hoangph3
Copy link

hoangph3 commented Jun 12, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.2.12
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): no
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.15
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 24 cores, 64GB
- GPU: No
- Others:

Current Behavior

When i run docker-compose up -d with compose file:

version: "3.0"

services:
  etcd:
    container_name: milvus_etcd
    image: hoangph3/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /opt/db/milvus/etcd:/etcd
    command: etcd -advertise-client-urls=http:/127.0.0.1:2379 -listen-client-urls http:/0.0.0.0:2379 --data-dir /etcd
    restart: always
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus_minio
    image: hoangph3/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - /opt/db/milvus/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http:/localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  milvus:
    image: hoangph3/milvus:v2.2.12
    container_name: milvus_standalone
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /opt/db/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http:/localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3

After some seconds, the container crashes automatically and restarting, because i set restart: always.
I am try on other machine, it's working.
What's going on? I have attached the log image to this issue below.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

Restarting:
image

Milvus logs before it crashes:
Screenshot from 2024-06-12 13-26-57
Screenshot from 2024-06-12 10-27-29

Anything else?

No response

@hoangph3 hoangph3 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 12, 2024
@yanliang567
Copy link
Contributor

@hoangph3 could you please retry the issue on latest release v2.4.4? If it reproduced to you, please offer milvus logs for investigation.
For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @hoangph3
/unassign

@sre-ci-robot sre-ci-robot assigned hoangph3 and unassigned yanliang567 Jun 13, 2024
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024
@xiaofan-luan
Copy link
Contributor

what is the reason of start? and fatal/error logs or K8s info?

@hoangph3
Copy link
Author

hoangph3 commented Jun 13, 2024

@yanliang567
It isn't easy to do that because this is a standalone product we deploy on top of our customer infrastructure. To update the Milvus version, we need to create a change request with multiple steps of signing and approving the request.
It's not a big deal; it only consumes time waiting for customer approval. But, can you ensure this problem does not happen in the future after we update the Milvus version to 2.4.4?

@xiaofan-luan
I don't know, I use docker-compose only, not k8s. But when I try it on other machines, it's working.
Are there infrastructure factors that could affect Milvus?

@yanliang567
Copy link
Contributor

please offer the full milvus logs as comments above. @hoangph3

@xiaofan-luan
Copy link
Contributor

@yanliang567 It isn't easy to do that because this is a standalone product we deploy on top of our customer infrastructure. To update the Milvus version, we need to create a change request with multiple steps of signing and approving the request. It's not a big deal; it only consumes time waiting for customer approval. But, can you ensure this problem does not happen in the future after we update the Milvus version to 2.4.4?

@xiaofan-luan I don't know, I use docker-compose only, not k8s. But when I try it on other machines, it's working. Are there infrastructure factors that could affect Milvus?

without enough information, It might be hard for us to give a clue. can you collect logs?

@hoangph3
Copy link
Author

hoangph3 commented Jul 15, 2024

Yes, please help me to fix it.
This problem is recurring on my production environment and currently I cannot restart the service.
milvus_log.zip
Note that the milvus version is v2.2.12.
@yanliang567 @xiaofan-luan

Copy link

stale bot commented Aug 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added stale indicates no udpates for 30 days and removed stale indicates no udpates for 30 days labels Aug 18, 2024
Copy link

stale bot commented Sep 20, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants