Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Compact timeout(~3min) after reinstallation #35088

Open
1 task done
zhuwenxing opened this issue Jul 30, 2024 · 3 comments
Open
1 task done

[Bug]: Compact timeout(~3min) after reinstallation #35088

zhuwenxing opened this issue Jul 30, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4-20240729-d1632070-amd64
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-07-29T07:30:55.246Z] self = <test_all_collections_after_chaos.TestAllCollection object at 0x7f98a7c73040>

[2024-07-29T07:30:55.246Z] collection_name = 'IndexChecker__z573lcNj'

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z]     @pytest.mark.tags(CaseLabel.L1)

[2024-07-29T07:30:55.246Z]     def test_milvus_default(self, collection_name):

[2024-07-29T07:30:55.246Z]         self._connect()

[2024-07-29T07:30:55.246Z]         # create

[2024-07-29T07:30:55.246Z]         name = collection_name if collection_name else cf.gen_unique_str("Checker_")

[2024-07-29T07:30:55.246Z]         t0 = time.time()

[2024-07-29T07:30:55.246Z]         schema = Collection(name=name).schema

[2024-07-29T07:30:55.246Z]         collection_w = self.init_collection_wrap(name=name, schema=schema)

[2024-07-29T07:30:55.246Z]         tt = time.time() - t0

[2024-07-29T07:30:55.246Z]         assert collection_w.name == name

[2024-07-29T07:30:55.246Z]         # get collection info

[2024-07-29T07:30:55.246Z]         schema = collection_w.schema

[2024-07-29T07:30:55.246Z]         dim = cf.get_dim_by_schema(schema=schema)

[2024-07-29T07:30:55.246Z]         int64_field_name = cf.get_int64_field_name(schema=schema)

[2024-07-29T07:30:55.246Z]         float_vector_field_name = cf.get_float_vec_field_name(schema=schema)

[2024-07-29T07:30:55.246Z]         float_vector_field_name_list = cf.get_float_vec_field_name_list(schema=schema)

[2024-07-29T07:30:55.246Z]         # compact collection before getting num_entities

[2024-07-29T07:30:55.246Z]         collection_w.flush(timeout=180)

[2024-07-29T07:30:55.246Z]         collection_w.compact()

[2024-07-29T07:30:55.246Z] >       collection_w.wait_for_compaction_completed(timeout=720)

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] testcases/test_all_collections_after_chaos.py:46: 

[2024-07-29T07:30:55.246Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-07-29T07:30:55.246Z] ../base/collection_wrapper.py:364: in wait_for_compaction_completed

[2024-07-29T07:30:55.246Z]     res = self.collection.wait_for_compaction_completed(timeout, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py:1531: in wait_for_compaction_completed

[2024-07-29T07:30:55.246Z]     return conn.wait_for_compaction_completed(self.compaction_id, timeout=timeout, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:143: in handler

[2024-07-29T07:30:55.246Z]     return func(*args, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:182: in handler

[2024-07-29T07:30:55.246Z]     return func(self, *args, **kwargs)

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:87: in handler

[2024-07-29T07:30:55.246Z]     return func(*args, **kwargs)

[2024-07-29T07:30:55.246Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f98a7bf6520>

[2024-07-29T07:30:55.246Z] compaction_id = 451474099137983097, timeout = 720, kwargs = {}

[2024-07-29T07:30:55.246Z] start = 1722237901.1416473

[2024-07-29T07:30:55.246Z] compaction_state = 

[2024-07-29T07:30:55.246Z] CompactionState

[2024-07-29T07:30:55.246Z]  - compaction id: 451474099137983097

[2024-07-29T07:30:55.246Z]  - State: Executing

[2024-07-29T07:30:55.246Z]  - executing plan number: 1

[2024-07-29T07:30:55.246Z]  - timeout plan number: 0

[2024-07-29T07:30:55.246Z]  - complete plan number: 1

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] end = 1722238140.1978414

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z]     @retry_on_rpc_failure()

[2024-07-29T07:30:55.246Z]     def wait_for_compaction_completed(

[2024-07-29T07:30:55.246Z]         self, compaction_id: int, timeout: Optional[float] = None, **kwargs

[2024-07-29T07:30:55.246Z]     ):

[2024-07-29T07:30:55.246Z]         start = time.time()

[2024-07-29T07:30:55.246Z]         while True:

[2024-07-29T07:30:55.246Z] >           time.sleep(0.5)

[2024-07-29T07:30:55.246Z] E           Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] 

[2024-07-29T07:30:55.246Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:1581: Failed

[2024-07-29T07:30:55.246Z] ------------------------------ Captured log setup ------------------------------

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:44)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:45)

[2024-07-29T07:30:55.246Z] ------------------------------ Captured log call -------------------------------

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.255.229.117', 'port': 19530} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : True  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['IndexChecker__z573lcNj', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <Dat......, kwargs: {'consistency_level': 'Strong'} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : <Collection>:

[2024-07-29T07:30:55.246Z] -------------

[2024-07-29T07:30:55.246Z] <name>: IndexChecker__z573lcNj

[2024-07-29T07:30:55.246Z] <description>: 

[2024-07-29T07:30:55.246Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT:......  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:00 - DEBUG - ci_test]: (api_request)  : [Collection.compact] args: [180], kwargs: {} (api_request.py:62)

[2024-07-29T07:30:55.246Z] [2024-07-29 07:25:01 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-07-29T07:30:55.246Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2024-07-29T07:30:55.246Z] =========================== short test summary info ============================

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[QueryChecker__1Vz7YeZT] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__15Ey8cY6] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__KDS6BxHk] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__QaaIvmLn] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__c275zJyM] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[Checker__ypx67r1N] - Failed: Timeout >240.0s

[2024-07-29T07:30:55.246Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[IndexChecker__z573lcNj] - Failed: Timeout >240.0s

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/2693/pipeline

log:
artifacts-kafka-standalone-reinstall-2693-server-second-deployment-logs.tar.gz

Anything else?

It's possible that it didn't fail, maybe it just took too long to execute.

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 30, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @XuanYang-cn

PTAL

@zhuwenxing zhuwenxing added this to the 2.4.7 milestone Jul 30, 2024
@binbinlv binbinlv added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 30, 2024
@XuanYang-cn
Copy link
Contributor

@zhuwenxing Yes, it finished at the end. The scheduler schedule it until 15:29, while the compaction task generated at 15:25.

There're 10+ compaction tasks waiting to be scheduled at 15:25.
image

@xiaofan-luan
Copy link
Contributor

is this still a problem?

@yanliang567 yanliang567 modified the milestones: 2.4.7, 2.4.8 Aug 12, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.8, 2.4.10 Aug 19, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.10, 2.4.11 Sep 5, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.11, 2.4.12 Sep 18, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.12, 2.4.13 Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants