Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed resources import #5909

Merged
merged 86 commits into from
Jun 2, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
cc6337d
Apply fixes
Marishka17 Mar 22, 2023
e82e5e0
Remove tmp_file_descriptor
Marishka17 Mar 23, 2023
2b504d3
Add result_ttl && failure_ttl
Marishka17 Mar 23, 2023
549911b
Revert on_success on_failure usage due to their incorrect work
Marishka17 Mar 23, 2023
f99512d
Fix creating resources from backups with the same names at the same time
Marishka17 Mar 24, 2023
da4be88
Fix the last problem with uploading annotations
Marishka17 Mar 24, 2023
afed207
[schema] Fix backup import
Marishka17 Mar 24, 2023
362a138
Fix rest api tests && server schema
Marishka17 Mar 27, 2023
930e7ca
Fix linters
Marishka17 Mar 27, 2023
7cc12d3
Revert used djnago-rq/rq versions
Marishka17 Mar 27, 2023
c8f808e
Fix server tests
Marishka17 Mar 27, 2023
29106ae
Fix method
Marishka17 Mar 27, 2023
6307318
Fix black issue
Marishka17 Mar 27, 2023
7f44aba
Update DatasetUploader
Marishka17 Mar 27, 2023
c4848a0
Add test for import task annotations
Marishka17 Mar 27, 2023
5e196e4
Fix typo
Marishka17 Mar 28, 2023
1c8bde8
Fix import job annotations from CS
Marishka17 Mar 28, 2023
a62a4e1
Fix test
Marishka17 Mar 28, 2023
f192792
Resolve conflicts
Marishka17 Mar 28, 2023
da08dfd
Update cvat/apps/engine/mixins.py
Marishka17 Mar 31, 2023
bc00af8
Resolve conflicts
Marishka17 May 8, 2023
d5c1614
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 8, 2023
3d10ae9
Move method
Marishka17 May 9, 2023
1521b19
COMMON_IMPORT_RQ_ID_TEMPLATE -> get_import_rq_id
Marishka17 May 9, 2023
2f79269
Apply part of comments
Marishka17 May 9, 2023
38f1730
Apply comments
Marishka17 May 11, 2023
a5f51ef
Fix schema
Marishka17 May 12, 2023
047b5b8
Rename function
Marishka17 May 12, 2023
e4b669b
Update test
Marishka17 May 12, 2023
9e84cf5
Fix linters
Marishka17 May 12, 2023
1cdc527
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 May 12, 2023
6b1a387
Fix project dataset import
Marishka17 May 12, 2023
cbc384b
Increase failed rq job lifetime
Marishka17 May 12, 2023
6b653e8
Use supported symbol
Marishka17 May 12, 2023
79d8e8c
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 May 12, 2023
e75d4f0
Fix tests
Marishka17 May 12, 2023
7d4b7b0
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 12, 2023
70489dc
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 May 12, 2023
dff240b
Add log files after failed helm tests
Marishka17 May 14, 2023
8fdab1f
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 14, 2023
4bb9e85
Fix missing parameter
Marishka17 May 14, 2023
ef7dc37
Update cvat/apps/engine/views.py
Marishka17 May 15, 2023
b616024
Add handler for deleting cache after interrupted import
Marishka17 May 19, 2023
5532142
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 19, 2023
39e530a
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 19, 2023
f894298
Add missed file
Marishka17 May 19, 2023
4a099c6
Fix attempt to get exec info from non-existent dependent job
Marishka17 May 19, 2023
1866346
Increase timeout
Marishka17 May 19, 2023
24831c3
Update datumaro version
Marishka17 May 19, 2023
d0169d7
Add endpoints description
Marishka17 May 19, 2023
857118b
Wrap datumaro errors to CVATImportError
Marishka17 May 19, 2023
59b1d0f
t
Marishka17 May 19, 2023
81813c0
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 19, 2023
2c253fa
Update server schema
Marishka17 May 19, 2023
a764791
Add comment
Marishka17 May 19, 2023
cbe4436
Refactoring && debug
Marishka17 May 19, 2023
af76504
Apply linters comments
Marishka17 May 19, 2023
b394620
Fix black
Marishka17 May 19, 2023
133de8d
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 May 19, 2023
3caa8d8
Resolve conflicts
Marishka17 May 22, 2023
c94c2a4
Fix merge
Marishka17 May 22, 2023
7e7c351
Fix helm
Marishka17 May 22, 2023
c0b0c38
Skip test for helm
Marishka17 May 22, 2023
c9fb2fd
Resolve conflicts
Marishka17 May 22, 2023
7fbc5e7
Update code to match develop
Marishka17 May 23, 2023
aae3170
debug
Marishka17 May 23, 2023
b700bde
Fix black issue
Marishka17 May 23, 2023
06d8f2e
Fix values.tests.yaml
Marishka17 May 23, 2023
ec2574e
t
Marishka17 May 23, 2023
953f577
Fix helm
Marishka17 May 24, 2023
8f94f70
Update requirements
Marishka17 May 24, 2023
ab678cd
keep only datumaro
Marishka17 May 24, 2023
123afca
Fix requirements
Marishka17 May 24, 2023
e22d576
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 May 25, 2023
f6b3d52
fix
Marishka17 May 26, 2023
46760f3
Merge branch 'mk/fix_issue_5773' of https://github.com/opencv/cvat in…
Marishka17 May 26, 2023
cb7bee5
Resolve conflicts
Marishka17 May 29, 2023
555cc31
Use docker_exec_cvat and kube_exec_cvat
Marishka17 May 29, 2023
cc1a581
d
Marishka17 May 29, 2023
9b2167d
Fix tests
Marishka17 May 29, 2023
2e715da
Rename RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER to IMPORT_CACHE_CLEAN_DELAY
Marishka17 May 29, 2023
c42faf8
Try to reduce timeout
Marishka17 May 29, 2023
8bf7f42
Fix isort issue
Marishka17 May 29, 2023
335dbd2
increase timeout
Marishka17 May 29, 2023
bf99c53
Merge branch 'develop' into mk/fix_issue_5773
Marishka17 Jun 1, 2023
6518376
Update changelog
Marishka17 Jun 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add handler for deleting cache after interrupted import
  • Loading branch information
Marishka17 committed May 19, 2023
commit b616024df07bd447343b7897ed5437042e2c2d72
23 changes: 23 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,28 @@
"env": {},
"console": "internalConsole"
},
{
"name": "server: RQ - cleaning",
"type": "python",
"request": "launch",
"stopOnEntry": false,
"justMyCode": false,
"python": "${command:python.interpreterPath}",
"program": "${workspaceRoot}/manage.py",
"args": [
"rqworker",
"cleaning",
"--worker-class",
"cvat.rqworker.SimpleWorker"
],
"django": true,
"cwd": "${workspaceFolder}",
"env": {
"DJANGO_LOG_SERVER_HOST": "localhost",
"DJANGO_LOG_SERVER_PORT": "8282"
},
"console": "internalConsole"
},
{
"name": "server: migrate",
"type": "python",
Expand Down Expand Up @@ -433,6 +455,7 @@
"server: RQ - annotation",
"server: RQ - webhooks",
"server: RQ - scheduler",
"server: RQ - cleaning",
"server: git",
]
}
Expand Down
22 changes: 22 additions & 0 deletions cvat/apps/engine/handlers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

from pathlib import Path
from time import time
from django.conf import settings
from cvat.apps.engine.log import slogger


def clear_import_cache(path: Path, creation_time: float) -> None:
"""
This function checks and removes the import files if they have not been removed from rq import jobs.
This means that for some reason file was uploaded to CVAT server but rq import job was not created.

Args:
path (Path): path to file
creation_time (float): file creation time
"""
if path.is_file() and (time() - creation_time + 1) >= settings.RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER.total_seconds():
path.unlink()
slogger.glob.warning(f"The file {str(path)} was removed from cleaning job.")
24 changes: 21 additions & 3 deletions cvat/apps/engine/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,22 @@
import json
import os
import uuid
import django_rq
from dataclasses import asdict, dataclass
from distutils.util import strtobool
from pathlib import Path
from tempfile import NamedTemporaryFile
from unittest import mock

import django_rq
from django.conf import settings
from rest_framework import mixins, status
from rest_framework.response import Response
from tempfile import NamedTemporaryFile

from cvat.apps.engine.location import StorageType, get_location_configuration
from cvat.apps.engine.log import slogger
from cvat.apps.engine.models import Location
from cvat.apps.engine.serializers import DataSerializer
from cvat.apps.engine.handlers import clear_import_cache
from cvat.apps.engine.utils import get_import_rq_id


Expand Down Expand Up @@ -229,7 +232,7 @@ def init_tus_upload(self, request):
# we need to create unique temp file here because
# users can try to import backups with the same name at the same time
with NamedTemporaryFile(prefix=f'cvat-backup-{filename}-by-{request.user}', suffix='.zip', dir=self.get_upload_dir()) as tmp_file:
filename = tmp_file.name
filename = os.path.relpath(tmp_file.name, self.get_upload_dir())
metadata['filename'] = filename
file_path = os.path.join(self.get_upload_dir(), filename)
file_exists = os.path.lexists(file_path) and import_type != 'backup'
Expand Down Expand Up @@ -260,6 +263,21 @@ def init_tus_upload(self, request):
location = request.build_absolute_uri()
if 'HTTP_X_FORWARDED_HOST' not in request.META:
location = request.META.get('HTTP_ORIGIN') + request.META.get('PATH_INFO')

if import_type in ('backup', 'annotations', 'datasets'):
scheduler = django_rq.get_scheduler(settings.CVAT_QUEUES.CLEANING.value)
path = Path(self.get_upload_dir()) / tus_file.filename
cleaning_job = scheduler.enqueue_in(time_delta=settings.RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER,
func=clear_import_cache,
path=path,
creation_time=Path(tus_file.file_path).stat().st_atime
)
slogger.glob.info(
f'The cleaning job {cleaning_job.id} is queued.'
f'The check that the file {path} is deleted will be carried out after '
f'{settings.RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER}.'
)

return self._tus_response(
status=status.HTTP_201_CREATED,
extra_headers={'Location': '{}{}'.format(location, tus_file.file_id),
Expand Down
7 changes: 6 additions & 1 deletion cvat/apps/engine/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,12 @@ def get_list_view_name(model):
}


def get_import_rq_id(resource_type: str, resource_id: int, subresource_type: str, user: str) -> str:
def get_import_rq_id(
resource_type: str,
resource_id: int,
subresource_type: str,
user: str,
) -> str:
# import:<task|project|job>-<id|uuid>-<annotations|dataset|backup>-by-<user>
return f"import:{resource_type}-{resource_id}-{subresource_type}-by-{user}"

Expand Down
13 changes: 7 additions & 6 deletions cvat/apps/engine/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -2226,9 +2226,6 @@ def _download_file_from_bucket(db_storage, filename, key):
def _import_annotations(request, rq_id_template, rq_func, db_obj, format_name,
filename=None, location_conf=None, conv_mask_to_poly=True):

def _is_there_past_non_deleted_job():
return rq_job and request.method == 'POST' and (rq_job.is_finished or rq_job.is_failed)

format_desc = {f.DISPLAY_NAME: f
for f in dm.views.get_import_formats()}.get(format_name)
if format_desc is None:
Expand All @@ -2248,9 +2245,13 @@ def _is_there_past_non_deleted_job():
if rq_id_should_be_checked and rq_id_template.format(db_obj.pk, request.user) != rq_id:
return Response(status=status.HTTP_403_FORBIDDEN)

if _is_there_past_non_deleted_job():
rq_job.delete()
rq_job = queue.fetch_job(rq_id)
if rq_job and request.method == 'POST':
# If there is a previous job that has not been deleted
if rq_job.is_finished or rq_job.is_failed:
rq_job.delete()
rq_job = queue.fetch_job(rq_id)
else:
return Response(status=status.HTTP_409_CONFLICT, data='Import job already exists')

if not rq_job:
# If filename is specified we consider that file was uploaded via TUS, so it exists in filesystem
Expand Down
8 changes: 8 additions & 0 deletions cvat/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ class CVAT_QUEUES(Enum):
AUTO_ANNOTATION = 'annotation'
WEBHOOKS = 'webhooks'
NOTIFICATIONS = 'notifications'
CLEANING = 'cleaning'

RQ_QUEUES = {
CVAT_QUEUES.IMPORT_DATA.value: {
Expand Down Expand Up @@ -327,6 +328,12 @@ class CVAT_QUEUES(Enum):
'DB': 0,
'DEFAULT_TIMEOUT': '1h'
},
CVAT_QUEUES.CLEANING.value: {
'HOST': 'localhost',
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': '1h'
},
}

NUCLIO = {
Expand Down Expand Up @@ -663,3 +670,4 @@ class CVAT_QUEUES(Enum):

IMPORT_CACHE_FAILED_TTL = timedelta(days=90).total_seconds()
IMPORT_CACHE_SUCCESS_TTL = timedelta(hours=1).total_seconds()
RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER = timedelta(hours=2)
zhiltsov-max marked this conversation as resolved.
Show resolved Hide resolved
7 changes: 7 additions & 0 deletions cvat/settings/rest-api-testing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2023 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

from .production import *

RUN_CLEAN_IMPORT_CACHE_FUNC_AFTER = timedelta(seconds=10)
22 changes: 22 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,28 @@ services:
networks:
- cvat

cvat_worker_cleaning:
container_name: cvat_worker_cleaning
image: cvat/server:${CVAT_VERSION:-dev}
restart: always
depends_on:
- cvat_redis
- cvat_db
environment:
CVAT_REDIS_HOST: 'cvat_redis'
CVAT_POSTGRES_HOST: 'cvat_db'
DJANGO_LOG_SERVER_HOST: vector
DJANGO_LOG_SERVER_PORT: 80
no_proxy: clickhouse,grafana,vector,nuclio,opa,${no_proxy:-}
NUMPROCS: 1
command: -c supervisord/worker.cleaning.conf
volumes:
- cvat_data:/home/django/data
- cvat_keys:/home/django/keys
- cvat_logs:/home/django/logs
networks:
- cvat

cvat_ui:
container_name: cvat_ui
image: cvat/ui:${CVAT_VERSION:-dev}
Expand Down
33 changes: 33 additions & 0 deletions supervisord/worker.cleaning.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[unix_http_server]
file = /tmp/supervisord/supervisor.sock

[supervisorctl]
serverurl = unix:///tmp/supervisord/supervisor.sock


[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisord]
nodaemon=true
logfile=%(ENV_HOME)s/logs/supervisord.log ; supervisord log file
logfile_maxbytes=50MB ; maximum size of logfile before rotation
logfile_backups=10 ; number of backed up logfiles
loglevel=debug ; info, debug, warn, trace
pidfile=/tmp/supervisord/supervisord.pid ; pidfile location
childlogdir=%(ENV_HOME)s/logs/ ; where child log files will live

[program:ssh-agent]
command=bash -c "rm /tmp/ssh-agent.sock -f && /usr/bin/ssh-agent -d -a /tmp/ssh-agent.sock"
priority=1
autorestart=true

[program:rqworker_cleaning]
command=%(ENV_HOME)s/wait-for-it.sh %(ENV_CVAT_REDIS_HOST)s:6379 -t 0 -- bash -ic " \
exec python3 %(ENV_HOME)s/manage.py rqworker -v 3 cleaning \
--worker-class cvat.rqworker.DefaultWorker \
"
environment=SSH_AUTH_SOCK="/tmp/ssh-agent.sock",VECTOR_EVENT_HANDLER="SynchronousLogstashHandler"
numprocs=%(ENV_NUMPROCS)s
process_name=rqworker_cleaning_%(process_num)s

33 changes: 31 additions & 2 deletions tests/python/rest_api/test_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1523,6 +1523,7 @@ def test_can_report_correct_completed_jobs_count(tasks, jobs, admin_user):
assert task.jobs.completed == 1


@pytest.mark.usefixtures("restore_cvat_data")
class TestImportTaskAnnotations:
zhiltsov-max marked this conversation as resolved.
Show resolved Hide resolved
def _make_client(self) -> Client:
return Client(BASE_URL, config=Config(status_check_period=0.01))
Expand All @@ -1549,8 +1550,8 @@ def _delete_annotations(self, task_id):
(_, response) = api_client.tasks_api.destroy_annotations(id=task_id)
assert response.status == HTTPStatus.NO_CONTENT

@pytest.mark.parametrize("task_id", [20])
def test_can_import_annotations_after_previous_unclear_import(self, task_id):
def test_can_import_annotations_after_previous_unclear_import(self, tasks_with_shapes):
task_id = tasks_with_shapes[0]["id"]
self._check_annotations(task_id)

filename = self.tmp_dir / f"task_{task_id}_coco.zip"
Expand Down Expand Up @@ -1582,3 +1583,31 @@ def test_can_import_annotations_after_previous_unclear_import(self, task_id):
self._delete_annotations(task_id)
task.import_annotations(self.format, filename)
self._check_annotations(task_id)

@pytest.mark.timeout(40)
def test_can_import_annotations_after_previous_interrupted_upload(self, tasks_with_shapes):
task_id = tasks_with_shapes[0]["id"]
self._check_annotations(task_id)

filename = self.tmp_dir / f"task_{task_id}_coco.zip"
task = self.client.tasks.retrieve(task_id)
task.export_dataset(self.format, filename, include_images=False)
self._delete_annotations(task_id)

params = {"format": self.format, "filename": filename.name}
url = self.client.api_map.make_endpoint_url(
self.client.api_client.tasks_api.create_annotations_endpoint.path
).format(id=task_id)

uploader = Uploader(self.client)
uploader._tus_start_upload(url, query_params=params)
uploader._upload_file_data_with_tus(
url, filename, meta=params, logger=self.client.logger.debug
)
sleep(30)
assert not int(
subprocess.check_output(
f'docker exec -it test_cvat_server_1 bash -c "ls data/tasks/{task_id}/tmp | wc -l"',
shell=True,
)
)
4 changes: 2 additions & 2 deletions tests/python/shared/fixtures/init.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
PREFIX = "test"

CONTAINER_NAME_FILES = ["docker-compose.tests.yml"]

OVERRIDE_FILES = ['tests/docker-compose.override.yml']

DC_FILES = [
"docker-compose.dev.yml",
"tests/docker-compose.file_share.yml",
"tests/docker-compose.minio.yml",
"tests/docker-compose.test_servers.yml",
] + CONTAINER_NAME_FILES
] + CONTAINER_NAME_FILES + OVERRIDE_FILES


def pytest_addoption(parser):
Expand Down