Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient error in pre_run_hook when workflow spawns multiples of same JT #6119

Closed
AlanCoding opened this issue Feb 28, 2020 · 7 comments
Closed

Comments

@AlanCoding
Copy link
Member

AlanCoding commented Feb 28, 2020

ISSUE TYPE
  • Bug Report
SUMMARY
ENVIRONMENT
  • AWX version: X.Y.Z
  • AWX install method: openshift, minishift, docker on linux, docker for mac, boot2docker
  • Ansible version: X.Y.Z
  • Operating System:
  • Web Browser:
STEPS TO REPRODUCE

Create a workflow that looks like this:

Screen Shot 2020-02-27 at 9 53 53 PM

But have TellTelephoneᏤ just do a simple fail: task

have FigureRespond銻 do something trivial that should be successful.

Launch that, but do it several times because this is flaky.

EXPECTED RESULTS

successful JT is always successful

ACTUAL RESULTS

sometimes successful JT is errored. Logs with relevant error:

awx_1        | 2020-02-28 02:42:25,363 DEBUG    awx.main.dispatch task 32b6d512-214c-4868-9371-d52f58f26eb2 starting awx.main.tasks.RunJob(*[86])
awx_1        | 2020-02-28 02:42:25,385 DEBUG    awx.main.models.mixins No credential configured to post back webhook status, skipping.
awx_1        | 2020-02-28 02:42:25,389 DEBUG    awx.main.dispatch publish awx.main.tasks.RunJob(27b30ad6-9ea1-4f56-8caa-5212b27401ab, queue=awx)
awx_1        | 2020-02-28 02:42:25,428 WARNING  awx.main.dispatch scaling up worker pid:2111
awx_1        | 2020-02-28 02:42:25,440 DEBUG    awx.main.dispatch task 27b30ad6-9ea1-4f56-8caa-5212b27401ab starting awx.main.tasks.RunJob(*[87])
awx_1        | 2020-02-28 02:42:25,530 DEBUG    awx.main.models.mixins No credential configured to post back webhook status, skipping.
awx_1        | 2020-02-28 02:42:25,586 DEBUG    awx.main.models.mixins No credential configured to post back webhook status, skipping.
awx_1        | 2020-02-28 02:42:25,592 DEBUG    awx.main.models.mixins No credential configured to post back webhook status, skipping.
awx_1        | 2020-02-28 02:42:25,713 INFO     awx.main.tasks Skipping project sync for job 84 (running) because commit is locally available
awx_1        | 2020-02-28 02:42:25,751 INFO     awx.main.tasks Skipping project sync for job 85 (running) because commit is locally available
awx_1        | 2020-02-28 02:42:25,759 INFO     awx.main.tasks Skipping project sync for job 86 (running) because commit is locally available
awx_1        | 2020-02-28 02:42:25,941 DEBUG    awx.main.models.mixins No credential configured to post back webhook status, skipping.
awx_1        | 2020-02-28 02:42:26,029 ERROR    awx.main.tasks job 85 (running) Exception occurred while running task
awx_1        | Traceback (most recent call last):
awx_1        |   File "/awx_devel/awx/main/tasks.py", line 1275, in run
awx_1        |     self.pre_run_hook(self.instance, private_data_dir)
awx_1        |   File "/awx_devel/awx/main/tasks.py", line 1879, in pre_run_hook
awx_1        |     job.project.scm_type, job_revision
awx_1        |   File "/awx_devel/awx/main/tasks.py", line 2255, in make_local_copy
awx_1        |     source_branch = git_repo.create_head(tmp_branch_name, scm_revision)
awx_1        |   File "/venv/awx/lib/python3.6/site-packages/git/repo/base.py", line 389, in create_head
awx_1        |     return Head.create(self, path, commit, force, logmsg)
awx_1        |   File "/venv/awx/lib/python3.6/site-packages/git/refs/symbolic.py", line 546, in create
awx_1        |     return cls._create(repo, path, cls._resolve_ref_on_create, reference, force, logmsg)
awx_1        |   File "/venv/awx/lib/python3.6/site-packages/git/refs/symbolic.py", line 513, in _create
awx_1        |     ref.set_reference(target, logmsg)
awx_1        |   File "/venv/awx/lib/python3.6/site-packages/git/refs/symbolic.py", line 329, in set_reference
awx_1        |     assure_directory_exists(fpath, is_file=True)
awx_1        |   File "/venv/awx/lib/python3.6/site-packages/git/util.py", line 181, in assure_directory_exists
awx_1        |     os.makedirs(path)
awx_1        |   File "/venv/awx/lib64/python3.6/os.py", line 220, in makedirs
awx_1        |     mkdir(name, mode)
awx_1        | FileExistsError: [Errno 17] File exists: '/var/lib/awx/projects/_71__project_aspectmatter/.git/refs/heads/awx_internal'
awx_1        | 2020-02-28 02:42:26,069 DEBUG    awx.main.tasks job 85 (running) finished running, producing 0 events.
awx_1        | 2020-02-28 02:42:26,200 INFO     awx.main.tasks Skipping project sync for job 87 (running) because commit is locally available
awx_1        | 2020-02-28 02:42:26,375 DEBUG    awx.main.dispatch publish awx.main.tasks.update_inventory_computed_fields(31bd28db-a87d-4314-8a1c-f46e6088a96d, queue=awx_private_queue)
ADDITIONAL INFORMATION
@ryanpetrello
Copy link
Contributor

ryanpetrello commented Mar 13, 2020

@AlanCoding is it fair to call this blocked, since we're waiting on a change to an upstream library (gitpython-developers/GitPython#998)?

@AlanCoding
Copy link
Member Author

Some more history on this:

The original flakiness was frequent flake, like 1 of 3 times.

This is flake that occurs under the same circumstances as that flake, but the actual 1-of-3 flake was due to a totally separate issue of mapping workflow job nodes to WFJT nodes.

This occurs with some yet-unknown lower probability. Low enough that I can't reliably test it via AWX workflow runs.

@AlanCoding
Copy link
Member Author

got merged! We will be able to fix this with an upgrade before too long.

@thibveni
Copy link

I still have this issue :( , like 4 times a week

@AlanCoding
Copy link
Member Author

Well it won't get fixed if the fix is never applied #7860

@kdelee
Copy link
Member

kdelee commented Sep 16, 2020

@shebangbash when you get back from PTO let's work on this together because I think it will be a good learning experience

@kdelee
Copy link
Member

kdelee commented Oct 2, 2020

One of our downstream tests was skipping because of this bug, but is now running and passing. Going to close as verified by this automated test passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants