Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis Dump Permissions Error Cause AWX Exceptions #9401

Closed
danukefl opened this issue Feb 23, 2021 · 10 comments
Closed

Redis Dump Permissions Error Cause AWX Exceptions #9401

danukefl opened this issue Feb 23, 2021 · 10 comments
Assignees
Labels

Comments

@danukefl
Copy link

ISSUE TYPE
  • Bug Report
SUMMARY

On a fresh install of AWX 17.0.1, Redis returns an exception preventing most functionality, including job execution after an uptime of a few minutes.

ENVIRONMENT
  • AWX version: 17.0.1
  • AWX install method: Ansible playbook into Kubernetes cluster
  • Ansible version: 2.9.17
  • Operating System: Debian Linux
  • Web Browser: Firefox
STEPS TO REPRODUCE

Ansible-playbook install to Kubernetes cluster only modifying the cluster information, ingress, and credential values as required. Postgres was setup through the playbook using Helm.

One sign in, credentials were changed, Gitlab credential added, Gitlab Project added, all project sync jobs fail with no Output display and status of "Failed". Relaunching the job provides and error with POST /api/v2/projects/6/update/ 500

Restart the AWX deployment and jobs are successful for a few minutes until Redis errors occur after approximately 5 minutes when background saving starts.

EXPECTED RESULTS

Project sync job to execute successfully

ACTUAL RESULTS

All jobs fail, some settings cannot be configured, various errors throughout all of AWX

ADDITIONAL INFORMATION

Based on search results for the error relating to Redis, in redis-cli the command config set stop-writes-on-bgsave-error no should be executed since there isn't the need for data persistence for AWX.

awx-redis

1:M 23 Feb 2021 22:22:41.003 * 100 changes in 300 seconds. Saving...
1:M 23 Feb 2021 22:22:41.004 * Background saving started by pid 524
524:C 23 Feb 2021 22:22:41.005 # Failed opening the RDB file dump.rdb (in server root dir /data) for saving: Permission denied
1:M 23 Feb 2021 22:22:41.106 # Background saving error

awx-task

2021-02-23 22:39:20,203 ERROR    awx.main.commands.run_callback_receiver encountered an error communicating with redis
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 56, in read
    res = self.redis.blpop(settings.CALLBACK_QUEUE, timeout=1)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/redis/client.py", line 1865, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/redis/client.py", line 878, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/redis/client.py", line 892, in parse_response
    response = connection.read_response()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/redis/connection.py", line 752, in read_response
    raise response
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

awx-web

2021-02-23 22:40:07,246 ERROR    django.request Internal Server Error: /api/v2/projects/6/update/
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/transaction.py", line 284, in __exit__
    connection.set_autocommit(True)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 410, in set_autocommit
    self.run_and_clear_commit_hooks()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 636, in run_and_clear_commit_hooks
    func()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/unified_jobs.py", line 1264, in <lambda>
    connection.on_commit(lambda: self._websocket_emit_status(status))
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/unified_jobs.py", line 1254, in _websocket_emit_status
    emit_channel_notification('jobs-status_changed', status_data)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/consumers.py", line 236, in emit_channel_notification
    "text": payload_dumped
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/consumers.py", line 211, in run_sync
    event_loop.run_until_complete(func)
  File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/channels_redis/core.py", line 650, in group_send
    key, min=0, max=int(time.time()) - self.group_expiry
aioredis.errors.ReplyError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
@Gabisonfire
Copy link

Gabisonfire commented Feb 24, 2021

Having the same issue on 17.0.1.
Downgrading redis to 6.0.10 seems to fix the issue.

Upddate: The issue came back after a restart at some point

@danukefl
Copy link
Author

Having the same issue on 17.0.1.
Downgrading redis to 6.0.10 seems to fix the issue.

I was able to do this, but I also tried adding stop-writes-on-bgsave-error no to the redis.conf configmap template, redeployed, and appears to also work. The errors are still present in the log but is no longer stopping AWX from functioning. I had an older test Docker-compose instance of AWX that was using 6.0.10 and looking at the logs, it was not running the snapshot/backup job . I upgraded the Docker Redis image to 6.2 and seem to be fine but can see the snapshot/backup job running.

To me it looks like Redis is executing the RDB snapshots by default whereas before it did not.

@shanemcd
Copy link
Member

Hi folks. Are you by chance using this option? Or maybe your Kubernetes clusters are forcing running containers as a non-root user? I noticed that the entrypoint for redis uses gosu to switch users inside of the container. This might be part of the problem.

@Tioborto
Copy link
Contributor

Same error for us after a restart.
Fixed by adding redis_security_context_enabled=False to the inventory.
By disable this option, image will be run using user 999 (redis) instead of 1001

@danukefl
Copy link
Author

The entrypoint for redis is the same between 6.0 and 6.2 but nothing stands out in the change log to me of what changed to cause it unless its tied to one of the security updates. I'm no Redis expert though.

I was able to replicate the issue using the same redis.conf in a bare redis container by setting the run-as user to anything other than "999" on all three versions that I tested. 6.0.10, 6.0.11, and 6.2.0.

@danukefl
Copy link
Author

@Tioborto I modified the setting and ran it on a new install but by only changing redis_security_context_enabled=false I received permission denied errors connecting to redis.sock. Sounds like you may have some other configuration differences?

@Tioborto
Copy link
Contributor

Hello @danukefl,
We have only change this parameter. For your information, we are in 15.0.0 of AWX.
After that, in our deployment we have this for redis container :

securityContext:
        fsGroup: 0

@Gabisonfire
Copy link

For me @danukefl 's solution works so far, adding stop-writes-on-bgsave-error no to redis.conf. (In the awx-config configmap). Still having the errors in the logs but AWX works fine.

@kristofre
Copy link

I am also having success with @danukefl's solution.

@ryanpetrello
Copy link
Contributor

👋 Hey everyone, this should be resolved in the latest version of AWX (17.1.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants