Skip to content

Commit

Permalink
Fix corrupt git cache causes "exit status 128"
Browse files Browse the repository at this point in the history
When the stack's git cache repository is in an invalid state, tasks and
deployments will fail indicating that `git: exited with status 128`
until a human intervenes to clear the stack's git cache.

For example, when shipit first creates a stack a background job like
CacheDeploySpecJob and/or FetchDeployedRevisionJob will run shortly
after. These make use of the
StackCommands#with_temporary_working_directory method to populate the
local git cache before performing their primary work. When the initial
clone of the git cache fails the git repository may be in an invalid
state. Subsequent attempts to interact with the git repository will
result in the git program exiting with status 128.

A concrete scenario in which we've seen this surface. We recently
experienced instability due to over-scheduling of memory resources on
one of the nodes in our Kubernetes clusters on which these shipit-engine
jobs run. Some of the sidekiq workers running these jobs were OOMKilled
by the kernel due to congestion / contention on node memory. The result
is that these stacks remain in a state where they did not yet, nor would
they ever be able to, cache their deploy spec. This means they appear
"stuck" until a human realizes the on-disk git cache is broken and
manually schedules a ClearGitCacheJob through the UI for the stack.

The idea of behind this bod of work is to make shipit-engine more
resilient - self-healing - in such cases. The approach enhances
shipit-engine's existing file-system checks of the stack's git cache. If
the git cache is a working git repository, then use it. Otherwise throw
away the broken on-disk git cache and re-clone the repository.
  • Loading branch information
indiebrain committed May 6, 2021
1 parent 12e3c00 commit 00def0c
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 13 deletions.
6 changes: 3 additions & 3 deletions app/models/shipit/stack.rb
Original file line number Diff line number Diff line change
Expand Up @@ -380,15 +380,15 @@ def merge_method
delegate :git_url, to: :repository, prefix: :repo

def base_path
Rails.root.join('data', 'stacks', repo_owner, repo_name, environment)
@base_path ||= Rails.root.join('data', 'stacks', repo_owner, repo_name, environment)
end

def deploys_path
base_path.join("deploys")
@deploys_path ||= base_path.join("deploys")
end

def git_path
base_path.join("git")
@git_path ||= base_path.join("git")
end

def acquire_git_cache_lock(timeout: 15, &block)
Expand Down
26 changes: 22 additions & 4 deletions lib/shipit/stack_commands.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,25 @@ def env

def fetch
create_directories
if Dir.exist?(@stack.git_path)
if valid_git_repository?(@stack.git_path)
git('fetch', 'origin', '--tags', @stack.branch, env: env, chdir: @stack.git_path)
else
@stack.clear_git_cache!
git_clone(@stack.repo_git_url, @stack.git_path, branch: @stack.branch, env: env, chdir: @stack.deploys_path)
end
end

def fetched?(commit)
git_dir = File.join(@stack.git_path, '.git')
if Dir.exist?(git_dir)
if valid_git_repository?(@stack.git_path)
git('rev-parse', '--quiet', '--verify', "#{commit.sha}^{commit}", env: env, chdir: @stack.git_path)
else
Command.new('test', '-d', git_dir, env: env, chdir: @stack.deploys_path)
# When the stack's git cache is not valid, the commit is
# NOT fetched. To keep the interface of this method
# consistent, we must return a Shipit::Command whose #success?
# method returns false - has a non-zero exit status. We utilize
# the POSIX 'test' command with no arguments which should
# always have an exit status of 1.
Command.new('test', env: env, chdir: @stack.deploys_path)
end
end

Expand Down Expand Up @@ -71,6 +77,18 @@ def with_temporary_working_directory(commit: nil)
end
end

def valid_git_repository?(path)
path.exist? &&
!path.empty? &&
git_cmd_succeeds?(path)
end

def git_cmd_succeeds?(path)
git("rev-parse", "--git-dir", chdir: path)
.tap(&:run)
.success?
end

def git_clone(url, path, branch: 'master', **kwargs)
git('clone', *modern_git_args, '--recursive', '--branch', branch, url, path, **kwargs)
end
Expand Down
62 changes: 56 additions & 6 deletions test/unit/deploy_commands_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
module Shipit
class DeployCommandsTest < ActiveSupport::TestCase
def setup
@stack = shipit_stacks(:shipit)
@deploy = shipit_deploys(:shipit_pending)
@stack = @deploy.stack
@stack.stubs(:clear_git_cache!)
@commands = DeployCommands.new(@deploy)
@deploy_spec = stub(
dependencies_steps!: ['bundle install --some-args'],
Expand All @@ -21,35 +22,84 @@ def setup
end

test "#fetch calls git fetch if repository cache already exist" do
Dir.expects(:exist?).with(@stack.git_path).returns(true)
@stack.git_path.stubs(:exist?).returns(true)
@stack.git_path.stubs(:empty?).returns(false)

command = @commands.fetch

assert_equal %w(git fetch origin --tags master), command.args
end

test "#fetch calls git fetch in git_path directory if repository cache already exist" do
Dir.expects(:exist?).with(@stack.git_path).returns(true)
@stack.git_path.stubs(:exist?).returns(true)
@stack.git_path.stubs(:empty?).returns(false)

command = @commands.fetch

assert_equal @stack.git_path.to_s, command.chdir
end

test "#fetch calls git clone if repository cache do not exist" do
Dir.expects(:exist?).with(@stack.git_path).returns(false)
@stack.git_path.stubs(:exist?).returns(false)

command = @commands.fetch

expected = %W(git clone --single-branch --recursive --branch master #{@stack.repo_git_url} #{@stack.git_path})
assert_equal expected, command.args.map(&:to_s)
end

test "#fetch calls git clone if repository cache is empty" do
@stack.git_path.stubs(:exist?).returns(true)
@stack.git_path.stubs(:empty?).returns(true)

command = @commands.fetch

expected = %W(git clone --single-branch --recursive --branch master #{@stack.repo_git_url} #{@stack.git_path})
assert_equal expected, command.args
end

test "#fetch calls git clone if repository cache corrupt" do
@stack.git_path.stubs(:exist?).returns(true)
@stack.git_path.stubs(:empty?).returns(false)
StackCommands.any_instance.expects(:git_cmd_succeeds?)
.with(@stack.git_path)
.returns(false)

command = @commands.fetch

expected = %W(git clone --single-branch --recursive --branch master #{@stack.repo_git_url} #{@stack.git_path})
assert_equal expected, command.args
end

test "#fetch clears a corrupted git stash before cloning" do
@stack.expects(:clear_git_cache!)
@stack.git_path.stubs(:exist?).returns(true)
@stack.git_path.stubs(:empty?).returns(false)
StackCommands.any_instance.expects(:git_cmd_succeeds?)
.with(@stack.git_path)
.returns(false)

command = @commands.fetch

expected = %W(git clone --single-branch --recursive --branch master #{@stack.repo_git_url} #{@stack.git_path})
assert_equal expected, command.args
end

test "#fetch does not use --single-branch if git is outdated" do
Dir.expects(:exist?).with(@stack.git_path).returns(false)
@stack.git_path.stubs(:exist?).returns(false)
StackCommands.stubs(git_version: Gem::Version.new('1.7.2.30'))

command = @commands.fetch

expected = %W(git clone --recursive --branch master #{@stack.repo_git_url} #{@stack.git_path})
assert_equal expected, command.args.map(&:to_s)
end

test "#fetch calls git fetch in base_path directory if repository cache do not exist" do
Dir.expects(:exist?).with(@stack.git_path).returns(false)
@stack.git_path.stubs(:exist?).returns(false)

command = @commands.fetch

assert_equal @stack.deploys_path.to_s, command.chdir
end

Expand Down

0 comments on commit 00def0c

Please sign in to comment.