Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add incremental backup-restore mechanism #29

Merged
merged 8 commits into from
Jun 13, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@

### Added

- Integration test for AWS
- Incremental backup of etcd, where full snapshot is taken first and then we apply watch and persist the logs accumulated over certain period to snapshot store. Restore process, restores from the full snapshot, start the embedded etcd and apply the logged events one by one.

- Initial setup for Integration test for AWS

## 0.2.3 - 2018-05-22

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ brew install git
In case you have to create a new release or a new hotfix, you have to push the resulting Docker image into a Docker registry. Currently, we are using the Google Container Registry (this could change in the future). Please follow the official [installation instructions from Google](https://cloud.google.com/sdk/downloads).

### Installing `Docker` (Optional)

In case you want to build Docker images, you have to install Docker itself. We recommend using [Docker for Mac OS X](https://docs.docker.com/docker-for-mac/) which can be downloaded from [here](https://download.docker.com/mac/stable/Docker.dmg).

### Build
Expand Down Expand Up @@ -173,6 +174,7 @@ With sub-command `server` you can start a http server which exposes an endpoint
We use [Dep](https://github.com/golang/dep) to manage golang dependencies.. In order to add a new package dependency to the project, you can perform `dep ensure -add <PACKAGE>` or edit the `Gopkg.toml` file and append the package along with the version you want to use as a new `[[constraint]]`.

### Updating dependencies

The `Makefile` contains a rule called `revendor` which performs a `dep ensure -update` and a `dep prune` command. This updates all the dependencies to its latest versions (respecting the constraints specified in the `Gopkg.toml` file). The command also installs the packages which do not already exist in the `vendor` folder but are specified in the `Gopkg.toml` (in case you have added new ones).

```sh
Expand All @@ -195,4 +197,4 @@ By default, we try to run test in parallel without computing code coverage. To g

Similarly, we use environment variable `INTEGRATION` to determine whether to execute integration test or not. By default, integration tests are executed. So, to disable integration test locally, you will have to set `INTEGRATION=false`.

[etcd]: https://github.com/coreos/etcd
[etcd]: https://github.com/coreos/etcd
4 changes: 2 additions & 2 deletions cmd/miscellaneous.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ import (
// initializeSnapstoreFlags adds the snapstore related flags to <cmd>
func initializeSnapstoreFlags(cmd *cobra.Command) {
cmd.Flags().StringVar(&storageProvider, "storage-provider", "", "snapshot storage provider")
cmd.Flags().StringVar(&storageContainer, "store-container", "", "prefix or directory inside container under which snapstore is created")
cmd.Flags().StringVar(&storagePrefix, "store-prefix", "", "container which will be used as snapstore")
cmd.Flags().StringVar(&storageContainer, "store-container", "", "container which will be used as snapstore")
cmd.Flags().StringVar(&storagePrefix, "store-prefix", "", "prefix or directory inside container under which snapstore is created")
}
33 changes: 25 additions & 8 deletions cmd/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,10 @@ package cmd

import (
"fmt"

"github.com/coreos/etcd/pkg/types"

"path"
"sort"

"github.com/coreos/etcd/pkg/types"
"github.com/gardener/etcd-backup-restore/pkg/snapshot/restorer"
"github.com/gardener/etcd-backup-restore/pkg/snapstore"
"github.com/sirupsen/logrus"
Expand Down Expand Up @@ -59,12 +58,12 @@ func NewRestoreCommand(stopCh <-chan struct{}) *cobra.Command {
if err != nil {
logger.Fatalf("failed to create snapstore from configured storage provider: %v", err)
}
logger.Infoln("Finding latest snapshot...")
snap, err := store.GetLatest()
logger.Infoln("Finding latest set of snapshot to recover from...")
baseSnap, deltaSnapList, err := getLatestFullSnapshotAndDeltaSnapList(store)
if err != nil {
logger.Fatalf("failed to get latest snapshot: %v", err)
}
if snap == nil {
if baseSnap == nil {
logger.Infof("No snapshot found. Will do nothing.")
return
}
Expand All @@ -74,14 +73,14 @@ func NewRestoreCommand(stopCh <-chan struct{}) *cobra.Command {
options := &restorer.RestoreOptions{
RestoreDataDir: restoreDataDir,
Name: restoreName,
Snapshot: *snap,
BaseSnapshot: *baseSnap,
DeltaSnapList: deltaSnapList,
ClusterURLs: clusterUrlsMap,
PeerURLs: peerUrls,
ClusterToken: restoreClusterToken,
SkipHashCheck: skipHashCheck,
}

logger.Infof("Restoring from latest snapshot: %s...", path.Join(snap.SnapDir, snap.SnapName))
err = rs.Restore(*options)
if err != nil {
logger.Fatalf("Failed to restore snapshot: %v", err)
Expand Down Expand Up @@ -113,3 +112,21 @@ func initialClusterFromName(name string) string {
}
return fmt.Sprintf("%s=http://localhost:2380", n)
}

// getLatestFullSnapshotAndDeltaSnapList resturns the latest snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'returns'

func getLatestFullSnapshotAndDeltaSnapList(store snapstore.SnapStore) (*snapstore.Snapshot, snapstore.SnapList, error) {
var deltaSnapList snapstore.SnapList
snapList, err := store.List()
if err != nil {
return nil, nil, err
}

for index := len(snapList); index > 0; index-- {
if snapList[index-1].Kind == snapstore.SnapshotKindFull {
sort.Sort(deltaSnapList)
return snapList[index-1], deltaSnapList, nil
}
deltaSnapList = append(deltaSnapList, snapList[index-1])
}
return nil, deltaSnapList, nil
}
31 changes: 10 additions & 21 deletions cmd/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,14 @@ func NewServerCommand(stopCh <-chan struct{}) *cobra.Command {
logger.Info("Starting the http server...")
go handler.Start()
defer handler.Stop()

if snapstoreConfig == nil {
logger.Warnf("No snapstore storage provider configured. Will not start backup schedule.")
handler.Status = http.StatusOK
<-stopCh
return
}

for {
ss, err := snapstore.GetSnapstore(snapstoreConfig)
if err != nil {
Expand All @@ -106,7 +108,9 @@ func NewServerCommand(stopCh <-chan struct{}) *cobra.Command {
ss,
logger,
maxBackups,
deltaSnapshotIntervalSeconds,
time.Duration(etcdConnectionTimeout),
time.Duration(garbageCollectionPeriodSeconds),
tlsConfig)
if err != nil {
logger.Fatalf("Failed to create snapshotter from configured storage provider: %v", err)
Expand All @@ -125,37 +129,23 @@ func NewServerCommand(stopCh <-chan struct{}) *cobra.Command {
handler.Status = http.StatusServiceUnavailable
continue
}

err = ssr.TakeFullSnapshot()
if err != nil {
if etcdErr, ok := err.(*errors.EtcdError); ok == true {
logger.Errorf("Snapshotter failed with etcd error: %v", etcdErr)

} else {
logger.Fatalf("Snapshotter failed with error: %v", err)
}
handler.Status = http.StatusServiceUnavailable
continue
} else {
handler.Status = http.StatusOK
}

err = ssr.Run(stopCh)
if err != nil {
gcStopCh := make(chan bool)
go ssr.GarbageCollector(gcStopCh)
if err := ssr.Run(stopCh); err != nil {
handler.Status = http.StatusServiceUnavailable
if etcdErr, ok := err.(*errors.EtcdError); ok == true {
logger.Errorf("Snapshotter failed with etcd error: %v", etcdErr)

} else {
logger.Fatalf("Snapshotter failed with error: %v", err)
}
} else {
handler.Status = http.StatusOK
}

gcStopCh <- true
}
},
}

initializeServerFlags(serverCmd)
initializeSnapshotterFlags(serverCmd)
initializeSnapstoreFlags(serverCmd)
Expand All @@ -180,8 +170,7 @@ func ProbeEtcd(tlsConfig *snapshotter.TLSConfig) error {

ctx, cancel := context.WithTimeout(context.TODO(), time.Duration(etcdConnectionTimeout)*time.Second)
defer cancel()
_, err = client.Get(ctx, "foo")
if err != nil {
if _, err := client.Get(ctx, "foo"); err != nil {
logger.Errorf("Failed to connect to client: %v", err)
return err
}
Expand Down
12 changes: 9 additions & 3 deletions cmd/snapshot.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ storing snapshots on various cloud storage providers as well as local disk locat
if err != nil {
logger.Fatalf("Failed to create snapstore from configured storage provider: %v", err)
}

tlsConfig := snapshotter.NewTLSConfig(
certFile,
keyFile,
Expand All @@ -52,17 +53,20 @@ storing snapshots on various cloud storage providers as well as local disk locat
ss,
logger,
maxBackups,
deltaSnapshotIntervalSeconds,
time.Duration(etcdConnectionTimeout),
time.Duration(garbageCollectionPeriodSeconds),
tlsConfig)
if err != nil {
logger.Fatalf("Failed to create snapshotter: %v", err)
}
err = ssr.Run(stopCh)
if err != nil {
gcStopCh := make(chan bool)
go ssr.GarbageCollector(gcStopCh)
if err := ssr.Run(stopCh); err != nil {
logger.Fatalf("Snapshotter failed with error: %v", err)
}
gcStopCh <- true
logger.Info("Shutting down...")
//TODO: do cleanup work here.
return
},
}
Expand All @@ -75,8 +79,10 @@ storing snapshots on various cloud storage providers as well as local disk locat
func initializeSnapshotterFlags(cmd *cobra.Command) {
cmd.Flags().StringSliceVarP(&etcdEndpoints, "endpoints", "e", []string{"127.0.0.1:2379"}, "comma separated list of etcd endpoints")
cmd.Flags().StringVarP(&schedule, "schedule", "s", "* */1 * * *", "schedule for snapshots")
cmd.Flags().IntVarP(&deltaSnapshotIntervalSeconds, "delta-snapshot-period-seconds", "i", 10, "Period in seconds after which delta snapshot will be persisted")
cmd.Flags().IntVarP(&maxBackups, "max-backups", "m", 7, "maximum number of previous backups to keep")
cmd.Flags().IntVar(&etcdConnectionTimeout, "etcd-connection-timeout", 30, "etcd client connection timeout")
cmd.Flags().IntVar(&garbageCollectionPeriodSeconds, "garbage-collection-period-seconds", 60, "Period in seconds for garbage collecting old backups")
cmd.Flags().BoolVar(&insecureTransport, "insecure-transport", true, "disable transport security for client connections")
cmd.Flags().BoolVar(&insecureSkipVerify, "insecure-skip-tls-verify", false, "skip server certificate verification")
cmd.Flags().StringVar(&certFile, "cert", "", "identify secure client using this TLS certificate file")
Expand Down
20 changes: 11 additions & 9 deletions cmd/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,17 @@ var (
logger = logrus.New()

//snapshotter flags
schedule string
etcdEndpoints []string
maxBackups int
etcdConnectionTimeout int
insecureTransport bool
insecureSkipVerify bool
certFile string
keyFile string
caFile string
schedule string
etcdEndpoints []string
deltaSnapshotIntervalSeconds int
maxBackups int
etcdConnectionTimeout int
garbageCollectionPeriodSeconds int
insecureTransport bool
insecureSkipVerify bool
certFile string
keyFile string
caFile string

//server flags
port int
Expand Down
4 changes: 3 additions & 1 deletion example/etcd-statefulset-aws.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=S3
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:0.2.3
imagePullPolicy: Always
ports:
Expand Down
4 changes: 3 additions & 1 deletion example/etcd-statefulset-azure.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=ABS
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:0.2.3
imagePullPolicy: Always
ports:
Expand Down
4 changes: 3 additions & 1 deletion example/etcd-statefulset-gcp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=GCS
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:0.2.3
imagePullPolicy: Always
ports:
Expand Down
4 changes: 3 additions & 1 deletion example/etcd-statefulset-local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:0.2.3
imagePullPolicy: Always
ports:
Expand Down
4 changes: 3 additions & 1 deletion example/etcd-statefulset-openstack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=Swift
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:0.2.3
imagePullPolicy: Always
ports:
Expand Down
4 changes: 3 additions & 1 deletion hack/templates/etcd-statefulset.yaml.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,13 @@ spec:
command:
- etcdbrctl
- server
- --schedule=*/1 * * * *
- --schedule=*/5 * * * *
- --max-backups=5
- --data-dir=/var/etcd/data
- --insecure-transport=true
- --storage-provider=${provider}
- --delta-snapshot-period-seconds=10
- --garbage-collection-period-seconds=60
image: eu.gcr.io/gardener-project/gardener/etcdbrctl:${imageTag}
imagePullPolicy: Always
ports:
Expand Down
Loading