Skip to content

Commit

Permalink
initial checkin
Browse files Browse the repository at this point in the history
  • Loading branch information
Dave North committed Nov 26, 2015
1 parent dff056e commit 9159753
Show file tree
Hide file tree
Showing 19 changed files with 1,191 additions and 2 deletions.
40 changes: 38 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,38 @@
# dynamodb-emr-exporter
Uses EMR clusters to export dynamoDB tables to S3 and generates import steps
# DynamoDB EMR Exporter
Uses EMR clusters to export and import dynamoDB tables to/from S3. This uses the same routines as dataPipeline BUT it runs everything though a single cluster for all tables rather than a cluster per table.

## Export Usage

* Clone this repo to a folder called /usr/local/dynamodb-emr
* Install python
``` apt-get install python ```
* Install the python dependancies
``` pip install -r requirements.txt ```
* Configure at least one [boto profile](http://boto.cloudhackers.com/en/latest/boto_config_tut.html)
* Create a new IAM role called dynamodb_emr_backup_restore using the IAM policy contained in _**dynamodb_emr_backup_restore.IAMPOLICY.json**_

>The role name can be changed by editing common-json/ec2-attributes.json
* Configure the size of your EMR cluster

>Edit the *common-json/instance-groups.json file* to set the number of masters and workers (typically, a single master and worker is fine)
* Run the invokeEMR.sh script as follows

```
./invokeEMR.sh app_name emr_cluster_name boto_profile_name table_filter read_throughput_percentage json_output_directory S3_location
```

Where

* _**app_name**_ is a 'friendly name' for the DynamoDB table set you wish to export
* _**emr_cluster_name**_ is a name to give to the EMR cluster
* _**boto_profile_name**_ is a valid boto profile name containing your keys and a region
* _**table_filter**_ is a filter for which table names to export (ie. MYAPP_PROD will export ALL tables starting with MYAPP_PROD)
* _**read_throughput_percentage**_ is the percent of provisioned read throughput to use (eg 0.45 will use 45% of the provisioned read throughput)
* _**json_output_directory**_ is a folder to output the json files for configuring the EMR cluster for export
* _**S3_location**_ is a base S3 location to store the exports and all logs (ie. s3://mybucket/myfolder)

## Import Usage

## Workings
35 changes: 35 additions & 0 deletions common-json/.svn/all-wcprops
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
K 25
svn:wc:ra_dav:version-url
V 74
/signiant/product/!svn/ver/67433/DevOps/DynamoDB/Backup/v3/EMR/common-json
END
bootstrap-actions-import.json
K 25
svn:wc:ra_dav:version-url
V 104
/signiant/product/!svn/ver/73465/DevOps/DynamoDB/Backup/v3/EMR/common-json/bootstrap-actions-import.json
END
applications.json
K 25
svn:wc:ra_dav:version-url
V 92
/signiant/product/!svn/ver/65580/DevOps/DynamoDB/Backup/v3/EMR/common-json/applications.json
END
bootstrap-actions-export.json
K 25
svn:wc:ra_dav:version-url
V 104
/signiant/product/!svn/ver/73465/DevOps/DynamoDB/Backup/v3/EMR/common-json/bootstrap-actions-export.json
END
ec2-attributes.json
K 25
svn:wc:ra_dav:version-url
V 94
/signiant/product/!svn/ver/65866/DevOps/DynamoDB/Backup/v3/EMR/common-json/ec2-attributes.json
END
instance-groups.json
K 25
svn:wc:ra_dav:version-url
V 95
/signiant/product/!svn/ver/67433/DevOps/DynamoDB/Backup/v3/EMR/common-json/instance-groups.json
END
198 changes: 198 additions & 0 deletions common-json/.svn/entries
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
10

dir
73464
http://svn/signiant/product/DevOps/DynamoDB/Backup/v3/EMR/common-json
http://svn/signiant/product



2015-02-07T00:32:57.729202Z
67433
dnorth


svn:special svn:externals svn:needs-lock











cce678a6-1c66-44e5-9c03-45eb55cd00e1

applications.json
file




2015-08-10T20:04:42.707000Z
081e7be18fbbefb118908d8fa67f89e4
2014-12-11T16:46:26.217213Z
65580
dnorth





















64

bootstrap-actions-export.json
file
73465



2015-11-25T22:19:24.711000Z
0c563b4f6cdc377ef3290946b72b3a6b
2015-11-25T22:22:18.701041Z
73465
dnorth





















1122

bootstrap-actions-import.json
file
73465



2015-11-25T22:19:24.722000Z
7172500b80c16188e3c9d41c6bdc9891
2015-11-25T22:22:18.701041Z
73465
dnorth





















492

ec2-attributes.json
file




2015-08-10T20:04:42.674000Z
d35781b33a58d2bf7a8a1c8bff70370c
2014-12-19T16:31:24.041138Z
65866
dnorth





















56

instance-groups.json
file




2015-08-10T20:04:42.691000Z
34973e9501798511a8737b5e5879499f
2015-02-07T00:32:57.729202Z
67433
dnorth





















258

8 changes: 8 additions & 0 deletions common-json/.svn/text-base/applications.json.svn-base
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[
{
"Name": "HIVE"
},
{
"Name": "PIG"
}
]
34 changes: 34 additions & 0 deletions common-json/.svn/text-base/bootstrap-actions-export.json.svn-base
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[
{
"Args":[
"-e",
"fs.s3.enableServerSideEncryption=true"
],
"Name":"bootstrap-action.enableServerSideEncryption",
"Path":"s3://us-east-1.elasticmapreduce/bootstrap-actions/configure-hadoop"
},
{
"Args":[
"--yarn-key-value",
"yarn.nodemanager.resource.memory-mb=11520",
"--yarn-key-value",
"yarn.scheduler.maximum-allocation-mb=11520",
"--yarn-key-value",
"yarn.scheduler.minimum-allocation-mb=1440",
"--yarn-key-value",
"yarn.app.mapreduce.am.resource.mb=2880",
"--mapred-key-value",
"mapreduce.map.memory.mb=5760",
"--mapred-key-value",
"mapreduce.map.java.opts=-Xmx4608M",
"--mapred-key-value",
"mapreduce.reduce.memory.mb=2880",
"--mapred-key-value",
"mapreduce.reduce.java.opts=-Xmx2304m",
"--mapred-key-value",
"mapreduce.map.speculative=false"
],
"Name":"bootstrap-action.setMemory",
"Path":"s3://us-east-1.elasticmapreduce/bootstrap-actions/configure-hadoop"
}
]
18 changes: 18 additions & 0 deletions common-json/.svn/text-base/bootstrap-actions-import.json.svn-base
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[
{
"Args":[
"-e",
"fs.s3.enableServerSideEncryption=true"
],
"Name":"bootstrap-action.enableServerSideEncryption",
"Path":"s3://us-east-1.elasticmapreduce/bootstrap-actions/configure-hadoop"
},
{
"Args":[
"--mapred-key-value",
"mapreduce.map.speculative=false"
],
"Name":"bootstrap-action.configCluster",
"Path":"s3://us-east-1.elasticmapreduce/bootstrap-actions/configure-hadoop"
}
]
3 changes: 3 additions & 0 deletions common-json/.svn/text-base/ec2-attributes.json.svn-base
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"InstanceProfile": "dynamodb_emr_backup_restore"
}
14 changes: 14 additions & 0 deletions common-json/.svn/text-base/instance-groups.json.svn-base
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[
{
"InstanceCount": 1,
"Name": "Master",
"InstanceGroupType": "MASTER",
"InstanceType": "m1.xlarge"
},
{
"InstanceCount": 1,
"Name": "Workers",
"InstanceGroupType": "CORE",
"InstanceType": "m1.xlarge"
}
]
8 changes: 8 additions & 0 deletions common-json/applications.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[
{
"Name": "HIVE"
},
{
"Name": "PIG"
}
]
Loading

0 comments on commit 9159753

Please sign in to comment.