-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V0.4.0 #129
Conversation
Are you ready for a review yet @aldemirenes ? |
I am not finished with the all the issues in the milestone yet, I will let you know when I am done.@alexanderdean |
Cool - I am on holiday next week, please assign @jbeemster as a reviewer when you are ready! |
@jbeemster v0.4.0 is mostly complete, only thing left to do removing binaries of the Stream Enrich and Stream Collector from the repo. This will be done after NSQ added versions of the Stream Enrich and Stream Collector is published. |
@aldemirenes on the ES issue you should be able to configure the amount of Java Heap to give it: Its likely that the newer versions do need a lot more ram... didn't realise it was this much however! |
If the new ES is that memory hungry, is this going to restrict our options to deploy Snowplow Mini (i.e. by increasing the cost of each Snowplow Mini instance)? Should we stick with the older version? |
@alexanderdean I think that's can be option too but I think it is best to make more tests with updated version before decide. I will continue to test, I will share the results with you. |
Thanks, would be good to understand the TCO impact of this ES change from @jbeemster in due course |
If we need 8 or more GB of RAM for Snowplow Mini the TCO cost will double or quadruple. So if we cannot get it running reliably on a |
@@ -27,32 +26,17 @@ echo "Event Counts:" | |||
echo " - Good: ${good_count}" | |||
echo " - Bad: ${bad_count}" | |||
|
|||
stream_enrich_pid_file=/var/run/snowplow_stream_enrich_0.10.0.pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove all the pid testing to check the service restarts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are moved to control plane test suite under the control plane folder.
cluster { | ||
name: "elasticsearch" | ||
index: "bad" | ||
clusterType: "bad" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this get changed from type
to clusterType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related with es loader. Since we use different config library in es loader, "type" can not be used as an key because of it is one of the keyword in the Scala lang.
NSQ { | ||
host: 127.0.0.1 #ip for NSQD | ||
port: 4150 #port for NSQD | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace
) | ||
|
||
//global variable for script's path | ||
var scriptsPath string | ||
var enrichmentsPath string | ||
var configPath string | ||
var spminiVersionFile string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase would be: spMiniVersionFile
"os/exec" | ||
"os" | ||
"flag" | ||
"io/ioutil" | ||
) | ||
|
||
//global variable for script's path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//global variables for script paths
http.HandleFunc("/add-domain-name", addDomainName) | ||
http.HandleFunc("/check-url", checkUrl) | ||
http.HandleFunc("/check-host-domain-name", checkHostDomainName) | ||
http.HandleFunc("/spmini-version", getSpminiVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/spmini-version to /version
if req.Method == "POST" { | ||
req.ParseForm() | ||
igluServerSuperUUID := req.Form["iglu_server_super_uuid"][0] | ||
shellScriptCommand := []string{scriptsPath + "/" + "add_iglu_server_super_uuid.sh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move all script paths like add_iglu_server_super_uuid.sh
to constants at the top of this file.
http.Error(resp, err.Error(), 400) | ||
return | ||
} | ||
//restart SP services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This restart sp services block has been repeated a few times - lets move it to a private function so you can re-use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better with this way because there is a function for calling restart services scripts and what I did in the these "restart SP services" blocks is that calling restartSPServices function and handling errors. Creating another function just for error handling block will be a little bit verbose I think.
sudo /etc/init.d/snowplow_mini_control_plane_api restart | ||
|
||
exit 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excess whitespace
sink_bad_pid_file=/var/run/snowplow_elasticsearch_loader_bad.pid | ||
sink_good_pid_file=/var/run/snowplow_elasticsearch_loader_good.pid | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excess whitespace
I created new branch for es and kibana are not upgraded. I created AMI for it in the eu-central-1 region of the Snowplow Engineering Sandbox. There is some problem while adding kibana index automatically in that version, therefore automatically created indexes need to be removed in the beginning, indexes need to be added manually after removing them.This issue is not very big problem, it should be solved in a short time but I could not look at it yet. Other than that "es kibana not upgraded" version working just fine. "es kibana upgraded" version is still giving memory related errors in the |
I think let's strip out the ES upgrade altogether - @jbeemster ? |
Hey @aldemirenes @alexanderdean - yep lets push that back for the moment - if it cannot run reliably then we do not want it! Do keep that commit in a branch somewhere though so we can sample it as we will likely need to revisit this at some point in the future. |
Packerfile.json
Outdated
"ssh_username": "ubuntu", | ||
"ami_name": "snowplow-mini-{{user `version`}}-{{ timestamp }}-hvm-ebs-amd64", | ||
"ami_name": "snowplow-mini-{{user `version`}}-with-es-1.7-{{ timestamp }}-hvm-ebs-amd64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove the with-es-1.7
as we are not making that distinction yet.
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
@@ -0,0 +1 @@ | |||
snowplow-mini-control-plane-api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
sudo sed -i 's/\(.*"uri":\)\(.*\)/\1 "'$iglu_server_uri'",/' $template_iglu_server | ||
sudo sed -i 's/\(.*"apikey":\)\(.*\)/\1 "'$iglu_server_api_key'"/' $template_iglu_server | ||
##secondly write content in the template_iglu_server to iglu_resolver.json | ||
sudo sed -i -E '/.*"repositories":.*/r '$template_iglu_server'' $iglu_resolver_config_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this operation not be done directly in GoLang? What is the rationale for using a script?
delete_all_apikeys="DELETE FROM apikeys" | ||
iglu_server_setup="INSERT INTO apikeys (uid, vendor_prefix, permission, createdat) VALUES ('${iglu_server_super_uid}','*','super',current_timestamp);" | ||
psql --host=localhost --port=5432 --username=snowplow --dbname=iglu -c "${delete_all_apikeys}" | ||
psql --host=localhost --port=5432 --username=snowplow --dbname=iglu -c "${iglu_server_setup}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this operation not be done directly in GoLang? What is the rationale for using a script?
echo "Already started" | ||
else | ||
echo "Starting $name" | ||
cd "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as caddy_init
@@ -34,9 +34,9 @@ case "$1" in | |||
echo "Starting $name" | |||
cd "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as caddy_init
@@ -34,13 +34,13 @@ case "$1" in | |||
echo "Starting $name" | |||
cd "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as caddy_init
@@ -32,7 +32,7 @@ case "$1" in | |||
echo "Starting $name" | |||
cd "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as caddy_init
@@ -34,13 +34,13 @@ case "$1" in | |||
echo "Starting $name" | |||
cd "$dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as caddy_init
95ad1c0
to
8cafe03
Compare
Any chance we could include the latest version of the scala stream collector in this release? It'd be good to get all the fixes and updates in so there's close parity between mini and R93. |
0.4.0 will embed all the R95 Zeugma applications - R95 being needed because it introduces the NSQ support. |
Makes sense - that's great news! |
# url: "http://dl.bintray.com/snowplow/snowplow-generic/{{kinesis_package}}" | ||
# dest: "{{staging_dir}}" | ||
# when: check_kinesis_packages_result.stat.exists == False | ||
# register: kinesis_packages_downloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be rmd, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this will be removed, I just comment out them to wait NSQ releases of the Stream Collector and Stream Enrich.
;; | ||
esac | ||
|
||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nsqd_init and nsqlookupd_init are almost identical except for the env variables, can't we refactor the common pieces out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep you are right however since they are init scripts, I think it will be best to leave them without giving them arguments. If there is cleaner way in your mind without giving any arguments to script or without initializing any env variables beforehand, feel free to suggest!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refactor the common parts in another script and source it in your nsqd_init and nsqlookupd_init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use "source" but there was an unexpected behavior while trying to use the init scripts with the "update rc.d" command after using the "source" command. However, I will try to find a solution until the release.
@@ -0,0 +1,43 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is go fmt run on those files? they seem to be space-indented instead of tab-indented?
err := restartSPServices() | ||
if err != nil { | ||
http.Error(resp, err.Error(), 500) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for return
} | ||
|
||
func restartServices(resp http.ResponseWriter, req *http.Request) { | ||
if (req.Method == "PUT") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for parens
var esLoaderGoodInit = "snowplow_elasticsearch_loader_good" | ||
var esLoaderBadInit = "snowplow_elasticsearch_loader_bad" | ||
|
||
func restartService(service string) (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for parens around error, below as well
var streamCollectorInit = "snowplow_stream_collector" | ||
var streamEnrichInit = "snowplow_stream_enrich" | ||
var esLoaderGoodInit = "snowplow_elasticsearch_loader_good" | ||
var esLoaderBadInit = "snowplow_elasticsearch_loader_bad" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are those used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nop, I forgot to remove them after moved them to config. Will remove them.
|
||
func restartSPServices() (error) { | ||
err := restartService("streamCollector") | ||
if err != nil {return err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reformat as:
if err != nil {
return err
}
// returns public IP if the host machine is EC2 instance | ||
func getPublicEC2IP() (string, error) { | ||
// URL of the instance meta service of AWS EC2 | ||
var urlForCheckingPubIP = "http://169.254.169.254/latest/meta-data/public-ipv4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a config?
|
||
externalIgluServer := ExternalIgluServer { | ||
ConfigPath: tmpfn, | ||
LineToInsert: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there a cleaner way to do this? This feels hacky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had another look at the control plane api
language: go | ||
|
||
go: | ||
- 1.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move to go 1.9?
Packerfile.json
Outdated
"ssh_username": "ubuntu", | ||
"ami_name": "snowplow-mini-{{user `version`}}-{{ timestamp }}-hvm-ebs-amd64", | ||
"ami_groups": [ "all" ], | ||
"ami_regions": "us-east-2,us-west-1,us-west-2,ca-central-1,eu-west-1,eu-central-1,eu-west-2,ap-southeast-1,ap-southeast-2,ap-northeast-2,ap-northeast-1,ap-south-1,sa-east-1", | ||
"ami_regions": "eu-west-1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this cut down to only eu-west-1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in the rc commit, I will change back it after testing is done.
README.md
Outdated
* Elasticsearch Sink Bad | ||
- Reads events in from the `bad-1-pipe` | ||
- Reads events in from the `bad events nsq topic` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are those between back ticks since they don't refer to the nsq topic name?
@@ -10,10 +10,10 @@ | |||
### END INIT INFO | |||
|
|||
dir="/home/ubuntu/snowplow/bin/" | |||
cmd="./snowplow-elasticsearch-sink-0.8.0-2x --config /home/ubuntu/snowplow/configs/snowplow-elasticsearch-sink-bad.hocon" | |||
cmd="java -jar $dir/snowplow-elasticsearch-loader-http-0.10.0-rc1.jar --config /home/ubuntu/snowplow/configs/snowplow-es-loader-bad.hocon" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's 0.10.1 now
@@ -10,13 +10,13 @@ | |||
### END INIT INFO | |||
|
|||
dir="/home/ubuntu/snowplow/bin/" | |||
cmd="./snowplow-stream-collector-0.9.0 --config /home/ubuntu/snowplow/configs/snowplow-stream-collector.hocon" | |||
cmd="java -jar $dir/snowplow-elasticsearch-loader-http-0.10.0-rc1.jar --config /home/ubuntu/snowplow/configs/snowplow-es-loader-good.hocon" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -10,13 +10,13 @@ | |||
### END INIT INFO | |||
|
|||
dir="/home/ubuntu/snowplow/bin/" | |||
cmd="./snowplow-stream-enrich-0.10.0 --config /home/ubuntu/snowplow/configs/snowplow-stream-enrich.hocon --resolver file:/home/ubuntu/snowplow/configs/iglu-resolver.json --enrichments file:/home/ubuntu/snowplow/configs/enrichments" | |||
cmd="java -jar snowplow-stream-enrich-0.11.0.jar --config /home/ubuntu/snowplow/configs/snowplow-stream-enrich.hocon --resolver file:/home/ubuntu/snowplow/configs/iglu-resolver.json --enrichments file:/home/ubuntu/snowplow/configs/enrichments" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's 0.11.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nsq version of stream enrich will be used in the v0.4.0, I think there is no need to change stream enrich to 0.11.1 right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well use the latest components available
00d4c35
to
e402ca0
Compare
Can you cross-check against the milestone @BenFradet ? It feels like this git history versus the milestone are rather divergent. |
I am still working on v0.4.0, I opened this PR to just show the progress. Also, binaries of stream enrich and stream collector are placed in the repo for now because their new versions with NSQ support are not published yet. After publish them, binaries will be removed from repo.