Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.4.0 #129

Merged
merged 18 commits into from
Dec 21, 2017
Merged

V0.4.0 #129

merged 18 commits into from
Dec 21, 2017

Conversation

aldemirenes
Copy link
Contributor

@aldemirenes aldemirenes commented Sep 6, 2017

I am still working on v0.4.0, I opened this PR to just show the progress. Also, binaries of stream enrich and stream collector are placed in the repo for now because their new versions with NSQ support are not published yet. After publish them, binaries will be removed from repo.


@aldemirenes aldemirenes added this to the Version 0.4.0 milestone Sep 6, 2017
@alexanderdean
Copy link
Member

Are you ready for a review yet @aldemirenes ?

@aldemirenes
Copy link
Contributor Author

I am not finished with the all the issues in the milestone yet, I will let you know when I am done.@alexanderdean

@alexanderdean
Copy link
Member

Cool - I am on holiday next week, please assign @jbeemster as a reviewer when you are ready!

@aldemirenes
Copy link
Contributor Author

@jbeemster v0.4.0 is mostly complete, only thing left to do removing binaries of the Stream Enrich and Stream Collector from the repo. This will be done after NSQ added versions of the Stream Enrich and Stream Collector is published.
Also, there are two problematic issues right know.First one is starting Kibana takes too much time. It changes 2 to 10 min. This related with server.basePath setting of the Kibana, more information for the reason of the problem can be found in here. However, since kibana will be started only in reboot, I think this problem can be tolerated.
Second issue is Elasticsearch 5.x is very memory hungry. It causes some problems. For example, it is not starting in the 'vagrant up' and this problem requires starting "configure ansible role" manually after vagrant machine up and running. But this is problem does not occur in the provisioning of the AMI with Packer. Another problem with Elasticsearch 5.x is that when EC2 instance has less than 8 gb ram, it is not starting in the first time after creation of the Snowplow Mini instance from the AMI. And this problem requires rebooting the machine or SSHing to the machine and starting ES manually. But this problem does no occur when EC2 instance has 8gb or more ram. It seems that this problem is related with memory but I could not find the exact reason yet. I will continue investage and try to find a solution.

@jbeemster
Copy link
Member

@aldemirenes on the ES issue you should be able to configure the amount of Java Heap to give it:

https://stackoverflow.com/questions/41120925/ubuntu-16-04-install-elasticsearch-5-x-failed-to-start/41354969

Its likely that the newer versions do need a lot more ram... didn't realise it was this much however!

@alexanderdean
Copy link
Member

If the new ES is that memory hungry, is this going to restrict our options to deploy Snowplow Mini (i.e. by increasing the cost of each Snowplow Mini instance)? Should we stick with the older version?

@aldemirenes
Copy link
Contributor Author

@alexanderdean I think that's can be option too but I think it is best to make more tests with updated version before decide. I will continue to test, I will share the results with you.

@alexanderdean
Copy link
Member

Thanks, would be good to understand the TCO impact of this ES change from @jbeemster in due course

@jbeemster
Copy link
Member

If we need 8 or more GB of RAM for Snowplow Mini the TCO cost will double or quadruple. So if we cannot get it running reliably on a t2.medium I think we need to keep it using the old version of Elasticsearch.

@@ -27,32 +26,17 @@ echo "Event Counts:"
echo " - Good: ${good_count}"
echo " - Bad: ${bad_count}"

stream_enrich_pid_file=/var/run/snowplow_stream_enrich_0.10.0.pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all the pid testing to check the service restarts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are moved to control plane test suite under the control plane folder.

cluster {
name: "elasticsearch"
index: "bad"
clusterType: "bad"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this get changed from type to clusterType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related with es loader. Since we use different config library in es loader, "type" can not be used as an key because of it is one of the keyword in the Scala lang.

NSQ {
host: 127.0.0.1 #ip for NSQD
port: 4150 #port for NSQD

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

)

//global variable for script's path
var scriptsPath string
var enrichmentsPath string
var configPath string
var spminiVersionFile string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase would be: spMiniVersionFile

"os/exec"
"os"
"flag"
"io/ioutil"
)

//global variable for script's path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//global variables for script paths

http.HandleFunc("/add-domain-name", addDomainName)
http.HandleFunc("/check-url", checkUrl)
http.HandleFunc("/check-host-domain-name", checkHostDomainName)
http.HandleFunc("/spmini-version", getSpminiVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/spmini-version to /version

if req.Method == "POST" {
req.ParseForm()
igluServerSuperUUID := req.Form["iglu_server_super_uuid"][0]
shellScriptCommand := []string{scriptsPath + "/" + "add_iglu_server_super_uuid.sh",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move all script paths like add_iglu_server_super_uuid.sh to constants at the top of this file.

http.Error(resp, err.Error(), 400)
return
}
//restart SP services
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restart sp services block has been repeated a few times - lets move it to a private function so you can re-use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better with this way because there is a function for calling restart services scripts and what I did in the these "restart SP services" blocks is that calling restartSPServices function and handling errors. Creating another function just for error handling block will be a little bit verbose I think.

sudo /etc/init.d/snowplow_mini_control_plane_api restart

exit 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excess whitespace

sink_bad_pid_file=/var/run/snowplow_elasticsearch_loader_bad.pid
sink_good_pid_file=/var/run/snowplow_elasticsearch_loader_good.pid


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excess whitespace

@aldemirenes
Copy link
Contributor Author

aldemirenes commented Sep 21, 2017

I created new branch for es and kibana are not upgraded. I created AMI for it in the eu-central-1 region of the Snowplow Engineering Sandbox. There is some problem while adding kibana index automatically in that version, therefore automatically created indexes need to be removed in the beginning, indexes need to be added manually after removing them.This issue is not very big problem, it should be solved in a short time but I could not look at it yet. Other than that "es kibana not upgraded" version working just fine. "es kibana upgraded" version is still giving memory related errors in the t2.medium

@alexanderdean
Copy link
Member

I think let's strip out the ES upgrade altogether - @jbeemster ?

@jbeemster
Copy link
Member

Hey @aldemirenes @alexanderdean - yep lets push that back for the moment - if it cannot run reliably then we do not want it!

Do keep that commit in a branch somewhere though so we can sample it as we will likely need to revisit this at some point in the future.

Packerfile.json Outdated
"ssh_username": "ubuntu",
"ami_name": "snowplow-mini-{{user `version`}}-{{ timestamp }}-hvm-ebs-amd64",
"ami_name": "snowplow-mini-{{user `version`}}-with-es-1.7-{{ timestamp }}-hvm-ebs-amd64",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the with-es-1.7 as we are not making that distinction yet.

}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

@@ -0,0 +1 @@
snowplow-mini-control-plane-api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

sudo sed -i 's/\(.*"uri":\)\(.*\)/\1 "'$iglu_server_uri'",/' $template_iglu_server
sudo sed -i 's/\(.*"apikey":\)\(.*\)/\1 "'$iglu_server_api_key'"/' $template_iglu_server
##secondly write content in the template_iglu_server to iglu_resolver.json
sudo sed -i -E '/.*"repositories":.*/r '$template_iglu_server'' $iglu_resolver_config_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this operation not be done directly in GoLang? What is the rationale for using a script?

delete_all_apikeys="DELETE FROM apikeys"
iglu_server_setup="INSERT INTO apikeys (uid, vendor_prefix, permission, createdat) VALUES ('${iglu_server_super_uid}','*','super',current_timestamp);"
psql --host=localhost --port=5432 --username=snowplow --dbname=iglu -c "${delete_all_apikeys}"
psql --host=localhost --port=5432 --username=snowplow --dbname=iglu -c "${iglu_server_setup}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this operation not be done directly in GoLang? What is the rationale for using a script?

echo "Already started"
else
echo "Starting $name"
cd "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as caddy_init

@@ -34,9 +34,9 @@ case "$1" in
echo "Starting $name"
cd "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as caddy_init

@@ -34,13 +34,13 @@ case "$1" in
echo "Starting $name"
cd "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as caddy_init

@@ -32,7 +32,7 @@ case "$1" in
echo "Starting $name"
cd "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as caddy_init

@@ -34,13 +34,13 @@ case "$1" in
echo "Starting $name"
cd "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as caddy_init

@miike
Copy link
Contributor

miike commented Oct 4, 2017

Any chance we could include the latest version of the scala stream collector in this release? It'd be good to get all the fixes and updates in so there's close parity between mini and R93.

@alexanderdean
Copy link
Member

Any chance we could include the latest version of the scala stream collector in this release?

0.4.0 will embed all the R95 Zeugma applications - R95 being needed because it introduces the NSQ support.

@miike
Copy link
Contributor

miike commented Oct 4, 2017

Makes sense - that's great news!

# url: "http://dl.bintray.com/snowplow/snowplow-generic/{{kinesis_package}}"
# dest: "{{staging_dir}}"
# when: check_kinesis_packages_result.stat.exists == False
# register: kinesis_packages_downloaded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be rmd, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this will be removed, I just comment out them to wait NSQ releases of the Stream Collector and Stream Enrich.

;;
esac

exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nsqd_init and nsqlookupd_init are almost identical except for the env variables, can't we refactor the common pieces out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you are right however since they are init scripts, I think it will be best to leave them without giving them arguments. If there is cleaner way in your mind without giving any arguments to script or without initializing any env variables beforehand, feel free to suggest!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor the common parts in another script and source it in your nsqd_init and nsqlookupd_init

Copy link
Contributor Author

@aldemirenes aldemirenes Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use "source" but there was an unexpected behavior while trying to use the init scripts with the "update rc.d" command after using the "source" command. However, I will try to find a solution until the release.

@@ -0,0 +1,43 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is go fmt run on those files? they seem to be space-indented instead of tab-indented?

err := restartSPServices()
if err != nil {
http.Error(resp, err.Error(), 500)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for return

}

func restartServices(resp http.ResponseWriter, req *http.Request) {
if (req.Method == "PUT") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for parens

var esLoaderGoodInit = "snowplow_elasticsearch_loader_good"
var esLoaderBadInit = "snowplow_elasticsearch_loader_bad"

func restartService(service string) (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for parens around error, below as well

var streamCollectorInit = "snowplow_stream_collector"
var streamEnrichInit = "snowplow_stream_enrich"
var esLoaderGoodInit = "snowplow_elasticsearch_loader_good"
var esLoaderBadInit = "snowplow_elasticsearch_loader_bad"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those used anywhere?

Copy link
Contributor Author

@aldemirenes aldemirenes Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, I forgot to remove them after moved them to config. Will remove them.


func restartSPServices() (error) {
err := restartService("streamCollector")
if err != nil {return err}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reformat as:

if err != nil {
  return err
}

// returns public IP if the host machine is EC2 instance
func getPublicEC2IP() (string, error) {
// URL of the instance meta service of AWS EC2
var urlForCheckingPubIP = "http://169.254.169.254/latest/meta-data/public-ipv4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a config?


externalIgluServer := ExternalIgluServer {
ConfigPath: tmpfn,
LineToInsert: 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there a cleaner way to do this? This feels hacky

Copy link
Contributor

@BenFradet BenFradet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had another look at the control plane api

language: go

go:
- 1.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move to go 1.9?

Packerfile.json Outdated
"ssh_username": "ubuntu",
"ami_name": "snowplow-mini-{{user `version`}}-{{ timestamp }}-hvm-ebs-amd64",
"ami_groups": [ "all" ],
"ami_regions": "us-east-2,us-west-1,us-west-2,ca-central-1,eu-west-1,eu-central-1,eu-west-2,ap-southeast-1,ap-southeast-2,ap-northeast-2,ap-northeast-1,ap-south-1,sa-east-1",
"ami_regions": "eu-west-1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this cut down to only eu-west-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in the rc commit, I will change back it after testing is done.

README.md Outdated
* Elasticsearch Sink Bad
- Reads events in from the `bad-1-pipe`
- Reads events in from the `bad events nsq topic`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are those between back ticks since they don't refer to the nsq topic name?

@@ -10,10 +10,10 @@
### END INIT INFO

dir="/home/ubuntu/snowplow/bin/"
cmd="./snowplow-elasticsearch-sink-0.8.0-2x --config /home/ubuntu/snowplow/configs/snowplow-elasticsearch-sink-bad.hocon"
cmd="java -jar $dir/snowplow-elasticsearch-loader-http-0.10.0-rc1.jar --config /home/ubuntu/snowplow/configs/snowplow-es-loader-bad.hocon"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 0.10.1 now

@@ -10,13 +10,13 @@
### END INIT INFO

dir="/home/ubuntu/snowplow/bin/"
cmd="./snowplow-stream-collector-0.9.0 --config /home/ubuntu/snowplow/configs/snowplow-stream-collector.hocon"
cmd="java -jar $dir/snowplow-elasticsearch-loader-http-0.10.0-rc1.jar --config /home/ubuntu/snowplow/configs/snowplow-es-loader-good.hocon"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -10,13 +10,13 @@
### END INIT INFO

dir="/home/ubuntu/snowplow/bin/"
cmd="./snowplow-stream-enrich-0.10.0 --config /home/ubuntu/snowplow/configs/snowplow-stream-enrich.hocon --resolver file:/home/ubuntu/snowplow/configs/iglu-resolver.json --enrichments file:/home/ubuntu/snowplow/configs/enrichments"
cmd="java -jar snowplow-stream-enrich-0.11.0.jar --config /home/ubuntu/snowplow/configs/snowplow-stream-enrich.hocon --resolver file:/home/ubuntu/snowplow/configs/iglu-resolver.json --enrichments file:/home/ubuntu/snowplow/configs/enrichments"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 0.11.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nsq version of stream enrich will be used in the v0.4.0, I think there is no need to change stream enrich to 0.11.1 right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well use the latest components available

@alexanderdean
Copy link
Member

Can you cross-check against the milestone @BenFradet ? It feels like this git history versus the milestone are rather divergent.

aldemirenes and others added 18 commits December 21, 2017 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants