Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.4.0 #129

Merged
merged 18 commits into from
Dec 21, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
cf896aa
Add Control Plane API call to put your own custom enrichments in Snow…
aldemirenes Aug 29, 2017
38fd09a
Add Control Plane API call to add external Iglu schema registry to Sn…
aldemirenes Oct 6, 2017
a44c941
Add Control Plane API call to add apikey for Iglu authentication (clo…
aldemirenes Oct 6, 2017
f48768c
Add Control Plane API call to change username and password for basic …
aldemirenes Oct 6, 2017
882212a
Add Control Plane API call to write domain name to Caddyfile for auto…
aldemirenes Oct 6, 2017
a9e75d8
Add Control Plane API call to return the Snowplow Mini version (close…
aldemirenes Oct 6, 2017
8df46ef
Refactor user_data.sh script to use Control Plane API (closes #113)
aldemirenes Aug 29, 2017
f641004
Create index mappings on bootstrap (closes #104)
aldemirenes Aug 29, 2017
9d3d0bd
Bump JavaScript Tracker to 2.8.2 (closes #71)
aldemirenes Aug 29, 2017
4a5f4f8
Switch to using NSQ rather than named pipes (closes #24)
aldemirenes Sep 23, 2017
66b472a
Use "wait_for" module instead of "pause" in ansible roles (closes #125)
aldemirenes Sep 5, 2017
860d60e
Ensure UI links adhere to the current protocol being used (closes #127)
aldemirenes Sep 6, 2017
0a998f0
Use Caddy instead of Nginx for serving the Snowplow Mini dashboard (c…
aldemirenes Oct 8, 2017
057047e
Update build process to build Caddy from source (closes #132)
aldemirenes Sep 22, 2017
322d970
Add libffi-dev, libssl-dev, python-dev and markupsafe as dependencies…
aldemirenes Sep 19, 2017
c770037
Upgrade pip version when launching vagrant box (closes #139)
BenFradet Dec 6, 2017
ab70ce8
Show Control Plane notifications in-page using react-alert (closes #119)
BenFradet Dec 21, 2017
7432961
Prepared for release
aldemirenes Dec 21, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
---
sudo: required
language: bash
language: go

go:
- 1.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move to go 1.9?


services:
- postgresql


before_install:
- sudo apt-get update -qq

Expand All @@ -14,7 +16,7 @@ install:
- sudo pip install ansible

before_script:
- ansible-playbook -i provisioning/inventory provisioning/with_building_ui.yml --connection=local --sudo
- ansible-playbook -i provisioning/inventory provisioning/with_building_ui_and_go_projects.yml --connection=local --sudo

script:
- ./integration/integration_test.sh
20 changes: 20 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
Version 0.4.0 (2017-12-21)
--------------------------
Add Control Plane API call to write domain name to Caddyfile for automatic TLS (#118)
Add Control Plane API call to change username and password for basic HTTP authentication (#117)
Add Control Plane API call to add apikey for Iglu authentication (#116)
Add Control Plane API call to put your own custom enrichments in Snowplow Mini (#66)
Add Control Plane API call to add external Iglu schema registry to Snowplow Mini (#62)
Add Control Plane API call to return the Snowplow Mini version (#128)
Switch to using NSQ rather than named pipes (#24)
Bump JavaScript Tracker to 2.8.2 (#71)
Ensure UI links adhere to the current protocol being used (#127)
Show Control Plane notifications in-page using react-alert (#119)
Create index mappings on bootstrap (#104)
Use Caddy instead of Nginx for serving the Snowplow Mini dashboard (#130)
Add libffi-dev, libssl-dev, python-dev and markupsafe as dependencies (#133)
Update build process to build Caddy from source (#132)
Use "wait_for" module instead of "pause" in ansible roles (#125)
Refactor user_data.sh script to use Control Plane API (#113)
Upgrade pip version when launching vagrant box (#139)

Version 0.3.0 (2017-08-30)
--------------------------
Get username and password for basic authentication via user_data.sh (#107)
Expand Down
6 changes: 3 additions & 3 deletions Packerfile.json
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
{
"variables": {
"version": "0.3.0"
"version": "0.4.0"
},

"builders": [
{
"type": "amazon-ebs",
"region": "us-east-1",
"source_ami": "ami-05dddc6f",
"instance_type": "t2.small",
"instance_type": "t2.medium",
"ssh_username": "ubuntu",
"ami_name": "snowplow-mini-{{user `version`}}-{{ timestamp }}-hvm-ebs-amd64",
"ami_groups": [ "all" ],
Expand All @@ -24,7 +24,7 @@
"provisioners": [
{
"type": "ansible",
"playbook_file": "provisioning/without_building_ui.yml"
"playbook_file": "provisioning/without_building_ui_and_go_projects.yml"
}
]
}
37 changes: 19 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,36 @@ An easily-deployable, single instance version of Snowplow that serves three use
* [x] Data is loaded into Elasticsearch
- Can be queried directly or through a Kibana dashboard
- Good and bad events are in distinct indexes
* [x] Create UI to indicate what is happening with each of the different subsystems (collector, enrich etc.), so as to provide developers a very indepth way of understanding how the different Snowplow subsystems work with one another

## Topology

Snowplow-Mini runs several distinct applications on the same box which are all linked by named pipes. In a production deployment each instance could be an Autoscaling Group and each named pipe would be a distinct Kinesis Stream.
Snowplow-Mini runs several distinct applications on the same box which are all linked by NSQ topics. In a production deployment each instance could be an Autoscaling Group and each NSQ topic would be a distinct Kinesis Stream.

* Scala Stream Collector:
- Starts server listening on `http://< sp mini public ip>/` which events can be sent to.
- Sends "good" events to the `raw-events-pipe`
- Sends "bad" events to the `bad-1-pipe`
* Stream Enrich
- Reads events in from the `raw-events-pipe`
- Sends "good" events to the `enriched-events-pipe`
- Sends "bad" events to the `bad-1-pipe`
* Elasticsearch Sink Good
- Reads events in from the `enriched-events-pipe`
- Sends the events to the "good" index of the cluster
- On failure to insert writes error to `bad-1-pipe`
* Elasticsearch Sink Bad
- Reads events in from the `bad-1-pipe`
- Sends the events to the "bad" index of the cluster

These events can then be viewed from `http://< sp mini public ip>/kibana`.
- Sends "good" events to the `RawEvents` NSQ topic
- Sends "bad" events to the `BadEvents` NSQ topic
* Stream Enrich:
- Reads events in from the `RawEvents` NSQ topic
- Sends events which passed the enrichment process to the `EnrichedEvents` NSQ topic
- Sends events which failed the enrichment process to the `BadEvents` NSQ topic
* Elasticsearch Sink Good:
- Reads events from the `EnrichedEvents` NSQ topic
- Sends those events to the `good` Elasticsearch index
- On failure to insert, writes errors to `BadElasticsearchEvents` NSQ topic
* Elasticsearch Sink Bad:
- Reads events from the `BadEvents` NSQ topic
- Sends those events to the `bad` Elasticsearch index
- On failure to insert, writes errors to `BadElasticsearchEvents` NSQ topic

These events can then be viewed in Kibana at `http://< sp mini public ip>/kibana`.

![](https://raw.githubusercontent.com/snowplow/snowplow-mini/master/utils/topology/snowplow-mini-topology.jpg)

## Roadmap

* [ ] Support loading data into Redshift. To give analysts / data teams a good idea to understand what Snowplow "does".
* [ ] Create UI to indicate what is happening with each of the different subsystems (collector, enrich etc.), so as to provide developers a very indepth way of understanding how the different Snowplow subsystems work with one another

## Documentation

Expand All @@ -69,7 +70,7 @@ limitations under the License.
[travis]: https://travis-ci.org/snowplow/snowplow-mini
[travis-image]: https://travis-ci.org/snowplow/snowplow-mini.svg?branch=master

[release-image]: http://img.shields.io/badge/release-0.3.0-blue.svg?style=flat
[release-image]: http://img.shields.io/badge/release-0.4.0-blue.svg?style=flat
[releases]: https://github.com/snowplow/snowplow-mini/releases

[license-image]: http://img.shields.io/badge/license-Apache--2-blue.svg?style=flat
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.3.0
0.4.0
2 changes: 1 addition & 1 deletion Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Vagrant.configure("2") do |config|
config.vm.network :private_network, ip: '192.168.50.50' # Uncomment to use NFS
config.vm.synced_folder '.', '/vagrant', nfs: true # Uncomment to use NFS

config.vm.network "forwarded_port", guest: 2000, host: 2000
config.vm.network "forwarded_port", guest: 80, host: 2000
config.vm.network "forwarded_port", guest: 3000, host: 3000
config.vm.network "forwarded_port", guest: 8080, host: 8080
config.vm.network "forwarded_port", guest: 9200, host: 9200
Expand Down
40 changes: 7 additions & 33 deletions integration/integration_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,21 @@

sudo service elasticsearch start
sudo service iglu_server_0.2.0 start
sudo service snowplow_stream_collector_0.9.0 start
sudo service snowplow_stream_enrich_0.10.0 start
sudo service snowplow_elasticsearch_sink_good_0.8.0 start
sudo service snowplow_elasticsearch_sink_bad_0.8.0 start
sudo service snowplow_stream_collector start
sudo service snowplow_stream_enrich start
sudo service snowplow_elasticsearch_loader_good start
sudo service snowplow_elasticsearch_loader_bad start
sudo service kibana4_init start
sudo service nginx start
sleep 15

# Send good and bad events
COUNTER=0
while [ $COUNTER -lt 10 ]; do
curl http://localhost:8080/i?e=pv
curl http://localhost:8080/i
let COUNTER=COUNTER+1
let COUNTER=COUNTER+1
done
sleep 5
sleep 60

# Assertions
good_count="$(curl --silent -XGET 'http://localhost:9200/good/good/_count' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["count"]')"
Expand All @@ -27,32 +26,7 @@ echo "Event Counts:"
echo " - Good: ${good_count}"
echo " - Bad: ${bad_count}"

stream_enrich_pid_file=/var/run/snowplow_stream_enrich_0.10.0.pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all the pid testing to check the service restarts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are moved to control plane test suite under the control plane folder.

stream_collector_pid_file=/var/run/snowplow_stream_collector_0.9.0.pid
sink_bad_pid_file=/var/run/snowplow_elasticsearch_sink_bad_0.8.0-2x.pid
sink_good_pid_file=/var/run/snowplow_elasticsearch_sink_good_0.8.0-2x.pid


stream_enrich_pid_old="$(cat "${stream_enrich_pid_file}")"
stream_collector_pid_old="$(cat "${stream_collector_pid_file}")"
sink_bad_pid_old="$(cat "${sink_bad_pid_file}")"
sink_good_pid_old="$(cat "${sink_good_pid_file}")"

req_result=$(curl --silent -XPUT 'http://localhost:10000/restart-services')

stream_enrich_pid_new="$(cat "${stream_enrich_pid_file}")"
stream_collector_pid_new="$(cat "${stream_collector_pid_file}")"
sink_bad_pid_new="$(cat "${sink_bad_pid_file}")"
sink_good_pid_new="$(cat "${sink_good_pid_file}")"

# Bad Count is 11 due to bad logging
if [[ "${good_count}" -eq "10" ]] && [[ "${bad_count}" -eq "11" ]] &&
[[ "${req_result}" == "OK" ]] &&
[[ "${stream_enrich_pid_old}" -ne "${stream_enrich_pid_new}" ]] &&
[[ "${stream_collector_pid_old}" -ne "${stream_collector_pid_new}" ]] &&
[[ "${sink_bad_pid_old}" -ne "${sink_bad_pid_new}" ]] &&
[[ "${sink_good_pid_old}" -ne "${sink_good_pid_new}" ]]; then

if [[ "${good_count}" -eq "10" ]] && [[ "${bad_count}" -eq "10" ]]; then
exit 0
else
exit 1
Expand Down
7 changes: 6 additions & 1 deletion provisioning/resources/configs/Caddyfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
localhost:2000 {
*:80 {
tls off
basicauth "USERNAME_PLACEHOLDER" PASSWORD_PLACEHOLDER {
/home
/kibana
Expand Down Expand Up @@ -34,3 +35,7 @@ localhost:2000 {
without /control-plane
}
}

*:3000 {
root /home/ubuntu/snowplow/ui
}
28 changes: 28 additions & 0 deletions provisioning/resources/configs/control-plane-api.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
version_file_path = "/home/ubuntu/snowplow/VERSION"

# for getting IP address of the running EC2 instance
# for more information visit:
# http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
EC2_meta_service_url = "http://169.254.169.254/latest/meta-data/public-ipv4"

[directories]
enrichments = "/home/ubuntu/snowplow/configs/enrichments" #directory which all the enrichments file are in
config = "/home/ubuntu/snowplow/configs" #directory which all the configs are in

[config_file_names]
caddy = "Caddyfile"
iglu_resolver = "iglu-resolver.json"

[init_scripts]
stream_collector = "snowplow_stream_collector"
stream_enrich = "snowplow_stream_enrich"
es_loader_good = "snowplow_elasticsearch_loader_good"
es_loader_bad = "snowplow_elasticsearch_loader_bad"
iglu = "iglu_server_0.2.0"
caddy = "caddy_init"

[PSQL]
user = "snowplow"
password = "snowplow"
database = "iglu"
adddress = "127.0.0.1:5432"
107 changes: 0 additions & 107 deletions provisioning/resources/configs/snowplow-elasticsearch-sink-bad.hocon

This file was deleted.

Loading