Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Google Analytics tap #426

Merged
merged 1 commit into from
May 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/connectors/taps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,15 @@ PipelineWise can replicate data from the following data sources:
:target: taps/zuora.html

:ref:`tap-zuora`

.. container:: tile

.. container:: img-hover-zoom

.. image:: ../img/google-analytics-logo.png
:target: taps/google-analytics.html

:ref:`tap-google-analytics`


Configuring taps
Expand All @@ -134,4 +143,4 @@ Configuring taps
taps/zendesk
taps/jira
taps/zuora

taps/google_analytics
97 changes: 97 additions & 0 deletions docs/connectors/taps/google_analytics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@

.. _tap-google-analytics:

Tap Google Analytics
-----------


Configuring what to replicate
'''''''''''''''''''''''''''''

PipelineWise configures every tap with a common structured YAML file format.
A sample YAML for Google Analytics replication can be generated into a project directory by
following the steps in the :ref:`generating_pipelines` section.

### Authorization Methods

`tap-google-analytics` supports two different ways of authorization:
- Service account based authorization, where an administrator manually creates a service account with the appropriate permissions to view the account, property, and view you wish to fetch data from
- OAuth `access_token` based authorization, where this tap gets called with a valid `access_token` and `refresh_token` produced by an OAuth flow conducted in a different system.

If you're setting up `tap-google-analytics` for your own organization and only plan to extract from a handful of different views in the same limited set of properties, Service Account based authorization is the simplest. When you create a service account Google gives you a json file with that service account's credentials called the `client_secrets.json`, and that's all you need to pass to this tap, and you only have to do it once, so this is the recommended way of configuring `tap-google-analytics`.

If you're building something where a wide variety of users need to be able to give access to their Google Analytics, `tap-google-analytics` can use an `access_token` granted by those users to authorize it's requests to Google. This `access_token` is produced by a normal Google OAuth flow, but this flow is outside the scope of `tap-google-analytics`. This is useful if you're integrating `tap-google-analytics` with another system, like Stitch Data might do to allow users to configure their extracts themselves without manual config setup. This tap expects an `access_token`, `refresh_token`, `client_id` and `client_secret` to be passed to it in order to authenticate as the user who granted the token and then access their data.

### Note

- This tap does not currently use any STATE information for incrementally extracting data. This is currently mitigated by allowing for chunked runs using [start_date, end_date), but we should definitely add support for using STATE messages.

The difficulty on that front is on dynamically deciding which attributes to use for capturing state for ad-hoc reports that do not include the `ga:date` dimension or other combinations of Time Dimensions.

Example YAML for ``tap-google-analytics``:

.. code-block:: bash

---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "google_analytics_sample" # Unique identifier of the tap
name: "Google Analytics" # Name of the tap
type: "tap-google-analytics" # !! THIS SHOULD NOT CHANGE !!
owner: "somebody@foo.com" # Data owner to contact


# ------------------------------------------------------------------------------
# Source (Tap) - Google Analytics connection details
# ------------------------------------------------------------------------------
db_conn:
view_id: "<view-id>"
start_date: "2010-01-01" # specifies the date at which the tap will begin pulling data

# OAuth authentication
oauth_credentials:
client_id: "<client-id>"
client_secret: "<oauth-client-id>" # Plain string or vault encrypted
access_token: "<access-token>" # Plain string or vault encrypted
refresh_token: "<refresh-token>" # Plain string or vault encrypted

# Service account based authorization
# key_file_location: "full-path-to-client_secrets.json"


# ------------------------------------------------------------------------------
# Destination (Target) - Target properties
# Connection details should be in the relevant target YAML file
# ------------------------------------------------------------------------------
target: "snowflake" # ID of the target connector where the data will be loaded
batch_size_rows: 20000 # Batch size for the stream to optimise load performance
default_target_schema: "google-analytic" # Target schema where the data will be loaded
#default_target_schema_select_permission: # Optional: Grant SELECT on schema and tables that created
# - grp_power


# ------------------------------------------------------------------------------
# Source to target Schema mapping
# ------------------------------------------------------------------------------
schemas:

- source_schema: "google-analytics" # This is mandatory, but can be anything in this tap type
target_schema: "google-analytics" # Target schema in the destination Data Warehouse
#target_schema_select_permissions: # Optional: Grant SELECT on schema and tables that created
# - grp_stats

# List of Google Analytics tables to replicate into destination Data Warehouse
# Tap-Google-Analytics will use the best incremental strategies automatically to replicate data
tables:

# Tables replicated incrementally
- table_name: "website_overview"
- table_name: "traffic_sources"
- table_name: "monthly_active_users"

# OPTIONAL: Load time transformations - you can add it to any table
#transformations:
# - column: "some_column_to_transform" # Column to transform
# type: "SET-NULL" # Transformation type
Binary file added docs/img/google-analytics-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ EXTRA_CONNECTORS=(
tap-adwords
tap-oracle
tap-zuora
tap-google-analytics
)

# Install only the default connectors if --connectors argument not passed
Expand Down
63 changes: 63 additions & 0 deletions pipelinewise/cli/samples/tap_google_analytics.yml.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---

# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "google_analytics_sample" # Unique identifier of the tap
name: "Google Analytics" # Name of the tap
type: "tap-google-analytics" # !! THIS SHOULD NOT CHANGE !!
owner: "somebody@foo.com" # Data owner to contact


# ------------------------------------------------------------------------------
# Source (Tap) - Google Analytics connection details
# ------------------------------------------------------------------------------
db_conn:
view_id: "<view-id>"
start_date: "2010-01-01" # specifies the date at which the tap will begin pulling data

# OAuth authentication
oauth_credentials:
client_id: "<client-id>"
client_secret: "<oauth-client-id>" # Plain string or vault encrypted
access_token: "<access-token>" # Plain string or vault encrypted
refresh_token: "<refresh-token>" # Plain string or vault encrypted

# Service account based authorization
# key_file_location: "full-path-to-client_secrets.json"


# ------------------------------------------------------------------------------
# Destination (Target) - Target properties
# Connection details should be in the relevant target YAML file
# ------------------------------------------------------------------------------
target: "snowflake" # ID of the target connector where the data will be loaded
batch_size_rows: 20000 # Batch size for the stream to optimise load performance
default_target_schema: "google-analytic" # Target schema where the data will be loaded
#default_target_schema_select_permission: # Optional: Grant SELECT on schema and tables that created
# - grp_power


# ------------------------------------------------------------------------------
# Source to target Schema mapping
# ------------------------------------------------------------------------------
schemas:

- source_schema: "google-analytics" # This is mandatory, but can be anything in this tap type
target_schema: "google-analytics" # Target schema in the destination Data Warehouse
#target_schema_select_permissions: # Optional: Grant SELECT on schema and tables that created
# - grp_stats

# List of Google Analytics tables to replicate into destination Data Warehouse
# Tap-Google-Analytics will use the best incremental strategies automatically to replicate data
tables:

# Tables replicated incrementally
- table_name: "website_overview"
- table_name: "traffic_sources"
- table_name: "monthly_active_users"

# OPTIONAL: Load time transformations - you can add it to any table
#transformations:
# - column: "some_column_to_transform" # Column to transform
# type: "SET-NULL" # Transformation type
3 changes: 2 additions & 1 deletion pipelinewise/cli/schemas/tap.json
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,8 @@
"tap-snowflake",
"tap-salesforce",
"tap-jira",
"tap-zuora"
"tap-zuora",
"tap-google-analytics"
]
},
"db_conn": {
Expand Down
8 changes: 8 additions & 0 deletions pipelinewise/cli/tap_properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,14 @@ def get_tap_properties(tap=None, temp_dir=None):
'default_replication_method': 'LOG_BASED',
'default_data_flattening_max_level': 0
},
'tap-google-analytics': {
'tap_config_extras': {},
'tap_stream_id_pattern': '{{table_name}}',
'tap_stream_name_pattern': '{{table_name}}',
'tap_catalog_argument': '--catalog',
'default_replication_method': 'INCREMENTAL',
'default_data_flattening_max_level': 0
},
# Default values to use as a fallback method
'DEFAULT': {
'tap_config_extras': {},
Expand Down
1 change: 1 addition & 0 deletions singer-connectors/tap-google-analytics/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pipelinewise-tap-google-analytics==1.1.0