Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Tap google analytics doc styles #430

Merged
merged 2 commits into from
May 29, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[DOC] RST styles for tap-google-analytics
  • Loading branch information
koszti committed May 29, 2020
commit deac8d9d102df80bf06d8d413cf39c524bf63689
40 changes: 21 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,25 +39,27 @@ Documentation is available at https://transferwise.github.io/pipelinewise/
Tap extracts data from any source and write it to a standard stream in a JSON-based format, and target
consumes data from taps and do something with it, like load it into a file, API or database

| Type | Name | Latest Version | Description |
|-----------|------------|----------------|------------------------------------------------------|
| Tap | **[Postgres](https://github.com/transferwise/pipelinewise-tap-postgres)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-postgres.svg)](https://badge.fury.io/py/pipelinewise-tap-postgres) | Extracts data from PostgreSQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[MySQL](https://github.com/transferwise/pipelinewise-tap-mysql)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql) | Extracts data from MySQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[Oracle](https://github.com/transferwise/pipelinewise-tap-oracle)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-oracle.svg)](https://badge.fury.io/py/pipelinewise-tap-oracle) | Extracts data from Oracle databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[Kafka](https://github.com/transferwise/pipelinewise-tap-kafka)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-kafka.svg)](https://badge.fury.io/py/pipelinewise-tap-kafka) | Extracts data from Kafka topics |
| Tap | **[AdWords](https://github.com/singer-io/tap-adwords)** | [![PyPI version](https://badge.fury.io/py/tap-adwords.svg)](https://badge.fury.io/py/tap-adwords) | Extracts data Google Ads API (former Google Adwords) using OAuth and support incremental loading based on input state |
| Tap | **[S3 CSV](https://github.com/transferwise/pipelinewise-tap-s3-csv)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-s3-csv.svg)](https://badge.fury.io/py/pipelinewise-tap-s3-csv) | Extracts data from S3 csv files (currently a fork of tap-s3-csv because we wanted to use our own auth method) |
| Tap | **[Zendesk](https://github.com/singer-io/tap-zendesk)** | [![PyPI version](https://badge.fury.io/py/tap-zendesk.svg)](https://badge.fury.io/py/tap-zendesk) | Extracts data from Zendesk using OAuth and Key-Based incremental replications |
| Tap | **[Snowflake](https://github.com/transferwise/pipelinewise-tap-snowflake)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-snowflake.svg)](https://badge.fury.io/py/pipelinewise-tap-snowflake) | Extracts data from Snowflake databases. Supporting Key-Based Incremental and Full Table replications |
| Tap | **[Salesforce](https://github.com/singer-io/tap-salesforce)** | [![PyPI version](https://badge.fury.io/py/tap-salesforce.svg)](https://badge.fury.io/py/tap-salesforce) | Extracts data from Salesforce database using BULK and REST extraction API with Key-Based incremental replications |
| Tap | **[Zuora](https://github.com/transferwise/pipelinewise-tap-zuora)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-zuora.svg)](https://badge.fury.io/py/pipelinewise-tap-zuora) | Extracts data from Zuora database using AQAA and REST extraction API with Key-Based incremental replications |
| Tap | **[Jira](https://github.com/singer-io/tap-jira)** | [![PyPI version](https://badge.fury.io/py/tap-jira.svg)](https://badge.fury.io/py/tap-jira) | Extracts data from Atlassian Jira using Base auth or OAuth credentials |
| Target | **[Postgres](https://github.com/transferwise/pipelinewise-target-postgres)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-postgres.svg)](https://badge.fury.io/py/pipelinewise-target-postgres) | Loads data from any tap into PostgreSQL database |
| Target | **[Redshift](https://github.com/transferwise/pipelinewise-target-redshift)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-redshift.svg)](https://badge.fury.io/py/pipelinewise-target-redshift) | Loads data from any tap into Amazon Redshift Data Warehouse |
| Target | **[Snowflake](https://github.com/transferwise/pipelinewise-target-snowflake)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-snowflake.svg)](https://badge.fury.io/py/pipelinewise-target-snowflake) | Loads data from any tap into Snowflake Data Warehouse |
| Target | **[S3 CSV](https://github.com/transferwise/pipelinewise-target-s3-csv)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-s3-csv.svg)](https://badge.fury.io/py/pipelinewise-target-s3-csv) | Uploads data from any tap to S3 in CSV format |
| Transform | **[Field](https://github.com/transferwise/pipelinewise-transform-field)** | [![PyPI version](https://badge.fury.io/py/pipelinewise-transform-field.svg)](https://badge.fury.io/py/pipelinewise-transform-field) | Transforms fields from any tap and sends the results to any target. Recommended for data masking/ obfuscation |

| Type | Name | Extra | Latest Version | Description |
|-----------|------------|-------|----------------|------------------------------------------------------|
| Tap | **[Postgres](https://github.com/transferwise/pipelinewise-tap-postgres)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-postgres.svg)](https://badge.fury.io/py/pipelinewise-tap-postgres) | Extracts data from PostgreSQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[MySQL](https://github.com/transferwise/pipelinewise-tap-mysql)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-mysql.svg)](https://badge.fury.io/py/pipelinewise-tap-mysql) | Extracts data from MySQL databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[Kafka](https://github.com/transferwise/pipelinewise-tap-kafka)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-kafka.svg)](https://badge.fury.io/py/pipelinewise-tap-kafka) | Extracts data from Kafka topics |
| Tap | **[S3 CSV](https://github.com/transferwise/pipelinewise-tap-s3-csv)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-s3-csv.svg)](https://badge.fury.io/py/pipelinewise-tap-s3-csv) | Extracts data from S3 csv files (currently a fork of tap-s3-csv because we wanted to use our own auth method) |
| Tap | **[Zendesk](https://github.com/singer-io/tap-zendesk)** | | [![PyPI version](https://badge.fury.io/py/tap-zendesk.svg)](https://badge.fury.io/py/tap-zendesk) | Extracts data from Zendesk using OAuth and Key-Based incremental replications |
| Tap | **[Snowflake](https://github.com/transferwise/pipelinewise-tap-snowflake)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-snowflake.svg)](https://badge.fury.io/py/pipelinewise-tap-snowflake) | Extracts data from Snowflake databases. Supporting Key-Based Incremental and Full Table replications |
| Tap | **[Salesforce](https://github.com/singer-io/tap-salesforce)** | | [![PyPI version](https://badge.fury.io/py/tap-salesforce.svg)](https://badge.fury.io/py/tap-salesforce) | Extracts data from Salesforce database using BULK and REST extraction API with Key-Based incremental replications |
| Tap | **[Jira](https://github.com/singer-io/tap-jira)** | | [![PyPI version](https://badge.fury.io/py/tap-jira.svg)](https://badge.fury.io/py/tap-jira) | Extracts data from Atlassian Jira using Base auth or OAuth credentials |
| Tap | **[AdWords](https://github.com/singer-io/tap-adwords)** | Extra | [![PyPI version](https://badge.fury.io/py/tap-adwords.svg)](https://badge.fury.io/py/tap-adwords) | Extracts data Google Ads API (former Google Adwords) using OAuth and support incremental loading based on input state |
| Tap | **[Google Analytics](https://github.com/transferwise/pipelinewise-tap-google-analytics)** | Extra | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-google-analytics.svg)](https://badge.fury.io/py/tap-adwords) | Extracts data from Google Analytics |
| Tap | **[Oracle](https://github.com/transferwise/pipelinewise-tap-oracle)** | Extra | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-oracle.svg)](https://badge.fury.io/py/pipelinewise-tap-oracle) | Extracts data from Oracle databases. Supporting Log-Based, Key-Based Incremental and Full Table replications |
| Tap | **[Zuora](https://github.com/transferwise/pipelinewise-tap-zuora)** | Extra | [![PyPI version](https://badge.fury.io/py/pipelinewise-tap-zuora.svg)](https://badge.fury.io/py/pipelinewise-tap-zuora) | Extracts data from Zuora database using AQAA and REST extraction API with Key-Based incremental replications |
| Target | **[Postgres](https://github.com/transferwise/pipelinewise-target-postgres)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-postgres.svg)](https://badge.fury.io/py/pipelinewise-target-postgres) | Loads data from any tap into PostgreSQL database |
| Target | **[Redshift](https://github.com/transferwise/pipelinewise-target-redshift)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-redshift.svg)](https://badge.fury.io/py/pipelinewise-target-redshift) | Loads data from any tap into Amazon Redshift Data Warehouse |
| Target | **[Snowflake](https://github.com/transferwise/pipelinewise-target-snowflake)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-snowflake.svg)](https://badge.fury.io/py/pipelinewise-target-snowflake) | Loads data from any tap into Snowflake Data Warehouse |
| Target | **[S3 CSV](https://github.com/transferwise/pipelinewise-target-s3-csv)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-target-s3-csv.svg)](https://badge.fury.io/py/pipelinewise-target-s3-csv) | Uploads data from any tap to S3 in CSV format |
| Transform | **[Field](https://github.com/transferwise/pipelinewise-transform-field)** | | [![PyPI version](https://badge.fury.io/py/pipelinewise-transform-field.svg)](https://badge.fury.io/py/pipelinewise-transform-field) | Transforms fields from any tap and sends the results to any target. Recommended for data masking/ obfuscation |

**Note**: Extra connectors are experimental connectors and written by community contributors. These connectors are not maintained regularly and not installed by default. To install the extra packages use the `--connectors=all` option when installing PipelineWise.

### Running from docker

Expand Down
2 changes: 1 addition & 1 deletion docs/connectors/taps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ PipelineWise can replicate data from the following data sources:
.. container:: img-hover-zoom

.. image:: ../img/google-analytics-logo.png
:target: taps/google-analytics.html
:target: taps/google_analytics.html

:ref:`tap-google-analytics`

Expand Down
22 changes: 12 additions & 10 deletions docs/connectors/taps/google_analytics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,23 @@ PipelineWise configures every tap with a common structured YAML file format.
A sample YAML for Google Analytics replication can be generated into a project directory by
following the steps in the :ref:`generating_pipelines` section.

### Authorization Methods
Authorization Methods
'''''''''''''''''''''

`tap-google-analytics` supports two different ways of authorization:
- Service account based authorization, where an administrator manually creates a service account with the appropriate permissions to view the account, property, and view you wish to fetch data from
- OAuth `access_token` based authorization, where this tap gets called with a valid `access_token` and `refresh_token` produced by an OAuth flow conducted in a different system.
``tap-google-analytics`` supports two different ways of authorization:

If you're setting up `tap-google-analytics` for your own organization and only plan to extract from a handful of different views in the same limited set of properties, Service Account based authorization is the simplest. When you create a service account Google gives you a json file with that service account's credentials called the `client_secrets.json`, and that's all you need to pass to this tap, and you only have to do it once, so this is the recommended way of configuring `tap-google-analytics`.
- **Service account based authorization**, where an administrator manually creates a service account with the appropriate permissions to view the account, property, and view you wish to fetch data from
- **OAuth** ``access_token`` **based authorization**, where this tap gets called with a valid ``access_token`` and ``refresh_token`` produced by an OAuth flow conducted in a different system.

If you're building something where a wide variety of users need to be able to give access to their Google Analytics, `tap-google-analytics` can use an `access_token` granted by those users to authorize it's requests to Google. This `access_token` is produced by a normal Google OAuth flow, but this flow is outside the scope of `tap-google-analytics`. This is useful if you're integrating `tap-google-analytics` with another system, like Stitch Data might do to allow users to configure their extracts themselves without manual config setup. This tap expects an `access_token`, `refresh_token`, `client_id` and `client_secret` to be passed to it in order to authenticate as the user who granted the token and then access their data.
If you're setting up ``tap-google-analytics`` for your own organization and only plan to extract from a handful of different views in the same limited set of properties, Service Account based authorization is the simplest. When you create a service account Google gives you a json file with that service account's credentials called the ``client_secrets.json``, and that's all you need to pass to this tap, and you only have to do it once, so this is the recommended way of configuring ``tap-google-analytics``.

### Note
If you're building something where a wide variety of users need to be able to give access to their Google Analytics, ``tap-google-analytics`` can use an ``access_token`` granted by those users to authorize it's requests to Google. This ``access_token`` is produced by a normal Google OAuth flow, but this flow is outside the scope of ``tap-google-analytics``. This is useful if you're integrating ``tap-google-analytics`` with another system, like Stitch Data might do to allow users to configure their extracts themselves without manual config setup. This tap expects an ``access_token``, ``refresh_token``, ``client_id`` and ``client_secret`` to be passed to it in order to authenticate as the user who granted the token and then access their data.

- This tap does not currently use any STATE information for incrementally extracting data. This is currently mitigated by allowing for chunked runs using [start_date, end_date), but we should definitely add support for using STATE messages.
.. warning::

The difficulty on that front is on dynamically deciding which attributes to use for capturing state for ad-hoc reports that do not include the `ga:date` dimension or other combinations of Time Dimensions.
This tap does not currently use any ``STATE`` information for incrementally extracting data. This is currently mitigated by allowing for chunked runs using ``[start_date, end_date]``, but we should definitely add support for using ``STATE`` messages.

The difficulty on that front is on dynamically deciding which attributes to use for capturing state for ad-hoc reports that do not include the `ga:date` dimension or other combinations of Time Dimensions.

Example YAML for ``tap-google-analytics``:

Expand Down Expand Up @@ -83,7 +85,7 @@ Example YAML for ``tap-google-analytics``:
# - grp_stats

# List of Google Analytics tables to replicate into destination Data Warehouse
# Tap-Google-Analytics will use the best incremental strategies automatically to replicate data
# List of available tables available at https://github.com/transferwise/pipelinewise-tap-google-analytics/blob/master/tap_google_analytics/defaults/default_report_definition.json
tables:

# Tables replicated incrementally
Expand Down