When we turn on Open Data for you, we create a special Google Cloud Platform project that provides secure access to your data stored in Emarsys. All Open Data Google Cloud Platform Project names follow the ems-od-<customer>
naming convention*.
An Open Data project can connect to multiple Emarsys accounts. This way Open Data can be assigned to multiple accounts with one-to-many relation. Open Data postfixes all resources with the corresponding Emarsys account’s ID to make the connections explicit.
Notes:
*Since April 30, 2021, the naming convention for new projects has changed to: sap-od-<customer>
. Please note that the naming convention has not changed for projects started before April 30, 2021.
We are constantly making new data views available in Open Data. To see the views you can currently work with, click here.
How to fix the "Access Denied: Table ems-data-platform:<customer>
: User does not have permission to query table ems-data-platform:<customer>
." issue:
You can resolve this issue by using the ems-od-<customer>
or sap-od-<customer>
project names instead.
Managing access to Open Data
When we enable Open Data for you, we also automatically create a Google group for your organization. The Google account you provided at the beginning has the Manager role of the group by default and allows you to manage user access to your BigQuery datasets.
A group may not be the manager of another group.
Anyone you add to this Open Data access-group will be able to log in to Google Cloud to run queries and export data.
Your users must have a Google account to log in to BigQuery. So make sure to use email addresses associated with a Google account when you add people to your Google group!
This way, you can grant access to your datasets either to people in your organization or to external consultants.
You won’t have access to the Google Cloud Platform Identity and Access Manager pages other than the Service account administrator page because of security reasons.
Granting access
To add people to your Open Data access-group, proceed as follows:
- Sign in to Google Groups.
- Click All groups or My groups if you are a group admin.
Group admin view:
4. Under the name of the Open Data access-group you want to manage, click Members.
If you are responsible for the marketing of several different brands or subsidiaries, you may see more groups associated with your organization on this page.
5. Click Add members in the menu on the left side.
6. Fill the form and click on the Add members button.
Group invitations sometimes fail to be sent out, therefore you should use the Add member functionality only. If you accidentally added a user in a different way, you should go to the Pending members menu, choose the user and click on the Revoke invitation button. You should be able to directly add the member after you removed the pending invite.
All users added to your Open Data access-group will inherit the following permissions:
- bigquery.jobs.create
- iam.serviceAccountKeys.create
- iam.serviceAccountKeys.delete
- iam.serviceAccountKeys.get
- iam.serviceAccountKeys.list
- iam.serviceAccounts.create
- iam.serviceAccounts.delete
- iam.serviceAccounts.get
- iam.serviceAccounts.getIamPolicy
- iam.serviceAccounts.list
- iam.serviceAccounts.setIamPolicy
- iam.serviceAccounts.update
- resourcemanager.projects.get
- resourcemanager.projects.list
- serviceusage.services.list
- storage.buckets.list
- bigquery.readsessions.*
- bigquery.savedqueries.get
- bigquery.savedqueries.list
- pubsub.subscriptions.consume
- pubsub.subscriptions.create
- pubsub.subscriptions.delete
- pubsub.subscriptions.get
- pubsub.subscriptions.list
- pubsub.subscriptions.update
To learn more about Google Cloud Platform roles, visit Google’s support pages.
Revoke access
To remove anyone from your Open Data access-group, proceed as follows:
- Sign in to Google Groups.
- Click My Groups.
- Select Switch organization view to: emarsys.com.
- Under the name of the Open Data access-group you want to manage, click Manage members.
- Choose the member you want to remove.
- Click Actions > Remove from group
To learn more about how you can manage your Google group, visit Google’s support pages.
Granting manager access
In certain cases you might need to provide a group manager role for another user, so they can manage access to your Open Data access-group too. To grant the manager role for someone, proceed as follows:
- Sign in to Google Groups with a Google account that has the Manager role already.
- Click My Groups.
- Select Switch organization view to: emarsys.com.
- Under the name of the Open Data access-group you want to manage, click Manage members.
- Choose the member you want to provide the Manager role.
- Click Actions > Add to role > Manager.
A group may not be the manager of another group.
Revoking manager access
- Sign in to Google Groups with a Google account that has the Manager role already.
- Click My Groups.
- Select Switch organization view to: emarsys.com.
- Under the name of the Open Data access-group you want to manage, click Manage members.
- Choose the member you want to revoke the Manager role from.
Click Actions > Remove from role > Manager
Creating API key
You can access your Open Data project via API by using a Service account we prepare during the Open Data project setup. To create an access key, proceed as follows:
- Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active account in the top right corner.
- Select your Open Data project. The name of your project always follows the
ems-od-<customer>
naming convention. - Open the menu, hover over Identity, and select Service accounts.
You don’t have access to the Google Cloud Identity and Access Manager pages other than the Service account administrator page because of security reasons.
4. Find the Service account that has a name like this: client-service-account@ems-od-<customer>.iam.gserviceaccount.com
.
5. Click Actions > Create key.
6. Choose the desired Key type and click on the Create button.
Creating a Service Account
In case you want to separate the programmatic access to your Google Cloud Platform project, you might want to create multiple Service Accounts. To create a new Service Account, proceed as follows:
- Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active account in the top right corner.
- Select your Open Data project. The name of your project always follows the
ems-od-<customer>
naming convention. - Open the menu, hover over IAM & Admin and select Service accounts.
You don’t have access to the Google Cloud IAM pages other than the Service account administrator page because of security reasons.
4. Click the Create Service Account button.
4. Fill the form and click on the Create button.
5. To provide the necessary access for this freshly created Service Account, you must add it to your Open Data access-group: copy its address and follow the steps in the Granting Access section.
Rotating your Service Account keys
You need to rotate your Service Account keys every 90 days to keep your access as secure as possible.
If your Open Data Service Account credentials need changing, for example they are more than 90 days old, the Notification Center sends you a message. This message includes a link that takes you to the Service Account dashboard.
Follow these steps to update your credentials:
- Click your Service Account.
- In the Keys tab click Add Key.
- Select your format (JSON or p12) and click Create to download the new key.
- Replace the old key with the new one wherever you are using it.
- Test the new key to verify it works correctly.
- Delete the old key from the Keys tab of the Service Account.
The name of the file contains the beginning of the key ID. It makes it easier for you to know when the key was created, and allows you to manage the service key updates without relying on notifications.
Accessing an Open Data project from another Google Cloud Platform project
In case you store data in another Google Cloud Platform project too, you might want to connect the 2 to make it possible to leverage data from both projects. To grant access to your Open Data project from your Google Cloud Platform project, proceed as follows:
- Open the Google Cloud Console and open the project you want to grant access to.
- Navigate to Identity and select Service accounts.
- Copy a Service account’s address.
- Add this account’s address to your Open Data access-group in the way you grant access to a simple user.
Open Data datasets store data in the EU. If your data is stored in a different location, you might need to copy the dataset to a different region. To learn more about BigQuery locations, visit Google’s Support pages.
Interacting with Open Data
Opening my Open Data project
1. Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active Google account in the top right corner.
2. Open the project selector in the top left corner.
Available datastores
We prepare 2 types of datasets and a storage bucket when we prepare your Open Data project. New dataset creation isn’t allowed in your Open Data project.
Emarsys account-level datasets
The datasets provided by Emarsys always follow the emarsys_<customer>_<suite account ID>
naming convention. They provide access to the data coming from an Emarsys account. In case you have multiple Emarsys accounts, we’ll generate a dataset for each, thus you can easily distinguish from which account your data is coming from.
The BigQuery views become visible only if the underlying tables store data.
E.g. If you don’t have a working mobile integration, the push views won’t be visible in your project.
We are constantly making new data views available in Open Data. To see the views you can currently work with, click here.
The views in this dataset cannot be altered and use the Standard SQL dialect only. If you are not familiar with the Standard SQL syntax or if you want to learn more about it, please read the BigQuery Standard SQL Reference documentation.
Legacy SQL queries are not supported in Open Data. A view that uses Standard SQL dialect cannot be queried by a query that uses the Legacy SQL dialect.
Editable dataset
Open Data also supports editable datasets which give users full access to them so that changes can be made, or even additional data can be uploaded. This is useful if you want to enrich your data by adding additional meta-data, or if you need to export large data sets.
Editable datasets are owned and maintained by you, the customer, which means that you are responsible for providing and managing access to this data. As these datasets exist outside of the Emarsys Data Protection features, it also means that you have to own all data related topics including data expiration, deletion, rectification, and exports. Because of this Emarsys is not liable for any incident relating to editable datasets as it is your responsibility to use the Emarsys services in an appropriate manner.
Cloud Storage Bucket
To make large data imports/exports possible from Open Data we prepare a Cloud Storage Bucket during Open Data project setup. These buckets always follow the client_bucket_<customer>
naming convention. New bucket creation isn’t allowed in your Open Data project. To browse your bucket from the UI, proceed as follows:
- Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active account in the top right corner.
- Open your Open Data project. You can learn how to do that if you go to the Opening my Open Data project section
- Open the menu, hover over Storage and select Browser.
- Click on the
client_bucket_<customer>
bucket.
Exporting data
There are two ways to run queries and export data from your Open Data datasets: manually on the BigQuery web UI or automatically using the BigQuery API.
Exporting data manually
To run queries and export data using the BigQuery web UI, proceed as follows:
- Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active account in the top right corner.
- Select your Open data project.
- Open BigQuery from the menu.
- Click the name of the view you want to work with.
- Click the Query view button and compose your query.
- Click the Run button to execute your query.
- Download the results by clicking on the Save results button.
If your query result set has fewer than 16,000 rows and is smaller than 10 MB in size, you can download it as a CSV or newline-delimited JSON file or save it to Google Sheets. Otherwise, you can only save it as a table into the editable_dataset and export it from there. To learn more about how to export large result sets, check out the Exporting large result sets section.
Exporting data via the API
To make programmatically data fetching possible from Open Data we prepared a Service Account. You can generate a new API key for your Service Account by following the steps in Creating API key section.
By default, we provide the possibility to use Big Query API with Open Data to make direct HTTP requests to BigQuery. Google provides client library code for all their Cloud APIs that make it easier to access them from your favorite languages. You can access the detailed REST documentation here or the available client libraries here.
Exporting large result sets
In case you want to export result sets larger than 16000 rows or 10 MB you’ll need to save the data into the editable_dataset
and then export it into a bucket before downloading. To export large result sets, proceed as follows:
- Follow the step 1-6. in the Exporting data manually section.
- Click Save results and choose the BigQuery table.
- Choose
editable_dataset
as target and specify Table name. - Click Save and wait until the export is ready.
- Choose the table on the left pane under
editable_dataset
. - Click Export and choose Export to GCS.
- Configure your export, click on the Export button, and wait until the export is ready.
- To browse your exports follow the steps in the Cloud Storage Bucket section.
To learn more about BigQuery exports, visit Google’s support pages.
Optimizing queries
You can exceed your Open Data limit rapidly if you use ineffective queries. To control your costs, we recommend following Google’s BigQuery best practices for controlling costs. As general advice, it is suggested to always filter on partitiontime and selecting the required fields only. Applying the limit clause in your query has no effect on how much data is scanned.
E.g. You want to query which contacts opened a specific campaign on 2019-04-20:
SELECT
contact_id
FROM
`ems-od-<customer>.emarsys_<customer>_123456.email_opens_123456`
WHERE
DATE(partitiontime) >= "2019-04-19"
AND DATE(partitiontime) <= "2019-04-21"
AND EXTRACT(DATE FROM event_time) = "2019-04-20"
AND campaign_id = <campaign ID>
E.g. You want to query which campaigns were sent today:
SELECT
DISTINCT campaign_id
FROM
`ems-od-<customer>.emarsys_<customer>_123456.email_sends_123456`
WHERE
DATE(partitiontime) = CURRENT_DATE() OR
DATE(partitiontime) = DATE(NULL)
-- Scan data that is currently in the Streaming Buffer.
-- To learn more about the Streaming buffer,
-- visit Google’s documentation.
It is possible to check how much data is processed by a query if you issue a dry run. To learn more about issuing a query dry run, visit Google’s support pages.
Filtering queries for latest results
Open Data tables contain historical data. Most queries result in lengthy list of records. Therefore, its often worth to limit the results already in the query.
Here is an example:
Let's say that we would like to get a list of campaign IDs and names.
If we query the sap-od-[customer_ID].emarsys_[customer_ID].email_campaigns_v2_[customer_ID]
table, then we will get all the names of all the campaigns including current names and previous names (in case the campaign was renamed).
But if we are only interested in the current names, then the following example returns the most recent name of each campaign_id
and does not return any previous names of the campaigns.
WITH
campaigns AS (
SELECT
campaign_id,
name,
ROW_NUMBER() OVER (PARTITION BY campaign_id ORDER BY event_time DESC, loaded_at DESC) AS row_num
FROM
'sap-od-[customer_ID].emarsys_[customer_ID].email_campaigns_v2_[customer_ID]' ),
latest_campaigns AS (
SELECT
campaign_id,
name
FROM
campaigns
WHERE
row_num = 1 )
select * from campaigns;
Importing data
Importing data manually
It is possible to enrich your data stored in Open Data (e.g. by adding additional meta-information about your campaigns etc.) by importing the additional information into the editable_dataset
. To learn more about how to import data into your editable_dataset
, visit Google’s support pages.
Importing data via the API
To make programmatically data uploading possible from Open Data we prepared a Service Account. You can generate a new API key for your Service Account by following the steps in Creating API key section.
While you can use Google Cloud APIs by making direct HTTP requests to BigQuery, Google provides client library code for all their Cloud APIs that make it easier to access them from your favorite languages. You can access the detailed REST documentation here or the available client libraries here.
Importing large datasets
In case you want to load files bigger than 10 MB or containing more than 16,000 rows, you’ll need to import the data into the a Cloud Storage Bucket first and then import into the editable_dataset. To learn more about loading data from Cloud Storage, visit Google’s support pages.
- Type your project name in the search field. All Open Data Google Cloud Platform Project names follow the
ems-od-<customer>
convention. Sometimes the project name isn't visible at first, so we suggest using always the search functionality.
- Click on your project’s name to open it.
To learn more about Google Cloud Platform projects, visit Google’s support pages.
Materialized views
You can create materialized views in Google BigQuery, which are especially useful if there is a certain set of data from table(s) that is read frequently but does not change often.
Sample query:
CREATE MATERIALIZED VIEW `ems-od-supporttest.editable_dataset.just_test_materialized_2` AS
SELECT campaign_id, name, event_time
FROM `ems-od-supporttest.editable_dataset.materialized_test_support`
GROUP BY 1,2,3
Explanation for the sample:
-
ems-od-supporttest.editable_dataset.just_test_materialized_2
Name of the materialized view to create. Do not use an existing table name. -
ems-od-supporttest.editable_dataset.materialized_test_support
The table that is the source of the data for the materialized view.
You cannot create materialzed view from a view, only native tables can be the source.