When using Open data you will be working via the Google Cloud Platform, rather than working within the Emarsys platform. The Open Data setup includes access to the following resources:
- Emarsys datasets
- Editable dataset
- Storage bucket
New dataset creation isn't allowed in your Open Data project, you can only add to the editable dataset.
Emarsys datasets
The datasets provided by Emarsys always follow the emarsys_<customer>_<suite account ID>
naming convention. They provide access to the data coming from an Emarsys account. In case you have multiple Emarsys accounts, we’ll generate a dataset for each, thus you can easily distinguish from which account your data is coming from.
The BigQuery views become visible only if the underlying tables store data. For example, if you don’t have a working mobile integration, the push views won’t be visible in your project.
We are constantly making new data views available in Open Data. To see the views you can currently work with, click here.
The views in this dataset cannot be altered and use the Standard SQL dialect only. If you are not familiar with the Standard SQL syntax or if you want to learn more about it, please read the BigQuery Standard SQL Reference documentation.
Legacy SQL queries are not supported in Open Data. A view that uses Standard SQL dialect cannot be queried by a query that uses the Legacy SQL dialect.
Editable dataset
Open Data also supports editable datasets which give users full access to them so that changes can be made, or even additional data can be uploaded. This is useful if you want to enrich your data by adding additional meta-data, or if you need to export large data sets.
Editable datasets are owned and maintained by you, the customer, which means that you are responsible for providing and managing access to this data. As these datasets exist outside of the Emarsys Data Protection features, it also means that you have to own all data related topics including data expiration, deletion, rectification, and exports. Because of this Emarsys is not liable for any incident relating to editable datasets as it is your responsibility to use the Emarsys services in an appropriate manner.
Cloud Storage Bucket
To make large data imports or exports possible from Open Data we prepare a Cloud Storage Bucket during Open Data project setup. These buckets always follow the client_bucket_<customer>
naming convention. New bucket creation isn’t allowed in your Open Data project. To browse your bucket from the UI, proceed as follows:
- Open Google Cloud Console. Make sure you use an account that was previously added to your Open Data access-group. You can double-check the active account in the top right corner.
- Open your Open Data project.
- Open the menu, hover over Storage and select Browser.
- Click the
client_bucket_<customer>
bucket.
Optimizing queries
Using ineffective, unoptimized queries can very quickly use up your Open Data utilization allocations. To make the most of your Open Data limits, you have to optimize your queries as per the Google BigQuery best practices. You should always filter on partition time and select only the necessary fields for your query. This way you make sure to only scan the relevant data.
The following example shows you how to query which contacts opened a specific campaign on 2019-04-20:
SELECT
contact_id
FROM
`ems-od-<customer>.emarsys_<customer>_123456.email_opens_123456`
WHERE
DATE(partitiontime) >= "2019-04-19"
AND DATE(partitiontime) <= "2019-04-21"
AND EXTRACT(DATE FROM event_time) = "2019-04-20"
AND campaign_id = <campaign ID>
The following example shows you how to query which campaigns were sent today:
SELECT
DISTINCT campaign_id
FROM
`ems-od-<customer>.emarsys_<customer>_123456.email_sends_123456`
WHERE
DATE(partitiontime) = CURRENT_DATE() OR
DATE(partitiontime) = DATE(NULL)
-- Scan data that is currently in the Streaming Buffer.
-- To learn more about the Streaming buffer,
-- visit Google’s documentation.
It is possible to check how much data is processed by a query if you issue a dry run. To learn more about issuing a query dry run, refer to Google’s support pages.
Filtering queries for latest results
Open Data tables contain historical data. Most queries result in lengthy list of records. Therefore, it's often worth to limit the results already in the query.
Here is an example:
Let's say that we would like to get a list of campaign IDs and names.
If we query the sap-od-[customer_ID].emarsys_[customer_ID].email_campaigns_v2_[customer_ID]
table, then we will get all the names of all the campaigns including current names and previous names (in case a campaign was renamed).
But if we are only interested in the current names, then the following example returns the most recent name of each campaign_id
and does not return any previous names of the campaigns.
WITH
campaigns AS (
SELECT
campaign_id,
name,
ROW_NUMBER() OVER (PARTITION BY campaign_id ORDER BY event_time DESC, loaded_at DESC) AS row_num
FROM
'sap-od-[customer_ID].emarsys_[customer_ID].email_campaigns_v2_[customer_ID]' ),
latest_campaigns AS (
SELECT
campaign_id,
name
FROM
campaigns
WHERE
row_num = 1 )
select * from latest_campaigns;
Using materialized views
You can create materialized views in Google BigQuery, which are especially useful if there is a certain set of data from a single or multiple tables that are read frequently but does not change often.
Sample query:
CREATE MATERIALIZED VIEW `ems-od-supporttest.editable_dataset.just_test_materialized_2` AS
SELECT campaign_id, name, event_time
FROM `ems-od-supporttest.editable_dataset.materialized_test_support`
GROUP BY 1,2,3
Explanation for the sample:
-
ems-od-supporttest.editable_dataset.just_test_materialized_2
Name of the materialized view to create. Do not use an existing table name. -
ems-od-supporttest.editable_dataset.materialized_test_support
The table that is the source of the data for the materialized view.