This article provides you information on how Emarsys opens connection to your database in order to generate and run queries to get the personalized content for email campaigns with Relational Data.
When does Emarsys open connection to your database?
Emarsys establishes connection to your database with Relational Data Service (RDS) when you either
- utilize Relational Data features, eg.: launching email campaigns with Relational Data personalization or using an Relational Data segment in an Automation Center program,
- browse the User Interface (UI) of the Emarsys Marketing Platform to collect metadata, or
- import data using the Relational Data API.
Various connection pools
Relational Data uses separate connection pools to avoid interference between segmentation, personalization and data import. This logic ensures that a slow segment running cannot block data import through the Relational Data API.
Open connection time: Emarsys attempts to close the connections as fast as possible, if a connection pool has not been used for a while.
Connection timeout: All connectors have a configured 5 seconds connection timeout.
Exception: Amazon Redshift has 20 seconds timeout.
How to generate and run queries for personalization?
Once the connection has been established to your database Emarsys can generate and run queries to get personalized content.
Use case example
For this query example we use the view called upsell_services.
|rent_a_car||Rent a car||ViaRent||Vienna||10||Euro||hour||yes
|rent_a_car||Rent a car||BerliRent||Berlin||14||Euro
|rent_a_car||Rent a car||KaiRent||Cairo||6||Egyptian Pound||hour
||Rent a car||LDRent||London||8||Pound||hour
||Rent a bike||TwoWheelsBerlin||Berlin||4||Euro||hour
||Rent a bike||TwoWheelsBerlin||Vienna||2||Euro||hour
Setting up reference fields
In the email message we want to use the value of
service_name based on the value of the
city and its
In order to do that, we have to set two reference fields, which are
availability. We use these 2 reference fields both together to identify the values for personalization. (Default value is not needed at this step of this use case example.)
Script to have the name of an available service located in Vienna:
Emarsys runs the following query to have the value of
service_name where the
city = "Vienna" and the
availability = "yes":
select distinct service_name from upsell_services where (city = "Vienna" AND availability = "yes") limit x
Setting up a default value
In case you expect to have some values not represented in your view and you want to make sure that even in that case you have a content there you can set a default value.
We already have the reference fields
availability set up and we added the default value "General" for the
city reference field.
Checking if the default value is needed?
Now, as a default value is set before we run the query to find the personalized content, first we check whether we need to use the default value at all, by running the following query:
select distinct city from upsell_services where city = "Los_Angeles" limit 1
If the default value query’s result is empty, Emarsys needs to use the default value instead of the original value.
select distinct service_name from upsell_services where (city = "General" AND availability = "yes") limit x
Result: Not Empty
If the default value query’s result is not empty, Emarsys runs the query with the original value (not with the default one).
select distinct service_name from upsell_services where (city = "Los_Angeles" AND availability = "yes") limit x
Personalizing multiple contacts
Generally, Emarsys can batch together and run queries per 500 contacts per personalization source (table or view). However, if the message contains the same source multiple times with different configurations (parameters), then the batch size is multiplied by the number of different configurations in the message and they are grouped together. Each query will contain up to 1,000 parameter groups in the
A parameter group contains multiple reference fields for a personalization token.
In this example, the
contact.1234 field contains the destination cities for each contact:
This would be 2x500 parameter group, because even though it is from the same source (view upsell_services) it has 2 different configurations. So, for the 500 contacts Emarsys runs 1 query containing 1000 parameters in the
Query for default handling
select distinct city from upsell_services where (city = "Vienna") OR (city = "Los_Angeles") limit 2
If the default query result is not empty
select distinct service_name from upsell_services where ((city = "Los_Angeles" AND availability = "yes") OR (city = "Vienna" AND availability = "yes")) limit x
Personalizing multiple parameters
If the message contains the same source multiple times with different configurations or parameters, they get merged together into one query.
select distinct service_name, service_type from upsell_services where (city = "Los_Angeles" AND availability = "yes") limit x
If multiple sources are used in the same campaign, they are not merged together to create a single query. Each individual source will have its own individual query.
Limit in personalization queries
limit in every query. When you use the same ESL with different indexes, the limit is increased instead of running the query multiple times.
rds.myconnection.upsell_services("Los_Angeles","yes").service_name rds.myconnection.upsell_services("Los_Angeles","yes").service_name rds.myconnection.upsell_services("Los_Angeles","yes").service_name
select distinct service_name from upsell_services where (city = "Los_Angeles" AND availability = "yes") limit 3
In case of the foreach ESL tag, you can define the limit yourself, however the maximum limit is
100. If undefined, the limit is
100 by default.
If multiple ESL or personalization tokens are present in your campaigns with multiple personalization sources (table or view) and at least one times out, all queries get retried not just the faulty slow one.
For example, let's say you have the following in the campaign:
If one bookings table is slow and fails to return the results within due time, both personalization queries will be retried. As a result, following queries run multiple times in your database :
select distinct service_name from upsell_services where (city = "Los_Angeles" AND availability = "yes") limit x select distinct date_of_arrival from bookings where (user_id = 2 AND destination_city = "Los_Angeles") limit x
Of course, in case the campaign is sent out to multiple contacts, the conditions can be more complex than the ones described above.
The duration of creating Relational Data personalization queries - FAQ
What is the time limit for Relational Data personalization queries?
For optimal performance, the queries should finish running less than 500 ms.
What can I do when a Relational Data personalization query is too slow?
Optimizing the duration of Relational Data personalization queries is very similar to optimizing relational segment template queries. Creating such well-optimized queries requires deep understanding of the type of database you use, the data you store in your database, the use-case you are trying to create as well as general understanding of query optimization.
We strongly recommend consulting an expert if you experience performance issues.
- When creating a personalization query, always consider the following:
- How large will the result set be?
- How large is the dataset the query scans to determine the results?
- After you considered these, you can decide:
- Whether to use indexes: Depending on the type of database you use, this might be different, but generally you should consider using indexes. Most of the time with larger datasets, indexes can help performance. Also, you should index reference fields.
- Whether to use functions in the where clause: Some functions can increase query time significantly because they can result in unintended table scans. Before using them we recommend to research their performance impact.
- What you really need to include in the query: Avoid to use select *. Instead, include the specific columns that you need individually.
- Whether to use join instead of correlated subqueries: Most of the time when operating with larger datasets, joins have better performance than correlated subqueries.
- How to use joins: Joins can be time-consuming as well. Do not join unused tables, and always try to join on indexed fields. Also, fields on you are joining should have the same data type.
- Whether to use wildcards: Using wildcards will definitely slow down your query, especially for huge tables. Try to avoid it if you can.