Data is transferred in and out of Emarsys in comma-separated file format (.csv). In order to ensure that the right values arrive in the right database fields, here is an overview of the conventions that we use.
Since the product data files and sales data files used for Predict and Smart Insight must follow the conventions outlined below, we strongly advise you to do this for your regular contact data imports and exports as well, as these are standard for the industry and will also enable you to take advantage of our import file validation feature.
Introduction
All incoming data files must be UTF-8 Unicode encoded, .csv (comma separated values) files.
If you are using Predict or Smart Insight, you may also want to confirm with Emarsys Support that your data structure matches the requirements of your particular configuration, in particular when relating to custom fields.
The most important rule for uploading data is that the format of all uploaded files must remain consistent. Once a format has been agreed upon, this should not subsequently be changed. If you do, please contact Emarsys Support before the changes take effect.
CSV syntax
Here are our basic rules for .csv files:
- The delimiter is always the comma. (The character separating fields.)
- Values which contain commas can be quoted using double quotes.
Please make sure there are no extra space characters prior to or following the delimiter ,
and the separator "
characters, otherwise data in the CSV file will be skewed.
For example, if one of the values in your CSV file is Last name, First name
, then this is how your values should be separated in your file:
column1,column2,column3
1value1,"Last name, First name",1value3
2value1,2value2,2value3
If the values are separated improperly, then data will be processed incorrectly. For example, if your CSV file contains the following values:
column1,column2,column3
1value1_good,"Last name, First name",1value3
2value1_bad, "Last name, First name",2value3
Then this is how the result looks:
+--------------+-----------------------+--------------+
| column1 | column2 | column3 |
+--------------+-----------------------+--------------+
| 1value1_good | Last name, First name | 1value3 |
| 2value1_bad | "Last name | First name" |
+--------------+-----------------------+--------------+
- No fields may contain new line characters (not even if they are quoted).
- Quotes can be escaped within fields by doubling them:
""
. - The first line must be the header row containing the field names; subsequent lines contain fields in the order defined in the header.
- Redundant (starting and trailing) white spaces are automatically removed unless they are quoted (e.g.
1, 2, 3
is the same as1,2,3
).
Case sensitivity
Take into consideration that values are case sensitive. The identifiers example@address.com
and EXample@address.com
are treated as different.
Encoding
In order to maximize foreign language support and to ensure that content is correctly displayed, Emarsys insists on UTF-8 encoding as a standard.
No UTF-8 4 byte characters are allowed, such as emojis. Including these could prevent importing files in full.
- For floats, you must always use a full stop/period (.) as the decimal separator, even for locales which use a different character such as the comma.
- Imported files must have a newline control character in the first 265 KB. This is needed to avoid the accidental upload of binary files. Uploading files without this character will result in an error notification and the file not being uploaded.
Changing the encoding
Most standard texts editors offer functionality to change the encoding of your content, or will let you select the encoding of a file when you save it (for example as part of the Save As dialog).
URL encoding
URLs (for example, in the link
field of the product catalog) should be properly URL encoded with the following exception: fields used in Predict are encoded automatically, so you don't need to encode these. Characters such as spaces (`) or brackets (
[,
]`) should not be present in these values.
URLs should include the protocol, which can be http://
, https://
or the protocol-relative //
.
Fields with multiple values
Multiple values in the contact data file
When preparing a CSV file for importing contact data, make sure that each value for a multi-choice field has its own column.
The column header must contain both the field name and the value, separated by a colon (:), as in: field_name:field_value
.
For example, if you have a field Interested_in
and the options Sports, Gaming, Film, Music and Food, your file might look like this:
For each of the multiple choice value columns, use either the name of the value (to include this value in the contact's DB record) or use a blank value (to omit this value for the particular contact).
Multiple values in the product data file
Some product catalog fields, such as author
, may have several values, which should be separated with the pipe (|
) character, for example:
item, title, author
BOOK0012, Best essays of 1908, Charles Dickens | Oscar Wilde | Douglas Adams
Multiple values in the sales data file
Smart Insight only supports multiple values in one field: category
(see below).
For all other fields, multiple values separated by pipes will stored by Smart Insight as a single string, for example: Charles Dickens|Oscar Wilde|Douglas Adams
.
Category format
The product category path is a series of increasingly fine-grained sets which describe the product. For example, The Hitchhiker’s Guide to the Galaxy coud be categorized as Literature > Fiction > Sci-Fi
. Note that elements in the category path are separated by the >
character in the product catalog file. In a bookstore, all products would be in the Literature
category. If there is a category which includes all items, we call it the root category. The category
field should not include the root.
Some items may belong to multiple categories. For example, The Hitchhiker’s Guide could also be categorized as Literature > British literature > Contemporary
. Multiple categories should be separated by the pipe, |
.
The correct value for the catalog
field in this example would be:
Fiction > Sci-Fi | British literature > Contemporary
There is no limit to the number of category hierarchies in Predict, but Smart Insight, which also uses the product catalog, accepts only the top five level and ignores the rest. For more details on managing categories in Predict, see Product Categories.
You can find a few examples below. Only the categories marked in bold would be available in Smart Insight (i.e. the top five categories), while the rest of the categories would not be available in Smart Insight.
For example, chocolate flavoured instant coffee would show up under the same category as regular instant coffee in Smart Insight. However, Predict would consider the differences when calculating the recommendations.
Date format
When you are using date fields, the following formats are supported by Emarsys for importing and exporting data:
Date format | Data type |
---|---|
YYYY-MM-DD | Contact, Sales and Product data. |
Timestamp
When you are using a timestamp field, for example to identify individual purchases, you must use the following UTC format:
- YYYY-MM-DDTHH:MM:SSZ
Quoting
If the data itself contains the separator character, then you need to indicate to the import mechanism that this is part of the field value, and it does not use it as an indication to start a new data row. To do this you must surround the entire field value with double quotes. In the example below, the separator ‘,’ appears in the text of the second field value. Therefore when this field comes to be imported, the entire value is enclosed in quotes.
Value in Database | Value in CSV | Comments |
---|---|---|
Notebook computers | …,Notebook computers,… |
No quoting required. |
Laptops, notebooks | …,"Laptops, notebooks",… |
Quotes required. |
Escaping
Similar to the separator character, if the imported text contains the quote character, this must also be indicated to the import mechanism, this time by ‘escaping’ it. Escaping is done by adding another quote character to the existing one and using quotes to surround the entire value. In the first example below, the quote character is used in the text as an abbreviation for the measurement ‘inch’. In the second example, quotes are used not only to escape the ‘inch’ symbol but also to indicate that the value includes the separator ‘,’ as well:
Value in Database | Value in CSV | Comments |
---|---|---|
21″ monitors | …,"21"" monitors",… |
Quote has been escaped. |
21″ monitors, flat screens | …,"21"" monitors, flat screens",… |
Quotes and separators are used. |
Note: Both the separator and the quote characters can be customized, but in each case the character used should be chosen with care.
Empty Values
Fields with no values must be still exported as an empty value without any quoting, so that the structure of the data is not affected.
Example
Value in Database | Value in CSV | Comments |
---|---|---|
NULL | …,,… |
Import recognises an empty field. |
Personalization does not support and cannot process null
values in the Product data files. In case of an occurring null
value, personalization would use the last not-null
value, which is not correct.
As a workaround, you can either use a numeric 0
or an empty string ''
character.
Sample files
Here are some sample files with a selection of typical fields which you can download and adapt for your own use:
Modifying CSV files in use
Modifying the structure of a CSV file used in a working environment is always a complex task that requires much attention. Modifying columns in the contact, product or sales CSV files may easily lead to data corruption or loss.
The best practice and the safest choice is to prepare and perform a complete data reload.
If you attempt to modify the column structure of a CSV file in use without a full reload, keep the following in mind:
- Valid actions to perform are only removing and adding columns.
- Removing a column means data will be lost and adding data to a new column may mean past data will be missing. Both removing and adding can damage data quality.
The following actions will result in data loss:
- changing the name of an existing column
- changing the order of existing columns
- adding a new column before or between the existing columns
- deleting existing columns
You may add columns to your CSV files at the end without a full reload and no data corruption. See Adding columns to a working sales data file.
While it is true that there is no data loss if additional columns are added after the last existing column, the data will not be backfilled in the new columns for the already uploaded past sales data. The new columns will start recording data from the day after their creation.