Compare Datasets#

The Compare Datasets node helps you compare data from two input streams.

Usage#

Decide which fields to compare. In Input A Field, enter the name of the field you want to use from input stream A. In Input B Field, enter the name of the field you want to use from input stream B.
Optional: you can compare by multiple fields. Select Add Fields to Match to set up more comparisons.
Choose how to handle differences between the datasets. In When There Are Differences, select one of the following:
- Use Input A Version
- Use Input B Version
- Use a Mix of Versions
- Include Both Versions

Understand item comparison#

Item comparison is a two stage process:

Ensemble checks if the values of the fields you selected to compare match across both inputs.
If the fields to compare match, Ensemble then compares all fields within the items, to determine if the items are the same or different.

Options#

You can use additional options to refine your comparison or modify comparison behavior.

Select Add Option, then choose the option you want to use.

Fields to Skip Comparing#

Enter field names that you want to ignore.

For example, if you compare the two datasets below using person.language as the Fields to Match, Ensemble returns them as different. If you add person.name to Fields to Skip Comparing, Ensemble returns them as matching.

	// Input 1
	[
		{
			"person":
			{
				"name":	"Stefan",
				"language":	"de"
			}
		},
		{
			"person":
			{
				"name":	"Jim",
				"language":	"en"
			}
		},
		{
			"person":
			{
				"name":	"Hans",
				"language":	"de"
			}
		}
	]
	// Input 2
		[
		{
			"person":
			{
				"name":	"Sara",
				"language":	"de"
			}
		},
		{
			"person":
			{
				"name":	"Jane",
				"language":	"en"
			}
		},
		{
			"person":
			{
				"name":	"Harriet",
				"language":	"de"
			}
		}
	]

Fuzzy Compare#

Whether to tolerate type differences when comparing fields (enabled), or not (disabled, default). For example, when you enable this, Ensemble treats "3" and 3 as the same.

Disable Dot Notation#

Whether to disallow referencing child fields using parent.child in the field name (enabled), or allow it (disabled, default).

Multiple Matches#

Choose how to handle duplicate data. The default is Include All Matches. You can choose Include First Match Only.

For example, given these two datasets:

	// Input 1
	[
		{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "banana",
				"color": "yellow"
			}
		}
	]
	// Input 2
	[
		{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "banana",
				"color": "yellow"
			}
		}
	]

Ensemble returns three items, in the Same Branch tab. The data is the same in both branches.

If you select Include First Match Only, Ensemble returns two items, in the Same Branch tab. The data is the same in both branches, but Ensemble only returns the first occurence of the matching "apple" items.

Understand the output#

There are four output options:

In A only Branch: data that occurs only in the first input.
Same Branch: data that's the same in both inputs.
Different Branch: data that's different between inputs.
In B only Branch: data that occurs only in the second output.