During the five years working with Experience Analytics, I have never had a chance to deep dive into segment reducing functionality.
Several days ago, the time has finally came and I had to find out how it works. I decided to describe it in detail, possibly somebody will find this interesting.

I used 8.2.7 version for demonstration, though the functionality should be pretty much the same in previous versions and revisions of 8.x.

Demo Data

I created a small example to demonstrate the reduce manager work.
There is Campaign Group 1 campaign group, which has 3 campaigns with the following names:
Campaign1
Campaign2
Campaign3

The report for the Campaign Group 1 group looks as follows:

Experience Analytics uses 5 tables to retrieve the data in Sitecore 8.2: SegmentRecords, Fact_SegmentMetrics, SegmentRecordsReduced, Fact_SegmentMetricsReduced, DimensionKeys.
Here is approximate query which can help to understand the exact way of how records are stored (we consider that the reducer has not been executed yet):

SELECT Date, Visits, Value, DimensionKey
FROM [SegmentRecords] JOIN [Fact_SegmentMetrics]
ON [SegmentRecords].SegmentRecordId = [Fact_SegmentMetrics].SegmentRecordId
JOIN [DimensionKeys] ON [SegmentRecords].DimensionKeyId = [DimensionKeys].DimensionKeyId
WHERE SegmentId = '7A9A483F-195D-4F96-AD88-473CD6854C4F'

The SegmentRecordsReduced, Fact_SegmentMetricsReduced tables are not used for now, since they are populated when reducing is performed.
The query gives the following results:

The first ID in the DimensionKey is the campaign group id. The second id is the id of the campaign. Thus, we have the same results as in the report:
156e78f3-f4ea-43d1-8607-07c38ccc53a9 - 13 times
4ba98edb-b73c-4951-a5bc-6f4210651a3a - 2 times
f4701287-86e2-4534-8358-5701cdb5c7ee - 1 time.

Reduce functionality

Reducer compresses the data which is considered insignificant by the system. It allows having less records in the database, which makes the query execution fast even for the big databases.

All reducing related configuration can be found in the following file:

App_Config\Include\ExperienceAnalytics\Sitecore.ExperienceAnalytics.Reduce.config

There is a reduceLoader hook, which makes ReduceAgent to be executed every 30 seconds.

<agent type="Sitecore.ExperienceAnalytics.Reduce.ReduceAgent, Sitecore.ExperienceAnalytics.Reduce" >
<param desc="connectionStringName">reporting</param>
<param desc="triggerHour">1</param>
<param desc="logger" ref="experienceAnalytics/reduce/logger"/>
<param ref="experienceAnalytics/reduce/manager"/>
</agent>

Interesting parameter here is the triggerHour, which defines that the ReduceAgent can do it work only at this particular hour, which is 1AM by default. If the agent wakes up at other hour, it will just do nothing.
Apart from that, the reduce agent can do its work only once in 24 hours. The value of the last execution is stored in the properties table of the reporting database:

SELECT * FROM [Properties]
WHERE [Key] = 'EA_reduce_lastrun'

The agent delegates the execution to ReduceManager:

<manager type="Sitecore.ExperienceAnalytics.Reduce.ReduceManager, Sitecore.ExperienceAnalytics.Reduce">
<param desc="connectionStringName">reporting</param>
<param desc="retentionDays">7</param>
<param desc="logger" ref="experienceAnalytics/reduce/logger"/>
</manager>

The retentionDays parameter defines how many days the aggregated data remains untouched by reduce manager.

ReduceManager has specific time for which it is allowed to run. By default, it is 1 hour and can be configured using the following setting:

<setting name="ExperienceAnalytics.Reduce.Timeout" value="01:00:00" />

If the execution time is exceeded, the operation is aborted.

Reduce manager is executed for each site from the SiteNames table of the reporting database and for each segment. There are several checkups in the code which may prevent segment reducing. Mostly, this can happen if re-aggregation is in process or if there is no data for the particular segment which is older than 7 days. Corresponding messages are written to a log file with Info level.

The main reducing logic is performed by the ReduceSegmentMetrics stored procedure. First of all, the data for further reducing is retrieved. It is done using the query like follows:

SELECT ROW_NUMBER() OVER(ORDER BY sm.Visits DESC, ABS(sm.Value) DESC) AS 'PredicateOrder',
sr.[SegmentId],
sr.[Date],
sr.[SiteNameId],
sr.[DimensionKeyId],
sm.[SegmentRecordId],
sm.[ContactTransitionType],
sm.[Visits],
sm.[Value],
sm.[Bounces],
sm.[Conversions],
sm.[TimeOnSite],
sm.[Pageviews],
sm.[Count]
FROM SegmentRecords sr
INNER JOIN Fact_SegmentMetrics sm ON sr.SegmentRecordId = sm.SegmentRecordId
INNER JOIN DimensionKeys dk ON dk.DimensionKeyId = sr.DimensionKeyId
WHERE sr.SegmentId = '7A9A483F-195D-4F96-AD88-473CD6854C4F' AND sr.[Date] >= '2019-08-16 00:00:00' AND
sr.[Date] < '2019-08-17 00:00:00'

The resulting table in our case will be:

The records in the table are ordered by number of visits and engagement value and each row has PredicateOrder number assigned. If the number of records is too big, only first n records remain and the other ones are reduced. The PredicateOrder helps to understand which records to leave.
By default, only first 1000 records are significant ones and the rest is reduced. It can be controlled using the following setting:

<setting name="ExperienceAnalytics.Reduce.DefaultKeepCountThreshold" value="1000" />

In our case, this setting does not affect the behavior since there are only 4 records for the segment.
The resulting relation is filtered further using Visits and Value metrics. By default only records with Visits > 10 are considered
significant. As for value, by default ABS(Value) must be > -1 (which is always true) to be significant.
The corresponding default values can be changed in the configuration:

<setting name="ExperienceAnalytics.Reduce.DefaultValueThreshold" value="-1" />
<setting name="ExperienceAnalytics.Reduce.DefaultVisitThreshold" value="10" />

The change will affect all segments.

Also each particular segment can be configured independently. To do that, the dimension which corresponds to a segment must be found first. It can be done as follows:

  1. Find the needed segment in the master database using Content Editor.
  2. The segment parent item is the needed dimension.
  3. Use the dimension id to find it in the Sitecore.ExperienceAnalytics.Reduce.config file. Using the visitThreshold and valueThreshold attributes it is possible to define the required thresholds.
<dimension id="{3E01BA28-2B4D-408A-A4BA-6C51ED9FFB9C}" type="Sitecore.ExperienceAnalytics.Aggregation.Dimensions.ByCampaign, Sitecore.ExperienceAnalytics.Aggregation" visitThreshold="1" valueThreshold="-1"/>

In the configuration above, the records which have more that 1 visit and any value are considered as significant.


In my case (with default configuration) only the first record is significant. The rest of the records will be reduced. Value, Visits, Bounces, etc. will be summed up and only one reduced record with [Other] DimensionKey will be stored in Fact_SegmentMetricsReduced and SegmentRecordsReduced tables (note that initially Fact_SegmentMetrics and SegmentRecords tables). The reduced data will be purged from the Fact_SegmentMetrics and SegmentRecords tables after stored procedure work is finished.
Running the query against Fact_SegmentMetrics and SegmentRecords tables will return no results now. If the same query is executed against Fact_SegmentMetricsReduced and SegmentRecordsReduced tables, the result will be the following:

SELECT Date, Visits, Value, DimensionKey
FROM [SegmentRecordsReduced] JOIN [Fact_SegmentMetricsReduced]
ON [SegmentRecordsReduced].SegmentRecordId = [Fact_SegmentMetricsReduced].SegmentRecordId
JOIN [DimensionKeys] ON [SegmentRecordsReduced].DimensionKeyId = [DimensionKeys].DimensionKeyId
WHERE SegmentId = '7A9A483F-195D-4F96-AD88-473CD6854C4F'

As it can be seen, non-significant data is stored with [Other] dimension key, thus can’t be used for displaying particular campaigns in the reports. The report shows the following data after the reducer finished its work:

Summary

Considering the above, we now know that Experience Analytics reports may show less data after some time. Using the configuration settings this behavior can be adjusted to meet the requirements:

  1. retentionDays value can be increased. In this case, the data will remain unchanged for longer time.
  2. Reduce functionality can be disabled completely by disabling the Sitecore.ExperienceAnalytics.Reduce.config file. However, please note that this approach may lead to reporting database grows and as a result the queries will be executed slowly if the amount of data is too big.
  3. Specific segments can be reconfigured using the visitThreshold and valueThreshold attributes as it was described above.
  4. It is also possible to comment out specific dimension under the experienceAnalytics/reduce/dimensions node, however it is not the way which EA treats as valid one. If the dimension is present under the experienceAnalytics/aggregation/dimensions node it should be present under the experienceAnalytics/reduce/dimensions node as well. Otherwise, the following error will appear in a log:
ERROR [Experience Analytics]: Error trying to reduce segment: ea364d4d-b85c-4c07-88d1-51edcaa1a160, date: 8/16/2019, site: website

Though, the error is not critical and does not seem to break something.