February 26, 2019 – Alert: Possible Data Integrity Issue with Mirrored Database Catchup with Parallel Dejournaling
InterSystems has corrected a defect that can result in data integrity problems in environments that use InterSystems mirroring in conjunction with parallel dejournaling.This problem exists for currently released Caché and Ensemble versions beginning with 2017.2 and for InterSystems IRIS Data Platform version 2018.1.
Ensuring that you are protected from this issue
Your system is at risk only if you have a mirroring environment that supports parallel dejournaling for mirrored database catchup. Hence, if your environment currently supports this feature, InterSystems recommends that you disable it by setting the following global node in the %SYS namespace:
Where mirrorname is your environment’s mirror. This prevents systems from meeting the necessary conditions to cause this issue; see below for more details. After upgrading to a version with the correction described below, you can safely ‘kill’ this global node.
Determining whether the necessary conditions are present
It is extremely unlikely that this defect would be triggered, even when all necessary conditions are present. If the defect is triggered, database updates are applied out of order – which can result in a data integrity issue.
Because the problem can only occur during mirrored database catchup, the following scenarios cannot trigger it:
- Journal restore for non-mirrored databases
- Dejournaling that occurs during reconnection or restart of a mirror member
For the problem to occur, all of the following conditions must be present:
- Parallel dejournaling must be enabled.
- For async members, parallel dejournaling must be explicitly enabled in the configuration.
- For DR async members, it is enabled by default.
- For failover members, it is always enabled.
- The host system must have sufficient resources to use parallel dejournaling, or one or more globals in the %SYS namespace have been explicitly set to a value greater than 1. These globals are:
- For more information on this feature, see “Configuring Parallel Dejournaling” in the “Mirroring” chapter of the High Availability Guide.
- Catchup must be run on multiple databases as a single operation. In order to determine if catchup was ever run on more than one database as a single operation, search the chain of log files since installing or upgrading to an affected release for messages that contain the string “catchup started for”; any message that contains this string also lists the number of databases being caught up. (The log file is log for InterSystems IRIS and cconsole.log for Caché and Ensemble). If you have the full chain of log files and there is no “catchup started for” message for more than one database, your system has not been affected.
If all of the conditions are present, it is possible that the defect has been triggered. Since the nature of the defect is to apply updates out of order, logical data integrity may be compromised. This would be difficult to identify. Please contact the Worldwide Response Centerif you have determined that your environment might be affected.
You can run DataCheck to verify the consistency of globals across mirror members, but it cannot guarantee that the defect was never experienced. Specifically, if DataCheck reports consistent data, this means the systems are consistent now; however, they may not have been in the past, if affected globals were updated after the defect was encountered. For more information on DataCheck, see the “Data Consistency on Multiple Systems” chapter in the Data Integrity Guide.
Information about the correction
The correction for this defect is identified as HYY2332. This correction will be present in all future released versions of Caché, Ensemble, HealthShare, and InterSystems IRIS Data Platform, except InterSystems IRIS 2019.1, which disables parallel dejournaling.
If you have any questions regarding this alert, please contact the Worldwide Response Center.