InterSystems Documentation 
Caché 5.2.4 Maintenance Kit Release Notes


Topics

General Announcements
InterSystems advises all users to recompile their applications as part of the upgrade process so as to take advantage of all the performance improvements as well as other changes that may affect them in this release.
This release describes the change made since Caché 5.2.3.

InterSystems News, Alerts and Advisories
From time to time, InterSystems publishes items of immediate importance to users of our software. These include alerts, mission critical issues, important updates, fixes, and release announcements.

The most current list can be obtained from the InterSystems Website.

Users should check this list periodically to obtain the latest information on issues that may have an effect on the operation of their site.


Online Documentation
As a convenience to our users, InterSystems provides online access to documentation for recent product versions at the InterSystems Website.

Description of a Change Report
To help you assess the impact of this maintenance kit on your applications, the remaining topics in this document describe each modification in detail. The format for this description is a Change Report.
Each Change Report provides a table with the following information:


Change Reports for This Maintenance Kit


Category: Config Mgr
Platforms: All
DevKey: CFL1468
Summary: Fix slowness in startup (.cpf file import)

Description:

An issue has been corrected where huge config files would cause startup to take more than 10 minutes.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: All
DevKey: CFL1470
Summary: Fix STUCNFG to not parse the .cpf file

Description:

An error has been fixed where the config file was parsed and imported in places other than STU. This could cause odd effects when customers edited the CPF file by hand and their changes made it into Caché before they imported them.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: All
DevKey: CFL1474
Summary: Fix subscript-level mapping when collation = -1

Description:

An error has been fixed where entering a subscript–level mapping for a global could cause startup to fail.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: All
DevKey: CFL1475
Summary: Fix error in database deletion via ^DATABASE

Description:

An error has been fixed where deleting a database via ^DATABASE would cause an ERROR #642.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: All
DevKey: CFL1478
Summary: Audit changes in ECP properties

Description:

An error has been fixed where changes in ECP network properties were not logged in the Audit database.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: Linux
DevKey: CFL1479
Summary: Fix chmod problem on Linux

Description:

An error has been fixed where a permission error occurred in a chmod call during startup and shutdown.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Config Mgr
Platforms: All
DevKey: CFL1483
Summary: Fix proliferation of backup files

Description:

An error has been fixed where backup CPF files were created when Caché was restarted, even when there were no changes to the configuration.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Installation.UNIX
Platforms: All
DevKey: ALE1021
Summary: Identify 64-bit AIX systems as capable of running both "ppc" and "ppc64" platforms

Description:

UNIX installation will allow users to install both "ppc" and "ppc64" platforms on 64-bit AIX systems regardless of CPU architecture. Prior to this change it would not list ppc64 as a valid platforms for the POWER3 architecture.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Installation.UNIX
Platforms: All
DevKey: ALE1172
Summary: Clean up old OS runtime and ICU libraries on upgrade on UNIX

Description:

Before installing SAX and Basic files, we will clean up the following files from <cachesys>/bin in upgrade installations on UNIX: libCstd.so.1, libCrun.so.1, libicudata.so.21, and libicuuc.so.21.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Installation.UNIX
Platforms: All
DevKey: ALE1224
Summary: Force private Apache on UNIX to use IPv4 sockets

Description:

On Caché startup, the web port will be passed in the form "0.0.0.0:<portnumber>" to force Apache to use only IPv4 sockets. This is done as a work around against a problem in the IPv6 implementation on some Solaris versions; IPv6 sockets would sometimes disappear and Apache would not stop correctly on Caché shutdown.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: MgtPortal
Platforms: All
DevKey: YSD1315
Summary: How to delete journal files

Description:

Deleting of journal files is now done through the Task Manager.

In the Management Portal, on the Operations side, a new item, "View Task Schedule", has been added. This page shows a table with all the task schedules.

The operator can create a Task to purge journal at a desired time or frequency. However, when Caché is installed, the purge journal task is already defined and set to run periodically. It may be modified and/or replaced as necessary.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: MgtPortal
Platforms: All
DevKey: YSD1467
Summary: Global mappings preserved by browser after namespace deleted

Description:

When a namespace and global/routine mappings are created through the Management Portal, and then later the namespace is deleted, creating a new namespace with the same name before restarting the browser would result in the previous mappings appearing in the Management Portal. This has been corrected.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: MgtPortal
Platforms: All
DevKey: YSD1498
Summary: Shadow parameter to keep files from getting purged immediately

Description:

This change adds a shadow parameter to allow the user to specify how old the shadow copy of a source journal file can be before it gets purged.

In the Management Portal, Configuration, Shadow Server Settings page, there is an option to "Add New Shadow or Edit an existing shadow." Selecting "Advanced" now shows an additional prompt:

Days of old copied journals to keep:

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: MgtPortal
Platforms: All
DevKey: YSD1501
Summary: Missing "Null Subscripts Enabled"

Description:

This change adds the "Null Subscripts Enabled" option to the Advanced Configuration/ObjectScript section.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: MgtPortal
Platforms: All
DevKey: YSD1578
Summary: Management Portal can erase global mappings if using same portal window after Caché reboot

Description:

If the Management Portal was used to create global mappings, and Caché is restarted, selecting "New Global Mapping" in the portal may erase the previous mappings because the browser was unaware of the Caché restart. This problem has been fixed.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Networking
Platforms: All
DevKey: GK527
Summary: Fixed UNIX socket delegation access permissions

Description:

Currently the control process is run/set with a different user id than the super server. If the control process is not root, it cannot accept/import the delegated sockets from the super-server.

This change modifies the socket delegation to give (rw) access to any job it tries to delegate the socket to. (In prior releases, it had been limited by the umask setting of whoever started the Caché instance).


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Networking.ECP
Platforms: All
DevKey: GK589
Summary: Invalid global names created via ECP

Description:

ECP application servers with multiple connections may encounter this issue in rare conditions—

An ECP client attempts to optimize duplicate globals sent to the server. The optimization caches the last global used across ECP, but the cached information is not invalidated consistently when multiple connections are involved, and the request sequence numbers match.

This issue has been corrected.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Networking.ECP
Platforms: All
DevKey: GK634
Summary: Modified the ECP server to ignore the dead sessions unlock jrn-position

Description:

The ECP server guarantees the associated changes are durable before granting a lock. In rare conditions, the validation did not account for dead ECP server sessions properly.

This change guarantees that, in the worst case, the database server with dead ECP server sessions may end up in an unexpected long wait.


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: Networking.ECP
Platforms: All
DevKey: GK635
Summary: Dead job cleanup may not release locks granted across ECP

Description:

Remote de-locks originated by the dead-job cleanup may fail and leave the ECP client in an unexpected state. This has been corrected.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: Security
Platforms: All
DevKey: STC1072
Summary: Don't change maximum database size on upgrade to 5.1 or later

Description:

This change corrects an error where, during an upgrade, the maximum database size could be altered.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Security
Platforms: All
DevKey: STC1112
Summary: Only rebuild routine index if necessary during ^INSTALL

Description:

In previous versions, whenever Caché was installed on an existing system, all the routine indices were rebuilt for all the user databases. Now Caché only rebuild these indices if it is upgrading from an older version.

For example, upgrading from 5.2 to 2007.1, will rebuild the indices. Installing an ad-hoc of 2007.1 on top of an existing 2007.1, will not rebuild the indices. This makes installs of maintenance releases quicker.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Studio
Platforms: All
DevKey: DVU1798
Summary: Open dialog. Save file pattern with '*' and default to it

Description:

This change alters Studio so that it only saves patterns including '*' for its file list, and the last-used pattern becomes the default for the next attempt to locate files.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: Studio
Platforms: All
DevKey: DVU1812
Summary: Studio calls lock on a document that is already locked

Description:

This fixes an issue where Studio attempted to lock a document twice.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: Studio
Platforms: All
DevKey: MAK1904
Summary: ^ROUTINE lock lost after using template

Description:

When a template was used for the first time in a Studio session, it removed any locks Studio currently had open. This would allow other users to edit routines/classes that this Studio should have maintained exclusive access to. This is now fixed.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: CDS1035
Summary: Release global block during pattern match

Description:

This change causes Caché to release the global block that the process may be retaining from the most recent global operation when a pattern match operation begins. This allows other processes to proceed with access to the global during the possibly long time of the pattern operation.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: CDS1118
Summary: Limit pattern recursion

Description:

The system will limit the recursion depth for its pattern matching algorithm to prevent abnormal process termination. If the pattern can not be resolved within that limit, it will throw a <COMPLEX PATTERN> error.

This means that the system was unable to perform a pattern match because the pattern was too complex to be applied to the given input string. The pattern should be simplified by reducing the number of alternations and indefinite counts, or the input string should be broken into smaller segments for matching.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: CDS1119
Summary: Create more space for error trap after <FRAMESTACK>

Description:

On some 64-bit platforms, the %ETN error trap was not able to complete normally after a <FRAMESTACK> error. This could cause repeated <FRAMESTACK> errors and filling of the CacheTemp database. More entries will now be popped from the execution stack before %ETN is run to properly handle a <FRAMESTACK> error.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: CDS1154
Summary: Fix local variable $DATA issue after MERGE then KILL

Description:

A MERGE into an existing local variable node, followed by a KILL of that node should leave $DATA() = 0. This change fixes a situation where $DATA() would report 10 and $ORDER() would return a subscript which did not exist.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System
Platforms: All
DevKey: CDS935
Summary: <FRAMESTACK> error handling improvement

Description:

When a <FRAMESTACK> error occurs, the system attempts to remove some execution levels in order to have enough room to run an error trap. The algorithm that it used was not sufficient to run the entire %ETN error trap, causing incomplete error logging or recursion within error traps. This is now corrected.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: GK518
Summary: Changed UNIX socket delegation to use a different directory

Description:

Socket delegation uses UNIX domain sockets to pass an open socket to the new job. The UNIX domain socket path is limited to 108 chars (104 on some systems). To create a unique UNIX domain socket name, Caché formerly used <mgr dir>/nti/<job num>. If the manager directory path was long, the path length exceeded that limit.

The socket is created for a very short time during job creation. The algorithm was modified to use /tmp/ directory in the path instead.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: GK579
Summary: Fixed cluster member write-image journal header corruption

Description:

In rare circumstances, the cluster master normal shutdown may corrupt the write-image journal header which prevents the system from restarting with journaling enabled. This has been corrected.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: HYY1367
Summary: Set the journal sync position (for write daemon) to the right value after opening a new journal file

Description:

This change corrects an issue involving a manual journal switch in "quiet" Caché system — one where no global SETs or KILLs are being done.

In this situation, "ccontrol stat" may appear to show that the write daemon is unresponsive.


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System
Platforms: All
DevKey: JO2099
Summary: Write daemon defers clearing freeze flag until it completes an entire pass

Description:

If the system becomes marked as frozen while the write daemon is writing to the databases, the write daemon unfreezes the system when it completes writing the databases. But it is immediately frozen again until the write daemon writes the write-image journal for the next pass.

This has been changed so that the system will remain frozen until the write daemon completes the subsequent pass to allow it more time to catch up if it has fallen behind.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: JO2109
Summary: Add cstat -K<switch #> to clear switches

Description:

A -K<switch #> option has been added to cstat to clear switches in the corresponding instances of Caché. This is intended to be used only in an emergency, for example, if a process sets a switch and exits. Setting a switch (eg. 13) and exiting may leave Caché in a state where a process cannot get into the system to clear the switch.

The command

..\bin\cstat -s. -K13
will clear switch 13.
Warning: This function must be used extremely carefully as switches within Caché are generally only set for specific reasons (such as a backup) where activity needs to be suspended.
If the process which set the switch is actually running it will not notice that the switch was cleared and it may report a successful completion when in fact the results are corrupt.

Use of the -K option is recorded in the console log and the user is warned that this is dangerous and asked to confirm the operation before continuing.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: JO2113
Summary: Retry loop in getdatabaselock() did not reload mailbox message

Description:

An error has been fixed which could result in an exception in the write daemon if the journal file is switched twice while the write daemon is busy writing blocks back out to the database files.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: JO2125
Summary: Find correct left ptrblk when gcompact starts in the middle of a global

Description:

A problem with gcompact has been resolved where it would sometimes report a <DATABASE> error which in turn could freeze the system. This occured when starting the compaction in the middle of a global. Sometimes this occurs internally in the gcompact routine as the system code may return to the COS routine periodically to report progress when compacting a large global.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: JO2148
Summary: Update write daemon monitor so system doesn't suspend if write daemon is doing any writes

Description:

The criteria for suspending the system when the write daemon takes more than 5 minutes to complete a pass has been extended; as long as the write daemon is writing some blocks to disk, the system does not get suspended.

A note will be generated in the console log when this occurs because it is not healthy for a system to take this long to write a set of blocks to disk, especially if it happens on consecutive passes. Users will not be locked out as long as some progress is being made. The system may eventually run out of free buffers; in this case, users will be locked out.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: UNIX, OpenVMS
DevKey: JO2158
Summary: Error handling for multiple write daemons needs to track errors from all slaves

Description:

Under rare circumstances, when Caché is running on UNIX and OpenVMS platforms, if multiple write daemons enountered I/O errors, the system would be suspended. (The system would be unsuspended when any one of them completed their pending operation.)

The correct behavior is to keep the system suspended until all of them complete their pending operations succesfully. It is implemented by this change.


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System
Platforms: All
DevKey: LRS1031
Summary: Command line routine parameters and class method references

Description:

The command line handling for Caché application mode has been enhanced to permit passing parameter lists consisting of string and/or numeric literals as well as omitted (void) parameters. Routines and procedures can be invoked, and class method calls may also be made.

General syntax forms are:

tag+offset^routine
tag^routine([parameter-list])
##CLASS(package.class).method([parameter-list])
More specific syntax forms are:
^routine
tag^routine
tag+offset^routine
+offset^routine
routine
^routine()
tag^routine()
routine()
^routine(parameter-list)
tag^routine(parameter-list)
routine(parameter-list)
##CLASS(package.class).method()
##CLASS(package.class).method(parameter-list)
##CLASS(package.class).method
A parameter list may appear as:
("string literal",,-+-000123.45600E+07)
where omitted (void) parameters are passed as $Data(parameter)=0 to the target.
Whitespace and shell metacharacters must be quoted in an O.S. dependent fashion. For example:

Windows:

General^^ADMIN(""""%username%"""",,-12.34,""""a b"""")
UNIX:
"General^ADMIN(\"$USER\",,-12.34,\"a b\")"
OpenVMS:
"General^ADMIN("""+F$USER()+""",,-12.34,""a b"")"

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: UNIX
DevKey: LRS1036
Summary: New cstart argument for use by HA failover scripts

Description:

The Caché startup script for UNIX has been enhanced to avoid accidental misuse in high availability failover configurations.

Formerly the customer failover script was responsible for deleting the cache.ids file and starting Caché, but if it were accidentally invoked in other than a bona fide failover situation, there could be severe interference with a running instance of Caché on that node.

The current change will not allow startup to proceed if Caché is already running, even if an accidental failover is commanded. To enable this functionality, replace script lines such as:

rm <directory>/mgr/cache.ids
ccontrol start <instance>
with:
ccontrol start <instance> @failover@
It is important that a properly configured, secure failover filesystem should only permit write access to a single node at any given moment, being switched under exclusive control of the failover software. If the filesystem allows simultaneous write access to multiple nodes, Caché will not be able to block an accidental failover startup should the instance already be running on a different node.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement Yes

 



Category: System
Platforms: OpenVMS
DevKey: MAK1836
Summary: Remove dependency of 'GETFILE' on OpenVMS

Description:

On OpenVMS, there is no way from Caché to determine the record format of a file at all. So when doing an XML import, Caché always assumes the file is not in 'UNK' mode and copies the file to a new file so the record format is known and it can later be imported.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement Yes

 



Category: System
Platforms: All
DevKey: RJF056
Summary: Cstat self-diagnosis for hung systems

Description:

Cstat has been enhanced to allow it to perform self-diagnostics for many types of system hangs.

When the -S option is selected, cstat first attempts to determine if the system is hung in a way that can be self-diagnosed. If it is hung in such a way, then it examines the system state in effort to identify the cause of the hang. Informational messages report the progress of the self-diagnosis and any relevant recommendations to the user. Finally, cstat lists processes which are suspected of causing the hang at that Caché system level.

Each process reported may be in one of several possible states. The states reported and their meanings are as follows:

Note that not all types of hangs necessarily result in having any processes requiring further investigation.

In a Caché cluster, the cstat self-diagnostics should be run on all cluster members. It is possible that one cluster member may appear to be hung before the others. Typically, within 10 minutes the other members become hung because the write daemons time out while trying to synchronize with the other nodes. For this reason, the self-diagnosis may not detect the hang on one or more nodes of the cluster until this timeout has expired.

Additionally, this change adds a new bit 2048 to the cstat -p option. When the 2048 bit is on the sfn and block number of each process's retained buffer (pgbdbsav) is displayed.


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP610
Summary: Warn when cache.ids is missing or doesn't match the shared memory segment

Description:

This change introduces a check for a valid cache.ids file in ECP configurations.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP672
Summary: Fix allocated buffers display for 5GB

Description:

An allocated space of 5GB for buffers displays as 948MB (although the full 5GB is actually allocated). This is the result of an overflow during conversion and has been fixed.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP677
Summary: Home->Globals, failure with 2k db with %globals in it

Description:

If a 2k database has %globals stored in it, an attempt to view them using the Management Portal fails with an error message:
Database, c:\datasets\mdm\, is not available
This does not occur if the 2k database does not have any %globals in it, or if the database is 8k. This has been fixed.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP759
Summary: Low ulimit can cause database errors

Description:

Beginning with Caché 5.1, for all UNIX systems, the hard limit for the maximum file size (RLIMIT_FSIZE) on any system running Caché must be "unlimited".

This correction does two things. First, Caché now checks that the RLIMIT_FSIZE hard limit is unlimited, and halts the Caché startup (or installation) with an error message if not. Second, it sets the soft limit to the hard limit in the daemons.

See the operating system documentation for each platform for a discussion of how to set the system hard limit for RLIMIT_FSIZE.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP763
Summary: Set current user limits up to the max hard limit in the daemons

Description:

A Caché running with current limit less than the maximum hard limits for file size could still encounter a serious i/o error if that limit was less than the size of the cache.dat or cache.wij files. The current limit is now set to the hard limit for file size, core dump size, and open files.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SAP870
Summary: Control process crashes after pausing users

Description:

This change fixes a problem where the control process crashes while validating the 'cache.ids' file. It happens within a few minutes of pausing the system because the write daemon is not making progress.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: UNIX
DevKey: SML787
Summary: Stop Private Web Server when Caché is forced down

Description:

Add code to stop the Web server when Caché is forced down on UNIX platform.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SML797
Summary: When upgrading, reduce journal life span to 100 if it was greater than 100

Description:

Make sure the journal life span maximum is 100 days when upgrading Caché.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SML865
Summary: Don't let write daemon sleep if write daemon is suspended during the pass

Description:

This change fixed a condition where the backup could take a longer time as expected because write daemon's delay.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System
Platforms: All
DevKey: SML882
Summary: Remove 'Com' value in [Com] section for non-Windows

Description:

This change will remove the 'Com' value in [Com] section for non-Windows platforms in the .cpf file. This field is only for Windows platforms.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Cluster Specific
Platforms: All
DevKey: HYY1291
Summary: Fix journal corruption during ECP cluster failover

Description:

This change corrects issues that could cause journal corruption during ECP cluster failover under certain circumstances.

As a rule of thumb, if the journal file from the failed cluster node contains big string data ("big" as defined being 128 or more characters long) or BITSET records, the current journal file on the cluster node performing ECP failover may get corrupted.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.I/O
Platforms: All
DevKey: SAP766
Summary: Orphaned process stuck in CLOSH

Description:

The customer regularly gets a orphaned process stuck in CLOSH status Caché spinning at the AIX level.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1094
Summary: Shorten period of retrying failed journal I/O before disabling journaling

Description:

In the scenario where Caché is configured NOT to freeze on journal I/O error, journaling will get disabled when one of the following conditions is met:

  1. the failed I/O has been tried for a certain period of time;
  2. journal buffers are full;
  3. available global buffers falls below a minimum.

Prior to this change, the length of the retry period in condition #1 was set at 10 minutes, longer than the time write daemon would wait (5 minutes by default) for journal synchronization to complete before declaring WDSTOP (stop the write daemon). As a result, if neither of the last two conditions was met, the write daemon would declare WDSTOP before journal daemon disabled journaling, causing the system to hang for a while (until journal daemon disabled journaling 5 minutes later).

With this change, the length of the retry period in condition #1 is set at half of write daemon's wait time, therefore guaranteeing that journaling would always get disabled before write daemon's wait timed out, preventing the system from hanging.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1275
Summary: Validate journal file paths properly and allow journal switch to exit as cleanly as possible upon an unexpected error

Description:

This change addresses an issue that could cause system to hang after a failed journal switch if

It would also cause buffer overflow with unpredictable result.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1288
Summary: Add an API for purging all journal files except for those required for transaction rollback and crash recovery

Description:

This change adds an API for purging all journal files except those required for transaction rollbacks or crash recovery. The method is:
##class(%SYS.Journal.File).PurgeAll()
A crash recovery refers to the journal restore performed as part of Caché startup or cluster failover recovery.
Warning: Post-backup journal files are not necessarily preserved. Therefore, if you want to be able to restore databases from backups and subsequent journal files, you should configure the journal purging parameter based on backups and use regular purging API (PURGE^JRNUTIL). There is also an API for purging journal files based on criteria different from what is in the Caché configuration. See the documentation on the Purge method in the class, %SYS.Journal.File.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement Yes

 



Category: System.Journaling
Platforms: All
DevKey: HYY1301
Summary: Add options to purge journal files to ^JOURNAL

Description:

A front end interface is provided as part of ^JOURNAL for a couple of options to purge journal files, one of which is to purge all journal files except those required for transaction rollback or crash recovery.

For example,

%SYS>DO ^JOURNAL

1) Begin Journaling (^JRNSTART)
2) Stop Journaling (^JRNSTOP)
3) Switch Journal File (^JRNSWTCH)
4) Restore Globals From Journal (^JRNRESTO)
5) Display Journal File (^JRNDUMP)
6) Purge Journal Files (PURGE^JOURNAL)
7) Edit Journal Properties (^JRNOPTS)
8) Activate or Deactivate Journal Encryption (ENCRYPT^JOURNAL())
9) Display Journal status (Status^JOURNAL)

Option? 6

1) Purge any journal NOT required for transaction rollback or crash recovery
2) Purge journals based on existing criteria (2 days or 2 backups)

Option? 1

The following files have been purged (listed from latest to oldest):

   1. /scratch1/yang/cache/lx4.72u/mgr/journal/20070222.007

1) Begin Journaling (^JRNSTART)
2) Stop Journaling (^JRNSTOP)
3) Switch Journal File (^JRNSWTCH)
4) Restore Globals From Journal (^JRNRESTO)
5) Display Journal File (^JRNDUMP)
6) Purge Journal Files (PURGE^JOURNAL)
7) Edit Journal Properties (^JRNOPTS)
8) Activate or Deactivate Journal Encryption (ENCRYPT^JOURNAL())
9) Display Journal status (Status^JOURNAL)

Option?

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1325
Summary: Abort startup if journal restore couldn't locate next journal file

Description:

This change has two effects:

  1. Caché startup will be aborted if journal recovery gets an error locating the next file to restore. Previously, journal recovery would stop without generating any error and startup would continue.
  2. Journal recovery will locate the correct next file in alternate journal directory even if there is a bad namesake file in the primary journal directory. The bad file could have been the result of an incomplete journal switch due to disk error. (Normally, journal switch would delete the incomplete file if it failed halfway through writing to it, but there are circumstances where it would be unable to delete it such as the disk becoming inaccessible.)

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1329
Summary: Freeze journals on <SYSTEM> error in journal switch when requested

Description:

When a Caché system is configured to freeze on journal errors, the system could be unfrozen by a subsequent successful journal switch. This is usually the desired behavior unless the journal error has resulted in the loss of journal data. In this case, user intervention is necessary to bring the system back to an appropriate state and until then the system should remain in the frozen state despite journal switch attempts.

This change enforces the above behavior in the event of possible loss of journal data resulting from a fatal error in journal switch. If such an error occurs, a message will be written to the console log.

Prior to this change, journal switch aborted with a fatal error <SYSTEM> error would leave behind an incomplete file.


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1345
Summary: Disable sorting in journal restore on fixed cachetemp max size

Description:

This change addresses various issues that, in certain circumstances, could cause journal restore to fail, or produce incorrect result on Caché systems where cachetemp max size is fixed.

Journal restore at startup or cluster failover were not affected, nor it was affected by the setting of long string support.

As of this change, sorting in journal restore is disabled if cachetemp max size is fixed (as opposed to unlimited, or 0).


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1352
Summary: Make external journal switch a simple roll to next journal file if there is no journal property change to activate

Description:

With this change, simple journal rollovers, regardless of their issuers, are no longer logged in journal history global (^%SYS("JOURNAL","HISTORY")). This is consistent with the behavior of journal daemon.

Journal switches that activate changes to journal property such as primary and alternate directories are still logged in journal history global (to facilitate searching for journal files).

All forms of journal switches are still logged in cconsole.log.


Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1359
Summary: Journal $BITSET with correct old value and bit length

Description:

This change addresses an issue that in certain circumstances could cause a $BITSET operation within a transaction to be journaled with incorrect old value and old bit length. This would cause incorrect values to be restored if the transaction gets rolled back.

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: HYY1363
Summary: "Canonicalize" input journal file path of NEXTJRN0^JRNUTIL on Windows

Description:

This change corrects an issue with shadowing where a Windows source that could cause shadowing to stall if the source journal file path contains uppercase characters. On the Windows source side, one may see errors like the following in cconsole.log (and shadow source error log):
NEXTJRN: -98,'D:\CacheSys\Mgr\Journal\20071221.003' appears to be the
next file of 'D:\CacheSys\Mgr\Journal\20071221.002' but contains a
pointer to a different previous file from
'D:\CacheSys\Mgr\Journal\20071221.002' and thus couldn't be the next
file

Likelihood Low
Risk Low
Ad Hoc No
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: JO2156
Summary: Prohibit journal switch when WDJRNFREEZE is set in WDSTOP

Description:

An error has been resolved where Caché would permit switching journal files after an fsync error had occurred when the system was set to freeze on journal errors.

When an fsync error occurs journal data has probably been lost so switching journal files results in resuming a system where the journal files are not complete. If the system is the server of shadow data, the shadow copy will probably be missing some sets and/or kills. Restarting Caché or stopping journaling is the only way to recover from this situation.

Restarting Caché will ensure that the shadow server and clients remain synchronized as the server that had the fsync problem will not have written any database blocks to disk which do not have their supporting journal records in the journal file.

Stopping journaling will allow the system to continue running with a gap in the journal (that is, if it is later restarted, some information will be missing).


Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: RJW1278
Summary: Edit journal properties even when journaling is off

Description:

With this change, ^JRNOPTS now permits modification of journal options, even when journaling is stopped.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: RJW1336
Summary: Re-enable formal support for journal prefixes

Description:

Journal file prefixes are again supported by the configuration utilities. The journal file prefix can be set or modified using the ^JRNOPTS utility (available from ^JOURNAL), or the Management Portal -> System Administration -> Configuration -> Journal Settings page. Any change to the journal file prefix requires a restart of Caché to become active.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: RJW1343
Summary: Journal restore should "multithread" when all directories are selected

Description:

Journal restore will now use multiple processes on multi-processor systems when all directories and globals are selected.

Likelihood Low
Risk Low
Ad Hoc Yes
Enhancement No

 



Category: System.Journaling
Platforms: All
DevKey: RJW1418
Summary: Running ^JRNSWTCH can reduce maximum journal size

Description:

A problem has been corrected where