The Components of Caché

pdf

Available in .pdf

Resiliency

ResiliencyProviding database resiliency is a balancing act between hardware costs, time of recovery from system outages, and how many transactions might be “lost” due to an outage. Caché includes capabilities like clustering and shadowing that allow system architects to design whatever system best fits the needs and resources of their organization.

Resiliency

Outages happen. Some are planned, such as when a system is brought down for reconfiguration or upgrades. Some are unplanned, perhaps caused by a glitch or someone tripping over a power cord. Some can be classified as disasters, for example, a fire in the building, or a neighborhood-wide power loss. Resiliency refers to a system’s ability to adapt to and recover from outages. By providing a variety of resiliency features, Caché gives system architects options when it comes to balancing cost, system availability, and database state retention.

Clustering for High Availability in the Face of Partial Outages

In a database cluster, two or more servers can access the same database. They are configured so that if one server fails, the other members of the cluster take on the jobs it was performing. Users will notice little or no interruption of operations, depending on the system architecture.

One common (and fairly cost-efficient) architecture is the failover cluster, in which multiple data servers share access to a database, but only one is active at a time. If the active server fails, the standby server automatically takes over. Depending on the operating system, the failover may take as much as a few minutes. Caché’s Enterprise Cache Protocol (ECP) provides a unique advantage in this kind of system because transactions, although they may be delayed, will not be lost.

Failover Cluster

Another type of cluster is the shared database cluster, in which all members of the cluster can simultaneously access the database. This system ensures that the database remains available, but may require additional hardware or administration. Caché currently* supports shared database clusters on the Alpha OpenVMS and TRU64 UNIX platforms.

Shadowing for State Recovery in the Face of Disaster

On those rare occasions when disaster strikes and the database is lost or corrupted, the question becomes one of data restoration. Specifically, how current is the most recent back-up copy of the database? Back-up copies may be hours, days, or even weeks old – and they may require a long time to reload. For that reason, some systems use shadow servers to keep a duplicate database in real time. Depending on the system architecture, the latency – the time between when a change occurs in the primary database, and when it occurs in the shadow database – will typically run from a fraction of a second up to as much as a few seconds. Caché has the capability of measuring the latency in a shadowed system. It also includes provisions for rolling back transactions that have not been shadowed when a failure occurs.

Shadowing

Shadowing can be used for a clustered system, thus providing both availability and database state retention. In a shadowed cluster, each member of the cluster writes to a separate journal. Journals are transferred to the shadow server. When they are de-journaled, instructions are applied in order according to the cluster journal sequence number.

Avoiding Downtime due to Planned Outages

One often-overlooked aspect of resiliency is the ability to avoid planned outages. Caché system management capabilities allow many modifications of the database and system configuration to be made without requiring a reboot or other interruption of operation.

* Support on other platforms is planned. Check with your account representative for availability.