Difference between revisions of "Remote-replication (DR)"

From OSNEXUS Online Documentation Site
Jump to: navigation, search
m (Creating a Remote Replication Schedules (DR Policies))
m (Creating a Remote Replication Schedule (DR Policies))
Line 73: Line 73:
 
If the volume or share had been previously replicated then an incremental ''diff'' replication is optimal but one can force a complete replication as well and these will be numbered at the destination with a suffix like ''.1_chkpnt'', ''.2_chkpnt'' and so on.  Incremental replication from that point will show the destination check-point with a GMT based timestamp in the name of the check-point.
 
If the volume or share had been previously replicated then an incremental ''diff'' replication is optimal but one can force a complete replication as well and these will be numbered at the destination with a suffix like ''.1_chkpnt'', ''.2_chkpnt'' and so on.  Incremental replication from that point will show the destination check-point with a GMT based timestamp in the name of the check-point.
  
=== Creating a Remote Replication Schedule (DR Policies) ===
+
=== Creating Remote Replication Schedules (DR Policies) ===
  
 
Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule.  To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen.  Right-click on the section header and choose 'Create Replication Schedule'.   
 
Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule.  To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen.  Right-click on the section header and choose 'Create Replication Schedule'.   

Revision as of 12:26, 3 March 2017

The following workflows are intended as a GO-TO guide which outlines the basic steps for initial appliance and grid configuration. More detailed information on each specific area can be found in the Administrators Guide.


Volume & Share Remote-Replication (DR Setup) Overview

QuantaStor supports both Storage Volume (SAN) and Network Share (NAS) remote-replication from appliance to appliance in both a scheduled fashion as well as one-time instant replication for data migration purposes. Remote-replication is done asynchronously which means that changes/deltas to source storage volumes and network shares are replicated to targets as frequently as every few minutes with an interval based schedule or at specific hours for a calendar based schedule.

Once a given set of the volumes and/or network shares have been replicated from one system to another (fully copy) the subsequent periodic replication operations send only the changes (incremental copy) and all information sent over the network is compressed and encrypted. The overhead of the compression and encryption is minimal (typically 15-20%) as QuantaStor leverages the AES-NI features of modern CPUs to offload the heavy lifting of encryption and decryption.

Note: Older QuantaStor systems using XFS based storage pools do not have access to the advanced replication features found in the ZFS based Storage Pools which were introduced in QuantaStor v3. XFS based Storage Pools only support basic rsync style replication of data. We recommend migrating older systems to the latest version of QuantaStor so that the newer more robust ZFS technology and highly efficient replication mechanisms may be utilized.

Minimum Requirements for Remote-replication Setup

  • 2x QuantaStor storage appliances each with a Storage Pool (ZFS based)
  • Storage pools do not need to be the same size or and the hardware and disk types on the appliances can be asymmetrical (non-matching pool size and hardware configurations).
  • Replication may be cascaded across many appliances from pool to pool.
  • Replication may be configured to be N-way, replicating from one-to-many or many-to-one appliance.
  • Replication is incremental/delta based so only the changes are sent and only for actual data blocks, empty space is not transmitted.
  • Replication is supported for both Storage Volumes and Network Shares.
  • Replication interval may be set to as low as 5 minutes for interval based schedule configurations or scheduled to run at specific hours on specific days.
  • All data is AES 256 encrypted on the wire and leverages AES-NI chipset features to accelerate encryption/decryption performance by roughly 8x. Typical performance overhead due to encryption is 20%.

Setup Process Overview

  1. Select the remote-replication tab and choose 'Create Storage System Link'. This will exchange keys between the two appliances so that a replication schedule can be created. You can create an unlimited number of links. The link also stores information about the ports to be used for remote-replication traffic
  2. Select the Volume & Share Replication Schedules section and choose Create in the toolbar to bring up the dialog to create a new remote replication schedule.
    1. Select the replication link that will indicate the direction of replication.
    2. Select the storage pool on the destination system where the replicated shares and volumes will reside
    3. Select the times of day or interval at which replication will be run
    4. Select the volumes and shares to be replicated
    5. Click OK to create the schedule
  3. Interval based replication schedules start momentarily after creation else one may test the schedule by using Trigger Schedule to start it immediately.

Diagram of Completed Configuration

Osn dr workflow.png

Enabling Replication Support between Appliances

The first step in setting up remote-replication is to establish a Storage System Link between two appliances in the storage grid. One must have at least two nodes (storage appliances) configured into a QuantaStor storage grid (link) in order to setup remote replication. QuantaStor's storage grid communication mechanism connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication and high-availability features, while simplifying automation and management operations. After one has setup a storage grid a Storage System Links may be created. The Storage System Link represents a low level security key exchange between the two nodes so that they may transmit data between pools and the link also specifies which network interface should be used for the transmission of data across the link.

Creating a Storage System Link for Replication

Create Storage Link.png

Creation of the Storage System Link may be done through the QuantaStor Manager web interface by selecting the Remote Replication tab, and then choosing the 'Create Storage System Link' button in the toolbar. Select the IP address on each system to be utilized for communication of remote replication network traffic.

Qs sys link created.png

Once the links have been created they'll appear as two separate directional links in the web user interface as shown in the above screenshot.

Configuring Storage System Links for High-Availability Configurations

Storage System Links are bi-directional and deletion of a link in one direction will automatically delete the link in the reverse direction. Remote-replication links which maintain the replication status information between network shares and storage volumes are unaffected by the deletion of Storage Systems Links but if no valid storage system link is available when a Remote Replication Schedule is activated then an alert notice will be raised. The HA failover system is designed to work in tandem with the DR system so in the event that a Storage Pool is manually or automatically failed-over to another appliance the scheduler will automatically select and use the appropriate Storage System Link pair to replicate between the designated pools. Note though that one must establish all the necessary storage system links so that remote replication may continue uninterrupted. For example, if one has appliances A & B configured with an HA pool which has volumes replicating to a HA pool managed by appliances C & D then for Storage System Link pairs must be setup. Specifically A <--> C, A <--> D, B <--> C, B <--> D so that no matter how the source and destination pools are moved between node pairs that the remote replication schedule will be able to continue replication normally.

Modifying a Storage System Link

Use the modify dialog to adjust the replication schedule as needed. Activation of the schedule by the schedule manager within QuantaStor will continue normally per the new settings automatically at the next activation point.

Qs sys link modify.png

Deleting a Storage System Link

Qs sys link delete.png

Creating a one-time Remote Replica

Once a Storage System Link pair has been established replication features for the replication of volumes and network shares between the appliances is made available. Instant replication of volumes and shares is accessible by right-clicking on a given volume or share to be replicated, then choose 'Create Remote Replica...' from the pop-up menu. Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system.

When selecting replication options first select the destination storage system to replicate too (only systems which have established and online storage system links will be displayed) and the destination storage pool within that system should be utilized to hold the remote replica.

If the volume or share had been previously replicated then an incremental diff replication is optimal but one can force a complete replication as well and these will be numbered at the destination with a suffix like .1_chkpnt, .2_chkpnt and so on. Incremental replication from that point will show the destination check-point with a GMT based timestamp in the name of the check-point.

Creating Remote Replication Schedules (DR Policies)

Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule. To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen. Right-click on the section header and choose 'Create Replication Schedule'.

Qs remote replication sched create.png

Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems. You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule. If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points. If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.

Interval based Remote Replication Schedules

Replication can be done on a schedule at specific times and days of the week or can be done continuously on an interval. The interval represents the amount of time between replication activities where no replication is done. The system automatically detects active schedules and will not active a given schedule again while any volumes or shares are still being replicated. As such the replication schedule interval is more of a delay interval between replications and can be safely set to a low value. Since only the changes are replicated it is often better to replicate more frequently to replicate throughout the day versus replicating all at once at the end of the day.

Qs remote replication sched interval.png

Modifying Remote Replication Schedules (DR Policies)

Schedules may be adjusted at any time by right-clicking on the schedule then choosing Modify Schedule.. to make changes.

Qs remote replication sched modify.png

Remote Replication Bandwidth Throttling

WAN links are often limited in bandwidth in a range between 2MB-60MBytes/sec for on-premises deployments and 20MBytes-100MBytes/sec and higher in datacenters depending on the service provider. QuantaStor does automatic load balancing of replication activities to limit the impact to active workloads and to limit the use of your available WAN or LAN bandwidth. By default QuantaStor comes pre-configured to limit replication bandwidth to 50MB/sec but you can increase this or decrease it to better match the bandwidth and network throughput limits of your environment. This default is a good default for datacenter deployments but hybrid cloud deployments where data is replicating to/from an on-premises site(s) should be configured to take up no more than 50% of your available WAN bandwidth so as to not disrupt other activities and workloads.

Here are the CLI commands available for adjusting the replication rate limit. To get the current limit use the 'qs-util rratelimitget' and to set the rate limit to a new value, (example, 4MB/sec) you can set the limit like so 'qs-util rratelimitset 4'.

  Replication Load Balancing
    qs-util rratelimitget            : Current max bandwidth available for all remote replication streams.
    qs-util rratelimitset NN         : Sets the max bandwidth available in MB/sec across all replication streams.
    qs-util rraterebalance           : Rebalances all active replication streams to evenly share the configured limit.
                                       Example: If the rratelimit (NN) is set to 100 (MB/sec) and there are 5 active
                                       replication streams then each stream will be limited to 20MBytes/sec (100/5)
                                       QuantaStor automatically reblanances replication streams every minute unless
                                       the file /etc/rratelimit.disable is present.

To run the above mentioned commands you must login to your storage appliance via SSH or via the console. Here's an example of setting the rate limit to 50MB/sec.

sudo qs-util rratelimitset 50

At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute. This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job. If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.


Using Replica Checkpoints for Disaster Recovery

Using Storage Volume Checkpoints

Checkpoint volumes have the suffix _chkpnt typically followed by a GMT timestamp. At the completion of each replication cycle the check-point volume parent which has no GMT timestamp suffix will contain all the latest information from the last transfer. In the event of a failure of a primary node one need only access a given check-point Storage Volume and it will switch to the Active Replica Checkpoint stage. This indicates that the check-point volume may have been used or written to by users so care should be taken to rollback that information using Replica Rollback... to the primary side as part of the DR fail-back process to preserve any changes.

Using Network Share Checkpoints

Network share check-points also contain the _chkpnt suffix just as with Storage Volume replica check-points and this can create some challenges if users are expecting to find their data under a share with the same name. For example, a share named backups will have the name backups_chkpnt at the destination / DR site appliance. There is an easy way to resolve this through the use of Network Share Aliases. Simply right-click on the check-point for the Network Share (eg. backups_chkpnt) and choose Create Alias/Sub-share... which allows one to assign alternate names to a share. After the alias is created as backups the share will be accessible both as backups and backups_chkpnt. If users find the dual-naming confusing then the browsable setting may be adjusted on the backups_chkpnt to hide it from users using the Modify Share dialog.

Clearing the Active Replica Checkpoint status on a Storage Volume

  1. Ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.
  2. Remove all Hosts and Host Group Associations from the source Storage Volume.


  1. Right-click on the Replication Schedule associated with the source and destination Storage Volume/Network Share and click 'Delete Schedule'.

Step 4) Right click on the Replication Link associated with the source and destination Storage Volume/Network Share and select the 'Delete Replica Association' option, which will open the 'Delete Remote Replication Link' Dialog. You will want to use the defaults in this dialog and click 'OK'

Delete Remote Replication Link.png

At this stage there is no longer a replication link or associations between the source and destination _chkpnt Storage Volume/Network Share. Both the original source and Destination _chkpnt Storage Volume/Network Share can be renamed using the Modify Storage Pool or Modify Network Share dialogs and mapped to client access as required.

Please note: If you are looking to use the same name for the _chkpnt Storage Volume/Netwprk share as used on the Source system and the Source QuantaStor appliance is offline/unavailable, you may need to remove it from the grid at this stage as it will not be accessible to perform the rename operation using the Modify Storage Volume or Modify network Share Dialog. In this event after removal of the offline QuantaStor node from the Grid, you can skip directly to step B below.

Renaming the _chkpnt Storage Volume/Network Share to be the same as the original Source Storage Volume/Network Share.

Step A) Right click on the original Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share to add '_bak' or any other unique postfix to the end and click 'OK'. Once you are done with the Promotion/Migration you can remove this backup(_bak) version and it's associated snapshots. Our multi-delete feature is useful for this sort of batch deletion process.

Example screenshot below showing the Modify Storage Volume for renaming the source Storage Volume to _bak

Modify Storage Volume rename bak.png

Step B) Right click on the replicated _chkpnt Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share as you see fit and click 'OK'.

Example screenshot below showing the Modify Storage Volume for renaming the destination _chkpnt Storage Volume to the name originally used by the Source volume.

Modify Storage Volume rename.png

Step C) Map client access to the Promoted Storage Volume / Network Share

For Storage Volumes, map lun access to your clients using the Host or Host Groups option detailed here: Managing Hosts

For Network Shares map them out using the CIFS/NFS access permissions as detailed here: Managing Network Shares

Please note: If this procedure was performed for Disaster recovery of a failed Primary QuantaStor node, once the original Primary node is brought online once more the old out of date Storage Volume/Network Share will need to be renamed to an '_bak' or your preferred postfix( or removed to free up space) and for the node to be re-added to the grid. Replication can then be configured from the new Primary Source QuantaStor to the recovered Quantastor appliance in a role as a Secondary replication destination target.