Difference between revisions of "Remote-replication (DR)"

From OSNEXUS Online Documentation Site
Jump to: navigation, search
m (Creating a Storage System Link to Enable Replication Support between Appliances)
m
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following workflows are intended as a GO-TO guide which outlines the basic steps for initial appliance and grid configuration.  More detailed information on each specific area can be found in the Administrators Guide. 
+
=== Volume & Share Remote-Replication (DR Setup) Overview ===
  
 +
QuantaStor supports both Storage Volume (SAN) and Network Share (NAS) remote-replication from appliance to appliance in both a scheduled fashion as well as one-time instant replication for data migration purposes.  Remote-replication is done asynchronously which means that changes/deltas to source storage volumes and network shares are replicated to targets as frequently as every few minutes with an interval based schedule or at specific hours for a calendar based schedule. 
  
== DR / Remote-replication of SAN/NAS Storage (ZFS based Storage Pools) ==
+
Once a given set of the volumes and/or network shares have been replicated from one system to another (fully copy) the subsequent periodic replication operations send only the changes (incremental copy) and all information sent over the network is compressed and encrypted.  The overhead of the compression and encryption is minimal (typically 15-20%) as QuantaStor leverages the AES-NI features of modern CPUs to offload the heavy lifting of encryption and decryption. 
  
=== Minimum Hardware Requirements ===
+
: ''Note: Older QuantaStor systems using XFS based storage pools do not have access to the advanced replication features found in the ZFS based Storage Pools which were introduced in QuantaStor v3.  XFS based Storage Pools only support basic ''rsync'' style replication of data.  We recommend migrating older systems to the latest version of QuantaStor so that the newer more robust ZFS technology and highly efficient replication mechanisms may be utilized.''
 +
 
 +
=== Minimum Requirements for Remote-replication Setup ===
  
 
* 2x QuantaStor storage appliances each with a ''Storage Pool'' (ZFS based)
 
* 2x QuantaStor storage appliances each with a ''Storage Pool'' (ZFS based)
** Storage pools do not need to be the same size or and the hardware and disk types on the appliances can be asymmetrical (non-matching pool size and hardware configurations).
+
* Storage pools do not need to be the same size or and the hardware and disk types on the appliances can be asymmetrical (non-matching pool size and hardware configurations).
** Replication may be cascaded across many appliances from pool to pool.
+
* Replication may be cascaded across many appliances from pool to pool.
** Replication may be configured to be N-way, replicating from one-to-many or many-to-one appliance.
+
* Replication may be configured to be N-way, replicating from one-to-many or many-to-one appliance.
** Replication is incremental/delta based so only the changes are sent and only for actual data blocks, empty space is not transmitted.
+
* Replication is incremental/delta based so only the changes are sent and only for actual data blocks, empty space is not transmitted.
** Replication is supported for both Storage Volumes and Network Shares.
+
* Replication is supported for both Storage Volumes and Network Shares.
** Replication interval may be set to as low as 5 minutes for interval based schedule configurations or scheduled to run at specific hours on specific days.
+
* Replication interval may be set to as low as 5 minutes for interval based schedule configurations or scheduled to run at specific hours on specific days.
** All data is AES 256 encrypted on the wire and leverages AES-NI chipset features to accelerate encryption/decryption performance by roughly 8x.  Typical performance overhead due to encryption is 20%.
+
* All data is AES 256 encrypted on the wire and leverages AES-NI chipset features to accelerate encryption/decryption performance by roughly 8x.  Typical performance overhead due to encryption is 20%.
  
=== Setup Process ===
+
=== Setup Process Overview ===
  
* Select the remote-replication tab and choose 'Create Storage System Link'. This will exchange keys between the two appliances so that a replication schedule can be created.  You can create an unlimited number of links.  The link also stores information about the ports to be used for remote-replication traffic
+
# Select the remote-replication tab and choose 'Create Storage System Link'. This will exchange keys between the two appliances so that a replication schedule can be created.  You can create an unlimited number of links.  The link also stores information about the ports to be used for remote-replication traffic
* Select the ''Volume & Share Replication Schedules'' section and choose ''Create'' in the toolbar to bring up the dialog to create a new remote replication schedule
+
# Select the ''Volume & Share Replication Schedules'' section and choose ''Create'' in the toolbar to bring up the dialog to [[Remote-replication_/_Disaster_Recovery_Setup#Creating_a_Storage_System_Link_for_Replication | create a new remote replication schedule.]]
** Select the replication link that will indicate the direction of replication.
+
## Select the replication link that will indicate the direction of replication.
** Select the storage pool on the destination system where the replicated shares and volumes will reside
+
## Select the storage pool on the destination system where the replicated shares and volumes will reside
** Select the times of day or interval at which replication will be run
+
## Select the times of day or interval at which replication will be run
** Select the volumes and shares to be replicated
+
## Select the volumes and shares to be replicated
** Click OK to create the schedule
+
## Click OK to create the schedule
* The Remote-Replication/DR Schedule is now created.  If you chose an interval based replication schedule it will start momentarily.  If you chose one that runs at specific times of day it will not trigger until that time. 
+
# Interval based replication schedules start momentarily after creation else one may test the schedule by using ''Trigger Schedule'' to start it immediately.
* You can test the schedule by using ''Trigger Schedule'' it to start immediately.
+
  
 
=== Diagram of Completed Configuration ===
 
=== Diagram of Completed Configuration ===
Line 31: Line 33:
 
[[File:osn_dr_workflow.png|700px]]
 
[[File:osn_dr_workflow.png|700px]]
  
 +
=== Enabling Replication Support between Appliances ===
  
== Volume & Share Remote-Replication (Disaster Recovery / DR Setup) ==
+
The first step in setting up remote-replication is to establish a ''Storage System Link'' between two appliances in the storage grid.  One must have at least two nodes (storage appliances) configured into a QuantaStor storage grid ([http://wiki.osnexus.com/index.php?title=QuantaStor_Administrators_Guide#Grid_Setup_Procedure link]) in order to setup remote replication.  QuantaStor's storage grid communication mechanism connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication and high-availability features, while simplifying automation and management operations.  After one has setup a storage grid  a Storage System Links may be created.  The Storage System Link represents a low level security key exchange between the two nodes so that they may transmit data between pools and the link also specifies which network interface should be used for the transmission of data across the link. 
  
Volume and Share Remote-replication within QuantaStor allows you to copy a volume or network share from one QuantaStor storage system to another and is a great tool for migrating volumes and network shares between systems and for using a remote system as a DR site.  Remote replication is done asynchronously which means that changes/deltas to volumes and network shares on the source volume or share are replicated up to every hour with calendar based schedules, and up to every 15 minutes with timer based schedules. 
+
==== Creating a Storage System Link for Replication ====
  
Once a given set of the volumes and/or network shares have been replicated from one system to another the subsequent periodic replication operations send only the changes and all information sent over the network is compressed to minimize network bandwidth and encrypted for security.  ZFS based storage pools use the ZFS send/receive mechanism which efficiently sends just the changes so it works well over limited bandwidth networks.
+
[[File:Create Storage Link.png|1100px]]
  
''Note: Older QuantaStor v2 and v3 systems defaulted to the XFS based storage pools type and do not have the advanced replication mechanisms like ZFS send/receive so we employ more brute force techniques for replication.  Specifically, when you replicate an XFS based storage volume or network share QuantaStor uses the linux rsync utility.  It does have compression and it will only send changes but it doesn't work well with large files because the entire file must be scanned and in some cases resent over the networkBecause of this we highly recommend using ZFS based storage pools for all deployments unless you specifically need the high sequential IO performance of XFS for a specific application.''
+
Creation of the Storage System Link may be done through the QuantaStor Manager web interface by selecting the ''Remote Replication'' tab, and then choosing the 'Create Storage System Link' button in the toolbarSelect the IP address on each system to be utilized for communication of remote replication network traffic.
  
=== Enable Replication Support between Appliances ===
+
[[File:qs_sys_link_created.png]]
  
The first step in setting up remote-replication is to establish a ''Storage System Link'' between two appliances in the storage grid.  One must have at least two nodes (storage appliances) configured into a QuantaStor storage grid ([http://wiki.osnexus.com/index.php?title=QuantaStor_Administrators_Guide#Grid_Setup_Procedure link]) in order to setup remote replication.  QuantaStor's storage grid communication mechanism connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication and high-availability features, while simplifying automation and management operations.  After one has setup a storage grid  a Storage System Links may be created.  The Storage System Link represents a low level security key exchange between the two nodes so that they may transmit data between pools and the link also specifies which network interface should be used for the transmission of data across the link.
+
Once the links have been created they'll appear as two separate directional links in the web user interface as shown in the above screenshot.
  
==== Creating a Storage System Link for Replication ====
+
==== Configuring Storage System Links for High-Availability Configurations ====
  
[[File:Create Storage Link.png|1100px]]
+
Storage System Links are bi-directional and deletion of a link in one direction will automatically delete the link in the reverse direction.  Remote-replication links which maintain the replication status information between network shares and storage volumes are unaffected by the deletion of Storage Systems Links but if no valid storage system link is available when a ''Remote Replication Schedule'' is activated then an alert notice will be raised.  The HA failover system is designed to work in tandem with the DR system so in the event that a ''Storage Pool'' is manually or automatically failed-over to another appliance the scheduler will automatically select and use the appropriate Storage System Link pair to replicate between the designated pools.  Note though that one must establish all the necessary storage system links so that remote replication may continue uninterrupted.  For example, if one has appliances A & B configured with an HA pool which has volumes replicating to a HA pool managed by appliances C & D then for Storage System Link pairs must be setup.  Specifically A <--> C, A <--> D, B <--> C, B <--> D so that no matter how the source and destination pools are moved between node pairs that the remote replication schedule will be able to continue replication normally.
  
Creation of the Storage System Link may be done through the QuantaStor Manager web interface by selecting the ''Remote Replication'' tab, and then choosing the 'Create Storage System Link' button in the toolbar.  Select the IP address on each system to be utilized for communication of remote replication network traffic.
+
==== Modifying a Storage System Link ====
  
=== Creating a Remote Replica ===
+
Use the modify dialog to adjust the replication schedule as needed.  Activation of the schedule by the schedule manager within QuantaStor will continue normally per the new settings automatically at the next activation point.
  
Once you have a Storage System Link created between two systems you can now replicate volumes and network shares in either direction. Simply login to the system that you want to replicate volumes from, right-click on the volume to be replicated, then choose 'Create Remote Replica'.  Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system.  As such, when you create a remote-replica you must specify which storage system you want to replicate too (only systems which have established and online storage system links will be displayed) and which storage pool within that system should be utilized to hold the remote replica.  If you have already replicated the specified volume to the remote storage system then you can re-sync the remote volume by choosing the remote-replica association in the web interface and choosing 'resync'.  This can also be done via the 'Create Remote Replica' dialog and then choose the option to replicate to an existing target if available.
+
[[File:qs_sys_link_modify.png]]
  
=== Creating a Remote Replication Schedule / DR Replication Policy ===
+
==== Deleting a Storage System Link ====
 +
 
 +
 
 +
To remove a storage system link simply right-click on the link within the web user interface and choose ''Delete Storage System Link..''.
 +
 
 +
[[File:qs_sys_link_delete.png]]
 +
 
 +
Deletion of a storage system replication link will also delete the reverse direction link as they are added and removed as pairs.  If the IP addresses or other configuration settings on a given set of appliances has changed one may delete the links and recreate them without having to adjust the replication schedules.  The schedules automatically select the currently available link necessary for a given replication task.
 +
 
 +
=== Creating a one-time Remote Replica ===
 +
 
 +
Once a Storage System Link pair has been established replication features for the replication of volumes and network shares between the appliances is made available.  Instant replication of volumes and shares is accessible by right-clicking on a given volume or share to be replicated, then choose 'Create Remote Replica...' from the pop-up menu.  Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system. 
 +
 
 +
When selecting replication options first select the destination storage system to replicate too (only systems which have established and online storage system links will be displayed) and the destination storage pool within that system should be utilized to hold the remote replica. 
 +
 
 +
If the volume or share had been previously replicated then an incremental ''diff'' replication is optimal but one can force a complete replication as well and these will be numbered at the destination with a suffix like ''.1_chkpnt'', ''.2_chkpnt'' and so on.  Incremental replication from that point will show the destination check-point with a GMT based timestamp in the name of the check-point.
 +
 
 +
=== Creating Remote Replication Schedules (DR Policies) ===
  
 
Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule.  To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen.  Right-click on the section header and choose 'Create Replication Schedule'.   
 
Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule.  To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen.  Right-click on the section header and choose 'Create Replication Schedule'.   
  
[[File:Drsetup1.png|1000px]]
+
[[File:qs_remote_replication_sched_create.png|1000px]]
  
 
Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems.  You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule.  If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points.  If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.
 
Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems.  You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule.  If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points.  If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.
 +
 +
==== Interval based Remote Replication Schedules ====
 +
 +
Replication can be done on a schedule at specific times and days of the week or can be done continuously on an interval.  The interval represents the amount of time between replication activities where no replication is done.  The system automatically detects active schedules and will not active a given schedule again while any volumes or shares are still being replicated.  As such the replication schedule interval is more of a delay interval between replications and can be safely set to a low value.  Since only the changes are replicated it is often better to replicate more frequently to replicate throughout the day versus replicating all at once at the end of the day.
 +
 +
[[File:qs_remote_replication_sched_interval.png|1200px]]
 +
 +
=== Modifying Remote Replication Schedules (DR Policies) ===
 +
 +
Schedules may be adjusted at any time by right-clicking on the schedule then choosing ''Modify Schedule..'' to make changes.
 +
 +
[[File:qs_remote_replication_sched_modify.png|1000px]]
  
 
=== Remote Replication Bandwidth Throttling ===
 
=== Remote Replication Bandwidth Throttling ===
Line 85: Line 117:
 
At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute.  This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job.  If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.
 
At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute.  This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job.  If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.
  
 +
=== Using Storage Volume Checkpoints for Disaster Recovery ===
  
=== Permanently Promoting a Replicated Storage Volume or Network Share ===
+
Checkpoint volumes have the suffix ''_chkpnt'' typically followed by a ''GMT'' timestamp. At the completion of each replication cycle the check-point volume parent which has no ''GMT'' timestamp suffix will contain all the latest information from the last transfer. In the event of a failure of a primary node one need only access a given check-point ''Storage Volume'' and it will switch to the ''Active Replica Checkpoint'' stage. This indicates that the check-point volume may have been used or written to by users so care should be taken to rollback that information using ''Replica Rollback...'' to the primary side as part of the DR fail-back process to preserve any changes.
 
+
The below process details how to Promote a _chkpnt Storage Volume/Network Share in the event of a failure of the primary node. This same procedure can be used to permanently migrate data to a Storage Pool on a different QuantaStor appliance using remote replication.
+
 
+
If the Replication Source system is offline due to a hardware failure of the appliance, you can skip directly to Step 3.
+
 
+
Step 1) Please ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.
+
 
+
Step 2) Remove all Hosts and Host Group Associations from the source Storage Volume.
+
 
+
Step 3) Right Click on the Replication Schedule associated with the source and destination Storage Volume/Network Share and click 'Delete Schedule'.
+
 
+
Step 4) Right click on the Replication Link associated with the source and destination Storage Volume/Network Share and select the 'Delete Replica Association' option, which will open the 'Delete Remote Replication Link' Dialog. You will want to use the defaults in this dialog and click 'OK'
+
 
+
[[File:Delete_Remote_Replication_Link.png|800px]]
+
 
+
At this stage there is no longer a replication link or associations between the source and destination _chkpnt Storage Volume/Network Share. Both the original source and Destination _chkpnt Storage Volume/Network Share can be renamed using the Modify Storage Pool or Modify Network Share dialogs and mapped to client access as required.
+
 
+
Please note: If you are looking to use the same name for the _chkpnt Storage Volume/Netwprk share as used on the Source system and the Source QuantaStor appliance is offline/unavailable, you may need to remove it from the grid at this stage as it will not be accessible to perform the rename operation using the Modify Storage Volume or Modify network Share Dialog. In this event after removal of the offline QuantaStor node from the Grid, you can skip directly to step B below.
+
 
+
Renaming the _chkpnt Storage Volume/Network Share to be the same as the original Source Storage Volume/Network Share.
+
 
+
Step A) Right click on the original Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share to add '_bak' or any other unique postfix to the end and click 'OK'. Once you are done with the Promotion/Migration you can remove this backup(_bak) version and it's associated snapshots. Our multi-delete feature is useful for this sort of batch deletion process.
+
 
+
Example screenshot below showing the Modify Storage Volume for renaming the source Storage Volume to _bak
+
 
+
[[File:Modify_Storage_Volume_rename_bak.png|800px]]
+
 
+
Step B) Right click on the replicated _chkpnt Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share as you see fit and click 'OK'.
+
  
Example screenshot below showing the Modify Storage Volume for renaming the destination _chkpnt Storage Volume to the name originally used by the Source volume.
+
=== Using Network Share Checkpoints for Disaster Recovery ===
  
[[File:Modify_Storage_Volume_rename.png|800px]]
+
Network share check-points also contain the ''_chkpnt'' suffix just as with ''Storage Volume'' replica check-points and this can create some challenges if users are expecting to find their data under a share with the same name.  For example, a share named ''backups'' will have the name ''backups_chkpnt'' at the destination / DR site appliance.  There is an easy way to resolve this through the use of Network Share Aliases.  Simply right-click on the check-point for the ''Network Share'' (eg. backups_chkpnt) and choose ''Create Alias/Sub-share...'' which allows one to assign alternate names to a share.  After the alias is created as ''backups'' the share will be accessible both as ''backups'' and ''backups_chkpnt''.  If users find the dual-naming confusing then the ''browsable'' setting may be adjusted on the ''backups_chkpnt'' to hide it from users using the ''Modify Share'' dialog.
  
Step C) Map client access to the Promoted Storage Volume / Network Share
+
==== Clearing the ''Active Replica Checkpoint'' status on a Storage Volume ====
  
For Storage Volumes, map lun access to your clients using the Host or Host Groups option detailed here: [[QuantaStor_Administrators_Guide#Managing_Hosts|Managing Hosts]]
+
Any volume marked as a active replica checkpoint (shows a green dot on the volume in the web UI) is protected from being overwritten by the automatic replication schedule system.  In order for the replication schedule to be able to overwrite the volume again so that replication from source to destination check-point may resume the following steps must be followed:
  
For Network Shares map them out using the CIFS/NFS access permissions as detailed here: [[QuantaStor_Administrators_Guide#Managing_Network_Shares|Managing Network Shares]]
+
# Ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.
 +
# Remove all Hosts and Host Group Associations from the source Storage Volume.
 +
# Use the ''Modify Storage Volume Remove'' dialog to clear the ''Active Replica Checkpoint'' status.
  
Please note: If this procedure was performed for Disaster recovery of a failed Primary QuantaStor node, once the original Primary node is brought online once more the old out of date Storage Volume/Network Share will need to be renamed to an '_bak' or your preferred postfix( or removed to free up space) and for the node to be re-added to the gridReplication can then be configured from the new Primary Source QuantaStor to the recovered Quantastor appliance in a role as a Secondary replication destination target.
+
Note that if a host logs into a check-point it will establish an iSCSI session and the system will automatically mark the volume as an active checkpoint againMake sure all iSCSI sessions to the volume have been closed or dropped if the active checkpoint flag is getting automatically marked again.

Revision as of 12:41, 3 March 2017

Volume & Share Remote-Replication (DR Setup) Overview

QuantaStor supports both Storage Volume (SAN) and Network Share (NAS) remote-replication from appliance to appliance in both a scheduled fashion as well as one-time instant replication for data migration purposes. Remote-replication is done asynchronously which means that changes/deltas to source storage volumes and network shares are replicated to targets as frequently as every few minutes with an interval based schedule or at specific hours for a calendar based schedule.

Once a given set of the volumes and/or network shares have been replicated from one system to another (fully copy) the subsequent periodic replication operations send only the changes (incremental copy) and all information sent over the network is compressed and encrypted. The overhead of the compression and encryption is minimal (typically 15-20%) as QuantaStor leverages the AES-NI features of modern CPUs to offload the heavy lifting of encryption and decryption.

Note: Older QuantaStor systems using XFS based storage pools do not have access to the advanced replication features found in the ZFS based Storage Pools which were introduced in QuantaStor v3. XFS based Storage Pools only support basic rsync style replication of data. We recommend migrating older systems to the latest version of QuantaStor so that the newer more robust ZFS technology and highly efficient replication mechanisms may be utilized.

Minimum Requirements for Remote-replication Setup

  • 2x QuantaStor storage appliances each with a Storage Pool (ZFS based)
  • Storage pools do not need to be the same size or and the hardware and disk types on the appliances can be asymmetrical (non-matching pool size and hardware configurations).
  • Replication may be cascaded across many appliances from pool to pool.
  • Replication may be configured to be N-way, replicating from one-to-many or many-to-one appliance.
  • Replication is incremental/delta based so only the changes are sent and only for actual data blocks, empty space is not transmitted.
  • Replication is supported for both Storage Volumes and Network Shares.
  • Replication interval may be set to as low as 5 minutes for interval based schedule configurations or scheduled to run at specific hours on specific days.
  • All data is AES 256 encrypted on the wire and leverages AES-NI chipset features to accelerate encryption/decryption performance by roughly 8x. Typical performance overhead due to encryption is 20%.

Setup Process Overview

  1. Select the remote-replication tab and choose 'Create Storage System Link'. This will exchange keys between the two appliances so that a replication schedule can be created. You can create an unlimited number of links. The link also stores information about the ports to be used for remote-replication traffic
  2. Select the Volume & Share Replication Schedules section and choose Create in the toolbar to bring up the dialog to create a new remote replication schedule.
    1. Select the replication link that will indicate the direction of replication.
    2. Select the storage pool on the destination system where the replicated shares and volumes will reside
    3. Select the times of day or interval at which replication will be run
    4. Select the volumes and shares to be replicated
    5. Click OK to create the schedule
  3. Interval based replication schedules start momentarily after creation else one may test the schedule by using Trigger Schedule to start it immediately.

Diagram of Completed Configuration

Osn dr workflow.png

Enabling Replication Support between Appliances

The first step in setting up remote-replication is to establish a Storage System Link between two appliances in the storage grid. One must have at least two nodes (storage appliances) configured into a QuantaStor storage grid (link) in order to setup remote replication. QuantaStor's storage grid communication mechanism connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication and high-availability features, while simplifying automation and management operations. After one has setup a storage grid a Storage System Links may be created. The Storage System Link represents a low level security key exchange between the two nodes so that they may transmit data between pools and the link also specifies which network interface should be used for the transmission of data across the link.

Creating a Storage System Link for Replication

Create Storage Link.png

Creation of the Storage System Link may be done through the QuantaStor Manager web interface by selecting the Remote Replication tab, and then choosing the 'Create Storage System Link' button in the toolbar. Select the IP address on each system to be utilized for communication of remote replication network traffic.

Qs sys link created.png

Once the links have been created they'll appear as two separate directional links in the web user interface as shown in the above screenshot.

Configuring Storage System Links for High-Availability Configurations

Storage System Links are bi-directional and deletion of a link in one direction will automatically delete the link in the reverse direction. Remote-replication links which maintain the replication status information between network shares and storage volumes are unaffected by the deletion of Storage Systems Links but if no valid storage system link is available when a Remote Replication Schedule is activated then an alert notice will be raised. The HA failover system is designed to work in tandem with the DR system so in the event that a Storage Pool is manually or automatically failed-over to another appliance the scheduler will automatically select and use the appropriate Storage System Link pair to replicate between the designated pools. Note though that one must establish all the necessary storage system links so that remote replication may continue uninterrupted. For example, if one has appliances A & B configured with an HA pool which has volumes replicating to a HA pool managed by appliances C & D then for Storage System Link pairs must be setup. Specifically A <--> C, A <--> D, B <--> C, B <--> D so that no matter how the source and destination pools are moved between node pairs that the remote replication schedule will be able to continue replication normally.

Modifying a Storage System Link

Use the modify dialog to adjust the replication schedule as needed. Activation of the schedule by the schedule manager within QuantaStor will continue normally per the new settings automatically at the next activation point.

Qs sys link modify.png

Deleting a Storage System Link

To remove a storage system link simply right-click on the link within the web user interface and choose Delete Storage System Link...

Qs sys link delete.png

Deletion of a storage system replication link will also delete the reverse direction link as they are added and removed as pairs. If the IP addresses or other configuration settings on a given set of appliances has changed one may delete the links and recreate them without having to adjust the replication schedules. The schedules automatically select the currently available link necessary for a given replication task.

Creating a one-time Remote Replica

Once a Storage System Link pair has been established replication features for the replication of volumes and network shares between the appliances is made available. Instant replication of volumes and shares is accessible by right-clicking on a given volume or share to be replicated, then choose 'Create Remote Replica...' from the pop-up menu. Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system.

When selecting replication options first select the destination storage system to replicate too (only systems which have established and online storage system links will be displayed) and the destination storage pool within that system should be utilized to hold the remote replica.

If the volume or share had been previously replicated then an incremental diff replication is optimal but one can force a complete replication as well and these will be numbered at the destination with a suffix like .1_chkpnt, .2_chkpnt and so on. Incremental replication from that point will show the destination check-point with a GMT based timestamp in the name of the check-point.

Creating Remote Replication Schedules (DR Policies)

Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule. To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen. Right-click on the section header and choose 'Create Replication Schedule'.

Qs remote replication sched create.png

Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems. You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule. If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points. If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.

Interval based Remote Replication Schedules

Replication can be done on a schedule at specific times and days of the week or can be done continuously on an interval. The interval represents the amount of time between replication activities where no replication is done. The system automatically detects active schedules and will not active a given schedule again while any volumes or shares are still being replicated. As such the replication schedule interval is more of a delay interval between replications and can be safely set to a low value. Since only the changes are replicated it is often better to replicate more frequently to replicate throughout the day versus replicating all at once at the end of the day.

Qs remote replication sched interval.png

Modifying Remote Replication Schedules (DR Policies)

Schedules may be adjusted at any time by right-clicking on the schedule then choosing Modify Schedule.. to make changes.

Qs remote replication sched modify.png

Remote Replication Bandwidth Throttling

WAN links are often limited in bandwidth in a range between 2MB-60MBytes/sec for on-premises deployments and 20MBytes-100MBytes/sec and higher in datacenters depending on the service provider. QuantaStor does automatic load balancing of replication activities to limit the impact to active workloads and to limit the use of your available WAN or LAN bandwidth. By default QuantaStor comes pre-configured to limit replication bandwidth to 50MB/sec but you can increase this or decrease it to better match the bandwidth and network throughput limits of your environment. This default is a good default for datacenter deployments but hybrid cloud deployments where data is replicating to/from an on-premises site(s) should be configured to take up no more than 50% of your available WAN bandwidth so as to not disrupt other activities and workloads.

Here are the CLI commands available for adjusting the replication rate limit. To get the current limit use the 'qs-util rratelimitget' and to set the rate limit to a new value, (example, 4MB/sec) you can set the limit like so 'qs-util rratelimitset 4'.

  Replication Load Balancing
    qs-util rratelimitget            : Current max bandwidth available for all remote replication streams.
    qs-util rratelimitset NN         : Sets the max bandwidth available in MB/sec across all replication streams.
    qs-util rraterebalance           : Rebalances all active replication streams to evenly share the configured limit.
                                       Example: If the rratelimit (NN) is set to 100 (MB/sec) and there are 5 active
                                       replication streams then each stream will be limited to 20MBytes/sec (100/5)
                                       QuantaStor automatically reblanances replication streams every minute unless
                                       the file /etc/rratelimit.disable is present.

To run the above mentioned commands you must login to your storage appliance via SSH or via the console. Here's an example of setting the rate limit to 50MB/sec.

sudo qs-util rratelimitset 50

At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute. This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job. If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.

Using Storage Volume Checkpoints for Disaster Recovery

Checkpoint volumes have the suffix _chkpnt typically followed by a GMT timestamp. At the completion of each replication cycle the check-point volume parent which has no GMT timestamp suffix will contain all the latest information from the last transfer. In the event of a failure of a primary node one need only access a given check-point Storage Volume and it will switch to the Active Replica Checkpoint stage. This indicates that the check-point volume may have been used or written to by users so care should be taken to rollback that information using Replica Rollback... to the primary side as part of the DR fail-back process to preserve any changes.

Using Network Share Checkpoints for Disaster Recovery

Network share check-points also contain the _chkpnt suffix just as with Storage Volume replica check-points and this can create some challenges if users are expecting to find their data under a share with the same name. For example, a share named backups will have the name backups_chkpnt at the destination / DR site appliance. There is an easy way to resolve this through the use of Network Share Aliases. Simply right-click on the check-point for the Network Share (eg. backups_chkpnt) and choose Create Alias/Sub-share... which allows one to assign alternate names to a share. After the alias is created as backups the share will be accessible both as backups and backups_chkpnt. If users find the dual-naming confusing then the browsable setting may be adjusted on the backups_chkpnt to hide it from users using the Modify Share dialog.

Clearing the Active Replica Checkpoint status on a Storage Volume

Any volume marked as a active replica checkpoint (shows a green dot on the volume in the web UI) is protected from being overwritten by the automatic replication schedule system. In order for the replication schedule to be able to overwrite the volume again so that replication from source to destination check-point may resume the following steps must be followed:

  1. Ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.
  2. Remove all Hosts and Host Group Associations from the source Storage Volume.
  3. Use the Modify Storage Volume Remove dialog to clear the Active Replica Checkpoint status.

Note that if a host logs into a check-point it will establish an iSCSI session and the system will automatically mark the volume as an active checkpoint again. Make sure all iSCSI sessions to the volume have been closed or dropped if the active checkpoint flag is getting automatically marked again.