Difference between revisions of "Remote-replication (DR)"

From OSNEXUS Online Documentation Site
Jump to: navigation, search
(Created page with "The following workflows are intended as a GO-TO guide which outlines the basic steps for initial appliance and grid configuration. More detailed information on each specific...")
 
m
Line 30: Line 30:
  
 
[[File:osn_dr_workflow.png|700px]]
 
[[File:osn_dr_workflow.png|700px]]
 +
 +
 +
== Volume & Share Remote-Replication (Disaster Recovery / DR Setup) ==
 +
 +
Volume and Share Remote-replication within QuantaStor allows you to copy a volume or network share from one QuantaStor storage system to another and is a great tool for migrating volumes and network shares between systems and for using a remote system as a DR site.  Remote replication is done asynchronously which means that changes/deltas to volumes and network shares on the source volume or share are replicated up to every hour with calendar based schedules, and up to every 15 minutes with timer based schedules. 
 +
 +
Once a given set of the volumes and/or network shares have been replicated from one system to another the subsequent periodic replication operations send only the changes and all information sent over the network is compressed to minimize network bandwidth and encrypted for security.  ZFS based storage pools use the ZFS send/receive mechanism which efficiently sends just the changes so it works well over limited bandwidth networks.  Also, if your storage pool has compression enabled the changes sent over the network are also compressed which further reduces your WAN network load.
 +
 +
==== Limits of XFS based Volume/Network Share Replication ====
 +
XFS based storage pools do not have the advanced replication mechanisms like ZFS send/receive so we employ more brute force techniques for replication.  Specifically, when you replicate an XFS based storage volume or network share QuantaStor uses the linux rsync utility.  It does have compression and it will only send changes but it doesn't work well with large files because the entire file must be scanned and in some cases resent over the network.  Because of this we highly recommend using ZFS based storage pools for all deployments unless you specifically need the high sequential IO performance of XFS for a specific application.
 +
 +
=== Creating a Storage System Link ===
 +
 +
The first step in setting up DR/remote-replication between two systems is to have at least nodes (storage appliances) configured into a Grid ([http://wiki.osnexus.com/index.php?title=QuantaStor_Administrators_Guide#Grid_Setup_Procedure link]).  QuantaStor has a grid communication mechanism that connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication, and simplify management operations.  After you create the grid  you'll need to setup a Storage System Link between the two or more nodes between which you want to replicate data (volumes and/or shares).  The Storage System Link represents a low level security key exchange between the two nodes so that they can send data between each other.  Creation of the Storage System Link is done through the QuantaStor Manager web interface by selecting the 'Remote Replication' tab, and then pressing the 'Create Storage System Link' button in the tool bar to bring up the the dialog.
 +
 +
[[File:Create Storage Link.png|800px]]
 +
 +
Select the IP address on each system to be utilized for communication of remote replication network traffic.  If both systems are on the same network then you can simply select one of the IP addresses from one of the local ports but if the remote system is in the cloud or remote location then most likely you will need to specify the external IP address for your QuantaStor system.  Note that the two systems communicate over ports 22 and 5151 so you will need to open these ports in your firewall in order for the QuantaStor systems to link up properly.
 +
 +
=== Creating a Remote Replica ===
 +
 +
Once you have a Storage System Link created between two systems you can now replicate volumes and network shares in either direction. Simply login to the system that you want to replicate volumes from, right-click on the volume to be replicated, then choose 'Create Remote Replica'.  Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system.  As such, when you create a remote-replica you must specify which storage system you want to replicate too (only systems which have established and online storage system links will be displayed) and which storage pool within that system should be utilized to hold the remote replica.  If you have already replicated the specified volume to the remote storage system then you can re-sync the remote volume by choosing the remote-replica association in the web interface and choosing 'resync'.  This can also be done via the 'Create Remote Replica' dialog and then choose the option to replicate to an existing target if available.
 +
 +
=== Creating a Remote Replication Schedule / DR Replication Policy ===
 +
 +
Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule.  To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen.  Right-click on the section header and choose 'Create Replication Schedule'. 
 +
 +
[[File:Drsetup1.png|1000px]]
 +
 +
Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems.  You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule.  If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points.  If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.
 +
 +
=== Remote Replication Bandwidth Throttling ===
 +
 +
WAN links are often limited in bandwidth in a range between 2MB-60MBytes/sec for on-premises deployments and 20MBytes-100MBytes/sec and higher in datacenters depending on the service provider.  QuantaStor does automatic load balancing of replication activities to limit the impact to active workloads and to limit the use of your available WAN or LAN bandwidth.  By default QuantaStor comes pre-configured to limit replication bandwidth to 50MB/sec but you can increase this or decrease it to better match the bandwidth and network throughput limits of your environment.  This default is a good default for datacenter deployments but hybrid cloud deployments where data is replicating to/from an on-premises site(s) should be configured to take up no more than 50% of your available WAN bandwidth so as to not disrupt other activities and workloads.
 +
 +
Here are the CLI commands available for adjusting the replication rate limit.  To get the current limit use the 'qs-util rratelimitget' and to set the rate limit to a new value, (example, 4MB/sec) you can set the limit like so 'qs-util rratelimitset 4'.
 +
 +
<pre>
 +
  Replication Load Balancing
 +
    qs-util rratelimitget            : Current max bandwidth available for all remote replication streams.
 +
    qs-util rratelimitset NN        : Sets the max bandwidth available in MB/sec across all replication streams.
 +
    qs-util rraterebalance          : Rebalances all active replication streams to evenly share the configured limit.
 +
                                      Example: If the rratelimit (NN) is set to 100 (MB/sec) and there are 5 active
 +
                                      replication streams then each stream will be limited to 20MBytes/sec (100/5)
 +
                                      QuantaStor automatically reblanances replication streams every minute unless
 +
                                      the file /etc/rratelimit.disable is present.
 +
</pre>
 +
 +
To run the above mentioned commands you must login to your storage appliance via SSH or via the console.  Here's an example of setting the rate limit to 50MB/sec.
 +
 +
<pre>sudo qs-util rratelimitset 50</pre>
 +
 +
At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute.  This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job.  If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.
 +
 +
 +
=== Permanently Promoting a Replicated Storage Volume or Network Share ===
 +
 +
The below process details how to Promote a _chkpnt Storage Volume/Network Share in the event of a failure of the primary node. This same procedure can be used to permanently migrate data to a Storage Pool on a different QuantaStor appliance using remote replication.
 +
 +
If the Replication Source system is offline due to a hardware failure of the appliance, you can skip directly to Step 3.
 +
 +
Step 1) Please ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.
 +
 +
Step 2) Remove all Hosts and Host Group Associations from the source Storage Volume.
 +
 +
Step 3) Right Click on the Replication Schedule associated with the source and destination Storage Volume/Network Share and click 'Delete Schedule'.
 +
 +
Step 4) Right click on the Replication Link associated with the source and destination Storage Volume/Network Share and select the 'Delete Replica Association' option, which will open the 'Delete Remote Replication Link' Dialog. You will want to use the defaults in this dialog and click 'OK'
 +
 +
[[File:Delete_Remote_Replication_Link.png|800px]]
 +
 +
At this stage there is no longer a replication link or associations between the source and destination _chkpnt Storage Volume/Network Share. Both the original source and Destination _chkpnt Storage Volume/Network Share can be renamed using the Modify Storage Pool or Modify Network Share dialogs and mapped to client access as required.
 +
 +
Please note: If you are looking to use the same name for the _chkpnt Storage Volume/Netwprk share as used on the Source system and the Source QuantaStor appliance is offline/unavailable, you may need to remove it from the grid at this stage as it will not be accessible to perform the rename operation using the Modify Storage Volume or Modify network Share Dialog. In this event after removal of the offline QuantaStor node from the Grid, you can skip directly to step B below.
 +
 +
Renaming the _chkpnt Storage Volume/Network Share to be the same as the original Source Storage Volume/Network Share.
 +
 +
Step A) Right click on the original Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share to add '_bak' or any other unique postfix to the end and click 'OK'. Once you are done with the Promotion/Migration you can remove this backup(_bak) version and it's associated snapshots. Our multi-delete feature is useful for this sort of batch deletion process.
 +
 +
Example screenshot below showing the Modify Storage Volume for renaming the source Storage Volume to _bak
 +
 +
[[File:Modify_Storage_Volume_rename_bak.png|800px]]
 +
 +
Step B) Right click on the replicated _chkpnt Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share as you see fit and click 'OK'.
 +
 +
Example screenshot below showing the Modify Storage Volume for renaming the destination _chkpnt Storage Volume to the name originally used by the Source volume.
 +
 +
[[File:Modify_Storage_Volume_rename.png|800px]]
 +
 +
Step C) Map client access to the Promoted Storage Volume / Network Share
 +
 +
For Storage Volumes, map lun access to your clients using the Host or Host Groups option detailed here: [[QuantaStor_Administrators_Guide#Managing_Hosts|Managing Hosts]]
 +
 +
For Network Shares map them out using the CIFS/NFS access permissions as detailed here: [[QuantaStor_Administrators_Guide#Managing_Network_Shares|Managing Network Shares]]
 +
 +
Please note: If this procedure was performed for Disaster recovery of a failed Primary QuantaStor node, once the original Primary node is brought online once more the old out of date Storage Volume/Network Share will need to be renamed to an '_bak' or your preferred postfix( or removed to free up space) and for the node to be re-added to the grid.  Replication can then be configured from the new Primary Source QuantaStor to the recovered Quantastor appliance in a role as a Secondary replication destination target.

Revision as of 15:15, 4 March 2016

The following workflows are intended as a GO-TO guide which outlines the basic steps for initial appliance and grid configuration. More detailed information on each specific area can be found in the Administrators Guide.


DR / Remote-replication of SAN/NAS Storage (ZFS based Storage Pools)

Minimum Hardware Requirements

  • 2x QuantaStor storage appliances each with a ZFS based Storage Pool
    • Storage pools do not need to be the same size or and the hardware and disk types on the appliances can be asymmetrical (non-matching)
    • Replication can be cascaded across many appliances.
    • Replication can be N-way replicating from one-to-many or many-to-one appliance.
    • Replication is incremental so only the changes are sent
    • Replication is supported for both Storage Volumes and Network Shares
    • Replication interval can be set to as low as 15 minutes for a near-CDP configuration or scheduled to run at specific hours on specific days
    • All data is AES 256 encrypted on the wire.

Setup Process

  • Select the remote-replication tab and choose 'Create Storage System Link'. This will exchange keys between the two appliances so that a replication schedule can be created. You can create an unlimited number of links. The link also stores information about the ports to be used for remote-replication traffic
  • Select the Volume & Share Replication Schedules section and choose Create in the toolbar to bring up the dialog to create a new remote replication schedule
    • Select the replication link that will indicate the direction of replication.
    • Select the storage pool on the destination system where the replicated shares and volumes will reside
    • Select the times of day or interval at which replication will be run
    • Select the volumes and shares to be replicated
    • Click OK to create the schedule
  • The Remote-Replication/DR Schedule is now created. If you chose an interval based replication schedule it will start momentarily. If you chose one that runs at specific times of day it will not trigger until that time.
  • You can test the schedule by using Trigger Schedule it to start immediately.

Diagram of Completed Configuration

Osn dr workflow.png


Volume & Share Remote-Replication (Disaster Recovery / DR Setup)

Volume and Share Remote-replication within QuantaStor allows you to copy a volume or network share from one QuantaStor storage system to another and is a great tool for migrating volumes and network shares between systems and for using a remote system as a DR site. Remote replication is done asynchronously which means that changes/deltas to volumes and network shares on the source volume or share are replicated up to every hour with calendar based schedules, and up to every 15 minutes with timer based schedules.

Once a given set of the volumes and/or network shares have been replicated from one system to another the subsequent periodic replication operations send only the changes and all information sent over the network is compressed to minimize network bandwidth and encrypted for security. ZFS based storage pools use the ZFS send/receive mechanism which efficiently sends just the changes so it works well over limited bandwidth networks. Also, if your storage pool has compression enabled the changes sent over the network are also compressed which further reduces your WAN network load.

Limits of XFS based Volume/Network Share Replication

XFS based storage pools do not have the advanced replication mechanisms like ZFS send/receive so we employ more brute force techniques for replication. Specifically, when you replicate an XFS based storage volume or network share QuantaStor uses the linux rsync utility. It does have compression and it will only send changes but it doesn't work well with large files because the entire file must be scanned and in some cases resent over the network. Because of this we highly recommend using ZFS based storage pools for all deployments unless you specifically need the high sequential IO performance of XFS for a specific application.

Creating a Storage System Link

The first step in setting up DR/remote-replication between two systems is to have at least nodes (storage appliances) configured into a Grid (link). QuantaStor has a grid communication mechanism that connects appliances (nodes) together so that they can share information, coordinate activities like remote-replication, and simplify management operations. After you create the grid you'll need to setup a Storage System Link between the two or more nodes between which you want to replicate data (volumes and/or shares). The Storage System Link represents a low level security key exchange between the two nodes so that they can send data between each other. Creation of the Storage System Link is done through the QuantaStor Manager web interface by selecting the 'Remote Replication' tab, and then pressing the 'Create Storage System Link' button in the tool bar to bring up the the dialog.

Create Storage Link.png

Select the IP address on each system to be utilized for communication of remote replication network traffic. If both systems are on the same network then you can simply select one of the IP addresses from one of the local ports but if the remote system is in the cloud or remote location then most likely you will need to specify the external IP address for your QuantaStor system. Note that the two systems communicate over ports 22 and 5151 so you will need to open these ports in your firewall in order for the QuantaStor systems to link up properly.

Creating a Remote Replica

Once you have a Storage System Link created between two systems you can now replicate volumes and network shares in either direction. Simply login to the system that you want to replicate volumes from, right-click on the volume to be replicated, then choose 'Create Remote Replica'. Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system. As such, when you create a remote-replica you must specify which storage system you want to replicate too (only systems which have established and online storage system links will be displayed) and which storage pool within that system should be utilized to hold the remote replica. If you have already replicated the specified volume to the remote storage system then you can re-sync the remote volume by choosing the remote-replica association in the web interface and choosing 'resync'. This can also be done via the 'Create Remote Replica' dialog and then choose the option to replicate to an existing target if available.

Creating a Remote Replication Schedule / DR Replication Policy

Remote replication schedules provide a mechanism for replicating the changes to your volumes to a matching checkpoint volume on a remote appliance automatically on a timer or a fixed schedule. To create a schedule navigate to the Remote Replication Schedules section after selecting the Remote Replication tab at the top of the screen. Right-click on the section header and choose 'Create Replication Schedule'.

Drsetup1.png

Besides selection of the volumes and/or shares to be replicated you must select the number of snapshot checkpoints to be maintained on the local and remote systems. You can use these snapshots for off-host backup and other data recovery purposes as well so there is no need to have a Snapshot Schedule which would be redundant with the snapshots which will be crated by your replication schedule. If you choose a Max Replicas of 5 then up to 5 snapshot checkpoints will be retained. If for example you were replicating nightly at 1am each day of the week from Monday to Friday then you will have a week's worth of snapshots as data recovery points. If you are replicating 4 times each day and need a week of snapshots then you would need 5x4 or a Max Replicas setting of 20.

Remote Replication Bandwidth Throttling

WAN links are often limited in bandwidth in a range between 2MB-60MBytes/sec for on-premises deployments and 20MBytes-100MBytes/sec and higher in datacenters depending on the service provider. QuantaStor does automatic load balancing of replication activities to limit the impact to active workloads and to limit the use of your available WAN or LAN bandwidth. By default QuantaStor comes pre-configured to limit replication bandwidth to 50MB/sec but you can increase this or decrease it to better match the bandwidth and network throughput limits of your environment. This default is a good default for datacenter deployments but hybrid cloud deployments where data is replicating to/from an on-premises site(s) should be configured to take up no more than 50% of your available WAN bandwidth so as to not disrupt other activities and workloads.

Here are the CLI commands available for adjusting the replication rate limit. To get the current limit use the 'qs-util rratelimitget' and to set the rate limit to a new value, (example, 4MB/sec) you can set the limit like so 'qs-util rratelimitset 4'.

  Replication Load Balancing
    qs-util rratelimitget            : Current max bandwidth available for all remote replication streams.
    qs-util rratelimitset NN         : Sets the max bandwidth available in MB/sec across all replication streams.
    qs-util rraterebalance           : Rebalances all active replication streams to evenly share the configured limit.
                                       Example: If the rratelimit (NN) is set to 100 (MB/sec) and there are 5 active
                                       replication streams then each stream will be limited to 20MBytes/sec (100/5)
                                       QuantaStor automatically reblanances replication streams every minute unless
                                       the file /etc/rratelimit.disable is present.

To run the above mentioned commands you must login to your storage appliance via SSH or via the console. Here's an example of setting the rate limit to 50MB/sec.

sudo qs-util rratelimitset 50

At any given time you can adjust the rate limit and all active replication jobs will automatically adjust to this new limit within a minute. This means that you can dynamically adjust the rate limit using the 'qs-util rratelimitset NN' command to set different replication rates for different times of day and days of the week using a cron job. If you need that functionality and need help configuring cron to run the 'qs-util rratelimitset NN' command please contact Customer Support.


Permanently Promoting a Replicated Storage Volume or Network Share

The below process details how to Promote a _chkpnt Storage Volume/Network Share in the event of a failure of the primary node. This same procedure can be used to permanently migrate data to a Storage Pool on a different QuantaStor appliance using remote replication.

If the Replication Source system is offline due to a hardware failure of the appliance, you can skip directly to Step 3.

Step 1) Please ensure that all client I/O has been stopped to the current source Storage Volume or Network Share and that one final replication has occurred using the replication links/schedules of any data that has been modified since the last replication.

Step 2) Remove all Hosts and Host Group Associations from the source Storage Volume.

Step 3) Right Click on the Replication Schedule associated with the source and destination Storage Volume/Network Share and click 'Delete Schedule'.

Step 4) Right click on the Replication Link associated with the source and destination Storage Volume/Network Share and select the 'Delete Replica Association' option, which will open the 'Delete Remote Replication Link' Dialog. You will want to use the defaults in this dialog and click 'OK'

Delete Remote Replication Link.png

At this stage there is no longer a replication link or associations between the source and destination _chkpnt Storage Volume/Network Share. Both the original source and Destination _chkpnt Storage Volume/Network Share can be renamed using the Modify Storage Pool or Modify Network Share dialogs and mapped to client access as required.

Please note: If you are looking to use the same name for the _chkpnt Storage Volume/Netwprk share as used on the Source system and the Source QuantaStor appliance is offline/unavailable, you may need to remove it from the grid at this stage as it will not be accessible to perform the rename operation using the Modify Storage Volume or Modify network Share Dialog. In this event after removal of the offline QuantaStor node from the Grid, you can skip directly to step B below.

Renaming the _chkpnt Storage Volume/Network Share to be the same as the original Source Storage Volume/Network Share.

Step A) Right click on the original Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share to add '_bak' or any other unique postfix to the end and click 'OK'. Once you are done with the Promotion/Migration you can remove this backup(_bak) version and it's associated snapshots. Our multi-delete feature is useful for this sort of batch deletion process.

Example screenshot below showing the Modify Storage Volume for renaming the source Storage Volume to _bak

Modify Storage Volume rename bak.png

Step B) Right click on the replicated _chkpnt Storage Volume/Network Share and choose the 'Modify Storage Volume' or 'Modify Network Share' option. In the dialog box, rename the Storage Volume or Network Share as you see fit and click 'OK'.

Example screenshot below showing the Modify Storage Volume for renaming the destination _chkpnt Storage Volume to the name originally used by the Source volume.

Modify Storage Volume rename.png

Step C) Map client access to the Promoted Storage Volume / Network Share

For Storage Volumes, map lun access to your clients using the Host or Host Groups option detailed here: Managing Hosts

For Network Shares map them out using the CIFS/NFS access permissions as detailed here: Managing Network Shares

Please note: If this procedure was performed for Disaster recovery of a failed Primary QuantaStor node, once the original Primary node is brought online once more the old out of date Storage Volume/Network Share will need to be renamed to an '_bak' or your preferred postfix( or removed to free up space) and for the node to be re-added to the grid. Replication can then be configured from the new Primary Source QuantaStor to the recovered Quantastor appliance in a role as a Secondary replication destination target.