Difference between revisions of "+ Admin Guide Overview"

Revision as of 21:37, 25 January 2014

The QuantaStor Administrators Guide is intended for all administrators and cloud users who plan to manage their storage using QuantaStor Manager as well as for those just looking to get a deeper understanding of how the QuantaStor Storage System Platform (SSP) works.

1 Storage System Management Operations
- 1.1 License Management
- 1.2 Recovery Manager
- 1.3 Upgrade Manager
- 1.4 System Checklist
- 1.5 System Hostname & DNS management
2 Physical Disk Management
- 2.1 Identifying physical disks in an enclosure
- 2.2 Scanning for physical disks
3 Hardware Controller & Enclosure Integration
- 3.1 Adaptec RAID integration
- 3.2 Fusion IO integration
- 3.3 LSI 3ware integration
- 3.4 LSI MegaRAID / DELL PERC integration
  - 3.4.1 Common Configuration Settings
  - 3.4.2 Installing the MegaRAID CLI on older QuantaStor v2 systems
- 3.5 HP SmartArray RAID integration
4 Managing Storage Pools
- 4.1 RAID Levels
5 Target Port Configuration
- 5.1 NIC Bonding / Trunking
- 5.2 10GbE NIC support
6 Pool Remote-Replication Configuration
- 6.1 Setting up a DRBD based Storage Pool replication link (XFS based pools only)
- 6.2 Activating DR Fail-over
- 6.3 Deactivating the DR Fail-over pool (Test Scenario)
- 6.4 Reversing the flow / DR Fail-back (Live Scenario)
7 DR with Volume & Share Remote-Replication
- 7.1 Creating a Storage System Link
- 7.2 Creating a Remote Replica
8 Alert Settings
9 Managing Hosts
- 9.1 Managing Host Groups
10 Managing Snapshot Schedules
- 10.1 Near Continuous Data Protection (N-CDP)
11 Managing Sessions
- 11.1 Dropping Sessions
12 Managing Network Shares
- 12.1 Controlling NFS Access
  - 12.1.1 NFS Custom Options
- 12.2 Controlling CIFS Access
  - 12.2.1 Verifying Users Have CIFS Passwords
  - 12.2.2 Setting CIFS Options
- 12.3 Active Directory Configuration
  - 12.3.1 Joining an AD Domain
  - 12.3.2 Leaving a AD Domain
13 Managing Storage Volumes
- 13.1 Creating Storage Volumes
- 13.2 Deleting Storage Volumes
- 13.3 Resizing Storage Volumes
- 13.4 Creating Snapshots
- 13.5 Creating Clones
- 13.6 Restoring from Snapshots
- 13.7 Converting a Snapshot into a Primary (btrfs only)
14 IO Tuning
- 14.1 ZFS Performance Tuning
- 14.2 Pool Performance Profiles
- 14.3 XFS Tuning Parameters
15 PagerDuty
- 15.1 Adding a New Service in PagerDuty
- 15.2 Adding PagerDuty to Quantastor
- 15.3 Example Alerts
16 Librato Metrics
- 16.1 Setup for Librato Metrics
- 16.2 Viewing the Metrics
- 16.3 Examples
17 Nagios Integration / Support
18 Zabbix Integration / Support
19 Samba v4 / SMB3 Support
20 Custom Scripting / Application Extensions
- 20.1 Where to put your custom scripts
- 20.2 Storage System Custom Scripts
  - 20.2.1 system-poststart.sh
  - 20.2.2 system-prestop.sh
- 20.3 Storage Pool Custom Scripts
  - 20.3.1 pool-poststart.sh
  - 20.3.2 pool-prestop.sh

Storage System Management Operations

When you initially connect to QuantaStor manager you'll see a toolbar (aka ribbon bar) at the top of the screen and a stack view / tree view on the left hand side of the screen. By selecting different areas of the tree view (Storage Volumes, Hosts, etc) the ribbon view . tool bar will change accordingly to indicate the operations available for that section. The following diagram shows these two sections:

Note also that you can right-click on the title-bar for each stack item in the tree view to access a pop-up menu, and you can right-click on any object anywhere in the UI to access a context sensitive pop-up menu for that item.

License Management

QuantaStor has two different categories of license keys, those are 'System' licenses and 'Feature' licenses. The 'System' licenses specify all the base features and capacity limits for your storage appliance and most systems just have a single 'System' license. 'Feature' licenses stack on top of an existing 'System' license and allow you to add features and capacity to an existing 'System'. In this way you can start small and add more capacity as you need it. Note also that everything is license key controlled with QuantaStor so you do not need to reinstall to go from a Trial Edition license to a Silver/Gold/Platinum license. Simply add your new license key and it will replace the old one automatically.

Recovery Manager

The 'Recovery Manager' is accessible from the ribbon-bar at the top of the screen when you login to your QuantaStor system and it allows you to recover all of the system meta-data from a prior installation. The system metadata includes user accounts, storage assignments, host entries, storage clouds, custom roles and more. To use the 'Recovery Manager' just select it then select the database you want to recover and press OK. If you choose the 'network configuration recovery' option it will also recover the network configuration. Be careful with that as it will most likely drop your current connection to QuantaStor when the IP address changes and if something goes wrong you'll need to re-login at the console to find out what the new IP addresses are. In the worst case scenario you may need to manually edit the /etc/network/interfaces file as per the same procedure one would use with any Debian/Ubuntu server.

Upgrade Manager

The Upgrade Manager handles the process of upgrading your system to the next available minor release version. Note that Upgrade Manager will not upgrade QuantaStor from a v2 to a v3 version, that requires a re-installation of the QuantaStor OS and then recovery of meta-data using the 'Recovery Manager'. The Upgrade Manager will display the available versions for the four key packages which includes the core services, web manager, web server, and SCSI target drivers. You can upgrade any of the packages at any time and it will not block iSCSI access or NFS access to your appliance. With upgrades to the SCSI target driver package you will need to restart your storage system/appliance for those new drivers to become active. Note also that you should always upgrade both the manager and service package together, never upgrade just one or the other as this may cause problems when you try to login to the QuantaStor web management interface. On occasion we'll see problems with an upgrade and so we've written a troubleshooting section on how to work out those issues here: Troublshooting Upgrade Issues

System Checklist

The 'System Checklist' aka 'Getting Started' will appear automatically when you login anytime there is no license key assigned to the system. After that you can still bring up the System Checklist by selecting it from the ribbon-bar. As the name implies, it will help you configure your system and in the process help you get acquainted with QuantaStor.

System Hostname & DNS management

To change the name of your system you can simply right-click on the storage system in the tree stack on the left side of the screen and then choose 'Modify Storage System'. This will bring up a screen where you can specify your DNS server(s) and change the hostname for your system as well as control other global network settings like the ARP Filtering policy.

Physical Disk Management

Identifying physical disks in an enclosure

When you right-click on a physical disk you can choose 'Identify' to force the lights on the disk to blink in a pattern which it accomplishes by reading sector 0 on the drive. This is very helpful when trying to identify which disk is which within the chassis. Note that technique doesn't work logical drives exposed by your RAID controller(s) so there is separate 'Identify' option for the hardware disks attached to your RAID controller which you'll find in the 'Hardware Controllers & Enclosures' section.

Scanning for physical disks

When new disks have been added to the system you can scan for new disks using the command. To access this command from the QuantaStor Manager web interface simply right-click where it says 'Physical Disks' and then choose scan for disks. Disks are typically named sdb, sdc, sdd, sde, sdf and so on. The 'sd' part just indicates SCSI disk and the letter uniquely identifies the disk within the system. If you've added a new disk or created a new Hardware RAID Unit you'll typically see the new disk arrive and show up automatically but the rescan operation can explicitly re-execute the disk discovery process.

Hardware Controller & Enclosure Integration

QuantaStor has custom integration modules 'plug-ins' for a number of major RAID controller cards which monitor the health and status of your hardware RAID units, disks, enclosures, and controllers. When a disk failure occurs within a hardware RAID group, QuantaStor detects this and sends you an email through the QuantaStor alert management system. Note that QuantaStor also has software RAID support for RAID levels 1,5,6 & 10 so you do not need a hardware RAID card but hardware RAID can boost performance and offer you additional RAID configuration options. Also, you can use any RAID controller that works with Ubuntu Server, but QuantaStor will only detect alerts and discover the configuration details of those controllers for which there is a QuantaStor hardware controller plug-in. Note that the plug-in discovery logic is triggered every couple of minutes so in some cases you will find that there is a small delay before the information in the web interface is updated.

QuantaStor has broad support for integrated hardware management including the following controllers:

LSI MegaRAID & Nytro MegaRAID (all models)
Adaptec 5xxx/6xxx/7xxx/8xxx (all models)
IBM ServeRAID (LSI derivative)
DELL PERC H7xx/H8xx (LSI derivative)
Intel RAID/SSD RAID (LSI derivative)
HP SmartArray P4xx/P8xx
LSI 3ware 9xxx
LSI HBAs
Fusion IO PCIe

Adaptec RAID integration

Adaptec controllers are automatically detected and can be managed via the QuantaStor web management interface.

Fusion IO integration

The Fusion IO integration requires that the fio-util and iomemory-vsl packages are installed. Once installed the Fusion IO control and logic devices will automatically show up in the Hardware Enclosures & Controllers view within QuantaStor Manager.

LSI 3ware integration

3ware controllers are automatically discovered and can be managed via the QuantaStor web management interface.

Note that if you arbitrarily remove a disk that was being utilized by a 3ware RAID unit, there are additional steps required before you can re-add it to the appliance. 3ware writes configuration data on the disk in what's called a Disk Control Block (DCB) and this needs to be scrubbed before you can use the disk again as a hot spare or within another unit. There is a great article written here on how to scrub the DCB on a disk so that you can use it again with your LSI 3ware controller. Formatting the disk in another system will also suffice. You can then add it back into the old system and designate it as a spare, and if you have a unit that is degraded it will automatically adopt the spare and begin rebuilding the unit back to a fully fault tolerant status. Of course if you pulled the disk because it was faulty you'll want RMA it to the manufacture for a warranty replacement.

LSI MegaRAID / DELL PERC integration

LSI MegaRAID, DELL PERC, IBM ServeRAID and Intel RAID controllers are fully supported by QuantaStor and can be managed via the web management interface. Note also that QuantaStor includes a command line utility called qs-util which assists with some MegaRAID maintenance operations. These include:

    qs-util megawb                   : Set Write-Back cache mode on all LSI MR units w/ BBU
    qs-util megaforcewb              : (W!) Force Write-Back cache mode on all LSI MR units
    qs-util megaclearforeign         : (W!) Clear the foreign config info from all LSI controllers
    qs-util megaccsetup              : Setup MegaRAID consistency check and patrol read settings
    qs-util megalsiget               : Generates a LSIget log report and sends it to support@osnexus.com

Common Configuration Settings

Disable Copyback

The MegaRAID controller will auto-heal a RAID unit using an available hot-spare in case of a drive failure. When the bad drive is pulled and a new drive is inserted and marked as hot-spare the location of your hot-spare drive will have changed. In fact it will change every time a bad drive is replaced. Generally speaking there is no impact to performance by having your hot-spare in a new location each time but over time it leads to a less organized chassis. As such there is a 'Copy Back' feature which copies the data from the hot-spare back to the original location after a new hot-spare has be inserted where the failed disk was located. Copy back does add time to the rebuild process so some prefer to disable it and just deal with the less organized drive placement in the chassis. To disable copy back on all controllers run this command at the QuantaStor console or via ssh as root:

MegaCli -AdpSetProp -copybackdsbl -1 -aall

To enable the CopyBack feature on all controllers run this command:

MegaCli -AdpSetProp -copybackdsbl -0 -aall

Increasing the RAID unit Rebuild Rate

The default rebuild rate is 30% which can lead to some long rebuilds depending on the size of your RAID unit and the amount of load on it. To increase the rate you can issue the following command at the console or ssh to increase it to 75% or higher:

MegaCli -AdpSetProp RebuildRate 75 -aall

Disabling the Alarm

If your server is in a datacenter then the alarm is not going to help much in identifying the problematic controller card and will only serve to cause grief. As such, you might want to disable the alarms:

MegaCli -AdpSetProp AlarmDsbl -aall

To just silence the current alarm run this:

MegaCli -AdpSetProp AlarmSilence -aall

The two most common cases for an alarm are that a disk needs to be replaced or the battery backup unit is not functioning properly. You can also silence all alarms using the web management interface.

Auto Import Foreign RAID units

The MegaRAID controllers can be a little troublesome if you're moving disks and/or disk chassis around as the disk drives will appear as 'foreign' to the controller when you move them. Most of the time you'll just want to import these foreign units automatically so that you don't have to press space-bar at boot time to continue the boot process. To avoid this, set the policy on the controllers to automatically import foreign units with this command: sudo MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL

Here's an example of what that looks like:

qadmin@qs-testing:~$ sudo MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL
[sudo] password for qadmin:

Adapter 0: Set Auto Enhanced Import to Enable success.
Adapter 1: Set Auto Enhanced Import to Enable success.

Exit Code: 0x00

Installing the MegaRAID CLI on older QuantaStor v2 systems

QuantaStor v3 and newer systems work with LSI MegaRAID controllers with no additional software to be installed. For older v2 systems first login to your QuantaStor system at the console. You'll need to make sure that your system is network connected with internet access as it will be downloading some necessary files and packages. Next, run the following two commands to install:

cd /opt/osnexus/quantastor/raid-tools
sudo lsimegaraid-install.sh

It will take a couple of minutes for the QuantaStor service to detect that the MegaRAID CLI is now installed but then you'll see the hardware configuration show up automatically in the web interface. The other thing is that this script will have upgraded the megaraid_sas driver included with QuantaStor. As such you must restart the system using the "Restart Storage System" option in the QuantaStor web management interface. Last, new firmware is required to 3TB and larger drives so if you have a older 9260 or 9280 controller be sure to download and apply the latest firmware. Here's an example of how to upgrade MegaRAID firmware using the MegaCli.

MegaCli -AdpFwFlash -f FW1046E.rom -a0

Adapter 0: PERC H800 Adapter
Vendor ID: 0x1000, Device ID: 0x0079

FW version on the controller: 2.0.03-0772
FW version of the image file: 2.100.03-1046
Download Completed.
Flashing image to adapter...
Adapter 0: Flash Completed.

Exit Code: 0x00

HP SmartArray RAID integration

HP SmartArray controllers are supported out-of-the box with no additional software to be installed. You can manage your HP RAID controller via the QuantaStor web management interface where you can create RAID units, mark hot-spares, replace drives, etc.

Managing Storage Pools

Storage pools combine or aggregate one or more physical disks (SATA, SAS, or SSD) into a single pool of storage from which storage volumes (iSCSI targets) can be created. Storage pools can be created using any of the following software RAID types including RAID0, RAID1, RAID5, RAID6, RAID10, RAID50, or RAID60. Choosing the optimal RAID type depends on your the I/O access patters of your target application, number of disks you have, and the amount of fault-tolerance you require. As a general guideline we recommend using RAID10 for all virtualization solutions and databases and RAID6 for applications that require high-performance sequential IO. RAID10 performs very well with sequential IO and random IO patterns but is a bit more expensive since you get 50% usable space from the raw storage due to mirroring. For archival storage or other similar workloads RAID6 is best and provides higher utilization with only two drives used for parity/fault tolerance. RAID5 is not recommended for any deployments because it is not fault tolerant after a single disk failure. If you decide to use RAID6 with virtualization or other workloads that can produce a fair amount of random IO, we strongly recommend that you use a RAID controller with at least 1GB of RAM and a super-capacitor so that you can safely enable the write-cache. RAID6 and other parity RAID mechanisms generally do not perform well when you have many workloads (virtual machines) using the storage due to the serialization of I/O that happens due to parity calculations and updates.

RAID Levels

RAID1 & RAID5 allow you have one disk fail without it interrupting disk IO. When a disk fails you can remove it and you should add a spare disk to the 'degraded' storage pool as soon as possible to in order to restore it to a fault-tolerant status. You can also assign spare disks to storage pools ahead of time so that the recovery happens automatically. RAID6 allows for up to two disk to fail and will keep running whereas RAID10 can allow for one disk failure per mirror pair. Finally, RAID0 is not fault tolerant at all but it is your only choice if you have only one disk and it can be useful in some scenarios where fault-tolerance is not required. Here's a breakdown of the various RAID types and their pros & cons.

RAID0 layout is also called 'striping' and it writes data across all the disk drives in the storage pool in a round robin fashion. This has the effect of greatly boosting performance. The drawback of RAID0 is that it is not fault tolerant, meaning that if a single disk in the storage pool fails then all of your data in the storage pool is lost. As such RAID0 is not recommended except in special cases where the potential for data loss is non-issue.
RAID1 is also called 'mirroring' because it achieves fault tolerance by writing the same data to two disk drives so that you always have two copies of the data. If one drive fails, the other has a complete copy and the storage pool continues to run. RAID1 and it's variant RAID10 are ideal for databases and other applications which do a lot of small write I/O operations.
RAID5 achieves fault tolerance via what's called a parity calculation where one of the drives contains an XOR calculation of the bits on the other drives. For example, if you have 4 disk drives and you create a RAID5 storage pool, 3 of the disks will store data, and the last disk will contain parity information. This parity information on the 4th drive can be used to recover from any data disk failure. In the event that the parity drive fails, it can be replaced and reconstructed using the data disks. RAID5 (and RAID6) are especially well suited for audio/video streaming, archival, and other applications which do a heavy sequential write I/O operations (such as reading/writing large files) and are not as well suited for database applications which do heavy amounts of small random write I/O operations or with large file-systems containing lots of small files with a heavy write load.
RAID6 improves upon RAID5 in that it can handle two drive failures but it requires that you have two disk drives dedicated to parity information. For example, if you have a RAID6 storage pool comprised of 5 disks then 3 disks will contain data, and 2 disks will contain parity information. In this example, if the disks are all 1TB disks then you will have 3TB of usable disk space for the creation of volumes. So there's some sacrifice of usable storage space to gain the additional fault tolerance. If you have the disks, we always recommend using RAID6 over RAID5. This is because all hard drives eventually fail and when one fails in a RAID5 storage pool your data is left vulnerable until a spare disk is utilized to recover your storage pool back to a fault tolerant status. With RAID6 your storage pool is still fault tolerant after the first drive failure. (Note: Fault-tolerant storage pools (RAID1,5,6,10) that have suffered a single disk drive failure are called degraded because they're still operational but they require a spare disk to recover back to a fully fault-tolerant status.)
RAID10 is similar to RAID1 in that it utilizes mirroring, but RAID10 also does striping over the mirrors. This gives you the fault tolerance of RAID1 combined with the performance of RAID10. The drawback is that half the disks are used for fault-tolerance so if you have 8 1TB disks utilized to make a RAID10 storage pool, you will have 4TB of usable space for creation of volumes. RAID10 will perform very well with both small random IO operations as well as sequential operations and it is highly fault tolerant as multiple disks can fail as long as they're not from the same mirror-pairing. If you have the disks and you have a mission critical application we highly recommend that you choose the RAID10 layout for your storage pool.
RAID60 combines the benefits of RAID6 with some of the benefits of RAID10. It is a good compromise when you need better IOPS performance than RAID6 will deliver and more useable storage than RAID10 delivers (50% of raw).

In some cases it can be useful to create more than one storage pool so that you have low cost fault-tolerant storage available in RAID6 for archive and higher IOPS storage in RAID10 for virtual machines, databases, MS Exchange, or similar workloads.

Once you have created a storage pool it will take some time to 'rebuild'. Once the 'rebuild' process has reached 1% you will see the storage pool appear in QuantaStor Manager and you can begin to create new storage volumes.

WARNING: Although you can begin using the pool at 1% rebuild completion, your storage pool is not fault-tolerant until the rebuild process has completed.

Target Port Configuration

Target ports are simply the network ports (NICs) through which your client hosts (initiators) access your storage volumes (aka targets). The terms 'target' and 'initiator' are SCSI terms that are synonymous with 'server' and 'client' respectively. QuantaStor supports both statically assigned IP addresses as well as dynamically assigned (DHCP) addresses. If you selected automatic network configuration when you initially installed QuantaStor then you'll have one port setup with DHCP and the others are likely offline. We recommend that you always use static IP addresses unless you have your DHCP server setup to specifically assign an IP address to your NICs as identified by MAC address. If you don't set the target ports up with static IP addresses you risk the IP address changing and losing access to your storage when the dynamically assigned address expires. To modify the configuration of a target port first select the tree section named "Storage System" under the "Storage Management" tab on the left hand side of the screen. After that, select the "Target Ports" tab in the center of the screen to see the list of target ports that were discovered. To modify the configuration of one of the ports, simply right-click on it and choose "Modify Target Port" from the pop-up menu. Alternatively you can press the "Modify" button in the tool bar at the top of the screen in the "Target Ports" section. Once the "Modify Target Port" dialog appears you can select the target port type for the selected port (static), enter the IP address for the port, subnet mask, and gateway for the port. You can also set the MTU to 9000 for jumbo packet support, but we recommend that you get your network configuration up and running with standard 1500 byte frames as jumbo packet support requires that you custom configure your host side NICs and network switch with 9K frames as well.

NIC Bonding / Trunking

QuantaStor supports NIC bonding, also called trunking, which allows you to combine multiple NICs together to improve performance and reliability. If combine two or more ports together into a virtual port you'll need to make sure that all the bonded ports are connected to the same network switch. There are very few exceptions to this rule. For example, if you have two networks and 4 ports (p1, p2, p3, p4) you'll want to create two separate virtual ports each bonding two NIC ports (p1, p2 / p3, p4) together and each pair connected to a separate network (p1, p2 -> network A / p3, p4 -> network B). This type of configuration is highly recommended as you have both improved bandwidth and have no single point of failure in the network or in the storage system. Of course you'll need your host to have at least 2 NIC ports and they'll each need to connect to the separate networks. For very simple configurations you can just connect everything to one switch but again, the more redundancy you can work into your SAN the better.

By default, QuantaStor uses Linux bonding mode-0, a round-robin policy. This mode provides load balancing and fault tolerance by transmitting packets in sequential order from the first available interface through the last. QuantaStor also supports LACP 802.3ad Dynamic Link aggregation. Use the 'Modify Storage System' dialog in the web management interface to change the default bonding mode for you appliance.

Enable LACP Port Bonding

10GbE NIC support

QuantaStor works with all the major 10GbE cards from Chelsio, Intel and others. We recommend the Intel 10GbE cards and you can use NIC bonding in conjunction with 10GbE to further increase bandwidth. If you are using 10GbE we recommend that you designate your slower 1GbE ports as iSCSI disabled so that they are only used for management traffic.

Pool Remote-Replication Configuration

Pool remote-replication is only supported with XFS based storage pools. With ZFS based storage pools remote-replication is handled at a storage volume and network share level so you can replicate just the data that you want and not have to replicate the entire pool. The other advantage is that ZFS based storage pools replicate using smart replication where only the changes are sent and the data is compressed.

In both cases (ZFS and XFS replication), you must start by creating a Grid. To create the grid you'll need to right-click on the storage system and then choose 'Create Grid..' then give it a name. Once you have a grid you can 'Add Grid Node..' to add another storage appliance to the grid. Once added you will have a 2 node grid and you can now setup a remote replication policy for volumes & shares if you're using a ZFS based pool. Or, if you're using a XFS based storage pool you can setup a pool level replication link.

Setting up a DRBD based Storage Pool replication link (XFS based pools only)

Next you'll need to create one storage pool on each system with the storage pool on the secondary being at least as large as the source storage pool on the primary. If the Primary already exists, great, in this case you'll just need to create an empty storage pool on the target/remote secondary. Once you have that created, right-click on the primary pool and choose 'Create Pool Replication Link..'. It will bring up a dialog where you can choose the source/primary pool, and the designated target/secondary pool at the bottom. Note also that you'll need to select the IP address through which the network traffic will flow between the primary and the secondary. Once you have that selected be sure to review that the pool, IP, and storage system selection is correct for the primary and the secondary/target and press OK. You've now setup replication though you'll need to give the system about 2 minutes to get everything properly setup. In the end you'll see that the Primary storage pool will have a new object underneath it in the tree view that says 'Primary/Secondary' and the target storage pool will have a new object that says 'Secondary/Primary'. The first part indicates the role of the local pool, and the second part indicates the role of the remote storage pool. When you select this object which is also called the 'Pool Replication Link' or 'Pool Replication Configuration' you'll also see the progress of the replication activity in the properties page on the right. Once the replication has reached 100% you'll be able to fail-over to your secondary / DR site. Note also that the initial replication is a one time process but it can take up to a couple of days for larger 16TB storage pools. Note also that it is important to have at least 10MB/sec of bandwidth minimum between your Primary system and your Secondary system or else you'll see a big drop in performance under write load. Better is to have 50MB/sec or more of bandwidth between sites. You can test that by doing a simple FTP of a large file or using a performance test tool to check how much bandwidth you have on your network between the two systems.

Summary review of pool replication setup process:
- Create a Grid by right-clicking on the storage system. (n.b. Grid support is not available in the Community Edition)
- Add the target storage system to the grid as a new node by right-clicking on the grid in the tree and choosing 'Add Grid Node..'
- Now that you have a 2 node grid, you can now start replicating storage from the Primary pool on node 'A' to the target Seconary pool on node 'B' in your DR site.
- You must have two pools that are the same size in order to create a link
- You must also make sure that the seconary/target storage pool has no volumes/shares in it
- You must make sure that there is at least 10-20 MB/sec of bandwidth between the source and target storage system, ideally 50MB/sec or more
- If the storage pool on your target storage system node has not been created you'll need to do that now.
- Next, right-click on the primary storage pool on node 'A' and choose 'Create Pool Replication Link..'. One you create the link QuantaStor will do the rest.

After the link is created you must wait, potentially several hours or days depending on the size of the storage pool and the speed of your link. For a storage pool that is 8TB it will take 28 hours to replicate the secondary storage pool at your target node 'B' with a 80MB/sec link. For 16TB, it will take a little over 2 days. Note that this is a one time hit and that you will not need to resync after this because a map is maintained of all writes to both storage pools so that re-sync can be done quickly and efficiently even if the two storage pools have been disconnected for weeks.

Activating DR Fail-over

When the initial replication has completed you'll be able to 'Promote Pool to Replication Primary' at any time. To do this, just right-click on the Pool Replication Link object on your secondary storage pool, then choose 'Promote Pool to Replication Primary'.

After that you'll be presented with a dialog that looks like this:

The pool you see selected will be promoted to Primary status and in doing so all of the volumes and shares in that pool will become available. Note that when a remote storage pool is activated that QuantaStor will automatically rename the device IQNs and ID to make it unique so that it doesn't collide with the device IDs of the original primary. This means that you'll need to setup your host (Windows, VMware, XenServer, etc) to login to these new iSCSI IQNs and not the old IQNs used at the primary site.

Note also that activation of the remote DR site's storage pool doesn't in any way effect the status of the source/original Primary. In fact it is a common and necessary scenario to activate the failover site to verify that it works (failover testing) and then just demote the pool when you're done testing and any changed blocks will be resent from the primary pool to bring it back up to date. In such a testing scenarios the primary site never goes offline and the workloads continue to work from the primary site but you have a complete copy that's active on the secondary site for testing or possibly even off-host backup.

In this state you will see both storage pools in the 'Primary/Unknown' state which means that they're both primary, and they both don't know what the current state is of its remote storage pool counterpart but the driver is keeping track of which blocks have changed for easy/quick resync later.

Deactivating the DR Fail-over pool (Test Scenario)

If you have two sites NY as primary and Seattle as secondary you can promote the Seattle storage pool to Primary status while NY is still active as a primary. This is how you can do failover testing without interrupting any of the workloads that are actively using the storage pool in NY.

Summary of this procedure:
- Promote DR site to 'Primary' status and it will change from 'Secondary/Primary' to 'Primary/Unknown' as outlined in the prior section.
- With the storage pool active you can now boot/attach workloads to the iSCSI volumes and network shares. The iSCSI volumes will have a suffix of _dr000 to indicate they are DR site replica copies.
- Now you can test that the workloads/VMs successfully start and are working but you do not want to change any of the global DNS entries for your workloads as that would be a complete failover and NY will become out-of-date / state.
- Once DR site testing is complete in Seattle you will simply 'Demote Pool to Replication Secondary' on the storage pool link in Seattle.

Demoting the storage pool in Seattle will cause it to be overwritten with blocks from NY to bring it back up to date. The replication driver (DRBD) keeps track of all changes made to the Seattle storage pool and all changes made to the NY storage pool while they were disconnected. When the secondary is demoted this information about which block have changes enables it to quickly and efficiently complete the resynchronization in seconds or minutes instead of hours. Note also that once the Seattle site is demoted it will be in the 'Secondary/Primary' state again. You'll also see it synchronizing for a short time as Seattle is brought back to the 'Up To Date' state. If it says 'Inconsistent' then it is either disconnected or still synchronizing.

Reversing the flow / DR Fail-back (Live Scenario)

Let's take as an example that you have two sites, New York is your primary and Seattle is your secondary / DR site. You had power outage in New York and activated the DR site in Seattle then started up all the VMs/workloads and they ran out of the Seattle site for say one week. At this point Seattle now has the most current copy of your data and New York can be over written as it is stale. Here's the process to recover your data back to NY and reactivate it as the primary site:

First you'll need to demote the NY storage pool to 'Secondary' status. This will start the flow of data from Seattle to New York. Before you demote NY both pools will have a link in 'Primary/Unknown' status. After you demote NY the pool in NY will show 'Secondary/Primary' and the pool in Seattle will show 'Primary/Secondary'.
Second, you need to wait. Look at the link object of either side and it will show you the progress of the replication to bring the changes over from Seattle to New York. Once both sides say 'Up To Date' in their status then you're ready to activate New York.
Now that the pool in New York is an exact copy of the pool in Seattle which is and has been the primary site for the last week you can now orchestrate a fail-back to New York.
Fail-back will require some amount of transition time as once you break the link between Seattle and NY you'll need to redirect your DNS entries for your apps/web servers back to NY and that can take some time. You'll also need to boot the VMs in NY and if the Seattle site is still online during this time then any transactions to Seattle during this time will be lost. So the best thing to do would be to schedule some downtime and then move the global IP addresses for your VMs over first. Once they switch over your Seattle site is offline and won't record any more transactions to your databases/app servers and you should probably suspend or stop the VMs in Seattle. Now is the time to promote the storage pool in NY to 'Primary' status which will just take a few seconds.
At this point both storage pools are back in 'Primary/Unknown' status and both have exactly the same data. Now that the NY pool is active and DNS work is done for your workloads you can boot the VMs in NY on the original primary storage pool which is now active.
If all the VMs/workloads are started and all has gone well then you've successfully completed the fail-back.
We recommend that you wait an hour to make sure everything checks out and then demote the storage pool in Seattle to secondary status so that new changes to the pool in NY will replicate over to Seattle as your DR site again.
If there was any problem with the fail-back to NY you can simply restart the VMs in Seattle and update the DNS entries accordingly. This is why you want to leave Seattle in the 'Primary/Unknown' state until you are absolutely sure all of the workloads have come back online successfully in NY. With Seattle left alone the worst case scenario is to just reactivate Seattle and reschedule the fail-back to NY for another day.

Given the complexity of DR failover we highly recommend testing your DR fail-over site on a regular basis and we recommend exercising the failover / failback process outlined above in a test environment to become more familiar with the process. Trial Edition keys are available on our main web site and include the DR features so setting up a couple of QuantaStor Virtual Storage Appliances is an easy way to become an expert without having to dedicate hardware.

DR with Volume & Share Remote-Replication

Volume and Share Remote-replication within QuantaStor allows you to copy a volume or network share from one QuantaStor storage system to another and is a great tool for migrating volumes and network shares between systems and for using a remote system as a DR site. Remote replication is done asynchronously which means that changes to volumes and network shares on the original/source system are made up to every hour.

Once a given set of the volumes and/or network shares have been replicated from one system to another the subsequent periodic replication operations send only the changes and all information sent over the network is compressed to minimize network bandwidth and encrypted for security. ZFS based storage pools use the ZFS send/receive mechanism which efficiently sends just the changes so it works well over limited bandwidth networks. Also, if your storage pool has compression enabled the changes sent over the network are also compressed which further reduces your WAN network load.

XFS based storage pools do not have the advanced replication mechanisms like ZFS send/receive so we employ more brute force techniques for replication. Specifically, when you replicate a XFS based storage volume or network share QuantaStor uses the linux rsync utility. It does have compression and it will only send changes but it doesn't work well with large files because the entire file must be scanned and in some cases resent over the network. Because of this we highly recommend using ZFS based storage pools for all deployments unless you specifically need the high sequential IO performance of XFS for a specific application.

Creating a Storage System Link

The first step in setting up DR/remote-replication between two systems is to create a Storage System Link between the two. This is accomplished through the QuantaStor Manager web interface by selecting the 'Remote Replication' tab, and then pressing the 'Create Storage System Link' button in the tool bar to bring up the the dialog. To create a storage system link you must provide the IP address of the remote system and the admin username and password for that remote system. You must also indicate the local IP address that the remote system will utilize for communication between the remote and local system. If both systems are on the same network then you can simply select one of the IP addresses from one of the local ports but if the remote system is in the cloud or remote location then most likely you will need to specify the external IP address for your QuantaStor system. Note that the two systems communicate over ports 22 and 5151 so you will need to open these ports in your firewall in order for the QuantaStor systems to link up properly.

Creating a Remote Replica

Once you have a Storage System Link created between two systems you can now replicate volumes and network shares in either direction. Simply login to the system that you want to replicate volumes from, right-click on the volume to be replicated, then choose 'Create Remote Replica'. Creating a remote replica is much like creating a local clone only the data is being copied over to a storage pool in a remote storage system. As such, when you create a remote-replica you must specify which storage system you want to replicate too (only systems which have established and online storage system links will be displayed) and which storage pool within that system should be utilized to hold the remote replica. If you have already replicated the specified volume to the remote storage system then you can re-sync the remote volume by choosing the remote-replica association in the web interface and choosing 'resync'. This can also be done via the 'Create Remote Replica' dialog and then choose the option to replicate to an existing target if available.

Alert Settings

QuantaStor allows you to thin-provision storage and over provision storage but that feature comes with the associated risk of running out of disk space. As such, you will want to make sure that you configure and test your alert configuration settings in the Alert Manager. The Alert Manager allows you to specify at which thresholds you want to receive email regarding low disk space alerts for your storage pools. It also let's you specify the SMTP settings for routing email.

Managing Hosts

Hosts represent the client computers that you assign storage volumes to. In SCSI terminology the host computers initiate the communication with your storage volumes (target devices) and so they are called initiators. Each host entry can have one or more initiators associated with it and the reason for this is because an iSCSI initiator (Host) can be identified by IP address or IQN or both at the same time. We recommend using the IQN (iSCSI Qualified Name) at all times as you can have login problems when you try to identify a host by IP address especially when that host has multiple NICs and they're not all specified.

Managing Host Groups

Sometimes you'll have multiple hosts that need to be assigned the same storage volume(s) such as with a VMware or a XenServer resource pool. In such cases we recommend making a Host Group object which indicates all of the hosts in your cluster/resource pool. With a host group you can assign the volume to the group once and save a lot of time. Also, when you add another host to the host group, it automatically gets access to all the volumes assigned to the group so it makes it very easy to add nodes to your cluster and manage storage from a group perspective rather than individual hosts which can be cumbersome especially for larger clusters.

Managing Snapshot Schedules

Snapshot schedules enable you to have your storage volumes automatically protected on a regular schedule by creating snapshots of them. You can have more than one snapshot schedule, and each schedule can be associated with any storage volumes even those utilized in other snapshot schedules. In fact, this is something we recommend. For storage volumes containing critical data you should create a snapshot schedule that makes a snapshot of your volumes at least once a day and we recommend that you keep around 10-20 snapshots so that you have a week or two of snapshots that you can recover from. A second schedule that creates a single snapshot on the weekend of your critical volumes is also recommended. If you set that schedule to retain 10 snapshots that will give you over two months of historical snapshots from which you can recover data from.

Near Continuous Data Protection (N-CDP)

What all this boils down to is a feature we in the storage industry refer to as continuous data protection or CDP. True CDP solutions allow you to recover to any prior point in time at the granularity of seconds. So if you wanted to see what a storage volume look like at 5:14am on Saturday you could look at a 'point-in-time' view of that storage volume at that exact moment. Storage systems that allow you to create large number of snapshots thereby giving you the ability to roll-back or recover from a snapshot that was created perhaps every hour are referred to as NCDP or "near continuous data protection" solutions, and that's exactly what QuantaStor is. This NCDP capability is achieved through snapshot schedules, so be sure to set one up to protect your critical volumes and network shares.

Managing Sessions

The list of active iSCSI sessions with the storage system can be found by selecting the 'Storage System' tree-tab in QuantaStor Manager then selecting the 'Sessions' tab in the center view. Here's a screenshot of a list of active sessions as shown in QuantaStor Manager.

Dropping Sessions

To drop an iSCSI session, just right-click on it and choose 'Drop Session' from the menu.

Keep in mind that some initiators will automatically re-establish a new iSCSI session if one is dropped by the storage system. To prevent this, just unassign the storage volume from the host so that the host cannot re-login.

Managing Network Shares

In QuantaStor you can use either NFSv3 or NFSv4. This can be changed from within the "NFS Services Configuration" dialog. To open this dialog navigate to the "Network Shares" tab, and select "Configure NFS" from the ribbon bar at the top, or "Configure NFS Services" by right clicking the open space under the "Network Share" section to bring up the context menu.

Controlling NFS Access

NFS share access is filtered by IP address. This can be done by right clicking on a network share, and selecting "Add Host Access". By default the share is set to have public access. This dialog allows you to specify access to a single IP address, or a range of IP addresses.

NFS Custom Options

You can also specify different custom options from within the "Modify Network Share Client Access" dialog. To open this menu, right click on the share's host access (defaults to public), and select "Modify Host Access". In this dialog you can set different options such as "Read Only", "Insecure", etc. You can also add custom options such as "no_root_squash" in the space provided below.

Controlling CIFS Access

CIFS access can be controlled on a per user basis. When you are not in a domain, the users you can choose from are the different users you have within QuantaStor. This can be done during share creation by selecting "CIFS/SMB Advanced Settings", or while modifying a share under the tab "CIFS User Access". If you are in a domain, you will also be able to select the different users/groups that are present within the domain. This can be done the same way as using the QuantaStor users, but by selecting "AD Users" or "AD Groups". You can set the access to either "Valid User", "Admin User", or "Invalid User".

Verifying Users Have CIFS Passwords

Before using a QuantaStor user for CIFS/SMB access you must first verify that the user has a CIFS password. To check if the user can be used for CIFS/SMB first go to the "Users & Groups". Now select a user, and look for the property "CIFS Ready". If the user is ready to be used within CIFS/SMB it will say "Yes". If the property says "Password Change Required" then one more step is required before that user can be used. You must first right click the user and select "Set Password". If you are signed in as an administrator, then the old password is not required. When setting the password for CIFS/SMB, you can use the same password as what it was set as before. It should now show up as CIFS ready.

Setting CIFS Options

You can modify some of the share options during share creation, or while modifying the share. Most of the options are set by selecting/unselecting the checkboxes. You can also set the file and directory permissions in the modify share dialog under the "CIFS File Permissions" tab.

Active Directory Configuration

Joining an AD Domain

To join a domain first navigate to the "Network Shares" section. Now select "Configure CIFS" in the top ribbon bar, or by right clicking in the "Network Shares" space and selecting "Configure CIFS Services" from the context window. Check the box to enable active directory, and provide the necessary information. KDC is most likely your domain controllers FQDN (DC.DOMAIN.COM).
Note: Your storage system name must be <= 15 characters long.
If there are any problems joining the domain please verify that you can ping the IP address of the domain controller, and that you are also able to ping the domain itself.

You can now see QuantaStor on the domain controller under the Computer entry tab.

Leaving a AD Domain

To leave a domain first navigate to the "Network Shares" section. Now select "Configure CIFS" in the top ribbon bar, or by right clicking in the "Network Shares" space and selecting "Configure CIFS Services" from the context window. Unselect the checkbox to disable active directory integration. If you would like to remove the computer entry from to domain controller you must also specify the domain adminstrator and password. After clicking "OK" QuantaStor will then leave the domain.

Managing Storage Volumes

Each storage volume is a unique iSCSI device or 'LUN' as it is often referred to in the storage industry. The storage volume is essentially a disk drive on the network (the SAN) that you can assign to any host in your environment.

Creating Storage Volumes

Storage volumes can be provisioned 'thick' or 'thin' which indicates whether the storage for the volume should be fully reserved (thick) or not (thin). As an example, a 100GB storage volume in a 1TB storage pool will only use 4KB of disk space in the pool when it is initially created leaving .99TB of disk space left over for use with other volumes and additional volume provisioning. In contrast, if you choose 'thick' provisioning by unchecking the 'thin provisioning' option then the entire 100GB will be pre-reserved. The advantage there is that that volume can never run out of disk space due to low storage availability in the pool but since it is reserved up front you will have 900GB free in your 1TB storage pool after it has been allocated so you can end up using up your available disk space fairly rapidly using thick provisioning.

Deleting Storage Volumes

There are two separate dialogs in QuantaStor manager for deleting storage volumes. If you press the the "Delete Volume(s)" button in the ribbon bar you will be presented with a dialog that will allow you to delete multiple volumes all at once and you can even search for volumes based on a partial name match. This can save a lot of time when you're trying to delete a multiple volumes. You can also right-click on a storage volume and choose 'Delete Volume' which will bring up a dialog which will allow you to delete just that volume. If there are snapshots of the volume you are deleting they are not deleted rather, they are promoted. For example, if you have snapshots S1, S2 of volume A1 then the snapshots will become root/primary storage volumes after A1 is deleted. Once a storage volume is deleted all the data is gone so use extreme caution when deleting your storage volumes to make sure you're deleting the right volumes. Technically, storage volumes are internally stored as files on a ext4 or btrfs filesystem so it is possible that you could use a filesystem file recovery tool to recover a lost volume but in generally speaking you would need to hire a company that specializes in data-recovery to get this data back.

Resizing Storage Volumes

QuantaStor supports increasing the size of storage volumes but due to the high probability of data-loss we do not support shrink. (n.b. all storage volumes are raw files within the storage pool filesystem (usually XFS) so you could theoretically experiment by making a copy of your storage volume file, manually truncate it, rename the old one and then rename the truncated version back into place. This is not recommended, but it's an example of some of the low-level things you could try in a real pinch given the open nature of the platform.)

Creating Snapshots

QuantaStor snapshots are probably not like any snapshots you've used with any other storage vendor on the market. Some key features of QuantaStor volume snapshots include:

massive scalability
- create hundreds of snapshots in just seconds
supports snapshots of snapshots
- you can create snapshots of snapshots of snapshots, ad infinitum.
snapshots are R/W by default, read-only snapshots are also supported
snapshots perform extremely well even when large numbers exist
snapshots can be converted into primary storage volumes instantly
you can delete snapshots at any time and in any order
snapshots are 'thin', that is they are a copy of the meta-data associated with the original volume and not a full copy of all the data blocks.

All of these advanced snapshot capabilities make QuantaStor ideally suited for virtual desktop solutions, off-host backup, and near continuous data protection (NCDP). If you're looking to get NCDP functionality, just create a 'snapshot schedule' and snapshots can be created for your storage volumes as frequently as every hour.

To create a snapshot or a batch of snapshots you'll want to select the storage volume that you which to snap, right-click on it and choose 'Snapshot Storage Volume' from the menu.

If you do not supply a name then QuantaStor will automatically choose a name for you by appending the suffix "_snap" to the end of the original's volume name. So if you have a storage volume named 'vol1' and you create a snapshot of it, you'll have a snapshot named 'vol1_snap000'. If you create many snapshots then the system will increment the number at the end so that each snapshot has a unique name.

Creating Clones

Clones represent complete copies of the data blocks in the original storage volume, and a clone can be created in any storage pool in your storage system whereas a snapshot can only be created within the same storage pool as the original. You can create a clone at any time and while the source volume is in use because QuantaStor creates a temporary snapshot in the background to facilitate the clone process. The temporary snapshot is automatically deleted once the clone operation completes. Note also that you cannot use a cloned storage volume until the data copy completes. You can monitor the progress of the cloning by looking at the Task bar at the bottom of the QuantaStor Manager screen. In contrast to clones, snapshots are created near instantly and do not involve data movement so you can use them immediately.

Restoring from Snapshots

If you've accidentally lost some data by inadvertently deleting files in one of your storage volumes, you can recover your data quickly and easily using the 'Restore Storage Volume' operation. To restore your original storage volume to a previous point in time, first select the original, the right-click on it and choose "Restore Storage Volume" from the pop-up menu. When the dialog appears you will be presented with all the snapshots of that original from which you can recover from. Just select the snapshot that you want to restore to and press ok. Note that you cannot have any active sessions to the original or the snapshot storage volume when you restore, if you do you'll get an error. This is to prevent the restore from taking place while the OS has the volume in use or mounted as this will lead to data corruption.

WARNING: When you restore, the data in the original is replaced with the data in 
the snapshot.  As such, there's a possibility of loosing data as everything that 
was written to the original since the time the snapshot was created will be lost.  
Remember, you can always create a snapshot of the original before you restore it 
to a previous point-in-time snapshot.

Converting a Snapshot into a Primary (btrfs only)

A primary volume is simply a storage volume that's not a snapshot of any other storage volume. With QuantaStor you can take any snapshot and make it a primary storage very easily. Just select the storage volume in QuantaStor Manager, then right-click and choose 'Modify Storage Volume' from the pop-up menu. Once you're in the dialog, just un-check the box marked "Is Snapshot?". If the snapshot has snapshots of it then those snapshots will be connected to the previous parent volume of the snapshot. This conversion of snapshot to primary does not involve data movement so it's near instantaneous. After the snapshot becomes a primary it will still have data blocks in common with the storage volume it was previously a snapshot of but that relationship is cleared from a management perspective.

IO Tuning

ZFS Performance Tuning

One of the most common tuning tasks that is done for ZFS is to set the size of the ARC cache. If your system has less than 10GB of RAM you should just use the default but if you have 32GB or more then it is a good idea to increase the size of the ARC cache to make maximum use of the available RAM for your storage appliance. Before you set the tuning parameters you should run 'top' to verify how much RAM you have in the system. Next, run this command to set the amount of RAM to some percentage of the available RAM. For example to set the ARC cache to use a maximum of 80% of the available RAM, and a minimum of 50% of the available RAM in the system, run these, then reboot:

qs-util setzfsarcmax 80
qs-util setzfsarcmax 50

Example:

sudo -i
qs-util setzfsarcmax 80
INFO: Updating max ARC cache size to 80% of total RAM 1994 MB in /etc/modprobe.d/zfs.conf to: 1672478720 bytes (1595 MB)
qs-util setzfsarcmin 50
INFO: Updating min ARC cache size to 50% of total RAM 1994 MB in /etc/modprobe.d/zfs.conf to: 1045430272 bytes (997 MB)

To see how many cache hits you are getting you can monitor the ARC cache while the system is under load with the qs-iostat command:

qs-iostat -af

ZFS Adaptive Replacement Cache (ARC) / read cache statistics

Name                              Data
---------------------------------------------
hits                              1099360191
misses                            65808011
c_min                             67108864
c_max                             1045925888
size                              26101960
arc_meta_used                     11552968
arc_meta_limit                    261481472
arc_meta_max                      28478856

ZFS Intent Log (ZIL) / writeback cache statistics

Name                              Data
---------------------------------------------
zil_commit_count                  25858
zil_commit_writer_count           25775
zil_itx_count                     12945

Pool Performance Profiles

Read-ahead and request queue size adjustments can help tune your storage pool for certain workloads. You can also create new storage pool IO profiles by editing the /etc/qs_io_profiles.conf file. The default profile looks like this and you can duplicate it and edit it to customize it.

[default]
name=Default
description=Optimizes for general purpose server application workloads
nr_requests=2048
read_ahead_kb=256
fifo_batch=16
chunk_size_kb=128
scheduler=deadline

If you edit the profiles configuration file be sure to restart the management service with 'service quantastor restart' so that your new profile is discovered and is available in the web interface.

XFS Tuning Parameters

QuantaStor has a number of tunable parameters in the /etc/quantastor.conf file that can be adjusted to better match the needs of your application. That said, we've spent a considerable amount of time tuning the system to efficiently support a broad set of application types so we do not recommend adjusting these settings unless you are a highly skilled Linux administrator. The default contents of the /etc/quantastor.conf configuration file are as follows:

[device]
nr_requests=2048
scheduler=deadline
read_ahead_kb=512

[mdadm]
chunk_size_kb=256
parity_layout=left-symmetric

There are tunable settings for device parameters, md array chunk-size and parity configuration settings, as well as some settings for btrfs. These configuration settings are read from the configuration file dynamically each time one of the settings is needed so there's no need to restart the quantastor service. Simply edit the file and the changes will be applied to the next operation that utilizes them. For example, if you adjust the chunk_size_kb setting for mdadm then the next time a storage pool is created it will use the new chunk size. Other tunable settings like the device settings will automatically be applied within a minute or so of your changes because the system periodically checks the disk configuration and updates it to match the tunable settings. Also, you can delete the quantastor.conf file and it will automatically use the defaults that you see listed above.

PagerDuty

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on duty engineer if there's a problem.

Quantastor can be setup to trigger PagerDuty alerts when Quantastor encounters an alert that is of severity "Error", "Warning", or "Critical". Getting setup only requires a few simple steps (internet connection required).

Adding a New Service in PagerDuty

After logging into your PagerDuty account click on the "Services" tab along the top. From here click on the "Add New Service" button.

This service is what all of the Quantastor alerts will be kept under. This will keep the alerts separate from the other programs that may be sending their alerts to PagerDuty.

For the "Service Name" field I would recommend something that describes the box or grid that is being monitored. Also make sure to select "Generic API System" under service type. Quantastor uses PagerDuty's API to post the alert to PagerDuty. After everything is set click "Add Service".

Everything on the PagerDuty side should now be setup. Copy the "Service API Key" and set it aside. This key is the input parameter to tell Quantastor where to post the alert.

Adding PagerDuty to Quantastor

	Open the web interface for the Quantastor system. Right click on the storage system or grid, and select "Alert Manager".
	In the text box titled "PagerDuty.com Service Key" paste the service key from before. Then click on "Apply".
	To test if the system is working select generate test alert. Make sure to select a severity level of "Error", "Warning", or "Critical" and then click okay. If everything is setup correctly a test alert should now be generated and sent to PagerDuty.

Example Alerts

When Quantastor sends an alert to PagerDuty it also sends a list of details to make solving the issue easier. These details include:

The serial number of the system
The startup time of the system
The location
The title of the alert
The version of the Quantastor service
The time at which the alert was sent
The name of the system
The id of the system
The current firmware version
The severity of the alert

Librato Metrics

Metrics takes away the headaches of traditional server based monitoring solutions that take time to set up, require investments in hardware and take effort to maintain. Metrics is delivered as a service so you don't have to worry about storage, reliability, redundancy, or scalability.

Setup for Librato Metrics

	To post data to Librato Metrics you first must have a Librato Metrics account, which can be created through their website at https://metrics.librato.com. Next you will want to go to your account settings page. This is where you will find your username (email used to create the account) and your API token. This token will be used to post data. At this screen you can do other things such as change your password, or generate a new API token.
	When you create the API token, make sure that it is set to "Full Access". This will allow us to create the different Instruments and Dashboards.
	The next step is to configure Quantastor to post data to Librato Metrics using the same API token. Right click on the storage system you wish to post data, and select the Librato Metrics settings. In the dialog that appears set your username as the email you use to log into Librato Metrics. Paste the token from the Librato Metrics site into the token field. The post interval allows you to change how often Quantastor will send data to Librato Metrics. The default value is 60 seconds. Click "OK", and Quantastor should begin posting data.

Viewing the Metrics

To view the data you will first sign into your Librato Metrics account. After signing in click on the "Metrics" tab along the top. This will bring you to a list of all the metrics that have been posted to your account. Quantastor uses a naming convention of: "<storage system/grid name> - <gauge name>"

Quantastor creates the following gauges:

Metrics

CPU Load Average
Storage Pool Free Space
Storage Pool Reads Per Sec
Storage Pool Read kB Per Sec
Storage Pool Writes Per Sec
Storage Pool Write kB Per Sec

Instruments

Storage Pool Read:Write
Storage Pool Read:Write kBps

Examples

The picture on the left shows an example of a gauge Metric. This graph is the CPU load averages Metric. In the top right corner of the graph you can change the window of time that is currently being viewed.

To the right of that is an example of an Instrument. An Instrument is a combination of of different Metrics. In this Instrument the Storage Pool Read kBps and Write kBps have been combined into one graph.

Nagios Integration / Support

This article has some good detail on setting up Nagios but the installation requires running just a couple of commands:

sudo apt-get update
sudo apt-get install -y nagios3

When installing Nagios for use with QuantaStor note that you must adjust the default port number for apache to something other than port 80 which conflicts with the QuantaStor web management service. For more information on changing the apache port numbers, please see this article which has more detail. To change the port numbers edit '/etc/apache2/ports.conf' and modify the default port number of 80 something like to 8001 and 443 to 4431. Finally, restart apache with 'service apache2 restart'.

After the port number has been changed you can then access Nagios via your web browser at the new port number like so:

http://your-appliance-ip-address:8001/nagios3/

Zabbix Integration / Support

To enable the Zabbix agent directly within your QuantaStor appliance you'll need to install the agent as per the Zabbix documentation on how to install into Ubuntu Server 12.04 (Precise) which can be found here.

Here is a quick summary of the commands to run as detailed on the Zabbix web site:

sudo -i
wget http://repo.zabbix.com/zabbix/2.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_2.0-1precise_all.deb
dpkg -i zabbix-release_2.0-1precise_all.deb
apt-get update
apt-get install zabbix-server-mysql zabbix-frontend-php

Note that Zabbix uses the apache2 web server for its web management interface. Apache uses port 80 by default which conflicts with the Tomcat service QuantaStor uses for its web management interface. As such, you must edit the /etc/apache2/ports.conf file to change the default port numbers. For example you can change 80 to 8001 and 443 to 4431, then restart the apache service with 'service apache2 restart'. This will eliminate the port conflict with the QuantaStor manager web interface. For more information on changing the apache port numbers, please see this article which has more detail.

After the port number has been changed you can then access Nagios via your web browser at the new port number like so:

http://your-appliance-ip-address:8001/zabbix/

Samba v4 / SMB3 Support

QuantaStor versions 3.8.2 and newer have support for Samba v4 but an additional configuration step is required to upgrade your system from the default Samba server (Samba v3.6.3) to Samba v4. The command you need to run as root at the console/SSH is:

sudo samba4-install

It will ask you a few questions about your Active Directory configuration. Your answers might look similar to these (note you must use the default 'dc' mode, we do not yet support the other modes). Note also that you must provide a strong password for the domain 'Administrator password' or the script will fail and you'll need to retry using the procedure outlined below.

Realm [UNASSIGNED-DOMAIN]: osnexus.net
 Domain [osnexus]:
 Server Role (dc, member, standalone) [dc]:
 DNS backend (SAMBA_INTERNAL, BIND9_FLATFILE, BIND9_DLZ, NONE) [SAMBA_INTERNAL]:
 DNS forwarder IP address (write 'none' to disable forwarding) [192.168.0.1]: none
Administrator password:
Retype password:

If you make a mistake and need to reconfigure the AD configuration settings just re-run the installer and it will prompt you again to enter the AD configuration settings. In some cases you will have to uninstall samba4, and cleanup the remnants of the failed install, then try again like so:

sudo -i
apt-get remove samba4
rm -rf /opt/samba4
samba4-install

As of 12/19/2013 we only support the default 'dc' mode and have not yet complete testing of the other modes, namely 'standalone' and 'member'. After the installation completes you can run this command to verify that the samba4 services are running:

service samba4 status
smbstatus -V

Starting in QuantaStor v3.9 the samba4-install script will turn off the enforcement of strong passwords but you can manually adjust it meet your company's security requirements by running this command. For strong passwords you'd want a minimum password length of 10 with the complexity requirement turned 'on' rather than 'off'. Note also that any existing user 'local' user accounts will need to have their passwords re-applied when you upgrade to Samba4, but that does not apply to AD accounts. If you have strong passwords enabled and a given user has a password that is not strong left over from a prior config then it will block the login when they attempt to access it from their Windows host.

samba-tool domain passwordsettings set --min-pwd-length=1 --complexity=off

If you have any questions please feel free to contact us at support (at) osnexus.com or via the Community Support Forum.

Custom Scripting / Application Extensions

QuantaStor has script call-outs which you can use to extend the functionality of the appliance. For example, you may have a an application which needs to be notified before or after a storage pool starts or stops. Or you may have need to call a script before an automated snapshot policy starts in order to quiesce applications. Note that scripts are called from the root user account so you must be careful to not allow anyone but the root user to have rights to create these files under /var/opt/osnexus/custom. Your scripts should also be configured with file permissions using the command 'chmod 700 scriptname.sh' to prevent non-root user accounts from modifying the scripts. Note also that your script must complete within 120 seconds; scripts taking longer are automatically killed.

Where to put your custom scripts

Custom script call-outs are hard-wired to specific file names and must be placed in the custom scripts directory '/var/opt/osnexus/custom' within your QuantaStor appliance. If you have a grid of appliances you'll need to install your script onto all of the appliances.

Custom Scripts Directory:

/var/opt/osnexus/custom

Storage System Custom Scripts

Scripts related to the startup / shutdown of the appliance.

system-poststart.sh

The system poststart script is only called one time when the system boots up. If the management services are restarted it will check against the timestamp in /var/opt/osnexus/quantastor/qs_lastreboot an only call the system-poststart.sh script if it has changed. If you want your poststart script to run every time the management service is restarted you can just delete the qs_lastreboot file in your script.

system-prestop.sh

Called when the user initiates a shutdown or a restart via the web management interface (or CLI). Note that if the admin bypasses the normal shutdown procedure and restarts the appliance at the console using 'reboot' or 'shutdown -P now' or similar command your script won't get called.

Storage Pool Custom Scripts

If you have custom applications running within the appliance which need to attach/detach from the pool or specific directories within a given storage pool these scripts may be helpful to you.

pool-poststart.sh

Called just after a storage pool is started. The UUID of the pool is provided as an input arguement to the script as '--pool=<POOLUUID>'. You can use 'qs pool-get <POOLUUID> --server=localhost,admin,password --xml' to get more detail about the storage pool from within your script. The --xml flag is optional, and you'll need to provide the correct admin password.

pool-prestop.sh

Called just before the pool is stopped.

@@ Line 735: / Line 735: @@
 appliances you'll need to install your script onto all of the appliances.
+Custom Scripts Directory:
 <pre>
-Custom Scripts Directory:  /var/opt/osnexus/custom
+/var/opt/osnexus/custom
 </pre>

Difference between revisions of "+ Admin Guide Overview"

Revision as of 21:37, 25 January 2014

Contents

Storage System Management Operations

License Management

Recovery Manager

Upgrade Manager

System Checklist

System Hostname & DNS management

Physical Disk Management

Identifying physical disks in an enclosure

Scanning for physical disks

Hardware Controller & Enclosure Integration

Adaptec RAID integration

Fusion IO integration

LSI 3ware integration

LSI MegaRAID / DELL PERC integration

Common Configuration Settings

Disable Copyback

Increasing the RAID unit Rebuild Rate

Disabling the Alarm

Auto Import Foreign RAID units

Installing the MegaRAID CLI on older QuantaStor v2 systems

HP SmartArray RAID integration

Managing Storage Pools

RAID Levels

Target Port Configuration

NIC Bonding / Trunking

10GbE NIC support

Pool Remote-Replication Configuration

Setting up a DRBD based Storage Pool replication link (XFS based pools only)

Activating DR Fail-over

Deactivating the DR Fail-over pool (Test Scenario)

Reversing the flow / DR Fail-back (Live Scenario)

DR with Volume & Share Remote-Replication

Creating a Storage System Link

Creating a Remote Replica

Alert Settings

Managing Hosts

Managing Host Groups

Managing Snapshot Schedules

Near Continuous Data Protection (N-CDP)

Managing Sessions

Dropping Sessions

Managing Network Shares

Controlling NFS Access

NFS Custom Options

Controlling CIFS Access

Verifying Users Have CIFS Passwords

Setting CIFS Options

Active Directory Configuration

Joining an AD Domain

Leaving a AD Domain

Managing Storage Volumes

Creating Storage Volumes

Deleting Storage Volumes

Resizing Storage Volumes

Creating Snapshots

Creating Clones

Restoring from Snapshots

Converting a Snapshot into a Primary (btrfs only)

IO Tuning

ZFS Performance Tuning

Pool Performance Profiles

XFS Tuning Parameters

PagerDuty

Adding a New Service in PagerDuty

Adding PagerDuty to Quantastor

Example Alerts

Librato Metrics

Setup for Librato Metrics

Viewing the Metrics

Examples

Nagios Integration / Support

Zabbix Integration / Support

Samba v4 / SMB3 Support

Custom Scripting / Application Extensions

Where to put your custom scripts

Storage System Custom Scripts

system-poststart.sh