Scale-out NAS Administrators Guide

From OSNEXUS Wiki
Jump to: navigation, search

Scale-out NAS on QuantaStor with GlusterFS

QuantaStor provides scale-out NAS capabilities via standard protocols (NFS/CIFS) where network shares are presented as a Single Namespace spanning all appliances, which work together to deliver a Network Share. Clients can then access the storage by connecting to any/all of the appliances in the configuration.

This is different from scale-up systems like traditional storage filers, which have a separate Namespace for each appliance. Because each appliance has a separate Namespace, it is difficult to scale those systems up to multiple petabytes of storage.

The following diagram shows how QuantaStor appliances can be combined together to deliver scale-out NAS storage to a client hosts over traditional protocols like NFS/SMB or using the native GlusterFS client as well.

A scale-out NAS Cluster using QuantaStor

The GlusterFS native client generally provides better performance over traditional NFS or SMB/CIFS protocols, particularly in cases where the Gluster Cluster consists of more than 5 Appliances. The GlusterFS client enables client systems to communicate directly with the correct appliance for all read/write activity. By contrast, the QuantaStor appliance must act as both the client and server When accessing scale-out NAS shares via NFS/SMB, the QuantaStor appliance is acting as both a client and a server at the same time, which can lead to network bottlenecks if sufficient bandwidth is unavailable.

Use Case Scenarios for Scale-out NAS with GlusterFS

Scale-out NAS using GlusterFS technology is great for unstructured data, archive, media, and many big-data use cases. It does not provide very good I/O performance, making it unsuitable for database and virtual machine workflows. GlusterFS read/write performance via CIFS/NFS is moderate (1GB/sec reads, 600MB/sec writes with 3 nodes) but performance scales best with the native GlusterFS client. Note that the native client is only available on Linux based platforms.

Good Use Cases Poor Use Cases
  • Large-scale Media Archive
  • Large-scale Unstructured Data Repository
  • Write-once / read-many sequential IO workflows
  • Virtual Machine virtual disk devices
  • Databases / high IOPS workloads

Before Getting Started

Gluster Network Considerations

The Gluster scale-out NAS feature can be configured using IP addresses and hostnames available on Virtual Network Interfaces (e.g. eth0:1, eth1:2, etc.) in addition to normal Physical Network interfaces (eth0, eth1, bond0, etc.). Details on how to configure a Virtual Network Interface are available in WebUI help:Virtual Network Interface

10GBe or better network interfaces are recommended for Gluster inter-node communication. For best stability, performance and access, all Gluster scale-out NAS peers should be configured to be on the same network subnet.

You have a number of options for tuning the network setup for Gluster. If you plan to use the native GlusterFS client via Linux servers that will be connecting directly to QuantaStor nodes then you should setup network bonding to bind multiple network ports on each appliance to provide additional bandwidth and automatic fail-over in the event a network cable is pulled. If you plan to use CIFS/NFS as the primary protocols for accessing your storage then you could use bonding or you could separate your ports into a front-end network for clients to access and a back-end network for inter-communication between the nodes. When in doubt start with a simple configuration like LACP bonded ports but ideally you'll want an expert to review your configuration before it goes into production. Getting the networking setup correctly is important for long term reliability and optimal performance so be sure to review your configuration with your reseller to make sure it's ideal for your needs.

DNS / Hostname resolution for Gluster

Client NFS/CIFS network traffic should be separated from the node-to-node Gluster communication network traffic using separate Front-End (client access) and Back-End (Gluster cluster) networks. In this configuration, Gluster Peers will need to be setup using the back-end network IP addresses and the front-end network IP addresses of each appliance should be setup in your DNS server so that client communication does not use the back-end network.

Configurations using the GlusterFS native client do not need to worry about network separation as the GlusterFS native client will ensure clients connect directly to the correct node(s) to read/write data, eliminating extra hops in the I/O path.

Storage Configuration

QuantaStor should be installed onto a mirrored boot/system device or onto a SATA-DOM which is represented by the upper box in the diagram below. The hardware RAID5 units should be created using 5x disks at a time until all available disks have been assigned, with 1 or more drives reserved as hot-spares.

While this is a guideline and other layouts can be used, our testing has identified that this configuration provides the best combination of fault-tolerance and performance for most workloads. Other layouts can be used depending on the demands of the workload, such as RAID10 or RAID6.

Osn gluster storageconfig.png

Setup and Configuration

Minimum Hardware Requirements

Qs gluster minreq.png
  • 3x QuantaStor storage appliances (up to 64x appliances)
  • Per Appliance Minimums
2+ Network Interface Ports
5x HDDs for data storage (Enterprise SATA or SAS)
1x hardware RAID controller per appliance with super-capacitor protected NVRAM
2x 100GB or larger SDD/HDDs (for QuantaStor mirrored boot/system disk)

A QuantaStor Scale-out NAS solution using GlusterFS requires at least 3 systems. The GlusterFS Cluster can contain up to 64 nodes, but 3 is the minimum number required in order to maintain parity and quorum in the event of a node failure.

RAID5 data storage pools are highly recommended for GlusterFS for redundancy and striping. Additionally, a RAID controller with battery or sup-cap protected cache is essential for preventing data corruption in the event of a system outage.

System Installation

  • Install QuantaStor on all appliances
  • Login to the QuantaStor WebUI on each appliance and add your license keys (A unique key is required for each appliance)

Initial Host Configuration

Following installation of QuantaStor, perform initial host configuration.

  • Right-click on the storage system, choose 'Modify Storage System..'
    • Ensure a unique host name for each for appliance (default is 'quantastor')
    • Set the DNS Server IP addresses for your local network (eg: 8.8.8.8)
    • Set the NTP server IP addresses

Qs-ui-storage-management.png

Networking

Ensure each node has a unique hostname, setup static IP addresses (DHCP is the default and should only be used for initial setup).

  • Expand the System Storage drawer and select the Network Ports tab
  • Right click on each of the network interfaces and setup static IP addresses

Note that each Network Interface should be on a separate network. Combining client access (NFS/CIFS) and the Gluster back-end traffic on a single network is not optimal and can cause bottlenecks. This is not a concern if you are deploying with GlusterFS native clients on Linux.

Example:

Node1 - eth0=10.0.13.101 / 255.255.0.0  eth1=192.168.113.101 / 255.255.0.0
Node2 - eth0=10.0.13.102 / 255.255.0.0  eth1=192.168.113.102 / 255.255.0.0

You will need to adjust your browser's URL to the new address after modification.

Create a QuantaStor Appliance Grid

Once the host and network configurations have been confirmed, proceed to creating a QuantaStor grid and add all nodes that will be a part of the GlusterFS Cluster. Grid creation takes less than a minute and the buttons to create the grid and add a second node to the grid are in the ribbon bar. QuantaStor appliances can only be members of a single grid at a time but up to 64 appliances can be added to a grid. Details on this procedure are available in the QuantaStor Admin Guide Grid Setup Procedure section detailed here.

Create the grid and then proceed to add the remaining nodes until all systems have been added.

Qs-ui-glustergrid-complete.png

Create the Storage Pools

If you have not done so already, create your RAID5 Logical Units now.

  • Expand the Hardware Enclosures & Controllers drawer
  • Right-click on the Controller and use the interface to create your RAID5 LUNs

You may need to initiate a new device scan before your LUNs are discovered. If you do not see your LUNs appear under the Physical Disks drawer, or they are not listed during Pool Creation, initiate a device scan by expanding the Physical Disks drawer and right-clicking in the center panel or using the "Scan" button in the Physical Disks section of the ribbon bar.

  • Create a XFS Storage Pool using the Create Storage Pool dialog for each Physical Disk that comes from the hardware RAID controllers.
  • Note that GlusterFS will not work with ZFS pools. Be sure to select XFS.

Qs-ui-create-storagepool.png

Qs-ui-create-xfs-storagepool.png

Once all the XFS Storage Pools have been created you can proceed to GlusterFS configuration.

Setting Up Gluster

This segment of the configuration section will deal with the steps necessary to establish your Gluster configuration.

About Gluster Peers

Setting up QuantaStor appliances into a grid allows them to intercommunicate but it doesn't automatically setup the GlusterFS peer relationships between the appliances. To link the appliances to enable scale-out NAS functionality select 'Peer Setup' from the toolbar and select which nodes to be setup as a peer group / Gluster cluster.

Peer Setup will setup the /etc/hosts file on each appliance so that each node can address the other nodes in the grid by hostname. This can also be accomplished using DNS as well, but by appending these entries to the /etc/hosts file QuantaStor can ensure node name resolution even if DNS service is interrupted. This is essential because Gluster volumes span appliances and a Gluster Brick is placed on each Appliance that the Gluster volume spans across.

These gluster bricks are referenced with a brick path that looks much like a URL for a web page. By setting up the IP to hostname mappings QuantaStor is able to create brick paths using hostnames rather than IP addresses and this makes it much easier to change the IP address of a node in the future.

Finally, in the Peer Setup dialog, there's a check box to setup the Gluster Peer relationships. This does a series of 'gluster peer probe' commands to link the nodes together so that gluster volumes can be created across the appliances. Once the peers are attached you'll see them appear in the Gluster Peers section of the WUI. Once that's done you can begin provisioning Gluster Volumes. Alternatively you can add the peers one at a time using the Peer Attach dialog like so.

Setting Up Gluster Peers

  • Select the Scale-out File Storage tab and choose 'Peer Setup' from the toolbar.
  • In the dialog select the Systems to include in the Gluster configuration
  • Select the IP each system should use for Gluster intercommunication
  • Check the box for Autoconfigure Gluster Peer Connections.
It will take a minute for all the connections to appear in the Gluster Peers section.

Qs-ui-scaleout-gluster-ribbon-tab peer pointer.png

Qs-ui-gluster-peer-setup.png

Note that the IPs selected for Gluster communication should all be in the same network. Additionally, if you are deploying the GlusterFS native client in your environment that the clients will use the Gluster IP addresses to communicate as well.

Attaching Additional Peers

Attaching additional peers to expand later can be done by clicking on the Peer Attach button in the Scale-out File Storage tab's ribbon bar.

Qs-ui-scaleout-gluster-ribbon-attach-peer.png

It is recommend that the 'Hostname' option be used when attaching a Gluster peer instead of IP address. This will allow for network changes to occur on the Gluster Peers, without having to tear down and reconfigure the Gluster peer connections and any Gluster Volumes associated with those Gluster Peers.

Qs-ui-peer-attach.png

Back-end Network Configuration

As mentioned in the previous section, the Peer Setup process automatically configures the hostname resolution files (/etc/hosts) to map the name of each appliance to the selected network IPs for each appliance. This provides local name resolution similar to DNS but iis maintained by QuantaStor and automatically updated if one of the selected IP addresses changes.

Example of QuantaStor Gluster Peer Config in /etc/hosts

The following shows an example of what the entries look like in the /etc/hosts file on each QuantaStor Appliance after Peer Setup is complete. Note that this is provided for informational purposes to provide deeper insights as one configures the DNS entries for QuantaStor appliances. There is no need to login to view or edit the /etc/hosts configuration files as these are configured and maintained automatically by QuantaStor to facilitate back-end communication by hostname.

## START-QUANTASTOR-HOSTS-CONFIG-SECTION ##
10.0.13.73 glusterdemo-03 # cdb857fe-7c65-bc13-c583-78ee57a879ea, eth0
10.0.13.71 glusterdemo-01 # 49e848f9-0bd7-cb19-9c2c-69e7f8fc43b7, eth0
10.0.13.72 glusterdemo-02 # ced7f5fa-deee-4bdb-af3b-1ca3ffc7ea12, eth0
## END-QUANTASTOR-HOSTS-CONFIG-SECTION ##


Provisioning Gluster Volumes

Now that the peers are all connected we can provision scale-out NAS shares by using the Create Gluster Volume dialog. Gluster Volumes also appear as Network Shares in the Network Shares section and can be further configured to apply CIFS/NFS specific settings.

For a Gluster Volume to be made highly-available it must be provisioned with a replica count of (2) two or more or with erasure coding. A Gluster Volume with a single replica (1 copy of the data) is fault-tolerant to disk failures when used with hardware RAID but in the event of an appliance outage some portion of the data will not be available for reads. As such we recommend using a replica count of 2 for most deployments. In this way when an appliance is disabled or turned off there is still another appliance with a copy of the files that can serve the necessary read/write requests.

Gluster Volumes are provisioned from the 'Gluster Management' tab in the web user interface. To make a new Gluster Volume simply right-click on the Gluster Volumes section or choose Create Gluster Volume from the tool bar.

Qs-ui-scaleout-gluster-ribbon-create volume.png

Qs-ui-dialog create gluster volume.png

About Data Redundancy Options

If you are not sure what your needs are, select Replica 2 (Mirrored)

  • Replica
Replica can be used to make your Gluster Volume highly-available with two or three copies of every file, distributed across the peers. With a replica count of two (2) you have full read/write access to your scale-out Network Share even if one of the appliance is turned off.
If you only need fault tolerance in case of a disk failure you can use a replica count of one (1). With a replica count of (1) you will lose read access to some of your data in the event that one of the appliances become unavailable (crash, maintenance, etc). The node will automatically synchronize with the other nodes to bring itself up to the proper current state via auto-healing when it comes back online.
  • Disperse
Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time.
The trade-off to this configuration is that the underlying brick storage does not store files, meaning all access to data must occur through the Gluster volume translator.

Qs-ui-gluster-vol-created.png

Once the Gluster Volume has been created, move on to setting up client access to the Volume in the next section.


Gluster Volume Client Access

Client Access via SMB/CIFS & NFS

A Network Share will appear in the Network Shares section of the Web Management interface for each Gluster volume that is created. From there you manage the Gluster Volume as a network share just as you would standard single pool network file shares. QuantaStor automatically takes care of synchronizing the configuration changes across nodes automatically to provide CIFS/NFS access across all nodes which a given gluster volume spans. For example, if you have a grid with 5 nodes (A, B, C, D, E) and you have a Gluster Volume which spans nodes A & B then your CIFS/NFS access to the Gluster Volume will only be provided and accessible via nodes A & B.

Scale-out NAS Network Shares can be accessed via major NAS protocols including SMB2.1, SMB3, NFSv3 and NFSv4. This approach is limited by the requirement that clients connect to a single IP address due to the point-to-point nature of the protocols.

A NFS/CIFS Share is created for the Gluster Volume automatically and can be accessed via any of the server IP addresses.

Making NFS/CIFS Client Access Highly Available

QuantaStor provides the ability to create a Highly Available Virtual IP address using Site Clustering that can then be supplied for clients to use to connect. If the system the Virtual IP address is located on experiences any issues, QuantaStor will fail the Virtual IP over to another peer, returning access to the data automatically for clients, eliminating the need to reconfigure connection protocols.

To utilize this feature, the first step iscreation of a Site Cluster and a Virtual Network Interface. Use the following steps to create a Site Cluster and a Virtual Network Interface. Once configured, the virtual IP will move to another appliance which is serving the Gluster Volume automatically in the event the appliance is restarted, taken offline, or has a hardware failure. Multiple Virtual Interfaces can be created.

Site Cluster Creation

A Site Cluster and Cluster Heartbeat Ring must be created first. Please see QuantaStor Admin Guide: Site Clusters for a more detailed explanation of Site Cluster setup and configuration.

Qs-ui-ha-ribbon-create-site-cluster.png

  • Select the High-Availability tab in the title bar and then click Create Site Cluster on the ribbon bar

Qs ha create site clus.png

  • Enter a Name, and select the Network Interfaces that will provide your clients with access to the share(s)

After selecting OK the Site Cluster will be created and the first heartbeat ring. As explained in the QuantaStor Admin Guide: Site Clusters section, configuring a second heartbeat ring is highly recommended.

Create the Virtual Network Interface

Once the Site Cluster has been created, the Gluster High Availability Virtual Interface can be created.

Qs-ui-scaleout-gluster-ribbon-createinterface.png

  • Begin by clicking on Add Interface under the Gluster HA VF section of the ribbon bar, or by right clicking on the Gluster Volume.

Qs-ui-create-high-availability-virtual-interface.png

  • Select the appropriate Gluster Volume
  • Enter the IP Address and Subnet Mask values
This is the IP Address clients will use to connect to the Gluster Volume Share.
  • Gateway is option and only required if clients will be routing across networks
  • Set the designed Storage System to bring the Virtual Network Interface up on initially
  • Select the Network Port on which the Virtual IP address should be activated
Note that all nodes must have the same network on this port.
For example, if the Virtual IP address is 10.0.13.71/16 (255.255.0.0) and brought up on eth0 on node 1, QuantaStor will attempt to bring that IP up on failover nodes on eth0 as well. All nodes as a result must have the same network attachment to ethernet port assignments across all peers for the Gluster Volume.

Qs-ui-gluster-vif-created.png

The Virtual Interface will be visible under the expanded Gluster Volume once created.


Client Access via native GlusterFS client

For scenarios where you have a Linux based host that will be connecting to the Gluster Volume you can install the native GlusterFS client package on your server so that you can connect to your Gluster Volume using the high-performance native client. The most recent versions of the GlusterFS client can be downloaded from the Gluster community web site here.

For Debian/Ubuntu based systems the process of installing the client and connecting to the Gluster Volume looks like this where in this example qs-gfs1 is the name of one of the servers which is serving up a brick of the volume and gvol06 is the name of the volume.

Gluster7.png

sudo apt-get install glusterfs-client
sudo mkdir -p /gvols/gvol06
sudo mount -t glusterfs qs-gfs1:/gvol06 /gvols/gvol06

Installation procedures vary by platform. An article with excellent examples and details on Cluster Client configuration can be found here: http://www.jamescoyle.net/how-to/439-mount-a-glusterfs-volume

See the Gluster Community Documentation for details on installing the client for your Linux system here: http://www.gluster.org/community/documentation/index.php/Getting_started_install

Additional Details

Snapshots & Quotas

At this time we do not provide support for snapshots and quotas of Gluster volumes. That said, when used with ZFS based Storage Pools QuantaStor allocates the Gluster bricks as filesystems so that we can provide more advanced functionality like snapshot & quotas in a future release.

Details Regarding High-Availability for NFS/SMB Access

If you are using the native Gluster Client from a Linux server there is no additional steps required to make a volume highly-available as it will communicate with the server nodes to get the peer status information and will then communicate directly with each appliance. But, when accessing Gluster Volumes via NFS and SMB additional steps are required to make the storage highly available. This is because SMB and NFS clients communicate with a specific IP address associated with a single appliance at a time. If the appliance serving storage to an NFS client is turned off then the IP address that was used by the client must "float" to another node to ensure continued access to storage on that interface. This is precisely what QuantaStor does and it provides this capability by allowing you to create virtual network interfaces for your Gluster Volumes which will float to another node automatically to maintain high-availability access to your storage via NFS/SMB.

Gluster Physical front end and back-end with HA VIF example

Qs gluster networking physical.png

In the above diagram a HA Virtual Interface is show attached to Appliance QSTOR-C. In the event that the appliance goes offline the HA Virtual Interface will be moved to Appliance QSTOR-A or QSTOR-B. HA Virtual Interfaces are assigned IP addresses (eg. 10.30.0.99) just like physical network interfaces like eth0 except that being virtual they attach to a physical interface (eth0 in this example) and can be moved by a failover event such as a node going offline or being shutdown for maintenance. As shown the QuantaStor nodes define their hosts entries to map the QuantaStor nodes hostnames to IP addresses on the back-end network, where the Gluster Native Clients map their /etc/hosts entries for the QuantaStor nodes to IP addresses on the Front-end network. This configuration will split the back-end gluster communication from that of the front-end client network, ensuring that back-end gluster operations such as self-heal replication or file access does not affect network bandwidth for client access.

Gluster Virtual front-end and Physical back-end with HA VIF example

Qs gluster networking virtual.png

The above diagram deviates from the previous physical example by using Virtual network Interfaces eth1:1 for the front-end ports used to share the connections to the client. The HA Virtual Interface is shown attached to Appliance QSTOR-C as Virtual interface eth1:gvNNNN. In the event that the appliance goes offline the HA Virtual Interface will be moved to QuantaStor appliance QSTOR-A or QSTOR-B. HA Virtual Interfaces are assigned IP addresses (eg. 10.30.0.99) just like physical network interfaces like eth1 except that being virtual they attach to a physical interface (eth1 in this example) and can be moved by a failover event such as a node going offline or being shutdown for maintenance.

As shown the QuantaStor nodes define their hosts entries to map the QuantaStor nodes hostnames to IP addresses on the back-end network, where the Gluster Native Clients map their /etc/hosts entries for the QuantaStor nodes to IP addresses on the Front-end network. This configuration will split the back-end gluster communication from that of the front-end client network, ensuring that back-end gluster operations such as self-heal replication or file access does not affect network bandwidth for client access.


Expanding Gluster Volumes

By Adding Bricks onto additional Storage Pools

Volumes can be expanded by adding more bricks to an existing volume. The bricks should always be on separate appliances so if you have a volume with two bricks on qs-gfs1 and qs-gfs3 respectively you can safely expand the volume by creating additional bricks on storage pools on appliances qs-gfs2 and qs-gfs4.

Gluster8.png

By Expanding Storage Pools

The other way you can provide additional storage to your Gluster Volumes is by expanding the Storage Pools where the bricks reside. ZFS based Storage Pools provide the capability for online expansion with zero downtime and they can be expanded by concatenating additional storage. As an example, you could have a 128TB Storage Pool with 32x 4TB drives which can be expanded to 256TB by adding a disk expansion unit with another 128TB. No changes need to be made to the Gluster volumes or bricks for them to use the additional space once the pool has been expanded. Note that all the storage pools containing bricks of your Gluster volumes must be expanded evenly.

Gluster File Locking Support

File Locking

The conclusions we found for file locking were from running the tests script provided below. Each scenario was tested in both directions. For example, the CIFS and Gluster Native test was done in the following manor. The CIFS session would take the lock on the file. We would then verify that the lock was unable to be taken by the Gluster Native session. After this was verified, we would kill the lock on the CIFS session and see the Gluster Native session take the lock. We would then rerun the test on the CIFS side to verify that it now fails to take the lock. After that has been verified we would kill the lock on the Gluster Native side and make sure that the CIFS session then obtained the lock. This process was done for every scenario in the table below.

Gluster Lock Testing
Client A Client B Supports Locking
Gluster Native Yes
Gluster Native Gluster Native Yes
Gluster Native NFS Yes
Gluster Native CIFS Yes
NFS Yes
NFS NFS Yes
NFS CIFS Yes
CIFS Yes
CIFS CIFS Yes
  • If you are using CIFS on top of the gluster volume, you will want to make sure you have "oplocks = no" in either your share definition, or your global definitions in your "/etc/samba/smb.conf"

Considerations with Gluster and File Locking

Locking Is Instant. But propagation of file content changes are not instant. As such Client B can open the file and see content in some cases even though Client A truncated the file before unlocking it. (example below)

GlusterPropagation.png

Locking Test Scripts

Our locking tests were done with the following python scripts:

https://s3.amazonaws.com/qstor-downloads/wiki/lockTest.tgz

After unpacking the file, make sure to give yourself read and write access to the file "test.txt" included with lockTest.tgz

Note for Windows testing:

  • You will also need to install pywin32

http://sourceforge.net/projects/pywin32/