QuantaStor Troubleshooting Guide

From OSNEXUS Online Documentation Site
Revision as of 12:49, 7 March 2013 by Qadmin (Talk | contribs)

Jump to: navigation, search

Login & Upgrade Issues

New packages are not advertised in QuantaStor web management interface's Upgrade Manager

If you've got an old version of QuantaStor and the new packages are not showing up in the Upgrade Manager then often times we find that the network gateway IP address is not set correctly. The easiest way to verify this is to login at the console for your QuantaStor system and then use ping to ping a well known address like www.google.com. If it is not reachable then likely you don't have the gateway configured on any of your network ports or you have the gateway configured on too many network ports. The other thing to check is to ping 8.8.8.8 which is a well known google DNS server. If it is reachable then your gateway is configured properly. If at the same time www.google.com is not reachable then your DNS settings need fixing. If you right-click on your storage system in the Storage Systems tree stack and then choose 'Modify Storage System' and that will let you set the correct DNS server(s) for your environment. The gateway setting is on a per port basis so you'll need to right-click on a network port in order to set that.

Cannot login to the QuantaStor web management interface

If you've seen an error like this, there are a few items to check to get it resolved. “Unable to connect to service, QuantaStor service may be down, or a network issue may be present.”

  • Make sure that you have cleared your browser cache. QuantaStor's web interface is cached by your browser and this can cause problems when you try to login after an upgrade unless you hit the 'Reload' button or 'F5' in your web browser to force it to reload the web management interface. Also in some instances where the internal web server has restarted you may also need to hit reload to clear your browser cache. This is always the best place to start as it is the easiest thing to check and resolves the issue 90% of the time.
  • Make sure that your last upgrade completed successfully. To do this you'll need to login at the console or via SSH and do the upgrade manually. Here are the commands to run:
sudo -i
apt-get -f install
apt-get update
apt-get install qstormanager qstorservice qstortomcat
  • Make sure that the Tomcat web service is running, or restart it. To do this you'll need to login at the console or via SSH.
sudo -i
service tomcat status
service tomcat restart
  • Make sure that the QuantaStor core service is running, or restart it. To do this you'll need to login at the console or via SSH.
sudo -i
service quantastor status
service quantastor restart
  • If the above hasn't resolved the issue try clearing your browser cache, restart your browser, and then hit the reload button after entering the URL of your QuantaStor system.

Missing objects in web management interface

This is due to the web management interface being at a different version than the core service. To resolve this you'll need to login at the console / SSH interface using the 'qadmin' account and then run the following commands:

sudo -i
apt-get -f install
apt-get update
apt-get install qstormanager qstorservice qstortomcat

Be sure to give the service about 30 seconds to start up.

Resetting the admin password

If you forget the admin password you can reset it by logging into the system via the console or via SSH and then run these commands:

sudo -i
cd /opt/osnexus/quantastor/bin
service quantastor stop
./qs_service --reset-password=newpass
service quantastor start

In the above example the new password for the system is set to 'newpass' but you can change that to anything of your choice. After the service starts up you can verify that the password is correct by running this command:

qs sys-get --service=localhost,admin,newpass

You'll need to wait a minute for the service to startup before running the sys-get command but if it succeeds then you've successfully reset the password.

Resetting all the security configuration data

A more brute force way of removing all user accounts and recreating the admin account with the default password ('password') is to do a full --security-reset like so:

sudo -i
cd /opt/osnexus/quantastor/bin
service quantastor stop
./qs_service --reset-security
service quantastor start

CLI login works, Web Manager login fails

Most of the time this is due to the tomcat service being stopped, needing to be restarted, or the web browser cache needing to be flushed (F5). But in some rare instances we've seen the loopback / localhost IP address incorrectly setup. First, type ifconfig and verify that the lo loopback interface exists. If not then you'll need to edit /etc/network/interfaces to put the default lo back in. If it is setup, then you'll need to make sure it has the correct 127.0.0.1 IP address on it. To test this ping localhost and run ifconfig:

ping localhost
ifconfig lo

If you see different IP addresses for lo and localhost then the web servlet in Tomcat won't be able to communicate with the core service. To fix this you'll need to adjust the configuration of /etc/hosts so that there is an entry that reads like so:

127.0.0.1       localhost

Here's the full contents of the hosts file from one of our v3 systems:

127.0.0.1       localhost
127.0.1.1       QSDGRID2-node008

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Once you have the localhost entry set correctly you should be able to login via the web interface again.

Don't use the Ubuntu 'apt-get upgrade' command

With Quantastor you do not want to use the command "apt-get upgrade" or the dist-upgrade. This is because QuantaStor uses a custom kernel based off of a specific version of Ubuntu. Calling apt-get upgrade will result in Ubuntu trying to update to a newer version of the standard kernel which does not include the custom SCSI target drivers.

To keep our software up to date only apt-get update and apt-get install <quantastor packages> need to be called. The four quantastor packages are qstormanager, qstorservice, qstortomcat, and qstortarget. The safest way to do this is through the web interface using the Upgrade Manager but you can also do: apt-get update apt-get install qstormanager qstorservice qstortarget qstortomcat

What to do when you upgrade on accident

The easiest way is to reinstall quantastor. After the install is complete you can go into the Recovery Manager in the web interface. This will recover the metadata and the network configuration, and the pools will auto recover.

Driver Issues

Custom Drivers

The underlying Ubuntu distribution used by QuantaStor v3 is Ubuntu 12.04.1 and we use a 3.5.5 linux kernel. In some cases the drivers included with the kernel we use are not new enough for your hardware and in such cases we recommend downloading and building a the latest driver from the manufacturers web site. If you have questions regarding a particular piece of hardware just send us email to support@osnexus.com, we may have a driver for your hardware already built which you can use to patch your system.

Chelsio T4 (w/ VMware)

We have heard of problems with Jumbo frames with Chelsio T4 controllers with MTU set to 9000 with VMware. If you see this try changing it to MTU 9216 in QuantaStor and MTU 9000 in VMware.

Solarflare NICs

We have seen some problems with these NICs with QuantaStor v2.x which uses the older 2.6 kernel. If you're using a Solarflare NIC be sure to start with QuantaStor v3.x.

Common Issues

Storage pool creation fails at 16%

Many motherboards include onboard RAID support which in some cases can conflict with the software raid mechanism QuantaStor utilizes. There's an easy fix for this, simply remove the driver using these two commands after logging in via the console as 'qadmin':

sudo apt-get remove dmraid
sudo update-initramfs -u

Here are a couple of articles that go into the problem in more detail here and here.

The two commands noted above removes the dmraid driver that linux utilizes to communicates with the RAID chipset in your BIOS. Once removed the devices will no longer be locked down so the software RAID mechanism we utilize (mdadm) is then unable to use the disks.


Created hardware unit but disks are not visible

If you see a little gear icon on the disk then it has been detected as a boot drive which cannot be used for storage pool creation. In the RAID controller boot BIOS you can reconfigure this so that your RAID unit for storage pool creation is no longer tagged as bootable. QuantaStor does this check to ensure that you cannot inadvertently create a storage pool out of your boot drive and which would reformat it.

Create storage pool using an available disk does not work

You've got a disk you select it to create a storage pool but the pool creation fails part way through. Typically this is because the disk has a partition on it or is marked as a LVM physical volume. If it has LVM information on the disk you'll need to use the pvremove command on the disk to clear it. If the disk has a partition on it, you'll need to remove the partition before QuantaStor can use it. QuantaStor does these checks to ensure that you do not inadvertently overwrite data on a disk that is being used for some other purpose.

Deleting Partitions

If you are unable to create a storage pool because the device has prior partitions on it there is a script you can execute. The script can be found at /opt/osnexus/quantastor/bin/qs_dpart.sh and takes in the device as a command line argument.

Here is an example of how to clear the partitions on /dev/sdb:

/opt/osnexus/quantastor/bin/qs_dpart.sh /dev/sdb

Note: make sure you do not run this script on the boot device

Librato Metrics Configuration Troubleshooting

If you've setup Librato metrics and are not seeing information show up in your Librato account the first thing to do is to verify your network settings. To do so you'll need to login to your QuantaStor appliance at the console or via SSH as qadmin. After you login run this command to see if you can ping the Google web site, currently the Metrics web site doesn't respond to ping:

ping www.google.com

If the ping is unsuccessful then the appliance has a bad gateway or DNS configuration. The next step would be to ping Google's DNS server like so:

ping 8.8.8.8

If the ping is successful then the appliance's gateway settings are good but the DNS settings are bad. If the ping is unsuccessful then the appliance's gateway settings are bad and the DNS may also be bad. For more information on setting the DNS settings for your system please see the Storage System Modify documentation here.

Gateway settings are configured per network adapter and we recommend that you only configure the gateway on a single network interface.

If both of those are working fine then please contact OS NEXUS (support@osnexus.com) or Librato support.

XenServer Troubleshooting

Verify Network Configuration

Often times XenServer issues can be traced to network issues so anytime you're having trouble accessing your storage it's best to start by doing a 'ping' test from each of your XenServer hosts to each of your QuantaStor systems. To do this just bring up the console window on each of your Dom0 XenServer hosts and then type 'ping <ipaddress of quantastor box>' for example, if you have a QuantaStor system at 192.168.10.10 you would type 'ping 192.168.10.10'. If it says 'destination host unreachable' then you have a network configuration issue. The issue may be with the VLAN configuration in your switch or it may be that the QuantaStor system and XenServer host are on separate networks. Be sure to review your subnet mask (eg 255.255.255.0) and the IP addresses for both. Once corrected try the ping test again. Once you can successfully ping you're ready for the next level of checking.

Verify Storage Volume Assignment

Another common mistake is to forget to assign all the storage volumes to all the XenServer hosts. If you're having trouble connecting a specific host that should be the first thing to check. To do this, go to the Hosts section within the QuantaStor Manager web interface, select the host that should have the volume and then make sure the volume is in the tree list off the host object. If you don't see it there, right-click on the host and assign the missing volume to the host.

Verify XenServer iSCSI Service Configuration

If you're still not able to successfully 'Repair Storage Repository..' in XenServer the next step is to try some low level iSCSI commands from the XenServer host that's not able to connect. The most basic of these is to do an iSCSI discover:

iscsiadm  -m discovery -t st -p <quantastor-ipaddress>

For example you 'iscsiadm -m discovery -t st -p 192.168.10.10'. If you don't see the target list come back from the storage system and the list is blank then you've got a storage assignment issue. It could be that you have the incorrect iqn assigned to the host or some typo in it. Re-verify the IQN for the XenServer host and make sure that your QuantaStor system has the volume(s) assigned to the correct IQN. If you get an error back like "iscsiadm: Could not scan /sys/class/iscsi_transport" or the iSCSI service is not started then you'll want to try restarting the iSCSI service on the XenServer host.

/etc/init.d/open-iscsi stop
/etc/init.d/open-iscsi start

You'll also want to look at the file '/etc/iscsi/initiatorname.iscsi' If that file is missing, that's a problem. You'll need to create that file to have contents that look something like this:

InitiatorName=iqn.2012-01.com.example:ee46dcfc
InitiatorAlias=osn-prod1

Note you will want to change the part 'ee46dcfc' to have different letters and numbers so that you have a unique IQN for the host. The host may already have one assigned which you can find in the 'General' tab within XenCenter. Take that IQN and replace the one above with it and the InitiatorAlias should be the name of your XenServer host. Once you have that file in place, try stopping and starting the open-iscsi service as noted above. Then try the iscsiadm discovery command again as noted above.

At this point you should be able to see a list of iSCSI targets from your QuantaStor system. If you're working on repairing an existing SR, try repairing it again.

Manual iSCSI Login Test

If the above is working but you still cannot connect to your iSCSI target in the QuantaStor system then you should try using the iSCSI utility to manually login to the target. An example of that is like this:

iscsiadm   --mode node  --targetname "iqn.2009-10.com.osnexus:testvol01"  -p 192.168.10.10:3260 --login

Once connected you can do a session list:

iscsiadm  --mode session -P 1

At this point try repairing or creating the SR again as needed.

CHAP iSCSI Login Test

If you're still having login issues, it may be CHAP related. If you're using CHAP authentication we've seen situations where XenServer gets confused. One way we've seen to resolve this is to manually set the CHAP settings for a given target where 'someuser' and 'secretpassword' are replaced with the CHAP username and password you've assigned to the storage volume:

iscsiadm -m node --targetname "iqn.2009-10.com.osnexus:testvol01" --portal "192.168.10.10:3260" --op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm -m node --targetname "iqn.2009-10.com.osnexus:testvol01" --portal "192.168.10.10:3260" --op=update --name node.session.auth.username --value=someuser
iscsiadm -m node --targetname "iqn.2009-10.com.osnexus:testvol01" --portal "192.168.10.10:3260" --op=update --name node.session.auth.password --value=secretpassword 

Once set you can try doing a manual login as noted above or try doing the SR repair again.

iscsiadm   --mode node  --targetname "iqn.2009-10.com.osnexus:testvol01"  -p 192.168.10.10:3260 --login

CHAP iscsid.conf Configuration

If you're editing your /etc/iscsi/iscsid.conf file so that you can have the CHAP settings set automatically you'll want to add these entries:

node.session.auth.authmethod = CHAP
node.session.auth.username = someuser
node.session.auth.password = secretpassword


After you have your iscsid.conf file configured you'll need to restart the initiator service, re-run the discovery, and then login. Re-running the discovery command is important as it seems to clear out stale information about the previous CHAP configuration settings.

service open-iscsi restart
iscsiadm -m discovery -t st -p 192.168.0.116
iscsiadm --mode node --targetname "iqn.2009-10.com.osnexus:58d91bf3-d3525d0558d8e704:asdf" -p 192.168.0.116:3260 --login