Cloudera(tm) Hadoop Integration Guide

From OSNEXUS Online Documentation Site
Revision as of 16:58, 7 October 2013 by Qadmin (Talk | contribs)

Jump to: navigation, search

This integration guide is focused on configuring your QuantaStor storage grid as Cloudera® Hadoop™ data cluster for better Hadoop performance with less hardware. Note that you can still use your QuantaStor system normally as SAN/NAS appliance with the Hadoop services installed.

Setting up Cloudera® Hadoop™ within your QuantaStor® storage appliance is very similar to the steps you would use with a standard Ubuntu™ Precise server as this is the Linux distribution upon which QuantaStor v3 builds. However there are some important differences and this guide covers all that. Also note that if you're following instructions from the Cloudera web site for steps outside of this how-to be sure refer to the sections regarding Ubuntu Precise (v12.04).

The Hadoop installation procedure begins by invoking an installation script, which is included with the current version of QuantaStor®. The installation proceeds in stages, as illustrated by the following screen shots. Note that the illustrated example consists of a cluster of four QuantaStor® nodes, one of which is designated the "Hadoop Manager" node.

Note also that on the Hadoop Manager node, the Hadoop management functionality will impose additional resource costs on the system. For example, when testing the installation using QuantaStor® nodes based on Virtual Machines, the Manager node required at least 10 GB of memory.

To get started you'll need to login to your QuantaStor v3 storage appliance(s) using SSH or via the console. Note that all of the commands shown in the how-to guide steps below should be run as root so be sure to run 'sudo -i' to get super-user privileges before you begin.

Last, if you don't yet have a QuantaStor v3 storage appliance you can get the CD-ROM ISO and a license key here.

Step 1. Manager Installation Script

From the superuser command-line on the manager node (hostname "osn-grid2-mgr1" in this example), simply invoke the "hadoop-install" script. This executable is located in the /bin directory, but it is in the path so it can be run from anywhere.

$ hadoop-install 

This script will take you through a series of screens, simply accept the EULA dialogs and allow it to proceed. This stage can take 10-15 minutes, and will install the web server for the Hadoop™ Management interface on the Manager node. The script will end with instructions for browsing into that interface to begin the next stage of the installation.

See the following screen shot examples.


Figure 1-01
Figure 1-01


Figure 1-02
Figure 1-02


Figure 1-03
Figure 1-03


Figure 1-04
Figure 1-04


Figure 1-05
Figure 1-05


Figure 1-06
Figure 1-06


Figure 1-07
Figure 1-07


Step 1.1 Potential error, package dependency

Note that this step includes the installation of a Java package, using the apt-get utility. Successful installation includes the target node meeting the package dependency requirements. If this stage fails, and the log shows package dependency errors: - Return to the command line - Run 'apt-get -f install' - Respond with 'Y' (must be uppercase) and allow this to complete - Accept the PAM configuration modification screen, if offered - Retry the hadoop-install script

See the following two screens as examples:


Figure 1-08
Figure 1-08


Figure 1-09
Figure 1-09


Step 2. Initial Web-Based Installation

Follow the instructions seen on the last screen of the previous stage, for example:

Point your web browser to hppt://<hostname or IP>:7180/. Log in to the Cloudera Manager with 
the username and password set to 'admin' to continue installation

Log into the interface using the 'admin/admin' user/password. You will see the initial screen asking you which edition you wish to install.


Figure 2-01
Figure 2-01


Figure 2-02
Figure 2-02

For the purposes of this example, the option for a minimal "Standard" edition is shown throughout. On this screen, as on all subsequent screens, hit 'Continue' to move on to the next step.

The next screen allows you to ender the host addresses of the nodes onto which Hadoop is to be installed. In this example, four IP addresses are shown, corresponding to the nodes 'osn-grid2-mgr1', 'osn-grid2a', 'osn-grid2b', and 'osn-grid2c'.

By invoking the 'Seach' button, the installation will test those node/addresses, and return status which should indicate that the nodes are ready and available for installation. When complete, hit 'Continue'.


Figure 2-03
Figure 2-03


Figure 2-04
Figure 2-04


Figure 2-05
Figure 2-05


The next screen, "Cluster Installation Screen 1", offers some installation options, again in this example we are installing the bare minimum, where we select the base CDH package only. The screen after that, "Cluster Installation Screen 2", requests login options. In this case, we are using the 'root' user where all nodes have the same root password.


Figure 2-06
Figure 2-06


Figure 2-07
Figure 2-07


The next screen, "Cluster Installation Screen 3", shows the cluster installation progress.
NOTE: the "Abort" popup dialog, do NOT hit "OK" here, this will abort the installation. Simply get rid of the popup.


Figure 2-08
Figure 2-08


If all goes well, this should progress as shown in the following example figures, resulting in "Installation completed successfully".


Figure 2-09
Figure 2-09


Figure 2-10
Figure 2-10


Figure 2-11
Figure 2-11


Hit 'Continue' to move to the next phase.

Step 2.1. Potential error, package dependency


As in Step 1.1 above, this step includes using the apt-get utility to install a Java package on all the other (besides the manager node) nodes in the cluster, and the same potential for package dependency errors can cause the cluster installation to fail on the non-manager nodes, as shown in the example figure below.


Figure 2-12
Figure 2-12


To correct, log into each of the nodes that failed and perform the same steps as shown in Step 1.1. When this is complete, hit the 'Retry Failed Nodes' button on the screen.

Step 2.2 Potential error, heartbeat detection failure


If installation succeeds past the package installation, the last phase of this step consists of heartbeat tests, as shown in the example figure below.


Figure 2-13
Figure 2-13


Sometimes these all succeed on the first try, other times some or all of the nodes fail.


Figure 2-14
Figure 2-14


Our testing showed that hitting "Retry Failed Nodes" (sometimes 2-3 times) resulted in heartbeat detection success.


Figure 2-15
Figure 2-15


The next screen, "Cluster Installation Screen 4", shows the progress of the "Installing Selected Parcels" stage. This is quite a lengthy phase, taking 1/2 hour to an hour typically.


Figure 2-16
Figure 2-16


Figure 2-17
Figure 2-17


Hitting 'Continue' at the completion of that phase takes you to "Cluster Installation Screen 5", the "Inspect hosts for correctness" phase. This should result in a comprehensive report, as shown in the following example screens.


Figure 2-18
Figure 2-18


Figure 2-19
Figure 2-19

Step 3. Web-Based Installation, CDH4 Services


The screen allows you to select the services you wish installed. As before, in this example, we are showing the installation of only the basic core Hadoop service. Immediately following that screen is the screen for Database Setup, here we simple accept the default, hit 'Test Connection", and hit 'Continue' after this returns 'Success' as shown.


Figure 3-01
Figure 3-01


Figure 3-02
Figure 3-02


Figure 3-03
Figure 3-03


Figure 3-04
Figure 3-04




Summary

That's the basics of getting Hadoop running with QuantaStor. Now you're ready to deploy CDH and install components if you haven't done so already.

We are looking forward to automating some of the above steps and adding deeper integration features to monitor Hadoop this year and would appreciate any feedback on what you would most like to see. So if you have some ideas you'd like to share or would like to be a beta customer for new Hadoop integration features write us at support@osnexus.com or write me directly at steve (at) osnexus.com.

Thanks and Happy Hadooping!


Cloudera is a registered trademark of Cloudera Corporation, Hadoop is a registered trademark of the Apache Foundation, Ubuntu is a registered trademark of Canonical, and QuantaStor is a registered trademark of OS NEXUS Corporation.