Comments
yourfanat wrote: I am using another tool for Oracle developers - dbForge Studio for Oracle. This IDE has lots of usefull features, among them: oracle designer, code competion and formatter, query builder, debugger, profiler, erxport/import, reports and many others. The latest version supports Oracle 12C. More information here.
Cloud Expo on Google News
SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Thin-OSCAR: Diskless Clustering for All
Solving problems that arise with diskless cluster support

While OSCAR (Open Source Cluster Application Resource) has been conceived for clusters with disks since its very first version, diskless and systemless support was a feature that a lot of people expected. The Center for Scientific Computing has built several clusters without disks; we tested OSCAR and were easily convinced of its quality, especially when compared to our own homemade scripts. We decided to use OSCAR for our diskless cluster and then transfer our diskless expertise to the OSCAR project. The thin-OSCAR workgroup was created specifically to analyze and solve problems that arise while adding this diskless cluster support.

This article first defines essential notions for a diskless cluster. The actual implementation of thin-OSCAR is explored, along with a roadmap for the development of thin-OSCAR. Interactions with OSCAR are detailed so that features can be discussed, prioritized, and eventually added to the OSCAR framework.

Why Use Diskless Nodes?
Mainstream open source clustering distributions assume that nodes have disks, and they rely on them to put the OS on these disks. However, there are many motivations for people to remove disks from their clusters.

First, disks are useless for calculation, and they don't get you on the Top 500 .org list. Save your money and buy more nodes with it. In addition, a disk is a mechanical part that is subject to faillure; fewer parts means greater reliability.

In addition, consider the increased consistency across nodes. It's easier to manage one image than many individual installations. For example, if a package update occurs while a node is down, this node is not exactly the same as others, and homogeneity issues can rise.

As a consequence, nodes with disks are subject to greater entropy than diskless nodes. As soon as a diskless node is rebooted, it is an exact copy of the image that was sent to it. Today this argument is somehow softened by the existing multicast technology used to install nodes with disks. Still, multicast installation of diskless nodes is faster (diskless image size is smaller) and less error-prone (the only error cause is the network) than the automatic remote installation of nodes with disks.

Nodes with disks can be considered diskless when, for security or practical reasons, the disk cannot be used. This is the case, for instance, in a Grid environment in which cluster nodes are workstations during the day. It allows nodes that are not used in a cluster to be very quickly integrated into a cluster without any alteration of the main operating system stored on their disks.

Is Diskless Right for Your Application?
There are many diskless techniques, and one technique may fit your needs better than another. This section presents some useful notions as well as several existing technical solutions for diskless support.

Diskless Nodes
Diskless nodes are well named: they have no disk. The main consequence is that the node lacks the functionality to boot without the presence of a network mechanism to initiate the boot process and then provide necessary permanent storage over the network. This kind of node is generally found in dedicated diskless clusters.

There are limitations on the type of computation that can be made on this kind of node: intensive I/O applications can be executed on such nodes but will not scale well and will slow down each calculation, resulting in inefficiency.

Systemless Nodes
Systemless nodes have a file, a partition, or a complete disk dedicated to the cluster; they don't contain a disk bootloader and boot from the network.

Diskfull Nodes
Diskfull nodes have dedicated disks for the cluster. This is the only type of node currently managed by OSCAR without using the thin-OSCAR package. Node management can be done entirely within the OSCAR framework, including installation from scratch and hard-disk formatting.

Diskless and Systemless Techniques
Several solutions exist to boot a node without a disk. We don't discuss the different booting mechanisms (PXE, etherboot, floppy, etc.) here, but cover the solutions that allow nodes to be functional once they have the capacity to boot from the network.

Root-NFS Model
To the best of our knowledge, the Root-NFS model is the oldest solution. The NFS protocol is very robust and can be used, if the kernel supports it, to initialize and run a system directly from the network. It does not solve the first step (i.e., transferring the initial kernel), but over the years this method has proven its viability and reliability.

Its main drawback is a scaling problem that is common to all clusters that share files with the NFS protocol. /home is generally exported and used like a distributed resource among the nodes. As a consequence, if computations cause intensive I/O usage, the network will be exclusively used by the NFS protocol. The cluster can be paralyzed and can even crash the NFS server, depending on the configuration of the NFS server and on the quality of the NFS implementation.

A common solution to this problem is to use a dedicated network exclusively for the transport of information for permanent storage. However, this doesn't solve the inherent NFS problem - the NFS server is central and network load can't be distributed. While this problem wasn't very important in building small clusters, it's very important today as clusters are commonly built with more than 1,000 nodes.

Diskless clusters have the same problem, but it occurs on a smaller scale because NFS is more heavily used. The complete root file system of each node resides on the NFS server. Only /opt and /usr are common (and read only). As such, diskless nodes are not good candidates for large Root-NFS-based clusters.

Ramdisk Model
With this approach, a minimal ramdisk containing a barely complete functional Linux system is uploaded to the client. Its root device is in RAM. Once this is done, it can mount NFS partitions or any distributed filesystem partition in order to access programs and user data. It uses more RAM but reduces the pressure on the network and on the NFS server.

This approach is interesting because the transfer of the initial RAM disk can be multi-casted so that booting a cluster can be very fast. Another advantage is that under certain conditions the connection with the file server can be lost. In this case, this kind of diskless node will still be up and running correctly (as long as it doesn't need files from the file server - which is generally the case with scientific computation programs once loaded).

Single System Image Model
The Single System Image model is a whole new class of clustering that doesn't need to be diskless but supports diskless clusters naturally. The idea behind these implementations (e.g., kerrighed, scyld, bproc, openssi, openmosix) is to simplify both the administration and usage of clusters. The same idea is behind SMP computers: the whole cluster appears as a single resource with lots of CPU and RAM. Efficient algorithms allow either queuing or load balancing. Several clustering distributions exist that are built with diskless nodes in mind and function very well with disks (for swap, temporary storage, or distributed file systems).

Actual Implementation
The actual implementation of thin-OSCAR is based on the ramdisk model. This section describes the boot process and the principle of operation that thin-OSCAR is based on.

The Perl script is an interactive script that lets you configure the diskless cluster easily. It performs all the tasks necessary to transform a regular systemimager image into a set of RAM disks for diskless operation and to configure the OSCAR server correctly. Loopback device support has to be available on the system (master node) where thin-OSCAR is executed.

In order to launch the thin-OSCAR wizard, issue (as root) the following command:

/var/lib/oscar/packages/thin-oscar/oscar2thin.pl

The next section examines each of the steps necessary to use thin-OSCAR and create a diskless cluster.

Image Creation in the OSCAR Wizard
Some reduced-size rpm lists are provided for building images into the OSCAR wizard in the file oscarsamples/Mandrake-9.2-noX-i386.rpmlist. System Installer uses this to build a set of files as if it were installed on a real system. The image is installed in:

/var/lib/systemimager/images

Model Definition
Thin-OSCAR lets the administrator define models for each kind of node in the cluster. This is a way to set parameters to meet specific needs. This is done for the following parameters:

  • The name of the model
  • The interface name and the kernel module needed by this kind of node (SCSI, NFS, network adapter, etc.)
  • The OSCAR image you want these settings applied on
  • The kernel to use on this model
Thin-OSCAR lets you add, delete, and change your models as you wish; the configuration is stored in oscar-package/etc/ model.xml of thin-OSCAR.

Linking Nodes to Models
Say you've created some models and there are nodes configured in the OSCAR wizard. It's possible to link a model with some nodes. If you don't have any particular needs, you may want to link all your nodes with the same model. Keep in mind that if you want to try a new kernel, for example, you can define a new model and apply it to a bunch of nodes for testing purposes.

The configuration is written in oscar-package/etc/link.xml.

Configuring the Details
Thin-OSCAR will build two RAM disks for each model defined. RAM disks are simply traditional ext2 file-systems mounted as a loopback device. The first RAM disk is used only at boot time and is configured by thin-OSCAR depending on various parameters, such as the size of RAM disk that the kernel supports and the driver used by the network interface. When the node boots by PXE, it requests an IP address, a kernel, and the first RAM disk image. Thin-OSCAR populates the PXE directory, configures the tftp server that sends the first boot image, and copies the required kernel in /tftpboot.

The boot image contains the linuxrc script, which will build the raid array in RAM and put the run image into it. The script ends with a pivot_root to change the root. The raid array becomes the new root of the system. After that, the system boots (almost) as if it were a normal disk install.

The run image is typically about 25Mb, which will reside on the node after the boot process. Thin-OSCAR simply copies the directory from the SIS image to this RAM disk and creates an empty directory for those that are mounted by NFS.

The relevant directories are copied from their systemimager image location to the image directory: /bin, /boot, /dev, /etc, /home, /lib, /mnt, /proc, /root, /sbin, /var.

The directories /usr, /opt, and /lib/modules are created and will be used as a mounting point for the NFS exported file system.

The /etc/fstab file is then generated in the image directory. It contains the raid array in the RAM device (/dev/md0) as its root mount point. /home is mounted via NFS (OSCAR standard), and the /opt, /usr, and /lib/modules directories are NFS mount points from the systemimager image directory on the server (see Figure 1).

 

Networking capabilities are then generated, mainly /etc/sysconfig/network-scripts/ ifcfg-eth0, which is configured via DHCP.

Some information is deleted from the run image, for example the RPM database, because no further RPM operation will occur on the node.

Server Configuration
The /etc/hosts file is copied in the run image; /var/spool/pbs/server_ name and/var/spool/pbs/mom_priv/config are generated in the image directory so that PBS will be functional on the nodes.

The /etc/exports file is adjusted so that the systemimager image is exported (read only) to the cluster net with a given subnet mask. /home is exported read-write to the same network.

Development Roadmap
The main goal of the thin-OSCAR workgroup is to add diskless and systemless support to the OSCAR clustering framework. As a consequence, support for Root-NFS and single system image is expected. This will lead to interactions with some core components of OSCAR.

For example, there is a tool called cpush from the C3 package that puts a specified file on each node. It's obvious that if the file is in RAM, it will be lost on the next reboot. For diskless nodes, cpush should copy the file in the image directory, rebuild the RAM disk if needed, and reboot affected nodes. As a result, many commands from the traditional OSCAR environment should behave differently depending on the configuration of the node. A lot of integration work has to be done to seamlessly administrate a heterogenous cluster.

A more detailed version of the roadmap is available in the thin-OSCAR package in oscar-package/ROADMAP.

Summary
This article discussed useful definitions of nodes: diskless, systemless, and diskfull. We also covered some techniques for diskless and systemless clustering. The goals, actual implementation, and future development of thin-OSCAR were exposed.

Acknowledgments
We would like to thank Mehdi Bozzo-Rey and Sean Dague for valuable discussion. All this work was possible thanks to the existence of the Centre de Calcul Scientifique of the Université de Sherbrooke and its enthusiastic acceptance of the open source model.

Resources

  • OSCAR: http://oscar.openclustergroup.org/
  • Centre de Calcul Scientifique, Universite de Sherbrooke: http://ccs.usherbrooke.ca/
  • "Development, installation and maintenance of Elix-II, a 180 nodes diskless cluster running thin-OSCAR," M. Barrette, M. Bozzo-Rey, C. Gauthier, F. Giraldeau, B. des Ligneris, J.P. Turcotte, P. Vachon, A. Veilleux. Submitted to HPCS2003.
  • thin-OSCAR workgroup: http://thin-oscar.ccs.usherbrooke.ca/
  • Linux Terminal Server Project: www.ltsp.org/
  • NFS-Root mini-HOWTO: www.tldp.org/HOWTO/mini/NFS-Root.html
  • TFTP standard (RFC 1350): ftp.rfc-editor.org/in-notes/rfc1350.txt
  • System Installation Suite: www.sisuite.org/
  • "Root Raid in RAM How-To," Mehdi Bozzo-Rey, Michel Barrette, Benoit des Ligneris. Submitted to HPCS2003 (2003).
  • Network Block Device project page: http://nbd.sourceforge.net/
  • Scyld Beowulf Scalable Computing: www.scyld.com/
  • Beowulf Distributed Process Space: http://bproc.sourceforge.net/
  • Single System Image Clusters (SSI) for Linux: http://openssi.org
  • openMOSIX: www.openmosix.org/
  • Kerrighed project: www.kerrighed.org
  • The Advantages of Diskless HPC Clusters Using NAS: www1.us.dell.com/content/topics/global.aspx/power/en/ps4q02_guler
  • About Benoit Des Ligneris
    Benoit des Ligneris, Ph.D, is a postdoctoral fellow at the polytechnique school of Montreal and a researcher in the Scientific Computing Center
    of the Universite de Sherbrooke. He is a core-member of the OSCAR
    (oscar.openclustergroup.org) and the chair of the thin-OSCAR workgroup dedicated to high performance diskless computing.

    About Michel Barrette
    Michel Barrette, MScA, is a cluster architect in
    the Scientific Computing Center of the Université de Sherbrooke. He has been playing with Linux since 1996.

    About Michel Dagenais
    Michel Dagenais is professor at Ecole Polytechnique de Montreal and cofounder
    of the Linux-Quebec user group. He specializes in software performance analysis and object oriented distributed systems. During a leave of absence, he was the director of software development at Positron Industries. He also worked on the Linux Trace Toolkit, an open source tracing tool for Carrier Grade Linux.

    In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT or...
    Cisco on Wedesday announced its intent to acquire privately held Metacloud. Based in Pasadena, Calif., Metacloud deploys and operates private clouds for global organizations with a unique OpenStack-as-a-Service model that delivers and remotely operates production-ready private cl...
    The cloud provides an easy onramp to building and deploying Big Data solutions. Transitioning from initial deployment to large-scale, highly performant operations may not be as easy. In his session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, will discu...
    Technology is enabling a new approach to collecting and using data. This approach, commonly referred to as the “Internet of Things” (IoT), enables businesses to use real-time data from all sorts of things including machines, devices and sensors to make better decisions, improve c...
    IoT is still a vague buzzword for many people. In his session at Internet of @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, will discuss the business value of IoT that goes far beyond the general public's perception that IoT is a...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021



    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News

    AURORA, Ontario, September 17, 2014 /PRNewswire/ --