[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[ale] HADTOO: Automatic-node-generating Hadoop cluster on Gentoo Linux
- Subject: [ale] HADTOO: Automatic-node-generating Hadoop cluster on Gentoo Linux
- From: dev.null.02 at gmail.com (dev null zero two)
- Date: Fri, 7 Sep 2018 16:52:25 -0400
- In-reply-to: <[email protected]>
- References: <[email protected]>
that is pretty incredible (never thought to use Gentoo for this purpose).
have you thought about using orchestration tools for this (Kubernetes etc.)?
On Fri, Sep 7, 2018 at 4:46 PM Jeff Hubbs via Ale <ale at ale.org> wrote:
> For the past few months, I've been operating an Apache Hadoop cluster at
> Emory University's Goizueta Business School. That cluster is
> Gentoo-Linux-based and consists of a dual-homed "edge node" and three
> 16-GiB-RAM 16-thread two-disk "worker" nodes. The edge node provides NAT
> for the active cluster nodes and holds a complete mirror of the Gentoo
> package repository that is updated nightly. There is also an auxiliary edge
> node (a one-piece Dell Vostro 320) with xorg and xfce that I mostly use to
> display exported instances of xosview from all of the other nodes so that I
> can keep an eye on the cluster's operation. Each of the worker nodes
> carries a standalone Gentoo Linux instance that was flown in via rsync from
> another node while booted to a liveCD-style distribution (SystemRescueCD,
> which happens to be Gentoo-based).
>
> I have since set up the main edge node to form a "shadow cluster" in
> addition to the one I've been operating. Via iPXE and dnsmasq on the edge
> node, any x86_64 system that is connected to the internal cluster network
> and allowed to PXE-boot will download a stripped-down Gentoo instance via
> HTTP (served up by nginx), boot to this instance in RAM, and execute a bash
> script that finds, partitions, and formats all of that system's disks,
> downloads and writes to those disks a complete Gentoo Linux instance,
> installs and configures the GRUB bootloader, sets a hostname based on the
> system's first NIC's MAC address, and reboots the system into that
> freshly-written instance.
>
> At present, there is only one read/write NFS export on the edge node and
> it holds a flat file that Hadoop uses as a list of available worker nodes.
> The list is populated by the aforementioned node setup script after the
> hostname is generated.
>
> Both the PXE-booted Gentoo Linux instance and the on-disk instance are
> managed within a chroot on the edge node in a manner not unlike how Gentoo
> Linux is conventionally installed on a system. Once set up as desired,
> these instances are compressed into separate squashfs files and placed in
> the nginx doc root. In the case of the PXE-booted instance, there is an
> intermediate step where much of the instance is stripped away just to
> reduce the size of the squashfs file, which is currently 431MiB. The full
> cluster node distribution file is 1.6GiB but I sometimes exclude the kernel
> source tree and local package meta-repository to bring it down to 1.1GiB.
> The on-disk footprint of the complete worker node instance is 5.9GiB.
>
> The node setup script takes the first drive it finds and GPT-partitions it
> six ways: 1) a 2MiB "spacer" for the bootloader; 2) 256MiB for /boot; 3)
> 32GiB for root; 4) 2xRAM for swap (this is WAY overkill; it's set by ratio
> in the script and a ratio of one or less would suffice); 5) 64GiB for
> /tmp/hadoop-yarn (more about this later); 6) whatever is left for /hdfs1.
> Any remaining disks identified are single-partitioned as /hdfs2, /hdfs3,
> etc. All partitions are formatted btrfs with the exception of /boot, which
> is vfat for UEFI compatibility (a route I went down because I have one old
> laptop I found that was UEFI-only and I expect that will become more the
> case than less over time). A quasi-boolean in the script optionally enables
> compression at mount time for /tmp/hadoop-yarn.
>
> One of Gentoo Linux's strengths is the ability to compile software
> specifically for the CPU but the node instance is set up with the gcc
> option -mtune=generic. Another quasi-boolean setting in the node setup
> script will change that to -march=native but that change will only
> effectuate when packages are built or rebuilt locally (as opposed to in
> chroot on the edge node, where everything must be built generic). I can
> couple this feature with another feature to optionally rebuild all the
> system's binaries native but that's an operation that would take a fair bit
> of time (that's over 500 packages and only some of them would affect
> cluster operation). Similarly, in the interest of run-what-ya-brung
> flexibility, I'm using Gentoo's genkernel utility to generate a kernel and
> initrd befitting a liveCD-style instance that will boot on basically any
> x86-64 along with whatever NICs and disk controllers it finds.
>
> I am using the Hadoop binary distribution (currently 3.1.1) as distributed
> directly by Apache (no HortonWorks; no Cloudera). Each cluster node has its
> own Hadoop distribution and each node's Hadoop distribution has
> configuration features both in common and specific to that node, modified
> in place by the node setup script. In the latter case, the amount of
> available RAM, the number of available CPU threads, and the list of
> available HDFS partitions on a system are flown into the proper local
> config files. Hadoop services run in a Java VM; I am currently using the
> IcedTea 3.8.0 source distribution supplied within Gentoo's packaging
> system. I have also run it under the IcedTea binary distribution and the
> Oracle JVM with equal success.
>
> Hadoop has three primary constructs that make it up. HDFS (Hadoop
> Distributed File System) consists of a NameNode daemon that runs on a
> single machine and controls the filesystem namespace and user access to it;
> DataNode daemons run on each worker node and coordinate between the
> NameNode daemon and the local machine's on-disk filesystem. You access the
> filesystem with command-line-like options to the hdfs binary like -put,
> -get, -ls, -mkdir, etc. but in the on-disk filesystem underneath
> /hdfs1.../hdfsN, the files you write are cut up into "blocks" (default
> size: 128MiB) and those blocks are replicated (default: three times) among
> all the worker nodes. My initial cluster with standalone workers reported
> 7.2TiB of HDFS available spread across six physical spindles. As you can
> imagine, it's possible to accumulate tens of TiB of HDFS across only a
> handful of nodes but doing so isn't necessarily helpful.
>
> YARN (Yet Another Resource Negotiator) is the construct that manages the
> execution of work among the nodes. Part of the whole point behind Hadoop is
> to *move the processing to where the data is *and it's YARN that
> coordinates all that. It consists of a ResourceManager daemon that
> communicates with all the worker nodes and NodeManager daemons that run on
> each of the worker nodes. You can run the ResourceManager daemon and HDFS'
> NameNode daemon on the same machines that act as worker nodes but past a
> point you won't want to and past *that* point you'd want to run each of
> NameNode and ResourceManager on two separate machines. In that regime,
> you'd have two machines dedicated to those roles (their names would be
> taken out of the centrally-located workers file) and the rest would run
> both the DataNode and NodeManager daemons, forming the HDFS storage
> subsystem and the YARN execution subsystem.
>
> There is another construct, MapReduce, whose architecture I don't fully
> understand yet; it comes into play as a later phase in Hadoop computations
> and there is a JobHistoryServer daemon associated with it.
>
> Another place where the bridge is out with respect to my understanding of
> Hadoop is coding for it - but I'll get there eventually. There are other
> apps like Apache's Spark and Hive that use HDFS and/or YARN that I have
> better mental insight into, and I have successfully gotten Python/Spark
> demo programs to run on YARN in my cluster.
>
> One thing I have learned is that Hadoop clusters do not "genericize" well.
> When I first tried running the Hadoop-supplied teragen/terasort example
> (goal: make a file of 10^10 100-character lines and sort it), it failed for
> want of space available in /tmp/hadoop-yarn but it ran perfectly when the
> file was cut down to 1/100th that size. For my PXE-boot-based cluster, I
> gave my worker nodes a separate partition for /tmp/hadoop-yarn and gave it
> optional transparent compression. There are a lot of parameters for
> controlling things like minimum size and minimum size increment of memory
> containers and JVM parameters that I haven't messed with but to optimize
> the cluster for a given job, one would expect to.
>
> What I have right now - basically, a single Gentoo Linux instance for
> installation on a dual-homed edge node - is able to generate a working
> Hadoop cluster with an arbitrary number of nodes, limited primarily by
> space, cooling, and electric power (the Dell Optiplex desktops I'm using
> right now max out at about an amp, so you have to be prepared to supply at
> least N amps for N nodes). They can be purpose-built rack-mount servers, a
> lab environment full of thin clients, or wire shelf units full of discarded
> desktops and laptops.
>
> - Jeff
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
--
Sent from my mobile. Please excuse the brevity, spelling, and punctuation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20180907/d65a60af/attachment.html>