Apache Hadoop is an open source distributed computing technology that assists users in processing large volumes of data with relative ease, helping them to generate tremendous insights into their data. Cloudera, with their open source distribution of Hadoop, has made data analytics on Big Data possible and accessible to anyone interested. Tools like Cloudera Impala have also made it possible to perform real-time analytics on large volumes of data.
This book fully prepares you to be a Hadoop administrator, with special emphasis on Cloudera. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running Cloudera. This book will also equip you with an understanding of tools like Cloudera Manager, which is currently being used by many companies to manage Hadoop clusters with hundreds of nodes. You will also learn how to set up security using Kerberos so you can monitor and troubleshoot cluster issues.
The book starts by giving you a brief introduction to Apache Hadoop and Cloudera. You will then move on to learn about all the tools and techniques needed to set up and manage a production-standard Hadoop cluster. You will learn all the steps required to add/remove and rebalance nodes in a cluster, and will also get hands-on with the various tools and techniques to back up cluster information. By the end of this book, you will be ready to take up the responsibility of setting up and managing a production-level Hadoop cluster.
|Product dimensions:||7.50(w) x 9.25(h) x 0.53(d)|
About the Author
Rohit Menon is a senior system analyst living in Denver, CO. He has over seven years of experience in the field of information technology. He currently works for a product-based company specializing in software for large telecom operators.
He graduated with a Master's degree in Computer Applications from Pune University, where he built an autonomous maze-solving robot as his final year project. He later joined a software consulting company in India, where he worked on C#, SQL Server, C++, and RTOS to provide software solutions to reputed organizations in USA and Japan. After this, he started working for a product-based company, where most of his time was dedicated to programming the finer details of the products using C++, Oracle, Linux, and Java.
On the Hadoop front, he is a Cloudera Certified Apache Hadoop Developer. He blogs at http://www.rohitmenon.com, mainly on topics related to Apache Hadoop and its components. To share his learning, he has also started http://www.hadoopscreencasts.com, a website that teaches Apache Hadoop using simple, short, and easy-to-follow screencasts. He is well versed with a wide variety of tools and techniques like MapReduce, Hive, Pig, Sqoop, Oozie, and Talend Open Studio.
Most Helpful Customer Reviews
This book is a rapid and useful introduction to the use of Apache Hadoop to analyse huge data sets. But the central assumption is that you are not necessarily a data scientist per se, but an administrator of a Hadoop system. So the text really treads lightly if at all on the intricacies of data analysis. Instead, the merits of what Cloudera offers are explained. Top level material is covered, including building a cluster and installing and if needed upgrading Hadoop on it. Very tightly related to this is running MapReduce, which is the analysis engine optimised for a Hadoop cluster. The demands are specialised enough that ancillary processes [daemons] are needed. Like a job tracker program, giving information about scheduling of jobs, a status of the hardware in the cluster, and what jobs are currently running. If you have ever been system administrator of a computer cluster, especially a unix cluster, you may have seen similar programs, albeit on a smaller scale of disks. A key advantage of this book is a joint education in Hadoop and MapReduce. The point about running a Hadoop cluster is that often you then run MapReduce on it. Where Cloudera comes into play is in the simplification of the Hadoop administration. But not all the software described in the book is free. Cloudera has a Manager program in two versions - Standard, which is free, and Enterprise, which is not. Standard looks pretty good actually. But the text encourages you to carefully contemplate splurging on Enterprise. Arguing essentially that its extra features are worth the cost. You should read the text slowly to see if you concur. As an inducement, at least when the book was written, the Enterprise version has 60 days free use. This could still be valid when you read the book.
This book provides an excellent overview of how to use the Cloudera Manager to build and maintain an industrial Hadoop cluster. I think its best audience is either someone who is already familiar with Hadoop and will need to start managing a Cloudera cluster, or someone who will mostly just be interacting with the Cloudera Manager interface while a primary system administrator handles the more complicated issues that revolve around unpredictable variations. It starts off with a relatively watered-down overview of the concepts behind Hadoop, but around Chapter 5 it really picks up and provides a great description of how to build and manage your cluster. The techniques provided in this book make use of Cloudera Manager wherever possible, as this is the preferred method of setting up and maintaining a Cloudera Hadoop cluster. One of the strongest points is that it contributes to the amount of published knowledge around Hadoop 2, which has been slow to catch up with the release of the technology. There are a few shortcomings that prevent me from giving it a full five stars. Since it focuses on the Cloudera Manager, it could leave a fledgeling admin in a bad place if things aren't all lined up just right. The education base of the target audience is a little narrow since its tone is aimed at informing, not teaching, so it excludes those who are not familiar with system administration and general. In the other direction, it doesn't provide the in-depth, "under the hood" details that heavy-weight system administrators enjoy wading through (but which would require a much thicker book).