CDN$ 48.46
  • List Price: CDN$ 64.82
  • You Save: CDN$ 16.36 (25%)
Only 1 left in stock (more on the way).
Ships from and sold by Gift-wrap available.
Big Data: Principles and ... has been added to your Cart
Have one to sell?
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See all 3 images

Big Data: Principles and best practices of scalable realtime data systems Paperback – May 10 2015

4.5 out of 5 stars 2 customer reviews

See all formats and editions Hide other formats and editions
Amazon Price
New from Used from
"Please retry"
CDN$ 48.46
CDN$ 33.74 CDN$ 59.28

Harry Potter and the Cursed Child
click to open popover

Frequently Bought Together

  • Big Data: Principles and best practices of scalable realtime data systems
  • +
  • Hadoop: The Definitive Guide
Total price: CDN$ 87.17
Buy the selected items together

No Kindle device required. Download one of the Free Kindle apps to start reading Kindle books on your smartphone, tablet, and computer.
Getting the download link through email is temporarily not available. Please check back later.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your mobile phone number.

Product Details

  • Paperback: 328 pages
  • Publisher: Manning Publications; 1 edition (May 10 2015)
  • Language: English
  • ISBN-10: 1617290343
  • ISBN-13: 978-1617290343
  • Product Dimensions: 18.5 x 1.5 x 23.1 cm
  • Shipping Weight: 794 g
  • Average Customer Review: 4.5 out of 5 stars 2 customer reviews
  • Amazon Bestsellers Rank: #43,306 in Books (See Top 100 in Books)
  •  Would you like to update product info, give feedback on images, or tell us about a lower price?

Product Description

About the Author

Nathan Marz is currently working on a new startup. Previously, he was the lead engineer at BackType before being acquired by Twitter in 2011. At Twitter, he started the streaming compute team which provides and develops shared infrastructure to support many critical realtime applications throughout the company. Nathan is the creator of Cascalog and Storm, open-source projects which are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many more companies.

James Warren is an analytics architect at Storm8 with a background in big data processing, machine learning and scientific computing.

Customer Reviews

4.5 out of 5 stars
5 star
4 star
3 star
2 star
1 star
See both customer reviews
Share your thoughts with other customers

Top Customer Reviews

Format: Paperback Verified Purchase
it is a good book.
Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again.
Report abuse
Format: Paperback Verified Purchase
good book, easy to read
Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again.
Report abuse

Most Helpful Customer Reviews on (beta) HASH(0xa00bbbd0) out of 5 stars 31 reviews
25 of 26 people found the following review helpful
HASH(0x9ff6c60c) out of 5 stars Other books in this area tend to focus a lot more on the "gee whiz" coolness of data science and machine learning applications ( May 27 2015
By Kirk D. Borne - Published on
Format: Paperback
I have rarely seen a thorough discussion of the importance of data modeling, data layers, data processing requirements analysis, and data architecture and storage implementation issues (along with other "traditional" database concepts) in the context of big data. This book delivers a refreshing comprehensive solution to that deficiency. Other books in this area tend to focus a lot more on the "gee whiz" coolness of data science and machine learning applications (which are aspects of big data that I happen to love, but they are not the whole story). You cannot hope to achieve good, effective, and efficient results from your analytics processes without good data flow, from discovery to access to integration, which is why architecture design, data modeling, and attention to data pipelining are essential. I highly recommend this book for anyone who isn't ashamed to admit that data engineering is at least as important as data science in the big data era (says this data scientist!).
13 of 14 people found the following review helpful
HASH(0x9ff6c660) out of 5 stars A clear-eyed look at good ways to keep your Big Data system from becoming overwhelmed by complexity and volume May 21 2015
By Si Dunn - Published on
Format: Paperback
Here's my bottom line: Get this book, whether you are new to working with Big Data or now an old hand at dealing with Big Data’s seemingly never-ending (and steadily expanding) complexities.

You may not agree with all that the authors offer or contend in this well-written "theory" text. But Nathan Marz’s Lambda Architecture is well worth serious consideration, especially if you are now trying to come up with more reliable and more efficient approaches to processing and mining Big Data. The writers' explanations of some of the power, problems, and possibilities of Big Data systems are among the clearest and best I have read.

"More than 30,000 gigabytes of data are generated every second, and the rate of data creation is only accelerating," Marz and Warren point out.

Thus, previous "solutions" for working with Big Data are now getting overwhelmed, not only by the sheer volume of information pouring in but by greater system complexities and failures of overworked hardware that now plague many outmoded systems.

The authors have structured their book to show "how to approach building a solution to any Big Data problem. The principles you’ll learn hold true regardless of the tooling in the current landscape, and you can use these principles to rigorously choose what tools are appropriate for your application.” In other words, they write, you will “learn how to fish, not just how to use a particular fishing rod.”

However, a particular Big Data architecture IS featured, as well: Marz's Lambda Architecture. It is, the two authors explain, "an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to Big Data systems that can be built and run by a small team."

The Lambda Architecture has three layers: the batch layer, the serving layer, and the speed layer.

Not surprisingly, the book likewise is divided into three parts, each focusing on one of the layers:

In Part 1, chapters 4 through 9 deal with various aspects of the batch layer, such as building a batch layer from end to end and implementing an example batch layer.

Part 2 has two chapters that zero in on the serving layer. "The serving layer consists of databases that index and serve the results of the batch layer," the writers explain. "Part 2 is short because databases that don’t require random writes are extraordinarily simple.”

In Part 3, chapters 12 through 17 explore and explain the Lambda Architecture’s speed layer, which “compensates for the high latency of the batch layer to enable up-to-date results for queries.”

Marz and Warren contend that "[t]he benefits of data systems built using the Lambda Architecture go beyond just scaling. Because your system will be able to handle much larger amounts of data, you’ll be able to collect even more data and get more value out of it. Increasing the amount and types of data you store will lead to more opportunities to mine your data, produce analytics, and build new applications."

This book requires no previous experience with large-scale data analysis, nor with NoSQL tools. However, it helps to be somewhat familiar with traditional databases. Nathan Marz is the creator of Apache Storm and originator of the Lambda Architecture. James Warren is an analytics architect with a background in machine learning and scientific computing.

(My thanks to Manning for providing a review copy of this book.)
12 of 14 people found the following review helpful
HASH(0x9ff6ca98) out of 5 stars Lambda Architecture FTW June 14 2015
By Zambonilli - Published on
Format: Paperback Verified Purchase
Great explanation of both the theory and practice of the lambda architecture. While the practice chapters are nice, it's the theory chapters that really shine. The book explains down to the byte level why components are implemented the way they are. For example, there's an immense amount of detail as to why using a db that doesn't support random writes allows for an application to query the batch layer's results without locking.

The only downside to the book is that the architecture and exosystem is so new that there's not really a lot of pragmatic solutions. For example, the theory describes a query layer that can merge the results of batch and real time processing for client applications. However, in real life there are no pragmatic solutions for doing this so you'd have to write your own.

It'll be interesting to see how the lambda architecture matures and to see future editions of this book. Hopefully, future editions will be as well written and have a better ecosystem for practice chapters.
1 of 1 people found the following review helpful
HASH(0x9ff6ca80) out of 5 stars This book gives a good overview of big data in the first few chapters March 28 2016
By Anon - Published on
Format: Paperback
This book gives a good overview of big data in the first few chapters. It also stresses that it is the theory that matters and not the specific tool used to illustrate an example. Therefore, I find it a good first book into big data. The downside because of its high level description is that you don't learn much as a beginner expect having the overall picture in mind. You will need to supplement this book with another to truly learn big data. Specifically, a book with more detail examples in using the tool you will use in your day-to-day programming.
HASH(0x9ff6cf3c) out of 5 stars If you are looking for a survey of different approaches ... April 10 2016
By Amazon Customer - Published on
Format: Paperback Verified Purchase
If you are looking for a survey of different approaches of handling big data, you want to read "ELEMENTS OF SCALE: COMPOSING AND SCALING DATA PLATFORMS". ([...]) This book is dedicated to Lambda Architecture (one that is surveyed in the above article.)

The book is very organized. Introduction in chapter 1 will be the road map of the whole book. Motivating with a simple web application based on RDBMS, the author showed how the approach to scale it becomes undesirable. After enumerating a list of desired properties, he proposed Lambda architecture, an approach in contrast to fully incremental architecture (with RDBMS).

The Lambda architecture is partitioned into three layers:
1. batch layer that computes different views on big data
2. serving layer that answers user queries using views from the batch layer and speed layer.
3. speed layer that compensates an approximate answer over a period time when the batch layer is working on the complete answers.

In the remaining chapters, the author dive deep into the rationale and requirements of all the different pieces of Lambda Architecture.

To under the context of Lambda Architecture, also refer to the wikipedia for crticism.