Vous voulez voir cette page en français ? Cliquez ici.

Have one to sell? Sell yours here
Tell the Publisher!
I'd like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Data Munging with Perl [Paperback]

David Cross
4.2 out of 5 stars  See all reviews (9 customer reviews)

Available from these sellers.


Amazon Price New from Used from
Paperback --  
Join Amazon Student in Canada

Book Description

Jan. 1 2001
Techniques for using Perl to recognize, parse, transform, and filter data.

Customers Who Bought This Item Also Bought

Product Details

Product Description


" . . . well written, informative, thought provoking . . . will be as relevant five years from now as it is today. . . . buy [one]." -- Dr. Dobb’s Journal

"A very good resource for programmers who want to learn more about data parsing, data filters, and data conversion..." -- ACM Computing Reviews

"I found the sample problems and the author's solutions to be very well done. I especially liked the design tips..." -- Pikes Peak Perl Mongers

"Well worth the price, and a good starting point for more advanced forays." -- Use.Perl.com

the chapters are concise, the coverage is comprehensive, and the examples are plentiful and relevant. -- Web Techniques Magazine

About the Author

Cross is the owner and managing director of Magnum Solutions, Ltd., an Internet and database consulting firm.

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

Customer Reviews

4.2 out of 5 stars
4.2 out of 5 stars
Most helpful customer reviews
4.0 out of 5 stars Shuffling the Cards Feb. 23 2013
By John M. Ford TOP 100 REVIEWER
David Cross shows us how to use Perl for "munging" data--"...storing information in databases, extracting it from files, reorganizing rows and columns, converting to and from bizarre formats, summarizing documents, tracking data in real time, creating statistics, doing back-up and recovery, merging and splitting data streams, logging and checkpointing computations." His book is full of techniques for transforming data from dumps into databases.

The book is written for programmers or analysts who transform data as a regular part of their jobs. It assumes a beginning knowledge of Perl programming, as one might gain from reading Learning Perl. Part I introduces data munging as a recurring necessary evil and points out aspects of Perl that recommend it for this task. Part II surveys different types of unstructured and semi-structured data formats and suggests Perl-based strategies for working with them. PART III examines the limitations of simple data formats and discusses parsing strategies and specific techniques for working with HTML, XML and other hierarchical data structures. PART IV extracts some useful lessons from the previous chapters and suggests sources for additional study. The organization is logical and easy to follow.

Cross has written a well-designed book with helpful examples and insights. The accompanying book web site and author web site provide downloadable code and other resources. This book is of course most useful to those working in Perl. But many general concepts and strategies have transferred well to data munging tasks I have done in TextPipe.

One of Perl's mottos is: "There's more than one way to do it." A variety of ways are illustrated and explained in this book. Note that it is over ten years old and does not include the latest evolutions of the Perl language.
Was this review helpful to you?
5.0 out of 5 stars Belongs on every sysadmin's desk July 2 2002
This book isn't about arcane corners of Perl theory. It's about how to write Perl programs that perform the "simple" task of converting data from one format to another.
Need to get every headline from an RSS feed? Or report the three users with the most processes running, as listed by `ps`? Or extract the first paragraph from each of a thousand HTML files? Or make a .tsv file based on all the "From:" and "Subject:" lines in your mailbox file? If those sorts of tasks sound familiar to you, then this is the book you've been looking for. It has working code for doing these sorts of things, involving lots of different common kinds of formats.
By tech book standards, this book is short (300 pages), but it's clear and direct and to the point -- no bloat here. Every page tells you something you need to know, with useful examples for every idea that it explains.
Was this review helpful to you?
5.0 out of 5 stars Valuable for its _clarity_ July 24 2001
After reading this book I rewrote a pretty massive postscript pasrsing and munging system that I was having a lot of trouble with and felt like I did it the _right_ way. If you follow the author through his examples and actually read the book (which I was able to read almost straight through) I think that you will find yourself with a more long-view approach. And I think that makes this book valuable. And admit it, every time you read throgh a regex chapter you get a little more in the old noggin...
Was this review helpful to you?
4.0 out of 5 stars Good for data-processing *beginners* July 6 2001
It's a guide. David takes you through the different "data munging" tasks ( record oriented data ? binary data ? fixed-width data ? XML ? ) and shows you his proper ways of dealing with them ( or, at least, thinking about them ). It's not an encyclopedia of "data munging", the book is 300 pages and many of them ( too many, may be ) are detailed descriptions of useful CPAN modules ( which I wasn't reading as careful as the rest of the book, since POD was always enough ), so it covers only a usual data processing tasks letting you to go deeper by yourself for more advanced topics. After you'll finish it much less "data sources" will scare you - the solutions and references are inside.
As I said, it may be good for data-processing beginners, but Perl experts will hardly find lot's of new information in it.
P.S. I trust him and therefore follow his advices in every script I start to think of ( especially the one about "UNIX filter model" ).
Was this review helpful to you?
1.0 out of 5 stars 7 years ago this would have been good. June 11 2001
By A Customer
I was hoping this book would provide some valuable routines for processing data, but instead it has proved virtually useless in my day to day job as a UNIX data center adminstrator. I work with XML a great deal (as well as relational databases), but the author's coverage of XML is week (2 pages on the DOM)-- and no coverage of dealing with record sets. This is a good text if you have reams of old fashioned columnar data that you need to text process. Hmm, I did that 7 years ago.
Was this review helpful to you?
Want to see more reviews on this item?

Look for similar items by category