Personal tools
You are here: Home User Group Association Program Book Reviews reviews_2005 Data Crunching:Solve Everyday Problems Using Java, Python, and More
Document Actions

Data Crunching:Solve Everyday Problems Using Java, Python, and More

by Tony Cappellini last modified 2006-11-05 02:21

review by Eric Walstad, June 2005

 Datacrunching.gif
 
  Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.
 
The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them.

  I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing. The author of course covers unit testing but also simple ways of testing when full-blown unit testing is overkill. The last chapter also has sections on encoding, dealing with floating point numbers, dates and times and how to format them with strftime. I was impressed by the author's ability to cull such important techniques and idioms and organize them into a small, yet incredibly useful text.
 
  Data Crunching covers real-life data parsing and manipulation concepts. It does so without tangential journeys into other areas of programming. Each of the five main topics include simple code examples, usually in Python, Java or both, that clearly demonstrate the topic. The author does an impressive job of squeezing in most all the issues in the daily work of data crunching. The reader can expect to come away with something of value on each topic covered, especially the newbie or occasional script writer.
 

« November 2008 »
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
 

Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: