The Data Wrangler's Handbook: Simple Tools for Powerful Results

Author:   Kyle Banerjee
Publisher:   American Library Association
ISBN:  

9780838919095


Pages:   176
Publication Date:   30 August 2019
Format:   Paperback
Availability:   In stock   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $179.49 Quantity:  
Add to Cart

Share |

The Data Wrangler's Handbook: Simple Tools for Powerful Results


Add your own review!

Overview

Data manipulation and analysis are far easier than you might imagine—in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity. In this handbook, data wizard Banerjee will familiarize you with easily digestible but powerful concepts that will enable you to feel confident working with data. With his expert guidance, you’ll learn how to use a single-word command to sort files of any size by any criteria, identify duplicates, and perform numerous other common library tasks; understand data formats, delimited text and CSV files, XML, JSON, scripting, and other key components of data; undertake more sophisticated tasks such as comparing files, converting data from one format to another, reformatting values, combining data from multiple files, and communicating with APIs (Application Programming Interfaces); save time and stress through simple techniques for transforming text, recognizing symbols that perform important tasks, a Regular Expression cheat sheet, a glossary, and other tools. Library technologists and those involved in maintaining and analyzing data and metadata will find Banerjee’s resource essential.

Full Product Details

Author:   Kyle Banerjee
Publisher:   American Library Association
Imprint:   ALA Editions
Weight:   0.257kg
ISBN:  

9780838919095


ISBN 10:   083891909
Pages:   176
Publication Date:   30 August 2019
Audience:   Professional and scholarly ,  Professional & Vocational
Format:   Paperback
Publisher's Status:   Active
Availability:   In stock   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Table of Contents

List of Figures and Tables Acknowledgments Introduction Chapter 1 Getting Started with the Command Line Finding the Command Line Mac Windows Meet the Command Line Chapter 2 Command Line Concepts Two Powerful Symbols Direct Output to a File (Greater Than Symbol) Direct Output to Another Program (Pipe Symbol) Command Substitution Regular Expressions—The Swiss Army Knife for Data Literal Characters Special Characters Wildcard Characters Logical Operators Grouping Scripting Chapter 3 Understanding Formats, by David Forero Chapter 4 Simplify Complicated Problems Isolating Specific Data Elements Converting Data into Formats That Are Easier to Work With Chapter 5 Delimited Text CSV (Comma Separated Values) Commas and Quotation Marks in CSV Files Multiline Fields in CSV Files Multivalued Fields in Delimited Files Chapter 6 XML So What Is XML, Really? What Makes XML So Useful? Why Is XML So Easy? DOM (Document Object Model) XPath XSLT (eXtensible Stylesheet Language Transformations) Working with Large XML Files Working with Complex XML Files XmlStarlet Installing XmlStarlet Converting XML Documents Chapter 7 JSON (JavaScript Object Notation) Chapter 8 Scripting Variables Arguments Conditional Execution Loops Chapter 9 Solving Common Problems Viewing Large Files Locating Files That Contain Particular Data Finding Files with Specific Characteristics Working with Internal Metadata Working with APIs Combining Data from Different Sources Other Tasks Chapter 10 Conclusions One-Line Wonders Locating, Viewing, and Performing Basic File Operations Combine Information from Multiple Files into a Single File Combine Three Files, Each Consisting of a Single Column into a Three-Column Table Extract 1,000 Random Lines or Records from a File Find Files with Specific Characteristics Find All Lines in All Files in the Current Directory as Well as All Subdirectories Containing a Regular Expression Identify All Files in Current Directories and Subdirectories That Contain a Value List All Files in Current Directory and Subdirectories over a 100 MB in Order of Decreasing Size List the Names, Pixel Dimensions, and File Sizes of All Files in the Current Directory and Subdirectories in Tab Delimited Format Print Line Number of File That Match Occurred On Split Large Files into Smaller Chunks with Each File Breaking on a Line View 200 Characters Starting at Position 38562 in a File View Lines 4369–4374 of a File Retrieving and Sending Information over a Network Retrieve a Document from the Web and Send It to a File Send an XML Document to an API Requiring HTTP Authentication Sorting, Counting, Deduplication, and File Comparison Combine Two Files on a Common Field Compare Two Sorted Files Count Occurrences for Each Entry in a File, Listed in Order of Decreasing Frequency Count Records Containing an Expression Count Words, Lines, and Characters in Files Identify All Unique Entries and Supply a Count of How Many Times Each Occurs Sort a File and Remove Duplicates, Show Only Duplicated Entries, or Show Only Unique Entries Useful Scripting Operations Capture Parameters Passed to a Script Divide a Line into Parameters Iterate through Every Item in Parameter List Perform a Loop Perform an Operation Conditionally Run a Script on Every Line of a File Send the Output of a Command as Arguments to Another Command Send the Output of a Command to Another Command Send the Output of a Command to a File Store the Output of a Command in a Variable Use Foreign Character Sets in a Terminal Window Transforming Text Convert File of Dates to YYYY-MM-DD Format Convert to Title Case Convert to Upper Case Convert List of Names from Direct Order to Indirect Order Extract and Manipulate All Lines in a File That Match a Complex Pattern Extract and Manipulate All Entries in All Files in an Entire Directory Hierarchy That Match a Pattern Remove Lines from a File That Match a Pattern Remove Carriage Return Characters Inserted by Windows Programs from a File Remove Newline Characters from a File Replace Newlines in a File with Character 7 (Bell) Replace Search_Expr with Replace_Expr Only on Lines That Contain Condition_Expr Replace Search_Expr with Replace_Expr Except on Lines That Contain Condition_Expr Replace Smart Quotes with Straight Quotes Working with Delimited Files Convert Comma Delimited File Where Some Values Are Quoted and Some Values Are Not to Tab Delimited Convert Multiline Records to Table Extract Individual Fields from Files Find the Most Common Values in the Second Field of a File Find All Lines in Tab Delimited File Not Containing Six Fields Fix Delimited File That Contains Line Breaks in Fields Remove Trailing and Leading Whitespace from Tab Delimited Data Fields Reorder Fields in a Tab Delimited File Working with JSON and XML Add an Attribute to an XML Document Add an Element to an XML Document Apply XSLT Stylesheet to XML Document Convert JSON to Tab Delimited Format Delete Elements, Attributes, or Values Based on XPath Expressions Display Structure of XML File Pretty Print JSON Document Pretty Print XML Document Glossary Symbols That Perform Important Tasks Useful Commands Regular Expression Cheat Sheet Index

Reviews

"I highly recommend The Data Wrangler’s Handbook for anyone who now manipulates data or may need to do so in the future. In Banerjee’s words, 'If these tasks [that require data wrangling] sound intimidating, this book is for you. You will understand everything in this book even if you have no special technical knowledge or programming experience.'"""" — Technicalities"


I highly recommend The Data Wrangler's Handbook for anyone who now manipulates data or may need to do so in the future. In Banerjee's words, 'If these tasks [that require data wrangling] sound intimidating, this book is for you. You will understand everything in this book even if you have no special technical knowledge or programming experience.' - Technicalities


Author Information

Kyle Banerjee has wrangled data for diverse purposes in academic, government, and nonprofit environments since 1996. A firm believer that understanding people is the key to building services of the future from the systems and data of the past, his professional interests revolve around understanding workflows and identifying opportunities in data previously thought inconsistent or incomplete. He has published several books and numerous articles on a variety of topics related to applying technology in library settings.

Tab Content 6

Author Website:  

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions
Latest Reading Guide

Aorrng

Shopping Cart
Your cart is empty
Shopping cart
Mailing List