Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

by Wes McKinney


$53.99 $59.99 Save 10% Current price is $53.99, Original price is $59.99. You Save 10%.
View All Available Formats & Editions
Choose Expedited Shipping at checkout for guaranteed delivery by Wednesday, October 23


Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

  • Use the IPython shell and Jupyter notebook for exploratory computing
  • Learn basic and advanced features in NumPy (Numerical Python)
  • Get started with data analysis tools in the pandas library
  • Use flexible tools to load, clean, transform, merge, and reshape data
  • Create informative visualizations with matplotlib
  • Apply the pandas groupby facility to slice, dice, and summarize datasets
  • Analyze and manipulate regular and irregular time series data
  • Learn how to solve real-world data analysis problems with thorough, detailed examples

Product Details

ISBN-13: 9781491957660
Publisher: O'Reilly Media, Incorporated
Publication date: 10/21/2017
Pages: 550
Sales rank: 63,202
Product dimensions: 6.90(w) x 9.20(h) x 1.20(d)

About the Author

Wes McKinney is a New York—based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.

Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.

Table of Contents

Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Preliminaries;
1.1 What Is This Book About?;
1.2 Why Python for Data Analysis?;
1.3 Essential Python Libraries;
1.4 Installation and Setup;
1.5 Community and Conferences;
1.6 Navigating This Book;
1.7 Acknowledgements;
Chapter 2: Introductory Examples;
2.1 data from;
2.2 MovieLens 1M Data Set;
2.3 US Baby Names 1880-2010;
2.4 Conclusions and The Path Ahead;
Chapter 3: IPython: An Interactive Computing and Development Environment;
3.1 IPython Basics;
3.2 Using the Command History;
3.3 Interacting with the Operating System;
3.4 Software Development Tools;
3.5 IPython HTML Notebook;
3.6 Tips for Productive Code Development Using IPython;
3.7 Advanced IPython Features;
3.8 Credits;
Chapter 4: NumPy Basics: Arrays and Vectorized Computation;
4.1 The NumPy ndarray: A Multidimensional Array Object;
4.2 Universal Functions: Fast Element-wise Array Functions;
4.3 Data Processing Using Arrays;
4.4 File Input and Output with Arrays;
4.5 Linear Algebra;
4.6 Random Number Generation;
4.7 Example: Random Walks;
Chapter 5: Getting Started with pandas;
5.1 Introduction to pandas Data Structures;
5.2 Essential Functionality;
5.3 Summarizing and Computing Descriptive Statistics;
5.4 Handling Missing Data;
5.5 Hierarchical Indexing;
5.6 Other pandas Topics;
Chapter 6: Data Loading, Storage, and File Formats;
6.1 Reading and Writing Data in Text Format;
6.2 Binary Data Formats;
6.3 Interacting with HTML and Web APIs;
6.4 Interacting with Databases;
Chapter 7: Data Wrangling: Clean, Transform, Merge, Reshape;
7.1 Combining and Merging Data Sets;
7.2 Reshaping and Pivoting;
7.3 Data Transformation;
7.4 String Manipulation;
7.5 Example: USDA Food Database;
Chapter 8: Plotting and Visualization;
8.1 A Brief matplotlib API Primer;
8.2 Plotting Functions in pandas;
8.3 Plotting Maps: Visualizing Haiti Earthquake Crisis Data;
8.4 Python Visualization Tool Ecosystem;
Chapter 9: Data Aggregation and Group Operations;
9.1 GroupBy Mechanics;
9.2 Data Aggregation;
9.3 Group-wise Operations and Transformations;
9.4 Pivot Tables and Cross-Tabulation;
9.5 Example: 2012 Federal Election Commission Database;
Chapter 10: Time Series;
10.1 Date and Time Data Types and Tools;
10.2 Time Series Basics;
10.3 Date Ranges, Frequencies, and Shifting;
10.4 Time Zone Handling;
10.5 Periods and Period Arithmetic;
10.6 Resampling and Frequency Conversion;
10.7 Time Series Plotting;
10.8 Moving Window Functions;
10.9 Performance and Memory Usage Notes;
Chapter 11: Financial and Economic Data Applications;
11.1 Data Munging Topics;
11.2 Group Transforms and Analysis;
11.3 More Example Applications;
Chapter 12: Advanced NumPy;
12.1 ndarray Object Internals;
12.2 Advanced Array Manipulation;
12.3 Broadcasting;
12.4 Advanced ufunc Usage;
12.5 Structured and Record Arrays;
12.6 More About Sorting;
12.7 NumPy Matrix Class;
12.8 Advanced Array Input and Output;
12.9 Performance Tips;
Python Language Essentials;
The Python Interpreter;
The Basics;
Data Structures and Sequences;
Files and the operating system;

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 0 out of 5 based on 0 ratings. 0 reviews.