# pandas: Essential Functionality¶

Ha Khanh Nguyen (hknguyen)

## 1. Reindexing¶

• reindex: to create a new object with the data conformed to a new index.
• Calling reindex on this Series rearrnages the data according to the new index, introducing missing values if any index values were not already present:
• For ordered data like time series, it may be desireable to do some interpolation or filling of values when reindexing.
• The method option allows us to do this, using a method such as ffill, which forward-fills the values:

## 2. Dropping Entries from an Axis¶

• The drop() method will return a new object with the indicated value or values detected from an axis:
• With DataFrame, index values can be deleted from either axis.
• Calling drop() with a sequence of labels will drop values from the row labels (axis 0):
• You can drop values from the columns by passing axis=1 or axis='columns':
• As you already notice, drop() returns a new Series/DataFrame, not modifying the one it is called on.
• To manipulate the object without returning a new object, use the inplace argument.

## 3. Function Application and Mapping¶

• NumPy ufuncs (element-wise array methods) also work with pandas objects:

### 3.1 apply()¶

• Another frequent opeartion is applying a function on one-dimensional arrays to each column or row.
• The DataFrame's apply() method does exactly this:
• Here the function f, which computes the difference between maximum and minimum of a Series, is invoked once on each column in frame.
• The result is a Series having the column names of frame as its index.
• If you pass axis='columns' or axis=1 to apply(), the function will be invoked once per row instead.
• Many of the common array statistics (like sum and mean) are DataFrame methods, so using apply() is not necessary.
• The function passed to apply() does not need to return a scalar value, it can also returns a Series!
• The output is a DataFrame with the same index as the output of the f() function.

### 3.2 applymap()¶

• We can apply a user-defined function to each element of the DataFrame too (element-wise execution or vectorization).

### 3.3 map()¶

• The map() function allows element-wise execution for Series:

## 4. Sorting and Ranking¶

• Sorting a dataset by some criterion is another important built-in operation.

### 4.1 Sort by index¶

• To sort lexicographically by row or column index, use the sort_index() method, which returns a new, sorted object:
• With DataFrame, you can sort by undex on either axis (row or column):
• To sort in descending order, set ascending=False:

### 4.2 Sort by values¶

• To sort a Series by its value, use the sort_values method:
• When sorting by values with a DataFrame, you can use the data in one or more columns as the sort key.

### 4.3 Ranking¶

• Ranking assign ranks from one through the number of valid data points in an array.
• By default, rank() breaks ties by assigning each group the mean rank.
• We can change the tie breaker to other settings according to our needs.

This lecture notes reference materials from Chapter 5 of Wes McKinney's Python for Data Analysis 2nd Ed.