# How to use Pandas apply()

Pandas, the popular data manipulation library for Python, has become an essential tool for data scientists, engineers, and analysts around the globe. Its intuitive syntax, combined with its powerful functionalities, makes it the go-to library for anyone looking to perform efficient data analysis or manipulation in Python.

Among the all of functions offered by Pandas, the `apply()`

function holds a special place. This function stands out due to its versatility in handling a diverse range of tasks, from simple data transformations to more complex row or column-wise operations. In this article, we'll embark on a journey to decode the mysteries of the `apply()`

function, exploring its capabilities, use-cases, and diving deep into illustrative examples that showcase its potential.

## Why Use `apply()`

in Pandas?

The `apply()`

function in Pandas is a powerful tool that offers a unique blend of flexibility and functionality. It's often the go-to method when you need to perform custom operations that aren't directly available through Pandas' built-in functions.

**Benefits of Using **`apply()`

:

`apply()`

:**Flexibility**:`apply()`

can handle a wide range of tasks, from simple transformations to more complex row or column-wise operations.**Custom Operations**: It allows you to define custom functions (including lambda functions) to transform your data.**Integration with Built-in Functions**:`apply()`

seamlessly works with Python's built-in functions, expanding its potential uses.**Row and Column-wise Operations**: By adjusting the`axis`

parameter, you can easily switch between applying functions row-wise or column-wise.

**Syntax:**

The general syntax for the `apply()`

function is:

```
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
```

`func`

: The function to apply to each column/row.`axis`

: Axis along which the function is applied.`0`

for columns and`1`

for rows.`raw`

: Determines if the function should receive ndarray objects instead of Series. By default, it's`False`

.`result_type`

: Accepts "expand", "reduce", "broadcast", or`None`

. It controls the type of output. By default, it's`None`

.`args`

: A tuple that holds positional arguments passed to`func`

.

For a more in-depth understanding and additional parameters, one should refer to the official Pandas documentation.

When you're faced with a data transformation challenge that doesn't have a straightforward solution using Pandas' built-in functions, `apply()`

becomes an invaluable tool in your data manipulation toolkit.

## Basics of `apply()`

The `apply()`

function in Pandas is primarily used to apply a function along the axis (either rows or columns) of a DataFrame or Series. This function's beauty is in its simplicity and flexibility, allowing you to use built-in functions, custom functions, or even lambda functions directly.

**Applying a Function to Each Column**

By default, when you use `apply()`

on a DataFrame, it operates column-wise (i.e., `axis=0`

). This means the function you provide will be applied to each column as a Series.

**Doubling the numbers in a DataFrame**

Let's say we have the following DataFrame:

A | B |
---|---|

1 | 4 |

2 | 5 |

3 | 6 |

To double each number, we can use:

```
df_doubled = df.apply(lambda x: x*2)
```

After doubling each number, we get

A | B |
---|---|

2 | 8 |

4 | 10 |

6 | 12 |

By understanding the basic operations of the `apply()`

function, you can easily extend its capabilities to fit more complex scenarios, making your data processing tasks more efficient and readable.

## Applying Functions Row-wise with `apply()`

While column-wise operations are the default for the `apply()`

function on DataFrames, one can easily switch to row-wise operations by setting the `axis`

parameter to 1. When applying functions row-wise, each row is treated as a Series, allowing for operations that consider multiple columns.

**Calculating Aggregate Metrics Across Columns**

Often, we need to calculate some aggregate metric using values from different columns in a DataFrame.

**Example 1: Calculating the average of numbers in each row**

Given the following DataFrame:

A | B | C |
---|---|---|

1 | 4 | 7 |

2 | 5 | 8 |

3 | 6 | 9 |

To compute the average for each row, we can use:

```
row_avg = df.apply(lambda x: (x['A'] + x['B'] + x['C']) / 3, axis=1)
```

The average for each row is:

- Row 0: \( \frac{1 + 4 + 7}{3} = 4.0 \)
- Row 1: \( \frac{2 + 5 + 8}{3} = 5.0 \)
- Row 2: \( \frac{3 + 6 + 9}{3} = 6.0 \)

and the result we get is

0 | 4 |

1 | 5 |

2 | 6 |

**Combining Column Values Based on Condition**

In some scenarios, we might want to generate a new value based on conditions across multiple columns.

**Example 2: Categorizing based on column values**

Using the same DataFrame:

A | B | C |
---|---|---|

1 | 4 | 7 |

2 | 5 | 8 |

3 | 6 | 9 |

Let's categorize each row based on the following condition: If the average of the three columns is greater than 5, label it as "High", otherwise "Low".

```
row_category = df.apply(lambda x: "High" if (x['A'] + x['B'] + x['C']) / 3 > 5 else "Low", axis=1)
```

Using the same DataFrame:

A | B | C |
---|---|---|

1 | 4 | 7 |

2 | 5 | 8 |

3 | 6 | 9 |

The category based on the average value of each row:

- Row 0: Low (Average = 4.0)
- Row 1: Low (Average = 5.0)
- Row 2: High (Average = 6.0)

By understanding how to apply functions row-wise using `apply()`

, you can effectively transform, aggregate, or generate new data based on the values across multiple columns in a DataFrame.

## Using `apply()`

with Built-in Functions

The `apply()`

function in Pandas is not restricted to lambda functions or custom-defined functions. It seamlessly integrates with Python's built-in functions, allowing you to leverage a vast array of functionalities directly on your DataFrame or Series.

**Applying **`len`

to Calculate String Lengths

`len`

to Calculate String LengthsOne of the most common built-in functions to use with `apply()`

is `len`

, especially when dealing with columns of string data.

**Example 1: Calculating the length of strings in a DataFrame**

Given the following DataFrame:

Names |
---|

Alice |

Bob |

Charlie |

To compute the length of each name, we can use:

```
name_length = df_str['Names'].apply(len)
```

The length of each name is:

Names | Length |
---|---|

Alice | 5 |

Bob | 3 |

Charlie | 7 |

**2. Using max and min to Identify Extremes**

When dealing with numeric data, identifying the highest and lowest values in a row or column can be easily achieved using the built-in `max`

and `min`

functions.

**Example 2: Identifying the maximum value in each row**

Given the DataFrame:

A | B | C |
---|---|---|

1 | 4 | 7 |

2 | 5 | 3 |

3 | 6 | 9 |

To find the maximum value for each row, we can use:

```
row_max = df_new.apply(max, axis=1)
```

The maximum value for each row is:

- Row 0: 7
- Row 1: 8
- Row 2: 9

By integrating Python's built-in functions with Pandas' `apply()`

, you can achieve a range of operations without the need for custom logic, making your data manipulation tasks both efficient and readable.

## Advanced Uses: Combining `apply()`

with Other Functions

Pandas' `apply()`

function is versatile and can be paired with other functions or methods to achieve more complex operations. This combination unlocks the potential for more sophisticated data manipulations.

**Combining **`apply()`

with `map()`

for Value Mapping

`apply()`

with `map()`

for Value MappingThe `map()`

function can be used within `apply()`

to map values based on a dictionary or another function.

**Example 1: Mapping values based on a condition**

Given the DataFrame:

Scores |
---|

85 |

70 |

92 |

55 |

Let's categorize each score into "Pass" if it's above 60 and "Fail" otherwise:

```
score_map = {score: 'Pass' if score > 60 else 'Fail' for score in df_scores['Scores']}
df_scores['Result'] = df_scores['Scores'].apply(lambda x: score_map[x])
```

After categorization:

Scores | Result |
---|---|

85 | Pass |

70 | Pass |

92 | Pass |

55 | Fail |

**Combining **`apply()`

with String Functions for Text Manipulation

`apply()`

with String Functions for Text ManipulationPandas provides a range of string manipulation functions that can be combined with `apply()`

for text data transformations.

**Example 2: Extracting the domain from email addresses**

Given the DataFrame:

Emails |
---|

[email protected] |

[email protected] |

[email protected] |

To extract the domain of each email:

```
df_emails['Domain'] = df_emails['Emails'].apply(lambda x: x.split('@')[1])
```

After extracting the domain:

Emails | Domain |
---|---|

[email protected] | example.com |

[email protected] | mywebsite.net |

[email protected] | organization.org |

Combining `apply()`

with other functions and methods offers a robust approach to data manipulation in Pandas. Whether you're working with numeric, textual, or mixed data types, these combinations allow for intricate operations with ease.

## Performance Considerations with `apply()`

While the `apply()`

function in Pandas is incredibly versatile and can be used for a wide range of tasks, it might not always be the most efficient choice. This is particularly true for large datasets, where vectorized operations or Pandas' built-in functions can offer significant performance boosts.

**Vectorized Operations vs. **`apply()`

`apply()`

Pandas is built on top of NumPy, which supports vectorized operations. These operations are generally faster than using `apply()`

as they process data without the Python for-loop overhead.

**Example 1: Adding two columns**

Given the DataFrame:

A | B |
---|---|

1 | 4 |

2 | 5 |

3 | 6 |

Instead of using `apply()`

to add two columns:

```
df['C'] = df.apply(lambda x: x['A'] + x['B'], axis=1)
```

A more efficient, vectorized approach would be:

```
df['C'] = df['A'] + df['B']
```

**Using Built-in Functions vs. **`apply()`

`apply()`

Pandas provides built-in methods optimized for specific tasks. These can be more efficient than using `apply()`

with custom functions.

**Example 2: Calculating the mean**

Given the DataFrame:

Values |
---|

10 |

20 |

30 |

40 |

Instead of:

```
mean_value = df_values['Values'].apply(lambda x: x).sum() / len(df_values)
```

You can simply use:

```
mean_value = df_values['Values'].mean()
```

While `apply()`

provides flexibility, it's essential to consider performance implications, especially with large datasets. Leveraging vectorized operations or Pandas' built-in methods can lead to more efficient and faster code execution.

## Conclusions

The `apply()`

function in Pandas is undeniably a powerful tool in the arsenal of any data enthusiast. Its ability to handle a vast array of tasks, from straightforward data modifications to intricate row or column-wise computations, makes it a favorite among professionals. By leveraging this function, data manipulation tasks that might seem complex at first glance can often be distilled into concise and readable operations.

However, as with any tool, it's essential to understand when to use it. While `apply()`

offers flexibility, it's crucial to be aware of its performance implications, especially with larger datasets. Vectorized operations or other built-in Pandas functions might sometimes be a more efficient choice. Nonetheless, by mastering the nuances of `apply()`

, users can ensure that they are making the most out of Pandas and handling their data in the most effective manner possible.