Universal Functions In Pandas
First, let’s create Pandas
Series
of random integersimport numpy as np
import pandas as pd
# creating random state
rand = np.random.RandomState(42)
# Creating Pandas Series of random integers
ser1 = pd.Series(rand.randint(10, size=4))
print(ser1)
0 6
1 3
2 7
3 4
dtype: int64
Second, create a Pandas
DataFrame
of random integers# Creating Pandas DataFrame
df1 = pd.DataFrame(rand.randint(10,size=(3,4)),
columns=['a','b','c','d'])
print(df1)
a b c d
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4
Now, if we apply any Numpy Ufunc on these objects (
Series
or DataFrame
) the result will be another Panda object with indices preserved# Taking exponent of all element in the Series, sr1
np.exp(ser1)
0 403.428793
1 20.085537
2 1096.633158
3 54.598150
dtype: float64
# Doing arithmatic on each element of dataframe, df1
print(np.multiply(df1,10))
a b c d
0 60 90 20 60
1 70 40 30 70
2 70 20 50 40
When we try to
add
two Series
with non-identical index, the resulting sum will keep the index alignment# First, define two series whose index are not identical
A = pd.Series([1,2,3], index=[0,1,2]) #index[0,1,2]
B = pd.Series([10,20,30], index=[1,2,3]) #index[1,2,3]
# Second, perform addition of these two series
print(A); print(B)
print(A.add(B))
0 1
1 2
2 3
dtype: int64
1 10
2 20
3 30
dtype: int64
0 NaN
1 12.0
2 23.0
3 NaN
dtype: float64
As we can tell from above example, when we perform the sum, the indices of both series are preserved.
- When Python doesn’t find any corresponding value on same index, it returns
NaN
- For example, in Series
A
there is index 0 but no corresponding value for SeriesB
, index 0 - To handle this NaN, we can use kwarg
fill_value
with Pandas.add()
method
A.add(B, fill_value=0)
0 1.0
1 12.0
2 23.0
3 30.0
dtype: float64
When we try to
add
two DataFrame
with non-identical index, the resulting sum will keep the index alignment# First, defining two dataframes with not identical indices or columns
C = pd.DataFrame(rand.randint(10, size=(2,2)),
columns=['a','b'])
D = pd.DataFrame(rand.randint(10, size=(3,3)),
columns=['a','b','c'])
print(C); print(D)
a b
0 1 7
1 5 1
a b c
0 4 0 9
1 5 8 0
2 9 2 6
# Secondly, we add these two dataframes and see how results are handled
print(C.add(D))
a b c
0 5.0 7.0 NaN
1 10.0 9.0 NaN
2 NaN NaN NaN
- When Python doesn’t find any corresponding value on same index and column, it returns
NaN
- For example, in DataFrame
D
there is index 0, column ‘c’ but no corresponding value for SeriesC
under index 0, column ‘c’ - We can use keyword argument,
fill_value
with Pandas.add()
method to handle the NaN
print(C.add(D, fill_value=0))
a b c
0 5.0 7.0 9.0
1 10.0 9.0 0.0
2 9.0 2.0 6.0
Python operator | Parameter method(s) |
---|---|
+ | add() |
- | sub(),subtract() |
* | mul(),multiply() |
/ | div(),divide(),truediv() |
// | floordiv() |
% | mod() |
** | pow() |
Remember that we mention,
axis=0
or axis=index
the operation will be performed column wise and when we mention axis=1
or axis=column
, the operation will be performed row wise.axis=0
oraxis=index
means to perform operation on all the rows in each columnaxis=1
oraxis=column
means to perform operation on all the columns in each row
Let’s subtract values of first row of the
df1
from all rows in df1
. In this case, the default value of kwarg, axis
is 1
or columns
print(df1)
print(df1.subtract(df1.iloc[0]))
a b c d
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4
a b c d
0 0 0 0 0
1 1 -5 1 1
2 1 -7 3 -2
However, If we would like to apply this arithmetic operation index-wise, we can use,
axis=0
or axis=index
print(df1.subtract(df1['a'], axis=0))
a b c d
0 0 3 -4 0
1 0 -3 -4 0
2 0 -5 -2 -3
Operations between a
DataFrame
and Series
object are similar to operations between a two-dimensional and one-dimensional NumPy array# Series
ser11 = pd.Series(rand.randint(12, size=3))
ser11
0 2
1 9
2 11
dtype: int64
# DataFrame
df11 = pd.DataFrame(rand.randint(10,size=(3,4)),
columns=['a','b','c','d'] )
print(df11)
a b c d
0 7 5 7 8
1 3 0 0 9
2 3 6 1 2
Let add
Series
to DataFrame
with kwarg, axis=0
or axis=index
, which matches the index . Both ser1
and df1
have identical indexprint(df1.add(ser1, axis=0))
a b c d
0 9 7 9 10
1 12 9 9 18
2 14 17 12 13
Last modified 4mo ago