Boolean Masking In Numpy
We will learn how to apply comparison operators (
<
, >
, <=
, >=
, ==
& !-
) on the NumPy array which returns a boolean array with True
for all elements who fulfill the comparison operator and False
for those who doesn’t.import numpy as np
# making an array of random integers from 0 to 1000
# array shape is (5,5)
rand = np.random.RandomState(42)
arr = rand.randint(1000, size=(5,5))
print(arr)
# which elements value is greater than 500
print(arr > 500)
# which elements value is less than 750
print(arr < 750)
[[102 435 860 270 106]
[ 71 700 20 614 121]
[466 214 330 458 87]
[372 99 871 663 130]
[661 308 769 343 491]]
[[False False True False False]
[False True False True False]
[False False False False False]
[False False True True False]
[ True False True False False]]
[[ True True False True True]
[ True True True True True]
[ True True True True True]
[ True True False True True]
[ True True False True True]]
There are equivalent ufunc for comparison operators as listed in the table below:
Operator | unfunc |
---|---|
> | np.less |
< | np.greater |
>= | np.greater_equal |
<= | np.less_equal |
== | np.equal |
!= | np.no_equal |
# which elements value is greater than 500
print(np.greater(arr, 500))
# which elements value is less than 750
print(np.less(arr, 750))
[[False False True False False]
[False True False True False]
[False False False False False]
[False False True True False]
[ True False True False False]]
[[ True True False True True]
[ True True True True True]
[ True True True True True]
[ True True False True True]
[ True True False True True]]
In this section, we will study some useful functions/methods to work with boolean arrays we have created by applying comparison operator on numpy array
You must be thinking that how to count total number of
True
elements that passes the condition. There is a useful function for doing exactly the same, no.count_nonzero()
# counting the number of elements in array whose value > 500
print(np.count_nonzero(arr > 500))
# counting the number of elements in array whose value < 750
print(np.count_nonzero(arr < 750))
7
22
We can also use
np.sum
to count the elements that passes the condition. One major benefit of using this function is that we can provide kwarg axis
and can do the summation along preferred index# total in an array
print(np.sum(arr < 750))
# along axis=0
print(np.sum(arr < 750, axis=0))
# along axis=1
print(np.sum(arr < 750, axis=1))
22
[5 5 2 5 5]
[4 5 5 4 4]
np.any
returnsTrue
, if any element in the array makes the condition pass. Otherwise returnsFalse
np.all
returnsTrue
, if all elements in the array makes the condition pass. Otherwise returnsFalse
- We can also provide optional kwarg
axis
to apply function along preferred axis
# np.any
print(np.any(arr>500))
# np.all
print(np.all(arr>10))
# np.all along axis=1
print(np.all(arr>100, axis=1))
True
True
[ True False False False True]
Until now, we only applied a single comparison operator on an array. However, we can use Pythons bitwise logic operators (
&
, |
, ^
and ~
) to apply more than one comparison operators.For example, let suppose, for our array
arr
, we are interested to count number of elements that are greater than 500 but less than 750:# using boolean operator '&'
np.count_nonzero((arr >500) & (arr < 750))
4
# using boolean operator '|' or
np.count_nonzero((arr < 500) | (arr >= 500))
25
# using ~ before a condition revert the condition
# calculate no of elememnts NOT greater than 100 AND
# should be greater than 50
arr[(~(arr >100) & (arr > 50))]
array([71, 87, 99])
There are
ufunc
equivalent for all these boolean operators:Boolean Operator | unfunc |
---|---|
& | np.bitwise_and |
| |
^ | np.bitwise_xor |
~ | np.bitwise_not |
What is the difference between the keyword
and
and or
and boolean operators &
and |
?Keywordsand
andor
measure theTrue
orFalse
status of an entire object, while&
and|
refer to bits within each object
# using '&'
print(np.count_nonzero((arr >500) & (arr < 750)))
# using 'and'
try:
(arr >500) and (arr < 750)
except Exception as e:
print(e)
4
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In the above section, we applied single or multiple conditional operators, which returns a boolean array with
True
for element(s) that passes the condition(s) and False
for those element(s) that don’t pass the condition(s)In this section, we will apply this boolean array to return the actual values from the array. This process is called boolean masking
First example we covered in this section is by passing condition
arr > 500
to get the boolean array of elements passing True
and not passing False
this condition. Now, lets apply this condition under []
to return the actual values from the array, arr
# return array of elements with value < 500
arr[arr < 500]
array([102, 435, 270, 106, 71, 20, 121, 466, 214, 330, 458, 87, 372,
99, 130, 308, 343, 491])
# we can also use the ufunc
arr[np.less(arr,500)]
array([102, 435, 270, 106, 71, 20, 121, 466, 214, 330, 458, 87, 372,
99, 130, 308, 343, 491])
# passing more than one conditions
# using boolean operator
arr[(arr >500) & (arr < 750)]
array([700, 614, 663, 661])
Last modified 6mo ago