Booleans in Python: Beware of Numpy

Published Aug 24, 2021

I was doing some backend work for an upcoming Finance project when I ran into some trouble with numpy. Thankfully, this isn’t my first rodeo so I quickly found the issue.

So what was this issue? Well, turns out that the numpy package uses a custom type for boolean variables, which is bool_. This makes sense since numpy implements ctype arrays so comparisons are done in cython and not python so they needed to define a type to interface between the two. Since bool is reserved, they went with bool.

And therein lied my first issue: if you check whether the variable is a bool, you’ll get a False. Thankfully doing type-check on a boolean is not common, and the stack should point you in the right direction straightaway if you’ve set up a catch.

This solution needs you to write a little dispatch function but that’s no big deal. Let’s see how our controller would look like:

>> import numpy as np    
>> value = np.abs(4) > 0  # simple check, value is of type np.bool_     
>> value    
True    
>> type(value)    
<class 'numpy.bool_'>    
>> value is bool    
False   
>> value = np.bool(True)   # numpy defines both bool and bool_ 
>> type(value) 
<class 'bool'>

There we have it: numpy exposes booland bool, with bool the custom type and bool the base python type. The problem comes from the fact that a numpy comparison will return bool_ instead of bool while you, the developer, can get confused.

The second issue, which is the one I encountered is less obvious. What happens if you try to add 2 booleans together? Let’s try.

>> True + True    
2   
>> True + False   
1   
>> True * 4       # What about multiplications?
4                 # Still acts like an integer

It appears python booleans behave like integers when adding them together (0 is False, 1 is True). Now let’s see numpy.

>> trueVar = np.bool_(True)
>> falseVar = np.bool_(False)
>> trueVar + trueVar
True                    # Wait up, that's not an integer!   
>> falseVar + falseVar  
False                   # Looks like our bool_ are acting like actual booleans in this case
>> trueVar + falseVar   # Let's add True and False 
True                    
>> trueVar * 4  
4   
>> falseVar * 4   
0
``` 

With all these checks, we get surprising results:   
- Adding bool_ instances returns True if there is at least one bool_(True) in the operators   
- Adding a bool_ and an int or float casts the bool_ to int or float  
- A multiplication also casts the bool_ to int or float depending on the multiplier    

It's obvious that the bool_ type has different properties from the base bool type, so keep it in mind for next time, you might have to cast your numpy comparison to bool afterwards to ensure expected behaviour.   

In my case this was my issue: I was performing checks on multiple conditions and adding the results of these checks to get a score. In effect I had 5 conditions so I was expecting this to give me a score between 0 and 5 inclusive, but ended up with only True and False values instead. It's not a complex issue, but it's definitely worth knowing in case you're doing quantitative work in python.

#python #learn #quantitative #finance #preprocessing data #numpy #pandas #typing