Table Of Contents

Previous topic

A sudoku solver

Next topic

Counters in python

This Page

Manipulating bitfields in Python (in most language actually)

Why and when to use a bitfield

In most language, the smallest memory chunk that you can manipulate, that is the smallest variable size available, is a small integer, which is, on most architecture, eight bits. It is not possible to manipulate directly the individual bits, for example, setting bit number 4 to zero leaving the other bits untouched. There is seldom use for such a manipulation.

There might be a day when there is a need to tackle a big computation problem, or when the mobile application under development halves the battery life of the user’ s phone. It might even be the case (like right now actually) that the system needs to spin the vent of the laptop so fast that it wakes up my grand mother having a nap on the couch nearby (true story, this is inconvenient).

That day, one might gets interested in how our fathers (and the fathers of our fathers) have sent a happy few to the moon with processors clocked at a few Hz and with a few bytes on memory stored on punch cards. A technique they most certainly used is the manipulation of bitfields for some data structures because they are light and fast, especially much lighter in terms of memory and processing than the Python dictionaries and lists.

They are not adapted for every use though: they are limited, trickier to get right, and not super easy to debug. You might end putting twice more time into the development that you originally expected. At this point, you might even want to consider rewriting the module in C, because by using choosing bitfields over dicts, sets and list, you are already halfway there !

This introductory section is followed by the detailed explanation of how to manipulate bitfields. The last section shows a real case use: a heavyweight computation will swap the default Python dictionary for a dictionary implemented with a bitfield for better performance.

Manipulating a bitfield

A difficulty with manipulating binary numbers in the Python console, is that the console does not know the input numbers are to be interpreted in binary and the output numbers need to be formattedas binary. The bitfield module helps in that once a number is declared Binary, its representation in the console is the binary representation, not the decimal. Additions and bitshifts uses binary arithmetics too:

>>> from bitfield import Binary
>>> one = Binary(1)
>>> one + 1
10
>>> [one << n for n in range(5)]
[1, 10, 100, 1000, 10000]
>>> a = Binary(1000)
>>> a, a+1, a+1+1, a+10
(1000, 1001, 1010, 1010)

The second operand must be a binary number or an instance of the Binary class. It can not be a decimal number, decimal numbers must first be converted to a binary number with B():

>>> a + 5
Traceback (most recent call last):
ValueError: invalid literal for int() with base 2: '5'
>>> a + B(5)
1101
>>> b = Binary(10)
>>> a+b, a-b
(1010, 110)

Logical operators used with a Binary as first operand also return a Binary:

>>> a|b,  a&b,  a^b,  a|11,  a^11
(1010, 0, 1010, 1011, 1011)

The methods one(), zero() and get() operates on an individual bit of a Binary number, requiring the bit number (counted from zero) as the argument:

>>> a.get(3), a.get(2), a.get(1), a.get(0)
(1, 0, 0, 0)
>>> a, a.one(1)
(1000, 1010)
>>> a.zero(3)
0

The methods of the Binary class often uses the decimal2binary, and binary2decimal module functions. B is a shortcut for the decimal2binary and is typically used by Binary when representing a Binary number :

>>> B(63), B(64)
('111111', '1000000')

D is a shortcut for binary2decimal and converts a binary number or string to a decimal number.

>>> D(111), D(101010)
(7, 42)

The bitfield module provides mostly eye candy, the two functions that are pivotal to the manipulation of binary numbers are binary2decimal and decimal2binary functions, whose source code is presented below:

def decimal2binary(decimal):
    """Turns a decimal number into the string representation of a
    binary"""

    # two special cases to "get" right: 0 and negative number

    sign, decimal = ('-', -decimal) if decimal<0 else ('', decimal) 

    digits = []
    while decimal>0:
        decimal, digit = decimal/2, decimal%2
        digits.append(str(digit))

    return sign + ''.join(digits[::-1]) if digits else '0'
def binary2decimal(binary):
    "Turns a binary number into its decimal representation"
    return int(str(binary),2)

Now that it is possible to manipulate the bitfields with some ease, the following subsections details the get(), one() and zero() functions. The concepts explained here are similar in many programming languages such as C/C++, Java, Ruby.

How to get the nth bit?

Take the number 1111 in binary, and we want to know if the third bit is set to one or zero (counting from zero, this is bit number 2). Here is the method;:

  1. first put the desired bit at the righmost position by shifting the binary word on the right as many times as the position of the desired bit:

    >>> from bitfield import Binary
    >>> b = Binary(1111)
    >>> b >> 2
    11
    
  2. then, sets all bits except the rightmost one to zero and returns the result which is the desired bit and not more.

    >>> (b >> 2) & 1
    1
    

    The third bit of the binary number 1111 is actually set to 1. Here is another example, where bit #2 is requested in number 10000:

    >>> Binary(10000) >> 2 & 1
    0
    

How to set the nth bit of the bitfield?

Take the binary number 10000 and let’s make sure the third bit is set to 1.

  1. First create a number with all the bits equals to zero except for the third bit:

    >>> one = Binary(1)
    >>> one << 2
    100
    
  2. The OR binary operator, available in Python with the | character, merges two bitfields: each bit of the resulting bitfield is one if either bit from one of the two operand is one, zero else:

    >>> 1|1, 1|0, 0|1, 0|0
    (1, 1, 1, 0)
    

    The number to transformed needs to be ORed with the number created in the previous step:

    >>> (one << 2) | 10000
    10100
    

How to unset the nth bit?

Say we have the number 10100 and we want to make sure the third bit is set to 0.

  1. First create a binary whose third bit is one, all other are zero:

    >>> one<<2
    100
    
  2. Invert the bits with the ~ operator: for every bits, a one becomes zero and a zero becomes a one: 100 becomes 011.

    ANDing the input number and the number created at the previous step sets the third bit to zero:

    >>> ~(one<<2) & 10100
    10000
    

Replacing the standard Python set object with a bitfield

The example problem is to implement a set of digits from 1 to 9. This is a data structure which has methods for adding, removing and checking whether elements are contained in the data structure. Our use case is a sudoku resolver: before setting a number at some position on the sudoku chessboard, we must check that the number is not somewhere else in the line, or the column, or in the square. Each column, line or square can be represented by a set, with Python primitives, this is:

>>> line = set([1,2,3,4,5])
>>> 4 in line
True
>>> 9 in line
False
>>> line.add(9)
>>> 9 in line
True

A bitfield of length nine is a lighter solution that the Python set and is adapted to our specific context: when the nth bit is set to 1 then, n is in the set. Conversely, when the nth bit is zero, n is absent from the set. At this point, the set can be implemented with just one integer. A Python set object containing 9 digits would be much bigger in terms of bytes. Also, the operation to set and retrieve the element of a set are much more heavywight processor-wise than bit arithmetic.

>>> class BitFieldSet:
...
...     _num = 0
...
...     _get  = lambda s, n: (s._num >> n) & 1
...     _zero = lambda s, n: s._num & ~(1 << n)
...     _one  = lambda s, n: s._num |  (1 << n)
...
...     def add(self, number):
...         self._num = self._one(number-1)
...
...     def remove(self, number):
...         self._num = self._zero(number-1)
...
...     def __iter__(self):
...         for i in range(0,9):
...             if self._get(i):
...                 yield i+1
...
...     def __repr__(self):
...         return "set(%s)" % [i for i in self]
...

Let’s write a small function which can operate on any data structure which has the methods add and remove. The function can not make a difference between a regular Python set and a BitFieldSet.

>>> def test(line):
...     for i in [1,2,3,4,5]:
...         line.add(i)
...     assert 4 in line
...     assert not 9 in line
...     line.add(9)
...     assert 9 in line
...
>>> test(set())
>>> test(BitFieldSet())
>>> from timeit import Timer
>>> print Timer( lambda : test(set())        ).timeit()     
2.52030396461
>>> print Timer( lambda : test(BitFieldSet())).timeit()     
28.7120981216

The last time the test function was executed, the BitFieldSet version would take rougly ten times longer, showing that the data structure unadapted for an optimized replacement of the Python set. Let’s blame it on a too naive benchmark for now. The article A sudoku solver shows the efficient use of bitfields for implementing a sudoku solver.