Systems Programming

Bit Twiddling

Aug 2013

Goals:

To understand the practical use of the bitwise operators.

Material based in the original lecture notes by Dr. Carlos Prolo.

1 Semantically-independent ways to represent data

From the Introduction to Computer Science module, you have already learned that numeric data can be represented in many numerical bases such as decimal, hexadecimal and binary. However, there is no such thing as a "base-10 number" which is different from a "base-2 number": regardless of the numerical base chosen to represent a value, the stored data is always the same. Nevertheless, there are some numerical bases which have clear advantages when we start to work with low level applications.

First let's review the three main numerical bases:

Decimal: Good for humans, bad for programmers. Why? Try to convert from decimal to any other base and you'll see. For example, decimal to binary or decimal to hexadecimal.

Figure 1.1 Obtaining the value from a decimal number (from [1])
Binary: Allow us to clearly see the internal representation of a value. However, binary is a long representation: just try to picture using a value such as 11110000101011100001010011101100...
Hexadecimal: Offers a good tradeoff between decimal and binary. As base 16 is a power of 2 (2⁴), the conversion between binary and hex is 4 binary digits to 1 hex (right aligned). For example, to convert the previous binary number to hex we just need to read the bits 4 at a time:
```
// 1111 0000 1010 1110 0001 0100 1110 1100
//   F    0    A    E    1    4    E    C
```
Thus the hex value will be F0AE14EC, which is not only shorter but much more readable than the binary one. At the same time, it is equally simple to extract the bits from anywhere within the sequence. For those reasons, hex representation is widely used in low level programming.

Figure 1.2 Obtaining the value from a hex number (from [1])

1.1 Representation of hex numbers in C

The C language allows us to easily use hex numbers in our code: we only need to use the "0x" prefix before the number. For example: 0xF0AE14EC. The following two commands are equivalent in C:

unsigned int var;
var = 254;
var = 0xFE;

1.2 Output of hex values

To print out an hex value, we use the %x (lowercase letters) or %X (uppercase letters) modifiers for printf:

printf("%x\n", hex);  // fe
printf("%X\n", hex);  // FE

2 Bitwise operators

Bitwise operators are operators that work individually on each bit of the operands. They can be used with integer values: int, char, short int, long int, preferably unsigned (e.g, unsigned int var).

There are two types: logical operators and shifting operators. In this section we will describe these operators in detail.

2.1 Logical operators

These operators apply common logical operations in a bitwise manner. They are:

2.1.1 Bitwise AND

In C, this is the & operator. But don't mix that up with the logical AND: &&. The difference, as we already know, is that bitwise AND operates in each bit of the operands. For example:

operand 1	01001010
operand 2	10010010
AND result	00000010

2.1.2 Bitwise OR

Just as the logical OR is ||, bitwise OR is a single bar: |

operand 1	01001010
operand 2	10010010
OR result	11011010

2.1.3 Bitwise NOT

This is different than the logical not (!): we use the ~ symbol.

operand	01001010
NOT result	10110101

2.1.4 Bitwise XOR

XOR is the exclusive-or operator: the end result will be the same as OR, except if both bits are on - in that case, the result will be zero.

operand 1	01001010
operand 2	10010010
XOR result	11011000

2.2 Shifting operators

These operators get this name because they work by shifting the bits either to the right or to the left. The empty spaces are filled with zeroes. They are particularly useful when combined with the logical operators, as we will see later.

2.2.1 Left logical shift

In the C language this is the << operator. The arguments for this operator are the value to shift and the amount of bits to shift to the left. Note that the value itself remains unchanged: if we want to change a variable then we must combine the shifting operator with an assignment.

For example, let's consider the following original value (0x17) in the x variable (top of 2.1↓):

Figure 2.1 Bitwise shifting to the left (from [2])

If we want to shift the value to the left and store the result in an y variable (bottom of 2.1↑), we can do as follows:

unsigned char x = 0x17;   // 23 in decimal
unsigned char y = x << 1; // left shift by 1 bit
// now y will store 46 (0x2E)

Note that you can also shift more bits at a time:

y = x << 2; // y will store 92  (0x5C)
y = x << 3; // y will store 184 (0xB8)

2.2.2 Right logical shift

Likewise, the opposite operation is the right logical shift: >>. Just as the left shift, the first operand is shifted to the right by the amount of bits specified by the second operand. For example, considering the same starting value for x (0x17):

Figure 2.2 Bitwise shifting to the right (from [3])

unsigned char x = 0x17;   // 23 in decimal
unsigned char y = x >> 1; // right shift by 1 bit
// now y will store 11 (0x0B)

And like before, you can also shift more bits at a time:

y = x >> 2; // y will store 5 (0x5)
y = x >> 3; // y will store 2 (0x2)

2.3 The sizeof operator

The sizeof operator can be used with any data type (or a variable, it doesn't matter) and it returns the amount of bytes needed to store values of such type. For example, what will the following commands print? Can you explain the output?

printf("sizeof(int)    = %d\n", sizeof (int));
printf("sizeof(char)   = %d\n", sizeof(char));
printf("sizeof(float)  = %d\n", sizeof(float));
printf("sizeof(double) = %d\n", sizeof(double));

This is a very important operator, as it can be used to know the correct data size for a given type. Note that this varies according to the language implementation and most importantly, it changes depending on the architecture (e.g. 32 bit vs 64 bit).

Exercise:

Using the bitwise logical and shifting operators, try to figure out how to implement the following operations. You must create a function for each one, and a main program to test the operations:

unsigned int clear(unsigned int val): clear all bits of the value
unsigned int setbit (unsigned int x, int bit): where bit must be a number between 0 and 31. The function should turn on the desired bit in x.
unsigned int clearbit (unsigned int x, int bit): stores 0 on the informed bit.
unsigned int invertBit (unsigned int x, int bit): toggles the bit value.
int testBit (unsigned int x, int bit): tests whether the bit is on or off. Actually, this function should just return the bit value as an integer.

2.4 Bit fields (partial words)

Let's suppose we have a struct that represent a colour through its three RGB components (Red, Green, Blue). Each compoent can store a value from 0 to 1023 (1024 unique values). As 1024 is 2¹⁰ we need 10 bits for each compoenent. This won't fit in a char. Also, depending on the application, you may be unable to store each component separately - and maybe, an int (4 bytes) for each component may be too much. So you decide to store the three components in a single unsigned int: Red (bits 0 to 9), Green (bits 10 to 19) and Blue (bits 20 to 29). And we still have two free bits...

Bom, como você faz para armazenar um valor, digamos 500 no Green?

Ou para imprimir o valor do Red ou do Blue?

A solução é extrair os dados desejados através de manipuladores de bits.

Por exemplo, o que fazem os comandos abaixo, assumindo que x tem um valor inteiro entre 0 e 1023, digamos, 500?

cor = cor & ~(0x3FF << 10) | (x << 10); // & tem precedência sobre | e ~, ~ tem precedência sobre ambos
int y = cor & 0x3FF;
int z = (cor >> 20) & 0x3FF;

Existem também construções na linguagem C++ mais específicas para campos de bits. Por exemplo:

struct Cor {
  // Pode especificar bits até o tamanho de um int
  // É transparente para o usuário como serão efetivamente alocados os campos
  unsigned int red: 10;    // 10 bits para red
  unsigned int green: 10;  // 10 bits para green
  unsigned int blue: 10;   // 10 bits para blue
};

...
Cor cor;
cor.green = 100;
cor.red = 900;
cor.blue = 320;