'In computers 32-bit or 64-bit processors are used, why not 40-bit or other numbers?

For example, in case of 32-bit processors, a word is 4-byte long. Is it also possible to use 5-byte word or others?



Solution 1:[1]

Is it also possible to use 5-byte word or others

Yes. You can even use just a few bits instead of the whole byte/word through bit fields. Technically compilers can support any integer sizes on any architectures, like a 12-bit, 30-bit or 96-bit int on a 16-bit computer. In fact Clang has just got a new extension for integers with arbitrary bit width called _ExtInt. See also

why not 40-bit or other numbers?

Performance would suffer a lot. We need more instructions to deal with non-native integer sizes. If the size is smaller than a word then compilers need to emit bitwise instructions to mask out the remaining bits. In the reverse case we need multiple instructions to work on multiple words

Another important thing is misalignment. Modern CPUs work more efficiently when a variable's address is a multiple of its size, or at least a multiple of the data bus width/word size. Working with an odd-sized variable is just awkward


That said, there exist many 32-bit architectures with a 40-bit type, like the TI C6000 or the TI C5500 DSPs

long is

  • 40 bits or 5 bytes for C6000 COFF

[...] and C5500 has 40-bit long long

C89 Support in TI Compilers

It's because those DSPs have a special 40-bit accumulator to allow adding 32-bit numbers 256 times without overflow. But why don't just use a 64-bit accumulator? That'll need a much bigger ALU, require more power and run slower (because bigger area means longer distance between hardware components, and operating on big numbers are slower than on smaller ones) which is unacceptable in a DSP which is designed for performance (and possibly also power)

... For instance, the Texas Instruments TMS320C6000, a DSP processor, uses 32 bits to represent the type int and 40 bits to represent the type long (this choice is not uncommon). Those processors (usually DSP) that use 24 bits to represent the type int, often use 48 bits to represent the type long. The use of 24/48 bit integer type representations can be driven by application requirements where a 32/64-bit integer type representation are not cost effective.

The New C Standard (Excerpted material): An Economic and Cultural Commentary

In fact, 40-bit int is very common among DSPs (other examples being Blackfin and SHARC). SHARC even has an 80-bit accumulator so you can add a lot of 64-bit values without worrying about overflow


But you don't need a special architecture or special compiler support for that. You can still use a 40-bit variable if you really need to, for example when you work with a huge array where 64-bit integers would make it too large to fit in main memory and fewer items would fit in the cache. The simplest way to do that is to disable alignment with #pragma pack or __attribute__((packed))

struct int40_t {
    int64_t : 40;
} __attribute__((packed));
int40_t myArray[200] __attribute__((packed));

or access the values from a simple char array (based on cmm's solution)

unsigned char _5byteInts[5*size+3];  // +3 avoids overfetch of last element

int64_t get5byteInt(size_t index)
{
    int64_t v = 0;
    memcpy(&v, &_5byteInts[index*5], 5);        // little endian
    return (v << 24) >> 24;
}

void set5byteInt(size_t index, int64_t value)
{
    memcpy(&_5byteInts[index*5], &value, 5);    // little endian
}

But those will result in bad performance on architectures without unaligned access. You can pack four 40-bit ints to a single 20-byte struct to have better alignment

struct four_int40s {
    uint32_t low[4];
    uint8_t hi[4];
};

Solution 2:[2]

Historically, there have been a few computers with word sizes not a power of two, as in this Table of word sizes. However, eventually people discovered that address arithmetic is much easier to implement when the address size is a power of two.

Consider an operation such as "jump forward by 14 words". If word size is a power of two, say 64, then the circuit needs to shift the number 14 by log(64)/log(2)=6 and add to ip, and that can easily be done in 1 cycle. If, however, the word size is 36, as in IBM 701, then the number 14 would have to be multiplied by 36, and that would take more cycles. Given that multiplying an integer by word size is a very common operation the slowdown would be significant.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Michael