From the gdalgorithms mailing list:
      > I believe the cannonical range-expanding trick is "duplicate the last bits";
      > in the case of nibble,  just duplicate it, x | (x << 4). This still doesn't
      > give you 128, though...  

No, that's not the canonical range expanding trick. It's a horrible hack that leads to very irregular distribution of values. The canonical trick is to duplicate the FIRST bits. Which, in the case of a nybble, is the same as multiplying by 17. In fact, for the special case of going nybble->byte, both methods yield the same results, but that's an unfortunate coincidence.

The correct formula go go from Ni bits to No bits of unsigned pixel values (or sound sample values, for that matter), where No > Ni and No <=2*Ni, is:
out = (in << (No-Ni)) | (in >> (2*Ni-No));

If you really want to represent 128, then you can multiply by 16, but then you really should not use the 0 value at all, because it adds "DC offset" to your signal -- the question is whether you treat 255 as "logical one" or 256 as "logical one". 256 as logical one means you can represent zero, but you can't represent one. Sucks to do fixed-point twos-complement math, doesn't it ;-)

Illustration of why replicating the lower bits is bad:

Suppose we have the values 0, 1, 2, 3, 4, 5, ... 63 and want to add two bits of value (range extend to 255):

input   lowbits  ds(low)   highbits  ds(high)
  0        0                  0
  1        5       5          4         4
  2       10       5          8         4
  3       15       5         12         4
  4       16       1         16         4
  5       21       5         20         4

 15       63                 60
 16       64       1         65         5
 17       69       5         69         4

 58      234                235
 59      239       5        239         4
 60      240       1        243         4
 61      245       5        247         4
 62      250       5        251         4
 63      255       5        255         4
 

Clearly, the "Lowbits" mechanism causes the jump between each successive step to be highly irregular, whereas the "highbits" mechanism gives you as similar steps as you can get, given the limited precision.