Function for decoding unsigned short value - c++

i have small problem with some task.
We conduct a survey on the subject. Result of a single survey (obtained from one respondent) provides the following information to be encoded in a variable of type unsigned short (it can be assumed that it is 2 bytes - 16 bits)
sex - 1 bit - 2 possibilities
marital status - 2 bits - 4 possibilities
Age - 2 bits - 4 possibilities
Education - 2 bits - 4 possibilities
City - 2 bits - 4 possibilities
region - 4 bits - 16 possibilities
answer - 3 bits - 8 possibilities
unsigned short coding(int sex, int marital_status, int age, int edu, int city, int region, int reply){
unsigned short result = 0;
result = result + sex;
result = result + ( marital_status << 1 );
result = result + ( age << 3);
result = result + ( edu << 5 );
result = result + ( city << 6 );
result = result + ( region << 11 );
result = result + ( reply << 13 );
return result;
}
Here it encodes the results (hope its correct), but I have no idea how to prepare function which will display informations, which i have encoded inside of unsigned short x.
First I have to encode it:
unsigned short x = coding(0, 3, 2, 3, 0, 12, 6);
then i need to prepare another function, which will decode informations from unsigned short x into this form:
info(x);
RESULT
sex: 0
martial status: 3
age: 2
education: 3
city: 0
region: 12
reply: 6
I will be grateful for your help, because I have no idea how to even get started and what to look for.
My question is if someone can check unsigned short coding function and help with with writing void info(unsigned short x).

you can use bit fields
struct survey_data
{
unsigned short sex : 1;
unsigned short marital_status : 2;
unsigned short age : 2;
unsigned short education : 2;
unsigned short city : 2;
unsigned short region : 4;
unsigned short answer : 3;
};
if you need to convert it between short, you can define a union like this
union survey
{
struct survey_data detail;
unsigned short s;
};
to use these types
struct survey_data sd;
sd.sex = 0;
sd.marital_status = 2;
...
unsigned short s = 0xCAFE;
union servey x;
x.s = s;
printf("Sex: %u, Age: %u", x.detail.sex, x.detail.age);
keep in mind the layout of bit fields is implementation defined; different compiler may interpret them in different order, e.g. in MSVC, it is lsb to msb; pelase refer to the compiler manual and c/c++ standard for details.

The solution is straightforward, and it's mostly text work. Transfer your data description
sex - 1 bit - 2 possibilities
marital status - 2 bits - 4 possibilities
Age - 2 bits - 4 possibilities
Education - 2 bits - 4 possibilities
City - 2 bits - 4 possibilities
region - 4 bits - 16 possibilities
answer - 3 bits - 8 possibilities
into this C/C++ structure:
struct Data {
unsigned sex: 1; // 2 possibilities
unsigned marital: 2; // 4 possibilities
unsigned Age: 2; // 4 possibilities
unsigned Education: 2; // 4 possibilities
unsigned City: 2; // 4 possibilities
unsigned region: 4; // 16 possibilities
unsigned answer: 3; // 8 possibilities
};
It's a standard use case for bit sets, which is even traditional C, but also available in each standard-conform C++ implementation.
Let's name your 16-bit encoded data type for storage store_t (from the several definitions in use we use the C standard header stdint.h):
#include <stdint.h>
typedef uint16_t store_t;
The example Data structure can be used for encoding:
/// create a compact storage type value from data
store_t encodeData(const Data& data) {
return *reinterpret_cast<const store_t*>(&data);
}
or decoding your data set:
/// retrieve data from a compact storage type
const Data decodeData(const store_t code) {
return *reinterpret_cast<const Data*>(&code);
}
you access the bitset structure Data like an ordinary structure:
Data data;
data.sex = 1;
data.marital = 0;

Related

Efficiently store multiple integral numbers in a single variable (and retrieve them)

Let's assume that we have up to 3 IDs (we don't know the number of IDs before, can be 0 to 3), each of them can consist of up to 4 digits, leading to numbers from 1 to 9999. Since these IDs are small enough, we want to store them in a single variable and not an array (for whatever reasons), such as a double. Here is an example of what I mean:
We have 3 IDs: ID_1 = 1234, ID_2 = 0987, and ID_3 = 6543. We pair these and obtain the following as a result: 654309871234
As we can see, the numbers are still there and could be obtained from it.
Something like this, we could accomplish by using the following code:
int maxDigits = 4;
double result = 1; // needed at the start, so that log10 works
int id[3] = {1234, 0987, 6543};
for (int i=0; i<3; ++i) {
int pos = log10(result) / maxDigits;
result += pow(10, pos * maxDigits + 1) * id;
}
This would have a slightly different outcome: result = 6543098712341, but we can still retrieve the information.
However, I don't think that this is an efficient way of handling this. Maybe one should rather operate in binaries and not decimals? What would be a better (more efficient) approach?
(The above-mentioned ID range mustn't be the same for other possible solutions.)
The IDs can go up to 9999, which requires 14 bits max. 3 integers of 14 bits each would require 42 bits max. You can easily store that in a 64-bit unsigned long or uint64_t with some manual bit-shifting, eg:
uint16_t get_id(uint64_t ids, uint8_t which)
{
return (ids >> (14 * (which & 3))) & 0x3FFF;
}
void set_id(uint64_t &ids, uint8_t which, uint16_t id)
{
uint64_t shift = 14 * (which & 3);
ids = (ids & ~((0x3FFF << shift))) | ((id & 0x3FFF) << shift);
}
uint64_t ids = 0;
set_id(ids, 0, 1234);
set_id(ids, 1, 0987);
set_id(ids, 2, 6543);
...
id1 = get_id(ids, 0);
id2 = get_id(ids, 1);
id3 = get_id(ids, 2);
Or, you can use bitfields to let the compiler handle the bit-shifting for you, eg:
struct s_ids
{
uint64_t id1: 14;
uint64_t id2: 14;
uint64_t id3: 14;
};
s_ids ids;
ids.id1 = 1234;
ids.id2 = 0987;
ids.id3 = 6543;
...
id1 = ids.id1;
id2 = ids.id2;
id3 = ids.id3;
Or, you could just use a normal struct of normal 16-bit integers and don't do any fancy bit-twiddling at all:
struct s_ids
{
uint16_t id1;
uint16_t id2;
uint16_t id3;
};
s_ids ids;
ids.id1 = 1234;
ids.id2 = 0987;
ids.id3 = 6543;
...
id1 = ids.id1;
id2 = ids.id2;
id3 = ids.id3;
You cannot store these values in an easy way in a double: If you remove the sign and the exponent bits from a double, you're left with 52 bits to store your value. However lg(52) = 15.65... (lg denoting the logarithm with a basis of 2 here). This means you'd be left with less than 4 decimal digits per ID which is insufficient to store the kind of info you want to store.
Instead use a unsigned integer type that's guaranteed to contain enough bits. 64 bit happens to be sufficient to store 4 ids containing 16 bits each (2^16 = 65536) and you can retrieve the ids efficiently using bit operations:
/**
* \param previous the combined ids storing new value
* \param index an index between 0 and 3 (inclusive) for the id to store
* \param value the id to store
* \return the new combined value after replacing the id at index with value
*/
uint64_t storeId(uint64_t previous, uint64_t index, uint64_t value)
{
return (previous ^ (0xffff << (index * 16))) | (value << (index * 16));
}
/**
* \return the id stored in value at index index
*/
uint64_t getId(uint64_t value, uint64_t index)
{
return (value >> (index * 16)) & 0xffff;
}

Split and casting address into different integers in Ada

To interface with a certain piece of hardware (in this case a TSS entry of an x86 GDT), it is required to use the following structure in memory:
type UInt32 is mod 2 ** 32;
type UInt16 is mod 2 ** 16;
type UInt8 is mod 2 ** 8;
type TSSEntry is record
Limit : UInt16;
BaseLow16 : UInt16;
BaseMid8 : UInt8;
Flags1 : UInt8;
Flags2 : UInt8;
BaseHigh8 : UInt8;
BaseUpper32 : UInt32;
Reserved : UInt32;
end record;
for TSSEntry use record
Limit at 0 range 0 .. 15;
BaseLow16 at 0 range 16 .. 31;
BaseMid8 at 0 range 32 .. 39;
Flags1 at 0 range 40 .. 47;
Flags2 at 0 range 48 .. 55;
BaseHigh8 at 0 range 56 .. 63;
BaseUpper32 at 0 range 64 .. 95;
Reserved at 0 range 96 .. 127;
end record;
for TSSEntry'Size use 128;
When translating some C code into Ada, I ran into several issues, and I could not find many resources online. the C snippet is:
TSSEntry tss;
void loadTSS(size_t address) {
tss.baseLow16 = (uint16_t)address;
tss.baseMid8 = (uint8_t)(address >> 16);
tss.flags1 = 0b10001001;
tss.flags2 = 0;
tss.baseHigh8 = (uint8_t)(address >> 24);
tss.baseUpper32 = (uint32_t)(address >> 32);
tss.reserved = 0;
}
This is the Ada code I tried to translate it to:
TSS : TSSEntry;
procedure loadTSS (Address : System.Address) is
begin
TSS.BaseLow16 := Address; -- How would I downcast this to fit in the 16 lower bits?
TSS.BaseMid8 := Shift_Right(Address, 16); -- Bitwise ops dont take System.Address + downcast
TSS.Flags1 := 2#10001001#;
TSS.Flags2 := 0;
TSS.BaseHigh8 := Shift_Right(Address, 24); -- Same as above
TSS.BaseUpper32 := Shift_Right(Address, 32); -- Same as above
TSS.Reserved := 0;
end loadTSS;
How would I be able to show the issues I highlighted in the code? Are there any resources a beginner can use for help in cases likes this? Thanks in advance!
Use the To_Integer function in the package System.Storage_Elements to convert the address into an integer, then convert that integer to Interfaces.Unsigned_32 or Unsigned_64 (whichever is appropriate) so that you can use the shift operations to extract bit-fields.
Instead of the shift and mask operations, you can of course use division and "mod" to pick the integer apart, without converting to the Interfaces types.

Manually changing a group of bytes in an unsigned int

I'm working with C and I'm trying to figure out how to change a set of bits in a 32-bit unsigned integer.
For example, if I have
int a = 17212403u;
In binary, that becomes 1000001101010001111110011. Now, supposing I labeled these bits, which are arranged in little-endian format, such that the bit utmost right represents the ones, the second to the right is the twos, and so on, how can I manually change a group of bits?
For example, suppose I wanted to change the bits such that the 11th bit to the 15th bit has the decimal value of 17. How would this be possible?
I was thinking of getting that range by doing as such:
unsigned int range = (a << (sizeof(a) * 8) - 14) >> (28)
But I'm not sure where to go on from now.
You will (1) first have to clear the bits 11..15, and (2) then to set the bits according to the value you want to set. To achieve (1), create a "mask" that has all bits set to 1 except the ones that you want to clear; use then a & bitMask to set the bits to 0. Then, use | myValue to set the bits to the value wanted.
Use the bit shift operator << to place the mask and the value at the right positions:
int main(int argc, char** argv) {
// Let's assume a range of 5 bits
unsigned int bitRange = 0x1Fu; // is ...00000000011111
// Let's assume to position the range from bit 11 onwards (i.e. move 10 left):
bitRange = bitRange << 10; // something like 000000111110000000000
unsigned int bitMask = ~bitRange; // something like 111111000001111111111
unsigned int valueToSet = (17u << 10); // corresponds to 000000101110000000000
unsigned int a = (17212403u & bitMask) | valueToSet;
return 0;
}
This is the long version to explain what's going on. In brief, you could also write:
unsigned int a = (17212403u & ~(0x1Fu << 10)) | (17u << 10)
The 11th to 15th bit is 5 bits, assuming you meant including the 15th bit. 5 bits is the hex value: 0x1f
Then you shift these 5 bits 11 position to the left:0x1f << 11
Now we have a mask for the bits 11 through 15 that we want to clear in the original variable, which - we do that by inverting the mask, bitwise and the variable with the inverted mask: a & ~(0x1f << 11)
Next is shifting the value 17 up to the 11th bit: 17 << 11
Then we bitwise or that into the 5 bits we have cleared:
unsigned int b = (a & ~(0x1f << 11)) | (17 << 11)
Consider using bit fields. This allows you to name and access sub-sections of the integer as though they were integer members of a struct.
For info on C bitfields see:
https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm
Below is code to do what you want, using bitfields. The "middle5" member of the struct holds bits 11-15. The "lower11" member is a filler for the lower 11 bits, so that the "middle5" member will be in the right place.
#include <stdio.h>
void showBits(unsigned int w)
{
unsigned int bit = 1<<31;
while (bit > 0)
{
printf("%d", ((bit & w) != 0)? 1 : 0);
bit >>= 1;
}
printf("\n");
}
int main(int argc, char* argv[])
{
struct aBitfield {
unsigned int lower11: 11;
unsigned int middle5: 5;
unsigned int upper16: 16;
};
union uintBits {
unsigned int whole;
struct aBitfield parts;
};
union uintBits b;
b.whole = 17212403u;
printf("Before:\n");
showBits(b.whole);
b.parts.middle5 = 17;
printf("After:\n");
showBits(b.whole);
}
Output of the program:
Before:
00000001000001101010001111110011
After:
00000001000001101000101111110011
Of course, you would want to use more meaningful naming for the various fields.
Be careful though, bitfields may be implemented differently on different platforms - so it may not be completely portable.

Get Integer From Bits Inside `std::vector<char>`

I have a vector<char> and I want to be able to get an unsigned integer from a range of bits within the vector. E.g.
And I can't seem to be able to write the correct operations to get the desired output. My intended algorithm goes like this:
& the first byte with (0xff >> unused bits in byte on the left)
<< the result left the number of output bytes * number of bits in a byte
| this with the final output
For each subsequent byte:
<< left by the (byte width - index) * bits per byte
| this byte with the final output
| the final byte (not shifted) with the final output
>> the final output by the number of unused bits in the byte on the right
And here is my attempt at coding it, which does not give the correct result:
#include <vector>
#include <iostream>
#include <cstdint>
#include <bitset>
template<class byte_type = char>
class BitValues {
private:
std::vector<byte_type> bytes;
public:
static const auto bits_per_byte = 8;
BitValues(std::vector<byte_type> bytes) : bytes(bytes) {
}
template<class return_type>
return_type get_bits(int start, int end) {
auto byte_start = (start - (start % bits_per_byte)) / bits_per_byte;
auto byte_end = (end - (end % bits_per_byte)) / bits_per_byte;
auto byte_width = byte_end - byte_start;
return_type value = 0;
unsigned char first = bytes[byte_start];
first &= (0xff >> start % 8);
return_type first_wide = first;
first_wide <<= byte_width;
value |= first_wide;
for(auto byte_i = byte_start + 1; byte_i <= byte_end; byte_i++) {
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
unsigned char next_thin = bytes[byte_i];
return_type next_byte = next_thin;
next_byte <<= byte_offset;
value |= next_byte;
}
value >>= (((byte_end + 1) * bits_per_byte) - end) % bits_per_byte;
return value;
}
};
int main() {
BitValues<char> bits(std::vector<char>({'\x78', '\xDA', '\x05', '\x5F', '\x8A', '\xF1', '\x0F', '\xA0'}));
std::cout << bits.get_bits<unsigned>(15, 29) << "\n";
return 0;
}
(In action: http://coliru.stacked-crooked.com/a/261d32875fcf2dc0)
I just can't seem to wrap my head around these bit manipulations, and I find debugging very difficult! If anyone can correct the above code, or help me in any way, it would be much appreciated!
Edit:
My bytes are 8 bits long
The integer to return could be 8,16,32 or 64 bits wside
The integer is stored in big endian
You made two primary mistakes. The first is here:
first_wide <<= byte_width;
You should be shifting by a bit count, not a byte count. Corrected code is:
first_wide <<= byte_width * bits_per_byte;
The second mistake is here:
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
It should be
auto byte_offset = (byte_end - byte_i) * bits_per_byte;
The value in parenthesis needs to be the number of bytes to shift right by, which is also the number of bytes byte_i is away from the end. The value byte_width - byte_i has no semantic meaning (one is a delta, the other is an index)
The rest of the code is fine. Though, this algorithm has two issues with it.
First, when using your result type to accumulate bits, you assume you have room on the left to spare. This isn't the case if there are set bits near the right boundry and the choice of range causes the bits to be shifted out. For example, try running
bits.get_bits<uint16_t>(11, 27);
You'll get the result 42 which corresponds to the bit string 00000000 00101010 The correct result is 53290 with the bit string 11010000 00101010. Notice how the rightmost 4 bits got zeroed out. This is because you start off by overshifting your value variable, causing those four bits to be shifted out of the variable. When shifting back at the end, this results in the bits being zeroed out.
The second problem has to do with the right shift at the end. If the rightmost bit of the value variable happens to be a 1 before the right shift at the end, and the template parameter is a signed type, then the right shift that is done is an 'arithmetic' right shift, which causes bits on the right to be 1-filled, leaving you with an incorrect negative value.
Example, try running:
bits.get_bits<int16_t>(5, 21);
The expected result should be 6976 with the bit string 00011011 01000000, but the current implementation returns -1216 with the bit string 11111011 01000000.
I've put my implementation of this below which builds the bit string from the right to the left, placing bits in their correct positions to start with so that the above two problems are avoided:
template<class ReturnType>
ReturnType get_bits(int start, int end) {
int max_bits = kBitsPerByte * sizeof(ReturnType);
if (end - start > max_bits) {
start = end - max_bits;
}
int inclusive_end = end - 1;
int byte_start = start / kBitsPerByte;
int byte_end = inclusive_end / kBitsPerByte;
// Put in the partial-byte on the right
uint8_t first = bytes_[byte_end];
int bit_offset = (inclusive_end % kBitsPerByte);
first >>= 7 - bit_offset;
bit_offset += 1;
ReturnType ret = 0 | first;
// Add the rest of the bytes
for (int i = byte_end - 1; i >= byte_start; i--) {
ReturnType tmp = (uint8_t) bytes_[i];
tmp <<= bit_offset;
ret |= tmp;
bit_offset += kBitsPerByte;
}
// Mask out the partial byte on the left
int shift_amt = (end - start);
if (shift_amt < max_bits) {
ReturnType mask = (1 << shift_amt) - 1;
ret &= mask;
}
}
There is one thing you certainly missed I think: the way you index the bits in the vector is different from what you have been given in the problem. I.e. with algorithm you outlined, the order of the bits will be like 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8 | 23 22 21 .... Frankly, I didn't read through your whole algorithm, but this one was missed in the very first step.
Interesting problem. I've done similar, for some systems work.
Your char is 8 bits wide? Or 16? How big is your integer? 32 or 64?
Ignore the vector complexity for a minute.
Think about it as just an array of bits.
How many bits do you have? You have 8*number of chars
You need to calculate a starting char, number of bits to extract, ending char, number of bits there, and number of chars in the middle.
You will need bitwise-and & for the first partial char
you will need bitwise-and & for the last partial char
you will need left-shift << (or right-shift >>), depending upon which order you start from
what is the endian-ness of your Integer?
At some point you will calculate an index into your array that is bitindex/char_bit_width, you gave the value 171 as your bitindex, and 8 as your char_bit_width, so you will end up with these useful values calculated:
171/8 = 23 //location of first byte
171%8 = 3 //bits in first char/byte
8 - 171%8 = 5 //bits in last char/byte
sizeof(integer) = 4
sizeof(integer) + ( (171%8)>0?1:0 ) // how many array positions to examine
Some assembly required...

convert 4 bytes to 3 bytes in C++

I have a requirement, where 3 bytes (24 bits) need to be populated in a binary protocol. The original value is stored in an int (32 bits). One way to achieve this would be as follows:-
Technique1:-
long x = 24;
long y = htonl(x);
long z = y>>8;
memcpy(dest, z, 3);
Please let me know if above is the correct way to do it?
The other way, which i dont understand was implemented as below
Technique2:-
typedef struct {
char data1;
char data2[3];
} some_data;
typedef union {
long original_data;
some_data data;
} combined_data;
long x = 24;
combined_data somedata;
somedata.original_data = htonl(x);
memcpy(dest, &combined_data.data.data2, 3);
What i dont understand is, how did the 3 bytes end up in combined_data.data.data2 as opposed to first byte should go into combined_data.data.data1 and next 2 bytes should go into
combined_data.data.data2?
This is x86_64 platform running 2.6.x linux and gcc.
PARTIALLY SOLVED:-
On x86_64 platform, memory is addressed from right to left. So a variable of type long with value 24, will have following memory representation
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0 0 0 0x18
With htonl() performed on above long type, the memory becomes
|--Byte4--|--Byte3--|--Byte2--|--Byte1--|
0x18 0 0 0
In the struct some_data, the
data1 = Byte1
data2[0] = Byte2
data2[1] = Byte3
data4[2] = Byte4
But my Question still holds, Why not simply right shift by 8 as shown in technique 1 ?
A byte takes 8 bits :-)
int x = 24;
int y = x<<8;
moving by 0 you are changing nothing. By 1 - *2, by 2 - *4, by 8 - *256.
if we are on the BIG ENDIAN machine, 4 bytes are put in memory as so: 2143. And such algorythms won't work for numbers greater than 2^15. On the other way, on the BIG ENDIAN machine you should define, what means " putting integer in 3 bytes"
Hmm. I think, the second proposed algorythm will be ok, but change the order of bytes:
You have them as 2143. You need 321, I think. But better check it.
Edit: I checked on wiki - x86 is little endian, they say, so algorythms are OK