I have a string (or cstring) consisted of 24 bits "000101011110011110101110" that should be represented in hex (0x15e7ae).
As i understand bit string needs to be splitted on to 6 parts by 4 bits
"0001 0101 1110 0111 1010 1110" and then each part converted to hex
0001 -> 1
0101 -> 5
1110 -> e
0111 -> 7
1010 -> a
1110 -> e
So what are the simplest and cost effective ways to convert it to hex representation: 0x15e7ae?
There is also dilemma for me which string type is better to use String or char[]. String can be easily splitted using substring function, but i don't know how to convert string type to hex.
And contrariwise char[] can be easily converted to hex using strtoul function but i didn't find simple way to split char string.
Let's try some simple bit shifting.
std::string sample_str = "000101011110011110101110";
uint32_t result = 0;
for (unsigned int i = 0; i < sample_str.length(); ++i)
{
result = result << 1;
result = result | (sample_str[i] & 1);
}
There may be faster methods, but you would have to search the web for "bit twiddling string".
Background
This is based on the assumption that the character representation of zero has the least significant bit set to zero. Likewise the character representation of one has the least significant bit set to one.
The algorithm shifts the result left by one to make room for a new bit value.
Taking the character value and ANDing with 1 results in a value of zero for '0' and one for '1'. This result is ORed into the result value, to produce the correct value.
Try single stepping with a debugger to see how it works.
This is quite literally found in this link: StringConstructors
// using an int and a base (hexadecimal):
stringOne = String(45, HEX);
// prints "2d", which is the hexadecimal version of decimal 45:
Serial.println(stringOne);
const char* binary = "000101011110011110101110" ;
char hex[9] = "" ;
uint32_t integer = 0 ;
for( int i = 0; binary[i+1] != '\0'; i++ )
{
if( binary[i] == '1' )
{
integer |= 1 ;
}
integer <<= 1 ;
}
sprintf( hex, "0x%06x", integer ) ;
In C, this is quite simple. Use strtoumax(binary, 0, 2) to convert your binary string to an uintmax_t and then convert that to a hex string with sprintf or fprintf.
Related
I (think I) understand how the maths with different variable types works. For example, if I go over the max limit of an unsigned int variable, it will loop back to 0.
I don't understand the behavior of this code with unsigned char:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << var << '\n';
}
}
This just outputs 1...9, then some symbols and capital letters, and then it just doesn't print anything. It doesn't loop back to the values 1...9 etc.
On the other hand, if I cast to int before printing:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << (int)var << '\n';
}
}
It does print from 1...255 and then loops back from 0...255.
Why is that? It seems that the unsgined char variable does loop (as we can see from the int cast).
Is it safe to to maths with unsigned char variables? What is the behavior that I see here?
Why doesn't it print the expected integer value?
The issue is not with the looping of char. The issue is with the insertion operation for std::ostream objects and 8-bit integer types. The non-member operator<< functions for these types treat all 8-bit integers (char, signed char, and unsigned char) as their ASCII character types.
operator<<(std::basic_ostream)
The canonical way to handle outputing 8-bit integer types is the way you're doing it. I personally prefer this instead:
char foo;
std::cout << +foo;
The unary + operator promotes the char type to an integer type, which then causes the integer printing function to be called.
Note that integer overflow is only defined for unsigned integer types. If you repeat this with char or signed char, the behavior is undefined by the standard. SOMETHING will happen, for sure, because we live in reality, but that overflow behavior may differ from compiler to compiler.
Why doesn't it repeat the 0..9 characters
I tested this using g++ to compile, and bash on Ubuntu 20.04. My non-printable characters are handled as explicit symbols in some cases, or nothing printed in other cases. The non-repeating behavior must be due to how your shell handles these non-printable characters. We can't answer that without more information.
Unsigned chars aren't trated as numbers in this case. This data type is literally a byte:
1 byte = 8 bits = 0000 0000 which means 0.
What cout is printing is the character that represents that byte you changed by adding +1 to it.
For example:
0 = 0000 0000
1 = 0000 0001
2 = 0000 0010
.
.
.
9 = 0000 1001
Then, here start other chars that arent related to numbers.
So, if you cast it to int, it will give you the numeric representations of that byte, giving you a 0-255 output.
Hope this clarifies!
Edit: Made the explanation more clear.
Here is my code that takes a 4 character string of 1's and 0's and converts to decimal using bitset function. It is returning correct values for all combinations except those involving 11's and 10's like {1110,1010,1011,1111}. For these numbers its returning the result ignoring MSB. That is for 1010 its giving 2 as the answer.
#include<bits/stdc++.h>
using namespace std;
#define ul unsigned long
int main(int argc, char const *argv[])
{
int bin1=0,bin2=0,choice=0;
ul x1=0,x2=0;
//string binary;
cin>>bin1;
x1=bitset<4>(bin1).to_ulong();
cin>>bin2;
x2=bitset<4>(bin2).to_ulong();
cout<<x1<<" "<<x2<<endl;
return 0;
}
EDIT here is the snapshot of my results
Another snapshot of same program reading another set of input but this time it gives correct output. Btw the 1101 and 1001 are the inputs and the next two limes are the output
cout << bitset<4>(1010).to_ulong() << endl;
Prints 2.
cout << bitset<4>(0b1010).to_ulong() << endl;
Prints 10. (Note: binary literals were introduced in C++14)
From the cppreference documentation of std::bitset:
bitset( unsigned long val );
Constructs a bitset, initializing the first (rightmost, least significant) M bit positions to the corresponding bit values of val, where M is the smaller of the number of bits in an unsigned long long and the number of bits N in the bitset being constructed. [...]
1010's bit representation is not 0b1010, it's 0b1111110010 - which is too large for the bitset. That's why you're seeing unexpected results.
When you input e.g. 1010 then that is the decimal value 1010, which in binary is 1111110010.
This is the value you initialize the bitset with. Since the bitset only contains four bits, the four lowest bits, which are 0010, will be used.
Simple solution is to read the input as strings.
Bitset changes it's input into a binary represenation, so 0's and 1's
binary represenation of:
1010 -> 1111110010
1100 -> 10001001100
You're taking only 4 bits out of 32 (int usually has 4 bytes, 4*8 = 32 bits), so
1010 -> 0010 -> decimal 2
1100 -> 1100 -> decimal 12
You need the string overload of the bitset's constructor.
template<class CharT, class Traits, class Alloc>
explicit bitset( const std::basic_string<CharT,Traits,Alloc>& str,
typename std::basic_string<CharT,Traits,Alloc>::size_type pos = 0,
typename std::basic_string<CharT,Traits,Alloc>::size_type n =
std::basic_string<CharT,Traits,Alloc>::npos,
CharT zero = CharT('0'),
CharT one = CharT('1'));
From your use case looks like changing the type of bin1 and bin2 to std::string may just work.
I don't know what you expect but the code expresses itself clear enough. I extracted only the minimum required for the discussion:
int bin1=0;
cin>>bin1;
bin1 is int. You read an int from cin and that int is 1010. Which is one thousand and ten. No bits involved here.
In binary, 1010 (one thousand and ten) looks like this: 00000011 11110010
x1=bitset<4>(bin1).to_ulong();
When the bitset<4> is constructed using the value of bin1 only the 4 rightmost (least-significant) bits of bin1 are used. These bits are 0010 and they represent the number 2. The value of x1 is 2.
In a similar way, the value of bin2 read from cin is 1110 (one thousand, one hundred and ten), its binary representation is 00000100 01010110, its rightmost 4 bits are 0110 and they are the binary representation of the integer 6.
The code does what it's supposed to do; your expectations are incorrect.
Read about bitset::bitset(). It contains examples that should help you understand the difference.
So I'm using the following code to put an integer into a char[] or an unsigned char[]
(unsigned???) char test[12];
test[0] = (i >> 24) & 0xFF;
test[1] = (i >> 16) & 0xFF;
test[2] = (i >> 8) & 0xFF;
test[3] = (i >> 0) & 0xFF;
int j = test[3] + (test[2] << 8) + (test[1] << 16) + (test[0] << 24);
printf("Its value is...... %d", j);
When I use type unsigned char and value 1000000000 it prints correctly.
When I use type char (same value) I get 98315724 printed?
So, the question really is can anyone explain what the hell is going on??
Upon examining the binary for the two different numbers I still can't work out whats going on. I thought signed was when the MSB was set to 1 to indicate a negative value (but negative char? wth?)
I'm explicitly telling the buffer what to insert into it, and how to interpret the contents, so don't see why this could be happening.
I have included binary/hex below for clarity in what I examined.
11 1010 1001 1001 1100 1010 0000 0000 // Binary for 983157248
11 1011 1001 1010 1100 1010 0000 0000 // Binary for 1000000000
3 A 9 9 C A 0 0 // Hex for 983157248
3 B 9 A C A 0 0 // Hex for 1000000000
In addition to the answer by Kerrek SB please consider the following:
Computers (almost always) use something called twos-complement notation for negative numbers, with the high bit functioning as a 'negative' indicator. Ask yourself what happens when you perform shifts on a signed type considering that the computer will handle the signed bit specially.
You may want to read Why does left shift operation invoke Undefined Behaviour when the left side operand has negative value? right here on StackOverflow for a hint.
When you say i & 0xFF etc, you're creaing values in the range [0, 256). But (your) char has a range of [-128, +128), and so you cannot actually store those values sensibly (i.e. the behaviour is implementation defined and tedious to reason about).
Use unsigned char for unsigned values. The clue is in the name.
This all has to do with internal representation and the way each type uses that data to interpret it. In the internal representation of a signed character, the first bit of your byte holds the sign, the others, the value. when the first bit is 1, the number is negative, the following bits then represent the complement of the positive value. for example:
unsigned char c; // whose internal representation we will set at 1100 1011
c = (1 * 2^8) + (1 * 2^7) + (1 * 2^4) + (1 * 2^2) + (1 * 2^1);
cout << c; // will give 203
// inversely:
char d = c; // not unsigned
cout << d; // will print -53
// as if the first is 1, d is negative,
// and other bits complement of value its positive value
// 1100 1011 -> -(complement of 100 1011)
// the complement is an XOR +1 011 0101
// furthermore:
char e; // whose internal representation we will set at 011 0101
e = (1 * 2^6) + (1 * 2^5) + (1 * 3^2) + (1 * 2^1);
cout << e; // will print 53
How can I create a file that uses 4-bit encoding to represent integers 0-9 separated by a comma ('1111')? for example:
2,34,99 = 0010 1111 0011 0100 1111 1001 1001 => actually becomes without spaces
0010111100110100111110011001 = binary.txt
Therefore 0010111100110100111110011001 is what I see when I view the file ('binary.txt')in WINHEX in binary view but I would see 2,34,99 when view the file (binary.txt) in Notepad.
If not Notepad, is there another decoder that will do '4-bit encoding' or do I have a write a 'decoder program' to view the integers?
How can I do this in C++?
The basic idea of your format (4 bits per decimal digit) is well known and called BCD (Binary Coded Decimal). But I doubt the use of 0xF as an encoding for a coma is something well established and even more supported by notepad.
Writing a program in C++ to do the encoding and decoding would be quite easy. The only difficulty would be that the standard IO use byte as the more basic unit, not bit, so you'd have to group yourself the bits into a byte.
You can decode the files using od -tx1 if you have that (digits will show up as digits, commas will show up as f). You can also use xxd to go both directions; it comes with Vim. Use xxd -r -p to copy hex characters from stdin to a binary file on stdout, and xxd -p to go the other way. You can use sed or tr to change f back and forth to ,.
This is the simplest C++ 4-bit (BCD) encoding algorithm I could come up with - wouldn't call it exactly easy, but no rocket science either. Extracts one digit at a time by dividing and then adds them to the string:
#include <iostream>
int main() {
const unsigned int ints = 3;
unsigned int a[ints] = {2,34,99}; // these are the original ints
unsigned int bytes_per_int = 6;
char * result = new char[bytes_per_int * ints + 1];
// enough space for 11 digits per int plus comma, 8-bit chars
for (int j=0; j < bytes_per_int * ints; ++j)
{
result[j] = 0xFF; // fill with FF
}
result[bytes_per_int*ints] = 0; // null terminated string
unsigned int rpos = bytes_per_int * ints * 2; // result position, start from the end of result
int i = ints; // start from the end of the array too.
while (i != 0) {
--i;
unsigned int b = a[i];
while (b != 0) {
--rpos;
unsigned int digit = b % 10; // take the lowest decimal digit of b
if (rpos & 1) {
// odd rpos means we set the lowest bits of a char
result[(rpos >> 1)] = digit;
}
else {
// even rpos means we set the highest bits of a char
result[(rpos >> 1)] |= (digit << 4);
}
b /= 10; // make the next digit the new lowest digit
}
if (i != 0 || (rpos & 1))
{
// add the comma
--rpos;
if (rpos & 1) {
result[(rpos >> 1)] = 0x0F;
}
else {
result[(rpos >> 1)] |= 0xF0;
}
}
}
std::cout << result;
}
Trimming the bogus data left at the start portion of the result according to rpos will be left as an exercise for the reader.
The subproblem of BCD conversion has also been discussed before: Unsigned Integer to BCD conversion?
If you want a more efficient algorithm, here's a bunch of lecture slides with conversion from 8-bit ints to BCD: http://edda.csie.dyu.edu.tw/course/fpga/Binary2BCD.pdf
I am aware of the 2s complement representation of signed values. But how does binary '10000000' become -128 in decimal(using %d).
for +64 binary rep = '01000000' for -64 binary rep = '11000000' which is 2's complement of '01000000'
can some one please explain?
Program:
int main()
{
char ch = 1;
int count = 0;
while(count != 8)
{
printf("Before shift val of ch = %d,count=%d\n",ch,count);
ch = ch << 1;
printf("After shift val of ch = %d,count=%d\n",ch,count);
//printBinPattern(ch);
printf("*************************************\n");
count++;
}
return 0;
}
Output:
Before shift val of ch = 1, count=0
After shift val of ch = 2, count=0
*************************************
...
... /* Output not shown */
Before shift val of ch = 32, count=5
After shift val of ch = 64, count=5
*************************************
Before shift val of ch = 64, count=6
After shift val of ch = -128, count=6
*************************************
Before shift val of **ch = -128**, count=7
After shift val of ch = 0, count=7
*************************************
Before shift val of ch = 0, count=8
After shift val of ch = 0, count=8
*************************************
Because on your compiler, char means signed char.
Char is just a tiny integer, generally in the range of 0...255 (for unsigned char) or -128...127 (for signed char).
The means of converting a number to 2-complement negative is to "invert the bits and add 1"
128 = "1000 0000". Inverting the bits is "0111 1111". Adding 1 yields: "1000 0000"
I am aware of the 2s complement representation of signed values.
Well, obviously you aren't. A 1 followed by all 0s is always the smallest negative number.
The answer is implementation defined as the type of 'default char' is implementation defined.
$3.9.1/1
Objects declared as characters (char)
shall be large enough to store any
member of the implementation’s basic
character set. If a character from
this set is stored in a character
object, the integral value of that
character object is equal to the value
of the single character literal form
of that character. It is
implementationdefined whether a char
object can hold negative values.
Characters can be explicitly declared
unsigned or signed. Plain char, signed
char, and unsigned char are three
distinct types.
$5.8/1 -
"The operands shall be of integral or
enumeration type and integral
promotions are performed. The type of
the result is that of the promoted
left operand. The behavior is
undefined if the right operand is
negative, or greater than or equal to
the length in bits of the promoted
left operand."
So when the value of char becomes negative, left shift from thereon has undefined behavior.
That's how it works.
-1 = 1111 1111
-2 = 1111 1110
-3 = 1111 1101
-4 = 1111 1110
...
-126 = 1000 0010
-127 = 1000 0001
-128 = 1000 0000
Two's complement is exactly like unsigned binary representation with one slight change:
The MSB (bit n-1) is redefined to have a value of -2n-1 instead of 2n-1.
That's why the addition logic is unchanged: because all the other bits still have the same place value.
This also explains the underflow/overflow detection method, which involves checking the carry from bit (n-2) into bit (n-1).
There is a pretty simple process for converting from a negative two's complement integer value to it's positive equivalent.
0000 0001 ; The x = 1
1000 0000 ; x <<= 7
The two's complement process is two-steps... first, if the high-bit is 1, reverse all bits
0111 1111 ; (-) 127
then add 1
1000 0000 ; (-) 128
Supplying a char to a %d format specifier that expects an int is probably unwise.
Whether an unadorned char is signed or unsigned is implementation defined. In this case not only is it apparently signed, but also the char argument has been pushed on to the stack an an int sized object and sign extended so that the higher order bits are all set to the same value as the high order bit of the original char.
I am not sure whether this is defined behaviour or not without looking it up, but personally I'd have cast the char to an int when formatting it with %d. Not least because some compilers and static analysis tools will trap that error and issue a warning. GCC will do so when -Wformat is used for example.
That is the explanation, if you want a solution (i.e. one that prints 128 rather than -128) then you need to cast to unsigned and mask-off the sign extension bits as well as using a correctly matching format specifier:
printf("%u", (unsigned)ch & 0xff );