Padded bit representation in Java - bit-manipulation

I have a ASCII encoded string that looks like 3030. Each character in the ascii string needs to be converted into a 4 bit sequence and concatenated together to form a 16 bit sequence, with 4 bit padding.
For eg: 3030 should be converted into
0011 0000 0011 0000
(Spaces added for readability).
I'm aware that we can cast each character to a byte and and do String format operations to get the binary representation as a string. But I would want to retain the binary format because I want to do further masking on it.
Is there a way to get this byte output in java?

byte chartodecimal(char x) {
if(x >= '0' || x <= '9') { return (byte)((byte)x - (byte)'0'); }
throw new Exception("Not a decimal digit");
}
byte[] tobcd(String s) {
int result_len = (s.length() + 1) / 2;
byte[] result = new byte[result_len];
for (int i = s.length() % 2, j = 0; i < result_len; i++, j += 2) {
result[i] = (byte)(chartodecimal(s[j]) << 4 | chartodecimal(s[j + 1]));
}
return result;
}
Usual caveat: May or may not work, may do something different from what you actually want.

Related

Trouble understanding Caesar decryption steps

The following code will decrypt a caesar encrypted string given the ciphertext and the key:
#include <iostream>
std::string decrypt(std::string cipher, int key) {
std::string d = "";
for(int i=0; i<cipher.length();i++) {
d += ((cipher[i]-65-key+26) %26)+65;
}
return d;
}
int main()
{
std::cout << decrypt("WKLVLVJRRG", 3) << std::endl; // THISISGOOD
std::cout << decrypt("NBCMCMAIIX", 20) << std::endl; // THISISGOOD
}
I'm having trouble to understand the operations performed to compute the new character ASCII code at this line:
d += ((cipher[i]-65-key+26) %26)+65;
The first subtraction should shift the number range
Then we will subtract the key as how the Caesar decryption is defined
We add 26 to deal with negative numbers (?)
The module will limit the output as the range of the ASCII numbers is 26 length
We come back to the old range by adding 65 at the end
What am I missing?
If we reorder the expression slightly, like this:
d += (((cipher[i] - 65) + (26 - key)) % 26) + 65;
We get a formula for rotating cipher[i] left by key:
cipher[i] - 65 brings the ASCII range A..Z into an integer range 0..25
(cipher[i] - 65 + 26 - key) % 26 rotates that value left by key (subtracts key modulo 26)
+ 65 to shift the range 0..25 back into ASCII range A..Z.
e.g. given a key of 2, A becomes Y, B becomes Z, C becomes A, etc.
Let me give you a detailed explanation about Caesar Cipher for understanding that formular. I will also show ultra simple code examples, but also more advanced one liners.
The biggest problems are potential overflows. So, we need to deal with that.
Then we need to understand what Encryption and decryption means. If encryption will shift everthing one to the right, decryption will shift it back to left again.
So, with "def" and key=1, the encrpyted string will be "efg".
And decrpytion with key=1, will shift it to left again. Result: "def"
We can observe that we simply need to shift by -1, so the negative of the key.
So, basically encryption and decryption can be done with the same routine. We just need to invert the keys.
Let us look now at the overflow problematic. For the moment we will start with uppercase characters only. Characters have an associated code. For example, in ASCII, the letter 'A' is encoded with 65, 'B' with 66 and so on. Because we do not want to calculate with such number, we normalize them. We simply subtract 'A' from each character. Then
'A' - 'A' = 0
'B' - 'A' = 1
'C' - 'A' = 2
'D' - 'A' = 3
You see the pattern. If we want to encrypt now the letter 'C' with key 3, we can do the following.
'C' - 'A' + 3 = 5 Then we add again 'A' to get back the letter and we will get 5 + 'A' = 'F'
That is the whole magic.
But what to do with an overflow, beyond 'Z'. This can be handled by a simple modulo division.
Let us look at 'Z' + 1. We do 'Z' - 'A' = 25, then +1 = 26 and now modulo 26 = 0 then plus 'A' will be 'A'
And so on and so on. The resulting Formula is: (c-'A'+key)%26+'A'
Next, what with negative keys? This is also simple. Assume an 'A' and key=-1
Result will be a 'Z'. But this is the same as shifting 25 to the right. So, we can simply convert a negative key to a positive shift. The simple statement will be:
if (key < 0) key = (26 + (key % 26)) % 26;
And then we can call our tranformation function with a simple Lambda. One function for encryption and decrytion. Just with an inverted key.
And with the above formular, there is even no need to check for a negative values. It will work for positive and negative values.
So, key = (26 + (key % 26)) % 26; will always work.
Some extended information, if you work with ASCII character representation. Please have a look at any ASCII table. You will see that any uppercase and lowercase character differ by 32. Or, if you look in binary:
char dez bin char dez bin
'A' 65 0100 0001 'a' 97 0110 0001
'B' 66 0100 0010 'b' 98 0110 0010
'C' 67 0100 0011 'b' 99 0110 0011
. . .
So, if you already know that a character is alpha, then the only difference between upper- and lowercase is bit number 5. If we want to know, if char is lowercase, we can get this by masking this bit. c & 0b0010 0000 that is equal to c & 32 or c & 0x20.
If we want to operater on either uppercase or lowercase characters, the we can mask the "case" away. With c & 0b00011111 or c & 31 or c & 0x1F we will get always equivalents for uppercase charcters, already normalized to start with one.
char dez bin Masking char dez bin Masking
'A' 65 0100 0001 & 0x1b = 1 'a' 97 0110 0001 & 0x1b = 1
'B' 66 0100 0010 & 0x1b = 2 'b' 98 0110 0010 & 0x1b = 2
'C' 67 0100 0011 & 0x1b = 3 'b' 99 0110 0011 & 0x1b = 3
. . .
So, if we use an alpha character, mask it, and subtract 1, then we get as a result 0..25 for any upper- or lowercase character.
Additionally, I would like tor repeat the key handling. Positive keys will encrypt a string, negative keys will decrypt a string. But, as said above, negative keys can be transormed into positive ones. Example:
Shifting by -1 is same as shifting by +25
Shifting by -2 is same as shifting by +24
Shifting by -3 is same as shifting by +23
Shifting by -4 is same as shifting by +22
So,it is very obvious that we can calculate an always positive key by: 26 + key. For negative keys, this will give us the above offsets.
And for positve keys, we would have an overflow over 26, which we can elimiate by a modulo 26 division:
'A'--> 0 + 26 = 26 26 % 26 = 0
'B'--> 1 + 26 = 27 27 % 26 = 1
'C'--> 2 + 26 = 28 28 % 26 = 2
'D'--> 3 + 26 = 29 29 % 26 = 3
--> (c + key) % 26 will eliminate overflows and result in the correct new en/decryptd character.
And, if we combine this with the above wisdom for negative keys, we can write: ((26+(key%26))%26) which will work for all positive and negative keys.
Combining that with that masking, could give us the following program:
const char potentialLowerCaseIndicator = c & 0x20;
const char upperOrLower = c & 0x1F;
const char normalized = upperOrLower - 1;
const int withOffset = normalized + ((26+(key%26))%26);
const int withOverflowCompensation = withOffset % 26;
const char newUpperCaseCharacter = (char)withOverflowCompensation + 'A';
const char result = newUpperCaseCharacter | (potentialLowerCaseIndicator );
Of course, all the above many statements can be converted into one Lambda:
#include <string>
#include <algorithm>
#include <cctype>
#include <iostream>
// Simple function for Caesar encyption/decyption
std::string caesar(const std::string& in, int key) {
std::string res(in.size(), ' ');
std::transform(in.begin(), in.end(), res.begin(), [&](char c) {return std::isalpha(c) ? (char)((((c & 31) - 1 + ((26 + (key % 26)) % 26)) % 26 + 65) | (c & 32)) : c; });
return res;
}
int main() {
std::string test{ "aBcDeF xYzZ" };
std::cout << caesar(test, 5);
}
The last function can also be made more verbose for easier understanding:
std::string caesar1(const std::string& in, int key) {
std::string res(in.size(), ' ');
auto convert = [&](const char c) -> char {
char result = c;
if (std::isalpha(c)) {
// Handling of a negative key (Shift to left). Key will be converted to positive value
if (key < 0) {
// limit the key to 0,-1,...,-25
key = key % 26;
// Key was negative: Now we have someting between 0 and 26
key = 26 + key;
};
// Check and remember if the original character was lower case
const bool originalIsLower = std::islower(c);
// We want towork with uppercase only
const char upperCaseChar = (char)std::toupper(c);
// But, we want to start with 0 and not with 'A' (65)
const int normalized = upperCaseChar - 'A';
// Now add the key
const int shifted = normalized + key;
// Addition result maybe bigger then 25, so overflow. Cap it
const int capped = shifted % 26;
// Get back a character
const char convertedUppcase = (char)capped + 'A';
// And set back the original case
result = originalIsLower ? (char)std::tolower(convertedUppcase) : convertedUppcase;
}
return result;
};
std::transform(in.begin(), in.end(), res.begin(), convert);
return res;
}
And if you want to see a solution with only the simplest statements, then see the below.
#include <iostream>
#include <string>
using namespace std;
string caesar(string in, int key) {
// Here we will store the resulting encrypted/decrypted string
string result{};
// Handling of a negative key (Shift to left). Key will be converted to positive value
if (key < 0) {
// limit the key to 0,-1,...,-25
key = key % 26;
// Key was negative: Now we have someting between 0 and 26
key = 26 + key;
};
// Read character by character from the string
for (unsigned int i = 0; i < in.length(); ++i) {
char c = in[i];
// CHeck for alpha character
if ((c >= 'A' and c <= 'Z') or (c >= 'a' and c <= 'z')) {
// Check and remember if the original character was lower case
bool originalIsLower = (c >= 'a' and c <= 'z');
// We want to work with uppercase only
char upperCaseChar = originalIsLower ? c - ('a' - 'A') : c;
// But, we want to start with 0 and not with 'A' (65)
int normalized = upperCaseChar - 'A';
// Now add the key
int shifted = normalized + key;
// Addition result maybe bigger then 25, so overflow. Cap it
int capped = shifted % 26;
// Get back a character
char convertedUppcase = (char)capped + 'A';
// And set back the original case
result += originalIsLower ? convertedUppcase + ('a' - 'A') : convertedUppcase;
}
else
result += c;
}
return result;
}
int main() {
string test{ "aBcDeF xYzZ" };
string encrypted = caesar(test, 5);
string decrypted = caesar(encrypted, -5);
cout << "Original: " << test << '\n';
cout << "Encrpyted: " << encrypted << '\n';
cout << "Decrpyted: " << decrypted << '\n';
}

Ascii to Decimal In Memory, Would there be a bitwise shift answer?

My question:
I would like to change the Ascii (hex) in memory to a Decimal Value by shifting or any other way you know how?
I would like a function to assign the memory as follows:
From (Input):
Example Memory: 32 35 38 00 (Ascii 258)
Example Pointer: +var 0x0057b730 "258" const char *
To (Output)(The ANSWER I am looking for):
Example Memory: 02 01 00 00
Example Pointer: &var 0x0040f9c0 {258} int *
Example Int: var 258 int
This function will (NOT) produce my answer above:
This function will produce a Decimal (600) answer and a Hex(258) answer.
int Utility::AsciiToHex_4B(void* buff)
{
int result = 0, i = 0;
char cWork = 0;
if (buff != NULL)
{
for (i = 0; i <= 3; i++)
{
cWork = *((BYTE*)buff + i);
if (cWork != NULL)
{
if (cWork >= '0' && cWork <= '9')
cWork -= '0';
else if (cWork >= 'a' && cWork <= 'f')
cWork = cWork - 'a' + 10;
else if (cWork >= 'A' && cWork <= 'F')
cWork = cWork - 'A' + 10;
result = (result << 4) | cWork & 0xF;
}
}
return result; // :) Good
}
return result; // :( Bad
}
I've seen a lot of answers and questions about changing Ascii To Int or Ascii To Hex or even Ascii to Decimal and none of them answer the question above.
Thanks for any help you may provide.
"I would like to change the Ascii (hex) in memory to a Decimal Value by shifting.."
No, shifting won't help you here.
"...or any other way you know how?"
Yes as you say there are questions already answering that.
(in short you need to replace your shift operation with adding cWork times the correct base ten (1,10,100) and get it right with endianess. But just use an existing answer.)
First of all, for the computer decimal and hex make no difference as the number is store in binary format anyway and it is presented to the user as a needed by different print functions. If I understood your problem correctly, this should simplify your life since you need to convert the c-string only to one of the two formats internally. You can then display the number in decimal or hex format as the client desires.
Normally, when I do those things by myself, I convert a string to a decimal variable working from the units up to the higher order numbers:
char* str="258";
uint8_t str_len=3;
uint16_t num=0;
for(uint8_t i=str_len-1;i>=0;--i)
{
uint16_t val=str[i]-'0'; //convert value
uint16_t mult=10*st_len-i+1; //first round multiplier is 0, you could use base 16 instead of base 10 but I found it more laborious
num+=val*(mult==0? 1 : mult); //multiply the value by 1, 10 ... this is your decimal shift
}
Please take the above untested code just as a reference for a solution, it can be done in a much better and more compact way.
Once you have the number in binary format you can manipulate it. You can divide it by 16 (mind the remainders) to obtain an hexadecimal representation of the same quantity
Finally, you can convert it back to to string as follows:
for(uint16_t i=str_len-1; num>0 ; num= num/10, --i)
{
CHAR8 n = num % 10+'0'; //converts a decimal number to a decimal string, use base 16 for the hex
char_buffer[i]=n;
}
You could achieve a similar result also with atoi and similar, which have a lot of side effects in case of failed conversion. Left/right shifting might not help you as much, this operation is like elevating a number to a power of two (or taking the log2, for a right shift) with an as larger exponent as the number of shifts. I.e. unit8_t n = 1<<3 is like doing 2^3 and I don't think that pointer address is relevant for you.
Hope this suggestion can guide you forward
You pretty much got the correct algorithm. You can left shift with 4 or multiply with 16, same thing. But you need to do this for every byte except the last one. One way to fix that is to set result to 0, then each time in the loop, do the shift/multiply of the result before you add something new.
The parameter should be an array of char, const qualified since the function should not modify it.
Corrected code:
#include <stdint.h>
#include <stdio.h>
unsigned int AsciiToHex (const char* buf)
{
unsigned int result = 0;
for (int i = 0; i<4 && buf[i]!='\0'; i++)
{
result*=16;
if(buf[i] >= '0' && buf[i] <= '9')
{
result += buf[i] - '0';
}
else if(buf[i] >= 'a' && buf[i] <= 'f')
{
result += buf[i] - 'a' + 0xA;
}
else if(buf[i] >= 'A' && buf[i] <= 'F')
{
result += buf[i] - 'A' + 0xA;
}
else
{
// error handling here
}
}
return result;
}
int main (void)
{
_Static_assert('Z'-'A' == 25, "Crap/EBCDIC not supported.");
printf("%.4X\n", AsciiToHex("1234"));
printf("%.4X\n", AsciiToHex("007"));
printf("%.4X\n", AsciiToHex("ABBA"));
return 0;
}
Function that could have been useful here is isxdigit and toupper from ctype.h, check them out.
Since the C standard does in theory not guarantee that letters are adjacent in the symbol table, I added a static assert to weed out crap systems.

Best practice to represent bit string in hex (Arduino)

I have a string (or cstring) consisted of 24 bits "000101011110011110101110" that should be represented in hex (0x15e7ae).
As i understand bit string needs to be splitted on to 6 parts by 4 bits
"0001 0101 1110 0111 1010 1110" and then each part converted to hex
0001 -> 1
0101 -> 5
1110 -> e
0111 -> 7
1010 -> a
1110 -> e
So what are the simplest and cost effective ways to convert it to hex representation: 0x15e7ae?
There is also dilemma for me which string type is better to use String or char[]. String can be easily splitted using substring function, but i don't know how to convert string type to hex.
And contrariwise char[] can be easily converted to hex using strtoul function but i didn't find simple way to split char string.
Let's try some simple bit shifting.
std::string sample_str = "000101011110011110101110";
uint32_t result = 0;
for (unsigned int i = 0; i < sample_str.length(); ++i)
{
result = result << 1;
result = result | (sample_str[i] & 1);
}
There may be faster methods, but you would have to search the web for "bit twiddling string".
Background
This is based on the assumption that the character representation of zero has the least significant bit set to zero. Likewise the character representation of one has the least significant bit set to one.
The algorithm shifts the result left by one to make room for a new bit value.
Taking the character value and ANDing with 1 results in a value of zero for '0' and one for '1'. This result is ORed into the result value, to produce the correct value.
Try single stepping with a debugger to see how it works.
This is quite literally found in this link: StringConstructors
// using an int and a base (hexadecimal):
stringOne = String(45, HEX);
// prints "2d", which is the hexadecimal version of decimal 45:
Serial.println(stringOne);
const char* binary = "000101011110011110101110" ;
char hex[9] = "" ;
uint32_t integer = 0 ;
for( int i = 0; binary[i+1] != '\0'; i++ )
{
if( binary[i] == '1' )
{
integer |= 1 ;
}
integer <<= 1 ;
}
sprintf( hex, "0x%06x", integer ) ;
In C, this is quite simple. Use strtoumax(binary, 0, 2) to convert your binary string to an uintmax_t and then convert that to a hex string with sprintf or fprintf.

Change bit of hex number with leading zeros in C++,(C)

I have this number in hex string:
002A05.
I need to set 7-th bit of this number to 1, so after conversion I will get
022A05
But it has to work with every 6 chars hex number.
I tried converting hex string to integer via strtol, but that function strip leading zeros.
Please help me how can I solve it.
int hex=0x002A05;
int mask = 0x020000;
printf ("%06X",hex | mask);
hope this helps
In a 24-bit number bit #7 (counting from the left, as you did in your example, not from the right, as is done conventionally) is always going to be in the second byte from the left. You can solve your problem without converting the entire number to integer by taking that second hex digit, converting it to a number 0..15, setting its bit #3 (again counting from the left), and converting the result back to a hex digit.
int fromHex(char c) {
c = toupper(c);
if (c >= '0' && c <= '9') {
return c-'0';
} else {
return c-'A'+10;
}
}
char toHexDigit(int n) {
return n < 10 ? '0'+n : 'A'+n-10;
}
char myNum[] = "002A05";
myNum[1] = toHexDigit(fromHex(myNum[1]) | 2);
printf("%s\n", myNum);
This prints '022A05' (link to ideone).
It sounds to me like you have a string, not a hex constant, that you want to manipulate. You can do it pretty easily by bit twiddling the ascii value of the hex character. If you have char representing a hex character like char h = '6';, char h = 'C';, or char h = '';, you can set the 3rd from the left (2nd from the right) bit in the number that the character represents using:
h = h > '7' ? h <= '9' ? h + 9 : ((h + 1) | 2) - 1 : h | 2;
So you can do this to the second character (4 + 3 bits) in your string. This works for any hex string with 2 or more characters. Here is your example:
char hex_string[] = "002A05";
// Get the second character from the string
char h = hex_string[1];
// Calculate the new character
h = h > '7' ? h <= '9' ? h + 9 : ((h + 1) | 2) - 1 : h | 2;
// Set the second character in the string to the result
hex_string[1] = h;
printf("%s", hex_string); // 022A05
You asked about strtol specifically, so to answer your question, just add padding after you convert the number with strtol:
const char *s = "002A05";
int x = strtol(s, NULL, 16);
x |= (1<<17);
printf("%.6X\n",x);

Create a file that uses 4-bit encoding to represent integers 0 -9

How can I create a file that uses 4-bit encoding to represent integers 0-9 separated by a comma ('1111')? for example:
2,34,99 = 0010 1111 0011 0100 1111 1001 1001 => actually becomes without spaces
0010111100110100111110011001 = binary.txt
Therefore 0010111100110100111110011001 is what I see when I view the file ('binary.txt')in WINHEX in binary view but I would see 2,34,99 when view the file (binary.txt) in Notepad.
If not Notepad, is there another decoder that will do '4-bit encoding' or do I have a write a 'decoder program' to view the integers?
How can I do this in C++?
The basic idea of your format (4 bits per decimal digit) is well known and called BCD (Binary Coded Decimal). But I doubt the use of 0xF as an encoding for a coma is something well established and even more supported by notepad.
Writing a program in C++ to do the encoding and decoding would be quite easy. The only difficulty would be that the standard IO use byte as the more basic unit, not bit, so you'd have to group yourself the bits into a byte.
You can decode the files using od -tx1 if you have that (digits will show up as digits, commas will show up as f). You can also use xxd to go both directions; it comes with Vim. Use xxd -r -p to copy hex characters from stdin to a binary file on stdout, and xxd -p to go the other way. You can use sed or tr to change f back and forth to ,.
This is the simplest C++ 4-bit (BCD) encoding algorithm I could come up with - wouldn't call it exactly easy, but no rocket science either. Extracts one digit at a time by dividing and then adds them to the string:
#include <iostream>
int main() {
const unsigned int ints = 3;
unsigned int a[ints] = {2,34,99}; // these are the original ints
unsigned int bytes_per_int = 6;
char * result = new char[bytes_per_int * ints + 1];
// enough space for 11 digits per int plus comma, 8-bit chars
for (int j=0; j < bytes_per_int * ints; ++j)
{
result[j] = 0xFF; // fill with FF
}
result[bytes_per_int*ints] = 0; // null terminated string
unsigned int rpos = bytes_per_int * ints * 2; // result position, start from the end of result
int i = ints; // start from the end of the array too.
while (i != 0) {
--i;
unsigned int b = a[i];
while (b != 0) {
--rpos;
unsigned int digit = b % 10; // take the lowest decimal digit of b
if (rpos & 1) {
// odd rpos means we set the lowest bits of a char
result[(rpos >> 1)] = digit;
}
else {
// even rpos means we set the highest bits of a char
result[(rpos >> 1)] |= (digit << 4);
}
b /= 10; // make the next digit the new lowest digit
}
if (i != 0 || (rpos & 1))
{
// add the comma
--rpos;
if (rpos & 1) {
result[(rpos >> 1)] = 0x0F;
}
else {
result[(rpos >> 1)] |= 0xF0;
}
}
}
std::cout << result;
}
Trimming the bogus data left at the start portion of the result according to rpos will be left as an exercise for the reader.
The subproblem of BCD conversion has also been discussed before: Unsigned Integer to BCD conversion?
If you want a more efficient algorithm, here's a bunch of lecture slides with conversion from 8-bit ints to BCD: http://edda.csie.dyu.edu.tw/course/fpga/Binary2BCD.pdf