Converting a four character string to a long - c++

I want to convert a four character string (i.e. four characters) into a long (i.e. convert them to ASCII codes and then put them into the long).
As I understand it, this is done by writing the first character to the first byte of the long, the second to the adjacent memory location, and so on. But I don't know how to do this in C++.
Can someone please point me in the right direction?
Thanks in advance.

Here's your set of four characters:
const unsigned char buf[4] = { 'a', '0', '%', 'Q' };
Now we assemble a 32-bit unsigned integer:
const uint32_t n = (buf[0]) | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);
Here I assume that buf[0] is the least significant one; if you want to go the other way round, just swap the indices around.
Let's confirm:
printf("n = 0x%08X\n", n); // we get n = 0x51253061
// Q % 0 a
Important: Make sure your original byte buffer is unsigned, or otherwise add explicit casts like (unsigned int)(unsigned char)(buf[i]); otherwise the shift operations are not well defined.
Word of warning: I would strongly prefer this algebraic solution over the possibly tempting const uint32_t n = *(uint32_t*)(buf), which is machine-endianness dependent and will make your compiler angry if you're using strict aliasing assumptions!
As was helpfully pointed out below, you can try and be even more portable by not making assumptions on the bit size of a byte:
const unsigned very long int n = buf[0] |
(buf[1] << (CHAR_BIT) |
(buf[2] << (CHAR_BIT * 2) |
(buf[3] << (CHAR_BIT * 3) ;
Feel free to write your own generalizations as needed! (Good luck figuring out the appropriate printf format string ;-) .)

If your bytes are in the correct order for a long on your machine then use memcpy, something like this -
#include <cstdlib>
#include <iostream>
int main()
{
char data[] = {'a', 'b', 'c', 'd'};
long result;
std::memcpy(&result, data, 4);
std::cout << result << "\n";
}
Note that this will be platform dependent for byte ordering in the long which may or may not be what you need. And the 4 is hard coded as the size in bytes of the long for simplicty. You would NOT hard code 4 in a real program of course. All the compilers I've tried this on optimize out the memcpy when optimization is enabled so it's likely to be efficient too.
EDIT: Go with the shift and add answer someone else posted unless this meets your specific requirements as it's much more portable and safe!

#include <string>
#include <iostream>
std::string fourCharCode_toString ( int value )
{
return std::string( reinterpret_cast<const char*>( &( value ) ), sizeof(int) );
}
int fourCharCode_toInt ( const std::string & value )
{
return *( reinterpret_cast<const int*>( value.data() ) );
}
int main()
{
int a = 'DROW';
std::string str = fourCharCode_toString( a );
int b = fourCharCode_toInt( str );
std::cout << str << "\n";
}

Related

Check if pubkey belongs to twisted Edwards25519

I want to check if some pubkey belongs to twisted edwards25519 (I guess this is used for ed25519 ?) The problem is that I have in theory some valid pubkeys like:
hash_hex = "3afe3342f7192e52e25ebc07ec77c22a8f2d1ba4ead93be774f5e4db918d82a0"
or
hash_hex = "fd739be0e59c072096693b83f67fb2a7fd4e4b487e040c5b128ff602504e6c72"
and to check if they are valid I use from libsodium:
auto result = crypto_core_ed25519_is_valid_point(reinterpret_cast<const unsigned char*>(hash_hex.c_str()));
and the thing is that for those pubkeys which should be in theory valid, I have in both cases 0 as a result, which means that checks didn't pass (according to https://doc.libsodium.org/advanced/point-arithmetic#point-validation). So my question is if I am using this function wrong ? should that key be delivered in another form ? or maybe somehow those keys are not valid for some reasons (I have them from some coin explorer, so in theory they should be valid) ? is there some online tool where I can check if those pubkey belongs to that eliptic curve ?
You need to convert your hex string into binary format. Internally, the ed25519 functions work on a 256 (crypto_core_ed25519_BYTES (32) * 8) bit unsigned integer. You can compare it with an uint64_t, which consists of 8 octets. The only difference is that there is no standard uint256_t type, so a pointer to an array of 32 unsigned char is used instead. I use std::uint8_t instead of unsigned char below, so if std::uint8_t is not an alias for unsigned char the program should fail to compile.
Converting the hex string to binary format is done like this.
A nibble is 4 bits, which is what a single hex digit can represent. 0 = 0b0000 and f = 0b1111.
You lookup each hex character in a lookup table to easily convert the character into the value 0 - 15 (decimal), 0b0000 - 0b1111 (binary).
Since an uint8_t requires two nibbles, you combine them two and two. The first nibble is left shifted to form the high part of the final uint8_t and the second nibble is just bitwise OR:ed (|) with that result. Using fe as an example:
f = 0b1111
e = 0b1110
shift f left 4 steps:
f0 = 0b11110000
bitwise OR with e
0b11110000
| 0b00001110
------------
0b11111110 = 254 (dec)
Example:
#include <sodium.h>
#include <cstdint>
#include <iostream>
#include <string_view>
#include <vector>
// a function to convert a hex string into binary format
std::vector<std::uint8_t> str2bin(std::string_view hash_hex) {
static constexpr std::string_view tab = "0123456789abcdef";
std::vector<std::uint8_t> res(crypto_core_ed25519_BYTES);
if(hash_hex.size() == crypto_core_ed25519_BYTES * 2) {
for(size_t i = 0; i < res.size(); ++i) {
// find the first nibble and left shift it 4 steps, then find the
// second nibble and do a bitwise OR to combine them:
res[i] = tab.find(hash_hex[i*2])<<4 | tab.find(hash_hex[i*2+1]);
}
}
return res;
}
int main() {
std::cout << std::boolalpha;
for(auto hash_hex : {
"3afe3342f7192e52e25ebc07ec77c22a8f2d1ba4ead93be774f5e4db918d82a0",
"fd739be0e59c072096693b83f67fb2a7fd4e4b487e040c5b128ff602504e6c72",
"this should fail" })
{
auto bin = str2bin(hash_hex);
bool result = crypto_core_ed25519_is_valid_point(bin.data());
std::cout << "is " << hash_hex << " ok: " << result << '\n';
}
}
Output:
is 3afe3342f7192e52e25ebc07ec77c22a8f2d1ba4ead93be774f5e4db918d82a0 ok: true
is fd739be0e59c072096693b83f67fb2a7fd4e4b487e040c5b128ff602504e6c72 ok: true
is this should fail ok: false
Also note: libsodium comes with helper functions to do this conversion between hex strings and binary format:
char *sodium_bin2hex(char * const hex, const size_t hex_maxlen,
const unsigned char * const bin, const size_t bin_len);
int sodium_hex2bin(unsigned char * const bin, const size_t bin_maxlen,
const char * const hex, const size_t hex_len,
const char * const ignore, size_t * const bin_len,
const char ** const hex_end);

Most efficient way to convert 8 hex chars into a 4-uint8_t array?

I have a const char*, pointing to an array of 8 characters (that may be a part of a larger string), containing a hexadecimal value. I need a function that converts those chars into an array of 4 uint8_t, where the two first characters in the source array will become the first element in the target array, and so on. For example, if I have this
const char* s = "FA0BD6E4";
I want it converted to
uint8_t i[4] = {0xFA, 0x0B, 0xD6, 0xE4};
Currently, I have these functions:
inline constexpr uint8_t HexChar2UInt8(char h) noexcept
{
return static_cast<uint8_t>((h & 0xF) + (((h & 0x40) >> 3) | ((h & 0x40) >> 6)));
}
inline constexpr uint8_t HexChars2UInt8(char h0, char h1) noexcept
{
return (HexChar2UInt8(h0) << 4) | HexChar2UInt8(h1);
}
inline constexpr std::array<uint8_t, 4> HexStr2UInt8(const char* in) noexcept
{
return {{
HexChars2UInt8(in[0], in[1]),
HexChars2UInt8(in[2], in[3]),
HexChars2UInt8(in[4], in[5]),
HexChars2UInt8(in[6], in[7])
}};
}
Here's what it will look like where I call it from:
const char* s = ...; // the source string
std::array<uint8_t, 4> a; // I need to place the resulting value in this array
a = HexStr2UInt8(s); // the function call does not have to look like this
What I'm wondering, is there any more efficient (and portable) way do do this? For example, is returning a std::array a good thing to do, or should I pass a dst pointer to HexChars2UInt8? Or are there any other way to improve my function(s)?
The main reason I'm asking this is because I will likely need to optimize this at some point, and it will be problematic if the API (the function prototype) is changed in the future.
You can add parallelism, as the HexChar2Uint8 can access 8 characters at the same time. It's probably faster to load non-aligned 64-bit value once than 8 chars one by one (and to call the conversion function)
hexChar2Uints(uint8_t *ptr, uint64_t *result) // make result aligned to qword
{
uint64_t d=*(uint64_t*)ptr;
uint64_t hi = (d>>6) & 0x0101010101010101;
d &= 0x0f0f0f0f0f0f0f0f;
*result = d+(hi*9); // let compiler decide the fastest method
}
The last stage has to be done as OP suggested, just reading from modified "string":
for (n=0;n<4;n++) arr[n]=(tmp[2*n]<<4) | tmp[2*n+1];
The chances are slim that this can be considerably speeded up. The << 4 operation could be injected to hexChar2Uints making that parallel too, but I doubt it can be made in less than 4 arithmetic operations.
The most efficient, i.e. the fastest way to do the conversion is probably to set up a table of 65536 values for every possible pair of 2 characters, and to store in the valid ones their conversions.
If you store them as unsigned chars you won't be able to catch errors so you'll just have to hope you get valid input. If you store the value type as bigger than unsigned char you'll be able to use some kind of "error" value but checking if you get one will be an overhead. (The extra 65536 bytes probably isn't).
What you have written is probably efficient enough too though. Of course once again you are also not checking for invalid input and will get a result anyway.
If you keep yours I might change:
((h & 0x40) >> 3) | ((h & 0x40) >> 6)
which seems to be a substitute for
( (h & 0x40) ? 10 : 0 )
I can't see how my expression is less efficient than yours and is probably clearer in intention. (Use 0xA rather than 10 if you insist on hex)
There are several approaches possible. The simplest and the
most portable is to break the characters down into two character
std::string, using each to initialize an std::istringstream,
set up the correct format flags, and read the value from that.
A somewhat more efficient solution would be to create a single
string, inserted whitespace to separate the individual values,
and just use one std::istringstream, something like:
std::vector<uint8_t>
convert4UChars( std::string const& in )
{
assert( in.size() >= 8 );
std::string tmp( in.begin(), in.begin() + 8 );
int i = tmp.size();
while ( i > 2 ) {
i -= 2;
tmp.insert( i, 1, ' ');
}
std::istringstream s(tmp);
s.setf( std::ios_base::hex, std::ios_base::basefield );
std::vector<int> results( 4 );
s >> results[0] >> results[1] >> results[2] >> results[3];
if ( !s ) {
// error...
}
return std::vector<uint8_t>( results.begin(), results.end() );
}
If you really want to do it by hand, the alternative is to
create a 256 entry table, indexed by each character, and use
that:
class HexValueTable
{
std::array<uint_t, 256> myValues;
public:
HexValueTable()
{
std::fill( myValues.begin(), myValues.end(), -1 );
for ( int i = '0'; i <= '9'; ++ i ) {
myValues[ i ] = i - '0';
}
for ( int i = 'a'; i <= 'f'; ++ i ) {
myValues[ i ] = i - 'a' + 10;
}
for ( int i = 'A'; i <= 'A'; ++ i ) {
myValues[ i ] = i - 'a' + 10;
}
}
uint8_t operator[]( char ch ) const
{
uint8_t results = myValues[static_cast<unsigned char>( ch )];
if ( results == static_cast<unsigned char>( -1 ) ) {
// error, throw some exceptions...
}
return results;
}
};
std::array<uint8_t, 4>
convert4UChars( std::string const& in )
{
static HexValueTable const hexValues;
assert( in.size() >= 8 );
std::array<uint8_t, 4> results;
std::string::const_iterator source = in.begin();
for ( int i = 0; i < 4; ++ i ) {
results[i] = (hexValues[*source ++]) << 4;
results[i] |= hexValues[*source ++];
}
return results;
}

Convert non-null-terminated char* to int

I am working on some code that reads in a data file. The file frequently contains numeric values of various lengths encoded in ASCII that I need to convert to integers. The problem is that they are not null-terminated, which of course causes problems with atoi. The solution I have been using is to manually append a null to the character sequence, and then convert it.
This is the code that I have been using; it works fine, but it seems very kludgy.
char *append_null(const char *chars, const int size)
{
char *tmp = new char[size + 2];
memcpy(tmp, chars, size);
tmp[size + 1] = '\0';
return tmp;
}
int atoi2(const char *chars, const int size)
{
char *tmp = append_null(chars, size);
int result = atoi(tmp);
delete[] tmp;
return result;
}
int main()
{
char *test = new char[20];
test[0] = '1';
test[1] = '2';
test[2] = '3';
test[3] = '4';
cout << atoi2(test, 4) << endl;
}
I am wondering if there is a better way to approach this problem.
Fixed-format integer conversion is still well within handroll range where the library won't do:
size_t mem_tozd_rjzf(const char *buf, size_t len) // digits only
{
int n=0;
while (len--)
n = n*10 + *buf++ - '0';
return n;
}
long mem_told(const char *buf, size_t len) // spaces, sign, digits
{
long n=0, sign=1;
while ( len && isspace(*buf) )
--len, ++buf;
if ( len ) switch(*buf) {
case '-': sign=-1; \
case '+': --len, ++buf;
}
while ( len-- && isdigit(*buf) )
n = n*10 + *buf++ -'0';
return n*sign;
}
In C++11, you can say std::stoi(std::string(chars, size)), all from <string>.
int i = atoi(std::string(chars, size).c_str());
Your method will work, although you should only need size+1 for appending the null and the null will go at position size. Currently, your test code doesn't actually make the function call, but I'll assume that you have a way to determine when the null-terminated characters end. If possibly, I'd recommend making the null termination there so that you don't have to worry about catching cases where you hit an exception before you can deallocate the memory (memory which, honestly, may or may not have been allocated if you start catching exceptions).
std::string str = "1234";
boost::lexical_cast<int>(str); // 1234
The problem as formulated requires to construct a string given an array of known size, then converting its text into a numeric value.
To convert text into values, C++ has a unified mechanism: streams.
In your case, you can do the following:
int i = 0;
std::stringstream(std::string(yourbuffer, yoursize)) >> i;
This will completely avoid any plain old C reference.
But, since -as you say- all values come from a file... why just don't read the file itself as a stream via std::fstream ?
The question says (emph mine):
The file frequently contains numeric values of various lengths encoded
in ASCII that I need to convert to integers. The problem is that they
are not null-terminated, which of course causes problems with atoi.
This does not really pose a problem, as, if we look at the docs for atoi or strtol, they clearly state:
Function discards any whitespace characters until first non-whitespace
character is found. Then it takes as many characters as possible to
form a valid integer number representation and converts them to
integer value.
That means, it doesn't matter at all that the numbers aren't null terminated, as long as they are delimited by something that stops conversion.
And if they are not delimited, then you have to know the size, and when you know the size, I would also recommend a hand-coded solution like in the other answer.
I know this answer is not answering OP's question, but it helps if your source of char* is a char array with known size.
Live demo
#include <fmt/core.h>
#include <type_traits>
#include <iostream>
// SFINAE fallback
template<typename T, typename =
std::enable_if< std::is_pointer<T>::value >
>
int charArrayToInt(const T arr){ // Fall back for user friendly compiler errors
static_assert(false == std::is_pointer<T>::value, "`charArrayToInt()` dosen't allow conversion from pointer!");
return -1;
}
// Valid for both null or non-null-terminated char array
template<size_t sz>
int charArrayToInt(const char(&arr)[sz]){
// It doesn't matter whether it's null terminated or not
std::string str(arr, sz);
return std::stof(str);
}
int main() {
char number[2] = {'4','2'};
int ret = charArrayToInt(number);
fmt::print("The answer is {}. ", ret);
return 0;
}

storing and reading int data into char array

I am trying to store two integer value into an char array in C++.
Here is the code..
char data[20];
*data = static_cast <char> (time_delay); //time_delay is of int type
*(data + sizeof(int)) = static_cast<char> (wakeup_code); //wakeup_code is of int type
Now on the other end of the program, I want to reverse this operation. That is, from this char array, I need to obtain the values of time_delay and wakeup_code.
How can I do that??
Thanks,
Nick
P.S: I know this is a stupid way to do this, but trust me its a constraint.
I think when you write static_cast<char>, that value is converted to a 1-byte char, so if it didn't fit in a char to begin with, you'll lose data.
What I'd do is use *((int*)(data+sizeof(int))) and *((int*)(data+sizeof(int))) for both reading and writing ints to the array.
*((int*)(data+sizeof(int))) = wakeup_code;
....
wakeup_code = *((int*)(data+sizeof(int)));
Alternatively, you might also write:
reinterpret_cast<int*>(data)[0]=time_delay;
reinterpret_cast<int*>(data)[1]=wakeup_code;
If you are working on a PC x86 architecture then there are no alignment problems (except for speed) and you can cast a char * to an int * to do the conversions:
char data[20];
*((int *)data) = first_int;
*((int *)(data+sizeof(int))) = second_int;
and the same syntax can be used for reading from data by just swapping sides of =.
Note however that this code is not portable because there are architectures where an unaligned operation may be not just slow but actually illegal (crash).
In those cases probably the nicest approach (that also gives you endianness control in case data is part of a communication protocol between different systems) is to build the integers explicitly in code one char at a time:
first_uint = ((unsigned char)data[0] |
((unsigned char)data[1] << 8) |
((unsigned char)data[2] << 16) |
((unsigned char)data[3] << 24));
data[4] = second_uint & 255;
data[5] = (second_uint >> 8) & 255;
data[6] = (second_uint >> 16) & 255;
data[7] = (second_uint >> 24) & 255;
I haven't tried it, but the following should work:
char data[20];
int value;
memcpy(&value,data,sizeof(int));
Try the following:
union IntsToChars {
struct {
int time_delay;
int wakeup_value;
} Integers;
char Chars[20];
};
extern char* somebuffer;
void foo()
{
IntsToChars n2c;
n2c.Integers.time_delay = 1;
n2c.Integers.wakeup_value = 2;
memcpy(somebuffer,n2c.Chars,sizeof(n2c)); //an example of using the char array containing the integer data
//...
}
Using such union should eliminate the alignment problem, unless the data is passed to a machine with different architecture.
#include <sstream>
#include <string>
int main ( int argc, char **argv) {
char ch[10];
int i = 1234;
std::ostringstream oss;
oss << i;
strcpy(ch, oss.str().c_str());
int j = atoi(ch);
}

Outputting bit data to binary file C++

I am writing a compression program, and need to write bit data to a binary file using c++. If anyone could advise on the write statement, or a website with advice, I would be very grateful.
Apologies if this is a simple or confusing question, I am struggling to find answers on web.
Collect the bits into whole bytes, such as an unsigned char or std::bitset (where the bitset size is a multiple of CHAR_BIT), then write whole bytes at a time. Computers "deal with bits", but the available abstraction – especially for IO – is that you, as a programmer, deal with individual bytes. Bitwise manipulation can be used to toggle specific bits, but you're always handling byte-sized objects.
At the end of the output, if you don't have a whole byte, you'll need to decide how that should be stored. Both iostreams and stdio can write unformatted data using ostream::write and fwrite, respectively.
Instead of a single char or bitset<8> (8 being the most common value for CHAR_BIT), you might consider using a larger block size, such as an array of 4-32, or more, chars or the equivalent sized bitset.
For writing binary, the trick I have found most helpful is to store all the binary as a single array in memory and then move it all over to the hard drive. Doing a bit at a time, or a byte at a time, or an unsigned long long at a time is not as fast as having all the data stored in an array and using one instance of "fwrite()" to store it to the hard drive.
size_t fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
Ref: http://www.cplusplus.com/reference/clibrary/cstdio/fwrite/
In English:
fwrite( [array* of stored data], [size in bytes of array OBJECT. For unsigned chars -> 1, for unsigned long longs -> 8], [number of instances in array], [FILE*])
Always check your returns for validation of success!
Additionally, an argument can be made that having the object type be as large as possible is the fastest way to go ([unsigned long long] > [char]). While I am not versed in the coding behind "fwrite()", I feel the time to convert from the natural object used in your code to [unsigned long long] will take more time when combined with the writing than the "fwrite()" making due with what you have.
Back when I was learning Huffman Coding, it took me a few hours to realize that there was a difference between [char] and [unsigned char]. Notice for this method that you should always use unsigned variables to store the pure binary.
by below class you can write and read bit by bit
class bitChar{
public:
unsigned char* c;
int shift_count;
string BITS;
bitChar()
{
shift_count = 0;
c = (unsigned char*)calloc(1, sizeof(char));
}
string readByBits(ifstream& inf)
{
string s ="";
char buffer[1];
while (inf.read (buffer, 1))
{
s += getBits(*buffer);
}
return s;
}
void setBITS(string X)
{
BITS = X;
}
int insertBits(ofstream& outf)
{
int total = 0;
while(BITS.length())
{
if(BITS[0] == '1')
*c |= 1;
*c <<= 1;
++shift_count;
++total;
BITS.erase(0, 1);
if(shift_count == 7 )
{
if(BITS.size()>0)
{
if(BITS[0] == '1')
*c |= 1;
++total;
BITS.erase(0, 1);
}
writeBits(outf);
shift_count = 0;
free(c);
c = (unsigned char*)calloc(1, sizeof(char));
}
}
if(shift_count > 0)
{
*c <<= (7 - shift_count);
writeBits(outf);
free(c);
c = (unsigned char*)calloc(1, sizeof(char));
}
outf.close();
return total;
}
string getBits(unsigned char X)
{
stringstream itoa;
for(unsigned s = 7; s > 0 ; s--)
{
itoa << ((X >> s) & 1);
}
itoa << (X&1) ;
return itoa.str();
}
void writeBits(ofstream& outf)
{
outf << *c;
}
~bitChar()
{
if(c)
free(c);
}
};
for example
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <stdlib.h>
using namespace std;
int main()
{
ofstream outf("Sample.dat");
ifstream inf("Sample.dat");
string enCoded = "101000001010101010";
//write to file
cout << enCoded << endl ; //print 101000001010101010
bitChar bchar;
bchar.setBITS(enCoded);
bchar.insertBits(outf);
//read from file
string decoded =bchar.readByBits(inf);
cout << decoded << endl ; //print 101000001010101010000000
return 0;
}