How to serialize structure data in C++? - c++

I was asked in an interview to serialize data (so it could be stored in a buffer and sent over some network). This is what I came up with -
struct AMG_ANGLES {
float yaw;
float pitch;
float roll;
};
char b[sizeof(struct AMG_ANGLES)];
char* encode(struct AMG_ANGLES *a)
{
std::memcpy(b, &a, sizeof(struct AMG_ANGLES));
return b;
}
void decode(char* data)
{
// check endianess
AMG_ANGLES *tmp; //Re-make the struct
std::memcpy(&tmp, data, sizeof(tmp));
}
Is this correct? Can anyone give alternate designs? I did not get callback so I'm just trying to learn what I could have improved.

Is this correct?
Most likely, no.
The point of serialization is to convert the data into a form that is completely platform independent - e.g. does not rely on things like endianess or if a float is an IEEE 754 or something very different. This requires:
a) strict agreement on the intended format - e.g. if it's some kind of text (XML, JSON, CSV, ...) or if it's "raw binary" with explicit definitions of the meaning of each individual byte (e.g. like maybe "byte 1 is always the lowest 8 bits of the significand").
b) correct conversion to whatever the intended format is (e.g. maybe like ensuring that byte 1 is always the lowest 8 bits of the significand regardless of any/all platform differences)
However; it is at least technically possible that the code is not supposed to be portable and the specification ("agreement on the intended format") happens to match what you ended up with for the only platform that the code is designed for; and therefore it's at least technically possible that the code is correct.

There could be lots of improvements, but instead of telling all of them I suggest you to examine into cereal . It is widely used serialization/deserialization library, so lots of keypoints are thought.
Some of my thoughts are :
Your code depends on hardware which the program running on because of alignment and endianness. So the serialized data is not portable and compiler dependant.
char* encode(struct AMG_ANGLES *a) function returns char*, it is possibly leaked. To prevent the issue, let std::unique_ptr<T> decide its lifetime or wrap it with a class. But get rid of pointers somehow.
Templatize your serialize/deserialize operations. Otherwise, you could write same functions for other types.
template<typename T>
char* encode( T* a ) // I leave signature as is, just to demonstrate
{
std::memcpy( b , &a , sizeof(T) );
return b;
}
If the format is up to you, it is better to prefer human readable ones rather than binary archiving such as JSON, XML

can someone give alternate design in C?
The "standard" way would be to use printf and scanf to create an ascii representation of the data:
#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <assert.h>
#include <float.h>
struct AMG_ANGLES {
float yaw;
float pitch;
float roll;
};
// declare a buffer at least this long to be sure encode works properly
#define AMG_ANGLES_BUFSIZE ( \
3 * ( /* 3 floats */ \
2 + /* digit and dot */ \
FLT_DECIMAL_DIG - 1 + /* digits after dot */ \
4 /* the 'e±dd' part */ \
) \
+ 2 /* spaces */ \
+ 1 /* zero terminating character */ \
)
int encode(char *dest, size_t destsize, const struct AMG_ANGLES *a) {
return snprintf(dest, destsize, "%.*e %.*e %.*e",
FLT_DECIMAL_DIG - 1, a->yaw,
FLT_DECIMAL_DIG - 1, a->pitch,
FLT_DECIMAL_DIG - 1, a->roll);
// my pedantic self wants to add `assert(snprintf_ret < AMG_ANGLES_BUFSIZE);`
}
int decode(struct AMG_ANGLES *dest, const char *data) {
return sscanf(data, "%e %e %e", &dest->yaw, &dest->pitch, &dest->roll) == 3 ? 0 : -1;
}
int main() {
char buf[AMG_ANGLES_BUFSIZE];
const struct AMG_ANGLES a = { FLT_MIN, FLT_MAX, FLT_MIN };
encode(buf, sizeof(buf), &a);
struct AMG_ANGLES b;
const int decoderet = decode(&b, buf);
assert(decoderet == 0);
assert(b.yaw == FLT_MIN);
assert(b.pitch == FLT_MAX);
assert(b.roll == FLT_MIN);
}
However in bare-metal embedded I try not to use scanf - it's a big function with some dependencies. So it's better to call strtof itself, but it needs some thinking:
int decode2(struct AMG_ANGLES *dest, const char *data) {
errno = 0;
char *endptr = NULL;
dest->yaw = strtof(data, &endptr);
if (errno != 0 || endptr == data) return -1;
if (*endptr != ' ') return -1;
data = endptr + 1;
dest->pitch = strtof(data, &endptr);
if (errno != 0 || endptr == data) return -1;
if (*endptr != ' ') return -1;
data = endptr + 1;
dest->roll = strtof(data, &endptr);
if (errno != 0 || endptr == data) return -1;
if (*endptr != '\0') return -1;
return 0;
}
or with removed code duplication:
int decode2(struct AMG_ANGLES *dest, const char *data) {
// array of pointers to floats to fill
float * const dests[] = { &dest->yaw, &dest->pitch, &dest->roll };
const size_t dests_cnt = sizeof(dests)/sizeof(*dests);
errno = 0;
for (int i = 0; i < dests_cnt; ++i) {
char *endptr = NULL;
*dests[i] = strtof(data, &endptr);
if (errno != 0 || endptr == data) return -1;
// space separates numbers, last number is followed by zero
const char should_be_char = i != dests_cnt - 1 ? ' ' : '\0';
if (*endptr != should_be_char) return -1;
data = endptr + 1;
}
return 0;
}
I needed to use some google and re-read chux answers to properly recall how to use FLT_DECIMAL_DIG in printf to print floats, that's most probably because I rarely worked with floats.

Keep in mind that, when using memcpy different architectures and different compilers will apply padding and endianness differently. To prevent the padding of the struct you could use an attribute provided by GCC
__attribute__ ((packed))
Nevertheless, this does not protect you from alternating endiannes.
The code for serializing and deserializing using memcpy might look like this:
#include <memory>
#include <cstring>
struct __attribute__((packed)) AMG_ANGLES {
float yaw;
float pitch;
float roll;
};
//The buffer is expected to be the same size as the T
template<typename T>
int serialize(const T &data,const std::unique_ptr<char[]> &buffer){
std::memcpy(buffer.get(), &data, sizeof(T));
return sizeof(T);
}
//The buffer is expected to be the same size as the ReturnType
template<typename ReturnType>
ReturnType deserialize(const std::unique_ptr<char[]> &buffer){
ReturnType tmp;
std::memcpy(&tmp, buffer.get(), sizeof(ReturnType));
return tmp;
}
int main()
{
struct AMG_ANGLES angles = {1.2, 1.3, 1.0};
std::unique_ptr<char[]> buffer(new char[sizeof(struct AMG_ANGLES)]);
int size = serialize(angles, buffer);
struct AMG_ANGLES angles_serialized = deserialize<AMG_ANGLES>(buffer);
}

It's better to make some kinds of class like std::stringstream..
std::stringstream is not good to save binary data but it works the same way you want.
so I could make some example that works with std::stringstream..
This code implement only for serialization but it also add code for deserialization.
// C++11
template < typename T, typename decltype(std::declval<T>().to_string())* = nullptr>
std::ostream& operator<< (std::ostream& stream, T&& val)
{
auto str = val.to_string();
std::operator <<(stream, str);
return stream;
}
struct AMG_ANGLES {
float yaw;
float pitch;
float roll;
std::string to_string() const
{
std::stringstream stream;
stream << yaw << pitch << roll;
return stream.str();
}
};
void Test()
{
std::stringstream stream;
stream << 3 << "Hello world" << AMG_ANGLES{1.f, 2.f, 3.f };
}

Related

How to detect UTF16 strings in PE files

I need to extract Unicode strings from a PE file. While extracting I need to detect it first. For UTF-8 characters, I used the following link - How to easily detect utf8 encoding in the string?. Is there any similar way to detect UTF-16 characters. I have tried the following code. Is this right? Please do help or provide suggestions. Thanks in advance!!!
BYTE temp1 = buf[offset];
BYTE temp2 = buf[offset+1];
while (!(temp1 == 0x00 && temp2 == 0x00) && offset <= bufSize)
{
if ((temp1 >= 0x00 && temp1 <= 0xFF) && (temp2 >= 0x00 && temp2 <= 0xFF))
{
tmp += 2;
}
else
{
break;
}
offset += 2;
temp1 = buf[offset];
temp2 = buf[offset+1];
if (temp1 == 0x00 && temp2 == 0x00)
{
break;
}
}
I just implemented right now a function for you, DecodeUtf16Char(), basically it is able to do two things - either just check if it is a valid utf-16 (when check_only = true) or check and return valid decoded Unicode code-point (32-bit). Also it supports either big endian (default, when big_endian = true) or little endian (big_endian = false) order of bytes within two-byte utf-16 word. bad_skip equals to number of bytes to be skipped if failed to decode a character (invalid utf-16), bad_value is a value that is used to signify that utf-16 wasn't decoded (was invalid) by default it is -1.
Example of usage/tests are included after this function definition. Basically you just pass starting (ptr) and ending pointer to this function and when returned check return value, if it is -1 then at pointer begin was invalid utf-16 sequence, if it is not -1 then this returned value contains valid 32-bit unicode code-point. Also my function increments ptr, by amount of decoded bytes in case of valid utf-16 or by bad_skip number of bytes if it is invalid.
My functions should be very fast, because it contains only few ifs (plus a bit of arithmetics in case when you ask to actually decode chars), always place my function into headers so that it is inlined into calling function to produce very fast code! Also pass in only compile-time-constants check_only and big_endian, this will remove extra decoding code through C++ optimizations.
If for example you just want to detect long runs of utf-16 bytes then you do next thing, iterate in a loop calling this function and whenever it first returned not -1 then it will be possible beginning, then iterate further and catch last not-equal-to -1 value, this will be the last point of text. Also important to pass in bad_skip = 1 when searching for utf-16 bytes because valid char may start at any byte.
I used for testing different characters - English ASCII, Russian chars (two-byte utf-16) plus two 4-byte chars (two utf-16 words). My tests append converted line to test.txt file, this file is UTF-8 encoded to be easily viewable e.g. by notepad. All of the code after my decoding function is not needed for it to work, the rest is just testing code.
My function to work needs two functions - _DecodeUtf16Char_ReadWord() (helper) plus DecodeUtf16Char() (main decoder). I only include one standard header <cstdint>, if you're not allowed to include anything then just define uint8_t and uint16_t and uint32_t, I use only these types definition from this header.
Also, for reference, see my other post which implements both from scratch (and using standard C++ library) all types of conversions between UTF-8<-->UTF-16<-->UTF-32!
Try it online!
#include <cstdint>
static inline bool _DecodeUtf16Char_ReadWord(
uint8_t const * & ptrc, uint8_t const * end,
uint16_t & r, bool const big_endian
) {
if (ptrc + 1 >= end) {
// No data left.
if (ptrc < end)
++ptrc;
return false;
}
if (big_endian) {
r = uint16_t(*ptrc) << 8; ++ptrc;
r |= uint16_t(*ptrc) ; ++ptrc;
} else {
r = uint16_t(*ptrc) ; ++ptrc;
r |= uint16_t(*ptrc) << 8; ++ptrc;
}
return true;
}
static inline uint32_t DecodeUtf16Char(
uint8_t const * & ptr, uint8_t const * end,
bool const check_only = true, bool const big_endian = true,
uint32_t const bad_skip = 1, uint32_t const bad_value = -1
) {
auto ptrs = ptr, ptrc = ptr;
uint32_t c = 0;
uint16_t v = 0;
if (!_DecodeUtf16Char_ReadWord(ptrc, end, v, big_endian)) {
// No data left.
c = bad_value;
} else if (v < 0xD800 || v > 0xDFFF) {
// Correct single-word symbol.
if (!check_only)
c = v;
} else if (v >= 0xDC00) {
// Unallowed UTF-16 sequence!
c = bad_value;
} else { // Possibly double-word sequence.
if (!check_only)
c = (v & 0x3FF) << 10;
if (!_DecodeUtf16Char_ReadWord(ptrc, end, v, big_endian)) {
// No data left.
c = bad_value;
} else if ((v < 0xDC00) || (v > 0xDFFF)) {
// Unallowed UTF-16 sequence!
c = bad_value;
} else {
// Correct double-word symbol
if (!check_only) {
c |= v & 0x3FF;
c += 0x10000;
}
}
}
if (c == bad_value)
ptr = ptrs + bad_skip; // Skip bytes.
else
ptr = ptrc; // Skip all eaten bytes.
return c;
}
// --------- Next code only for testing only and is not needed for decoding ------------
#include <iostream>
#include <string>
#include <codecvt>
#include <fstream>
#include <locale>
static std::u32string DecodeUtf16Bytes(uint8_t const * ptr, uint8_t const * end) {
std::u32string res;
while (true) {
if (ptr >= end)
break;
uint32_t c = DecodeUtf16Char(ptr, end, false, false, 2);
if (c != -1)
res.append(1, c);
}
return res;
}
#if (!_DLL) && (_MSC_VER >= 1900 /* VS 2015*/) && (_MSC_VER <= 1914 /* VS 2017 */)
std::locale::id std::codecvt<char16_t, char, _Mbstatet>::id;
std::locale::id std::codecvt<char32_t, char, _Mbstatet>::id;
#endif
template <typename CharT = char>
static std::basic_string<CharT> U32ToU8(std::u32string const & s) {
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> utf_8_32_conv;
auto res = utf_8_32_conv.to_bytes(s.c_str(), s.c_str() + s.length());
return res;
}
template <typename WCharT = wchar_t>
static std::basic_string<WCharT> U32ToU16(std::u32string const & s) {
std::wstring_convert<std::codecvt_utf16<char32_t, 0x10ffffUL, std::little_endian>, char32_t> utf_16_32_conv;
auto res = utf_16_32_conv.to_bytes(s.c_str(), s.c_str() + s.length());
return std::basic_string<WCharT>((WCharT*)(res.c_str()), (WCharT*)(res.c_str() + res.length()));
}
template <typename StrT>
void OutputString(StrT const & s) {
std::ofstream f("test.txt", std::ios::binary | std::ios::app);
f.write((char*)s.c_str(), size_t((uint8_t*)(s.c_str() + s.length()) - (uint8_t*)s.c_str()));
f.write("\n\x00", sizeof(s.c_str()[0]));
}
int main() {
std::u16string a = u"привет|мир|hello|𐐷|world|𤭢|again|русский|english";
*((uint8_t*)(a.data() + 12) + 1) = 0xDD; // Introduce bad utf-16 byte.
// Also truncate by 1 byte ("... - 1" in next line).
OutputString(U32ToU8(DecodeUtf16Bytes((uint8_t*)a.c_str(), (uint8_t*)(a.c_str() + a.length()) - 1)));
return 0;
}
Output:
привет|мир|hllo|𐐷|world|𤭢|again|русский|englis

How to convert the template from C++ to C

I am trying to convert some C++ code to C for my compiler that can't run with C++ code. I'd like to create the template below to C. This template converts the decimal integer to hexadecimal, and adds 0 in front of value if the size of the hexadecimal string is smaller than (sizeof(T)*2). Data type T can be unsigned char, char, short, unsigned short, int, unsigned int, long long, and unsigned long long.
template< typename T > std::string hexify(T i)
{
std::stringbuf buf;
std::ostream os(&buf);
os << std::setfill('0') << std::setw(sizeof(T) * 2)
<< std::hex << i;
std::cout<<"sizeof(T) * 2 = "<<sizeof(T) * 2<<" buf.str() = "<<buf.str()<<" buf.str.c_str() = "<<buf.str().c_str()<<std::endl;
return buf.str().c_str();
}
Thank you for tour help.
Edit 1: I have tried to use the declaration
char * hexify (void data, size_t data_size)
but when I call with the int value int_value:
char * result = hexify(int_value, sizeof(int))
it doesn't work because of:
noncompetitive type (void and int).
So in this case, do I have to use a macro? I haven't tried with macro because it's complicated.
C does not have templates. One solution is to pass the maximum width integer supported (uintmax_t, in Value below) and the size of the original integer (in Size). One routine can use the size to determine the number of digits to print. Another complication is C does not provide C++’s std::string with is automatic memory management. A typical way to handle this in C is for the called function to allocate a buffer and return it to the caller, who is responsible for freeing it when done.
The code below shows a hexify function that does this, and it also shows a Hexify macro that takes a single parameter and passes both its size and its value to the hexify function.
Note that, in C, character constants such as 'A' have type int, not char, so some care is needed in providing the desired size. The code below includes an example for that.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
char *hexify(size_t Size, uintmax_t Value)
{
// Allocate space for "0x", 2*Size digits, and a null character.
size_t BufferSize = 2 + 2*Size + 1;
char *Buffer = malloc(BufferSize);
// Ensure a buffer was allocated.
if (!Buffer)
{
fprintf(stderr,
"Error, unable to allocate buffer of %zu bytes in %s.\n",
BufferSize, __func__);
exit(EXIT_FAILURE);
}
// Format the value as "0x" followed by 2*Size hexadecimal digits.
snprintf(Buffer, BufferSize, "0x%0*" PRIxMAX, (int) (2*Size), Value);
return Buffer;
}
/* Provide a macro that passes both the size and the value of its parameter
to the hexify function.
*/
#define Hexify(x) (hexify(sizeof (x), (x)))
int main(void)
{
char *Buffer;
/* Show two examples of using the hexify function with different integer
types. (The examples assume ASCII.)
*/
char x = 'A';
Buffer = hexify(sizeof x, x);
printf("Character '%c' = %s.\n", x, Buffer); // Prints "0x41".
free(Buffer);
int i = 123;
Buffer = hexify(sizeof i, i);
printf("Integer %d = %s.\n", i, Buffer); // Prints "0x00007b".
free(Buffer);
/* Show examples of using the Hexify macro, demonstrating that 'A' is an
int value, not a char value, so it would need to be cast if a char is
desired.
*/
Buffer = Hexify('A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x00000041".
free(Buffer);
Buffer = Hexify((char) 'A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x41".
free(Buffer);
}
You don't need templates if you step down to raw bits and bytes.
If performance is important, it is also best to roll out the conversion routine by hand, since the string handling functions in C and C++ come with lots of slow overhead. The somewhat well-optimized version would look something like this:
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
This breaks down any passed type byte-by-byte, using a character type. Which is fine, when using character types specifically. It also uses caller allocation for maximum performance. (It can also be made endianess-independent with an extra check per loop.)
We can make the call a bit more convenient with a wrapper macro:
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
Full example:
#include <string.h>
#include <stdint.h>
#include <stdio.h>
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
int main (void)
{
char buf[50];
int32_t i32a = 0xABCD;
puts(hexify(buf, i32a));
int32_t i32b = 0xAAAABBBB;
puts(hexify(buf, i32b));
char c = 5;
puts(hexify(buf, c));
uint8_t u8 = 100;
puts(hexify(buf, u8));
}
Output:
0000ABCD
AAAABBBB
05
64
an optional solution is to use format string like printf
note that you can't return pointer to local variable, but you can get the buffer as argument, (here it is without boundaries check).
char* hexify(char* result, const char* format, void* arg)
{
int size = 0;
if(0 == strcmp(format,"%d") || 0 == strcmp(format,"%u"))
{
size=4;
sprintf(result,"%08x",arg);
}
else if(0 == strcmp(format,"%hd") || 0 == strcmp(format,"%hu"))
{
size=2;
sprintf(result,"%04x",arg);
}
else if(0 == strcmp(format,"%hhd")|| 0 == strcmp(format,"%hhu"))
{
size=1;
sprintf(result,"%02x",arg);
}
else if(0 == strcmp(format,"%lld") || 0 == strcmp(format,"%llu") )
{
size=8;
sprintf(result,"%016x",arg);
}
//printf("size=%d", size);
return result;
}
int main()
{
char result[256];
printf("%s", hexify(result,"%hhu", 1));
return 0;
}

uintx_t to const char* in freestanding c++ using GNU compiler

so I am trying to convert some integers in to character arrays that my terminal can write. so I can see the value of my codes calculations for debugging purposes when its running.
as in if the int_t count = 57 I want the terminal to write 57.
so char* would be an array of character of 5 and 7
The kicker here though is that this is in an freestanding environment so that means no standard c++ library.
EDIT:
this means No std::string, no c_str, no _tostring, I cant just print integers.
The headers I have access to are iso646,stddef,float,limits,stdint,stdalign, stdarg, stdbool and stdnoreturn
Ive tried a few things from casting the int as an const char*, witch just led to random characters being displayed. To feeding my compiler different headers from the GCC collection but they just keeped needing other headers that I continued feeding it until I did not know what header the compiler wanted.
so here is where the code needs to be used to be printed.
uint8_t count = 0;
while (true)
{
terminal_setcolor(3);
terminal_writestring("hello\n");
count++;
terminal_writestring((const char*)count);
terminal_writestring("\n");
}
any advice with this would be greatly appreciated.
I am using an gnu, g++ cross compiler targeted at 686-elf and I guess I am using C++11 since I have access to stdnoreturn.h but it could be C++14 since I only just built the compiler with the latest gnu software dependencies.
Without C/C++ Standard Library you have no options except writing conversion function manually, e.g.:
template <int N>
const char* uint_to_string(
unsigned int val,
char (&str)[N],
unsigned int base = 10)
{
static_assert(N > 1, "Buffer too small");
static const char* const digits = "0123456789ABCDEF";
if (base < 2 || base > 16) return nullptr;
int i = N - 1;
str[i] = 0;
do
{
--i;
str[i] = digits[val % base];
val /= base;
}
while (val != 0 && i > 0);
return val == 0 ? str + i : nullptr;
}
template <int N>
const char* int_to_string(
int val,
char (&str)[N],
unsigned int base = 10)
{
// Output as unsigned.
if (val >= 0) return uint_to_string(val, str, base);
// Output as binary representation if base is not decimal.
if (base != 10) return uint_to_string(val, str, base);
// Output signed decimal representation.
const char* res = uint_to_string(-val, str, base);
// Buffer has place for minus sign
if (res > str)
{
const auto i = res - str - 1;
str[i] = '-';
return str + i;
}
else return nullptr;
}
Usage:
char buf[100];
terminal_writestring(int_to_string(42, buf)); // Will print '42'
terminal_writestring(int_to_string(42, buf, 2)); // Will print '101010'
terminal_writestring(int_to_string(42, buf, 8)); // Will print '52'
terminal_writestring(int_to_string(42, buf, 16)); // Will print '2A'
terminal_writestring(int_to_string(-42, buf)); // Will print '-42'
terminal_writestring(int_to_string(-42, buf, 2)); // Will print '11111111111111111111111111010110'
terminal_writestring(int_to_string(-42, buf, 8)); // Will print '37777777726'
terminal_writestring(int_to_string(-42, buf, 16)); // Will print 'FFFFFFD6'
Live example: http://cpp.sh/5ras
You could declare a string and get the pointer to it :
std::string str = std::to_string(count);
str += "\n";
terminal_writestring(str.c_str());

Converting from char string to an array of uint8_t?

I'm reading a string from a file so it's in the form of a char array. I need to tokenize the string and save each char array token as a uint8_t hex value in an array.
char* starting = "001122AABBCC";
// ...
uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
How can I convert from starting to ending? Thanks.
Here is a complete working program. It is based on Rob I's solution, but fixes several problems has been tested to work.
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
const char* starting = "001122AABBCC";
int main()
{
std::string starting_str = starting;
std::vector<unsigned char> ending;
ending.reserve( starting_str.size());
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, 2 );
ending.push_back(::strtol( pair.c_str(), 0, 16 ));
}
for(int i=0; i<ending.size(); ++i) {
printf("0x%X\n", ending[i]);
}
}
strtoul will convert text in any base you choose into bytes. You have to do a little work to chop the input string into individual digits, or you can convert 32 or 64bits at a time.
ps uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
Doesn't mean anything, you aren't storing the data in a uint8 as 'hex', you are storing bytes, it's upto how you (or your debugger) interpretes the binary data
With C++11, you may use std::stoi for that :
std::vector<uint8_t> convert(const std::string& s)
{
if (s.size() % 2 != 0) {
throw std::runtime_error("Bad size argument");
}
std::vector<uint8_t> res;
res.reserve(s.size() / 2);
for (std::size_t i = 0, size = s.size(); i != size; i += 2) {
std::size_t pos = 0;
res.push_back(std::stoi(s.substr(i, 2), &pos, 16));
if (pos != 2) {
throw std::runtime_error("bad character in argument");
}
}
return res;
}
Live example.
I think any canonical answer (w.r.t. the bounty notes) would involve some distinct phases in the solution:
Error checking for valid input
Length check and
Data content check
Element conversion
Output creation
Given the usefulness of such conversions, the solution should probably include some flexibility w.r.t. the types being used and the locale required.
From the outset, given the date of the request for a "more canonical answer" (circa August 2014) liberal use of C++11 will be applied.
An annotated version of the code, with types corresponding to the OP:
std::vector<std::uint8_t> convert(std::string const& src)
{
// error check on the length
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
auto ishex = [] (decltype(*src.begin()) c) {
return std::isxdigit(c, std::locale()); };
// error check on the data contents
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
// allocate the result, initialised to 0 and size it to the correct length
std::vector<std::uint8_t> result(src.length() / 2, 0);
// run the actual conversion
auto str = src.begin(); // track the location in the string
std::for_each(result.begin(), result.end(), [&str](decltype(*result.begin())& element) {
element = static_cast<std::uint8_t>(std::stoul(std::string(str, str + 2), nullptr, 16));
std::advance(str, 2); // next two elements
});
return result;
}
The template version of the code adds flexibility;
template <typename Int /*= std::uint8_t*/,
typename Char = char,
typename Traits = std::char_traits<Char>,
typename Allocate = std::allocator<Char>,
typename Locale = std::locale>
std::vector<Int> basic_convert(std::basic_string<Char, Traits, Allocate> const& src, Locale locale = Locale())
{
using string_type = std::basic_string<Char, Traits, Allocate>;
auto ishex = [&locale] (decltype(*src.begin()) c) {
return std::isxdigit(c, locale); };
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
std::vector<Int> result(src.length() / 2, 0);
auto str = std::begin(src);
std::for_each(std::begin(result), std::end(result), [&str](decltype(*std::begin(result))& element) {
element = static_cast<Int>(std::stoul(string_type(str, str + 2), nullptr, 16));
std::advance(str, 2);
});
return result;
}
The convert() function can then be based on the basic_convert() as follows:
std::vector<std::uint8_t> convert(std::string const& src)
{
return basic_convert<std::uint8_t>(src, std::locale());
}
Live sample.
uint8_t is typically no more than a typedef of an unsigned char. If you're reading characters from a file, you should be able to read them into an unsigned char array just as easily as a signed char array, and an unsigned char array is a uint8_t array.
I'd try something like this:
std::string starting_str = starting;
uint8_t[] ending = new uint8_t[starting_str.length()/2];
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, i+2 );
ending[i/2] = ::strtol( pair.c_str(), 0, 16 );
}
Didn't test it but it looks good to me...
You may add your own conversion from set of char { '0','1',...'E','F' } to uint8_t:
uint8_t ctoa(char c)
{
if( c >= '0' && c <= '9' ) return c - '0';
else if( c >= 'a' && c <= 'f' ) return 0xA + c - 'a';
else if( c >= 'A' && c <= 'F' ) return 0xA + c - 'A';
else return 0;
}
Then it will be easy to convert a string in to array:
uint32_t endingSize = strlen(starting)/2;
uint8_t* ending = new uint8_t[endingSize];
for( uint32_t i=0; i<endingSize; i++ )
{
ending[i] = ( ctoa( starting[i*2] ) << 4 ) + ctoa( starting[i*2+1] );
}
This simple solution should work for your problem
char* starting = "001122AABBCC";
uint8_t ending[12];
// This algo will work for any size of starting
// However, you have to make sure that the ending have enough space.
int i=0;
while (i<strlen(starting))
{
// convert the character to string
char str[2] = "\0";
str[0] = starting[i];
// convert string to int base 16
ending[i]= (uint8_t)atoi(str,16);
i++;
}
uint8_t* ending = static_cast<uint8_t*>(starting);

Find the first occurence of char c in char *s or return -1

For a homework assignment, I need to implement a function which takes a char *s and a char c and return the index of c if found, and -1 otherwise.
Here's my first try:
int IndexOf(const char *s, char c) {
for (int i = 0; *s != '\0'; ++i, ++s) {
if (*s == c) {
return i;
}
}
return -1;
}
Is that an okay implementation, or are there things to improve?
EDIT Sry, didn't mention that I only should use pointer-arithmetic/dereferencing, not something like s[i]. Besides, no use of the Standard Library is allowed.
Yes, it's fine, but you could increment only one variable:
int IndexOf(const char *s, char c) {
for (int i = 0; s[i] != '\0'; ++i) {
if (s[i] == c) {
return i;
}
}
return -1;
}
Won't make any serious difference though, mostly a matter of taste.
Looks fine to me, at least given the signature. Just to add to the "many slightly different ways to do it" roadshow:
int IndexOf(const char *s, const char c) {
for (const char *p = s; *p != 0; ++p) {
if (*p == c) return p - s;
}
return -1;
}
Slight issue - p-s isn't guaranteed to work if the result is sufficiently big, and certainly goes wrong here if the correct result is bigger than INT_MAX. To fix this:
size_t IndexOf(const char *s, const char c) {
for (size_t idx = 0; s[idx] != 0; ++idx) {
if (s[idx] == c) return idx;
}
return SIZE_MAX;
}
As sharptooth says, if for some didactic reason you're not supposed to use the s[i] syntax, then *(s+i) is the same.
Note the slightly subtle point that because the input is required to be nul-terminated, the first occurrence of c cannot be at index SIZE_MAX unless c is 0 (and even then we're talking about a rather unusual C implementation). So it's OK to use SIZE_MAX as a magic value.
All the size issues can be avoided by returning a pointer to the found character (or null) instead of an index (or -1):
char *findchr(const char *s, const char c) {
while (*s) {
if (*s == c) return (char *)s;
++s;
}
return 0;
}
Instead you get an issue with const-safety, the same as the issue that the standard function strchr has with const-safety, and that can be fixed by providing const and non-const overloads.
Here's a way to do it without keeping track of the index:
int IndexOf(const char *s, char c) {
const char *p = s;
while (*p != '\0') {
if (*p == c) {
return p - s;
}
++p;
}
return -1;
}
This is not necessarily better than your solution. Just demonstrating another way to use pointer arithmetic.
FWIW, I would define the function to return size_t rather than int. Also, for real-world usage (not homework), you would probably want to consider what the proper behavior should be if s is a NULL pointer.
Yours is perfectly fine, as far as it goes. You should also write a simple test program that tests for the first char, last char, and a missing char.
Piling on to the 'other ways to do it' group, here is one with no break, a single return, and showing off pointer arithmetic. But, beware: if I were grading your homework, I would grade yours higher than mine. Yours is clear and maintainable, mine needlessly uses ?: and pointer subtraction.
#include <stdio.h>
int IndexOf(const char *s, const char c)
{
const char * const p = s;
while(*s && *s != c) s++;
return (*s) ? s-p : -1;
}
#ifdef TEST
int main()
{
printf("hello, h: %d\n", IndexOf("hello", 'h'));
printf("hello, g: %d\n", IndexOf("hello", 'g'));
printf("hello, o: %d\n", IndexOf("hello", 'o'));
printf("hello, 0: %d\n", IndexOf("hello", 0));
}
#endif
The output of this program is:
hello, h: 0
hello, g: -1
hello, o: 4
hello, 0: -1
There's a typo (index instead of i), but otherwise it looks fine. I doubt you'd be able to do much better than this (both in terms of efficiency and code clarity.)
yes, you shoule return i;
not index.
I think it's just a typo.
Another variant, as an old school C programmer may write it:
int IndexOf(const char *s, char c) {
int i = 0;
while (s[i] && (s[i] != c)) ++i;
return (s[i] == c)?i:-1;
}
Benefices : short, only one variable, only one return point, not break (considered harmful by some people).
For clarity I would probably go for the one below:
int IndexOf(const char *s, char c) {
int result = -1;
for (int i = 0; s[i] != 0; ++i) {
if (s[i] == c) {
result = i;
break;
}
}
return result;
}
It uses a break, but has only one return point, and is still short.
You can also notice I used plain 0 instead of '\0', just to remind that char is a numeric type and that simple quotes are just a shorthand to convert letters to their values. Obviously comparing to 0 can also be replaced by ! in C.
EDIT:
If only pointer arithmetic is allowed, this does not change much... really s[i] is pointer arithmetic... but you can rewrite it *(s+i) if you prefer (or even i[s] if you like obfuscation)
int IndexOf(const char *s, char c) {
int result = -1;
for (int i = 0; *(s+i) != 0; ++i) {
if (*(s+i) == c) {
result = i;
break;
}
}
return result;
}
For a version that works for most cases on x86 systems, one can use:
int IndexOf(char *s, char sr)
{
uint_t *x = (uint_t*)s;
uint_t msk[] = { 0xff, 0xff00, 0xff0000, 0xff000000 };
uint_t f[4] = { (uint_t)sr, (uint_t)sr << 8, (uint_t)sr << 16, (uint_t)sr << 24 };
uint_t c[4], m;
for (;;) {
m = *x;
c[0] = m & msk[0]; if (!c[0]) break; if (c[0] == f[0]) return (char*)x - s;
c[1] = m & msk[1]; if (!c[1]) break; if (c[1] == f[1]) return (char*)x - s + 1;
c[2] = m & msk[2]; if (!c[2]) break; if (c[2] == f[2]) return (char*)x - s + 2;
c[3] = m & msk[3]; if (!c[3]) break; if (c[3] == f[3]) return (char*)x - s + 3;
x++;
}
return -1;
}
Limitations:
It breaks if the string is shorter than four bytes and its address is closer to the end of a MMU page than four bytes.
Also, the mask pattern is little endian, for big endian systems the order for the msk[] and f[] arrays has to be reversed.
In addition, if the hardware can't do misaligned multi-byte accesses (x86 can) then if the string doesn't start at an address that's a multiple of four it'll also fail.
All of these are solveable with more elaborate versions, if you wish...
Why would you ever want to do weird things like that - what's the purpose ?
One does so for optimization. A char-by-char check is simple to code and understand but optimal performance, at least for strings above a certain length, tends to require operations on larger blocks of data. Your standard library code will contain some such "funny" things for that reason. If you compare larger blocks in a single operation (and with e.g. SSE2 instructions, one can extend this to 16 bytes at a time) more work gets done in the same time.