Converting from char string to an array of uint8_t?

Converting from char string to an array of uint8_t? - c++

I'm reading a string from a file so it's in the form of a char array. I need to tokenize the string and save each char array token as a uint8_t hex value in an array.
char* starting = "001122AABBCC";
// ...
uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
How can I convert from starting to ending? Thanks.

Here is a complete working program. It is based on Rob I's solution, but fixes several problems has been tested to work.
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
const char* starting = "001122AABBCC";
int main()
{
std::string starting_str = starting;
std::vector<unsigned char> ending;
ending.reserve( starting_str.size());
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, 2 );
ending.push_back(::strtol( pair.c_str(), 0, 16 ));
}
for(int i=0; i<ending.size(); ++i) {
printf("0x%X\n", ending[i]);
}
}

strtoul will convert text in any base you choose into bytes. You have to do a little work to chop the input string into individual digits, or you can convert 32 or 64bits at a time.
ps uint8_t[] ending = {0x00,0x11,0x22,0xAA,0xBB,0xCC}
Doesn't mean anything, you aren't storing the data in a uint8 as 'hex', you are storing bytes, it's upto how you (or your debugger) interpretes the binary data

With C++11, you may use std::stoi for that :
std::vector<uint8_t> convert(const std::string& s)
{
if (s.size() % 2 != 0) {
throw std::runtime_error("Bad size argument");
}
std::vector<uint8_t> res;
res.reserve(s.size() / 2);
for (std::size_t i = 0, size = s.size(); i != size; i += 2) {
std::size_t pos = 0;
res.push_back(std::stoi(s.substr(i, 2), &pos, 16));
if (pos != 2) {
throw std::runtime_error("bad character in argument");
}
}
return res;
}
Live example.

I think any canonical answer (w.r.t. the bounty notes) would involve some distinct phases in the solution:
Error checking for valid input
Length check and
Data content check
Element conversion
Output creation
Given the usefulness of such conversions, the solution should probably include some flexibility w.r.t. the types being used and the locale required.
From the outset, given the date of the request for a "more canonical answer" (circa August 2014) liberal use of C++11 will be applied.
An annotated version of the code, with types corresponding to the OP:
std::vector<std::uint8_t> convert(std::string const& src)
{
// error check on the length
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
auto ishex = [] (decltype(*src.begin()) c) {
return std::isxdigit(c, std::locale()); };
// error check on the data contents
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
// allocate the result, initialised to 0 and size it to the correct length
std::vector<std::uint8_t> result(src.length() / 2, 0);
// run the actual conversion
auto str = src.begin(); // track the location in the string
std::for_each(result.begin(), result.end(), [&str](decltype(*result.begin())& element) {
element = static_cast<std::uint8_t>(std::stoul(std::string(str, str + 2), nullptr, 16));
std::advance(str, 2); // next two elements
});
return result;
}
The template version of the code adds flexibility;
template <typename Int /*= std::uint8_t*/,
typename Char = char,
typename Traits = std::char_traits<Char>,
typename Allocate = std::allocator<Char>,
typename Locale = std::locale>
std::vector<Int> basic_convert(std::basic_string<Char, Traits, Allocate> const& src, Locale locale = Locale())
{
using string_type = std::basic_string<Char, Traits, Allocate>;
auto ishex = [&locale] (decltype(*src.begin()) c) {
return std::isxdigit(c, locale); };
if ((src.length() % 2) != 0) {
throw std::invalid_argument("conversion error: input is not even length");
}
if (!std::all_of(std::begin(src), std::end(src), ishex)) {
throw std::invalid_argument("conversion error: input values are not not all xdigits");
}
std::vector<Int> result(src.length() / 2, 0);
auto str = std::begin(src);
std::for_each(std::begin(result), std::end(result), [&str](decltype(*std::begin(result))& element) {
element = static_cast<Int>(std::stoul(string_type(str, str + 2), nullptr, 16));
std::advance(str, 2);
});
return result;
}
The convert() function can then be based on the basic_convert() as follows:
std::vector<std::uint8_t> convert(std::string const& src)
{
return basic_convert<std::uint8_t>(src, std::locale());
}
Live sample.

uint8_t is typically no more than a typedef of an unsigned char. If you're reading characters from a file, you should be able to read them into an unsigned char array just as easily as a signed char array, and an unsigned char array is a uint8_t array.

I'd try something like this:
std::string starting_str = starting;
uint8_t[] ending = new uint8_t[starting_str.length()/2];
for (int i = 0 ; i < starting_str.length() ; i+=2) {
std::string pair = starting_str.substr( i, i+2 );
ending[i/2] = ::strtol( pair.c_str(), 0, 16 );
}
Didn't test it but it looks good to me...

You may add your own conversion from set of char { '0','1',...'E','F' } to uint8_t:
uint8_t ctoa(char c)
{
if( c >= '0' && c <= '9' ) return c - '0';
else if( c >= 'a' && c <= 'f' ) return 0xA + c - 'a';
else if( c >= 'A' && c <= 'F' ) return 0xA + c - 'A';
else return 0;
}
Then it will be easy to convert a string in to array:
uint32_t endingSize = strlen(starting)/2;
uint8_t* ending = new uint8_t[endingSize];
for( uint32_t i=0; i<endingSize; i++ )
{
ending[i] = ( ctoa( starting[i*2] ) << 4 ) + ctoa( starting[i*2+1] );
}

This simple solution should work for your problem
char* starting = "001122AABBCC";
uint8_t ending[12];
// This algo will work for any size of starting
// However, you have to make sure that the ending have enough space.
int i=0;
while (i<strlen(starting))
{
// convert the character to string
char str[2] = "\0";
str[0] = starting[i];
// convert string to int base 16
ending[i]= (uint8_t)atoi(str,16);
i++;
}

uint8_t* ending = static_cast<uint8_t*>(starting);

Related

longest palindromic substring. Error: AddressSanitizer, heap overflow

#include<string>
#include<cstring>
class Solution {
void shift_left(char* c, const short unsigned int bits) {
const unsigned short int size = sizeof(c);
memmove(c, c+bits, size - bits);
memset(c+size-bits, 0, bits);
}
public:
string longestPalindrome(string s) {
char* output = new char[s.length()];
output[0] = s[0];
string res = "";
char* n = output;
auto e = s.begin() + 1;
while(e != s.end()) {
char letter = *e;
char* c = n;
(*++n) = letter;
if((letter != *c) && (c == &output[0] || letter != (*--c)) ) {
++e;
continue;
}
while((++e) != s.end() && c != &output[0]) {
if((letter = *e) != (*--c)) {
const unsigned short int bits = c - output + 1;
shift_left(output, bits);
n -= bits;
break;
}
(*++n) = letter;
}
string temp(output);
res = temp.length() > res.length()? temp : res;
shift_left(output, 1);
--n;
}
return res;
}
};
input string longestPalindrome("babad");
the program works fine and prints out "bab" as the longest palindrome but there's a heap overflow somewhere. Error like this appears:
Read of size 6 at ...memory address... thread T0
"babad" is size 5 and after going over this for an hour. I don't see the point where the iteration ever exceeds 5
There is 3 pointers here that iterate.
e as the element of string s.
n which is the pointer to the next char of output.
and c which is a copy of n and decrements until it reaches the address of &output[0].
maybe it's something with the memmove or memset since I've never used it before.
I'm completely lost

TL;DR : mixture of char* and std::string are not really good idea if you don't understand how exactly it works.
If you want to length of string you cant do this const unsigned short int size = sizeof(c); (sizeof will return size of pointer (which is commonly 4 on 32-bit machine and 8 on 64-bit machine). You must do this instead: const size_t size = strlen(c);
Address sanitizers is right that you (indirectly) are trying to get an memory which not belongs to you.
How does constructor of string from char* works?
Answer: char* is considered as c-style string, which means that it must be null '\0' terminated.
More details: constructor of string from char* calls strlen-like function which looks like about this:
https://en.cppreference.com/w/cpp/string/byte/strlen
int strlen(char *begin){
int k = 0;
while (*begin != '\0'){
++k;
++begin;
}
return k;
}
If c-style char* string does not contain '\0' it cause accessing memory which doesn't belongs to you.
How to fix?
Answer (two options):
not use mixture of char* and std::string
char* output = new char[s.length()]; replace with char* output = new char[s.length() + 1]; memset(output, 0, s.length() + 1);
Also you must delete all memory which you newed. So add delete[] output; before return res;

Sorting string vector using integer values at the end of the string in C++

I have a directory containing files {"good_6", good_7", "good_8"...,"good_660"}, after reading it using readdir and storing in a vector I get {"good_10", "good_100", "good_101", "good_102"...}.
What I want to do is to keep the file names as {"good_6", good_7", "good_8"...,"good_660"} in the vector and then replacing first name with 1, second with 2 and so on... such that good_6 will be 1, good_7 will be 2 and so on. but now good_10 corresponds to 1 and good_100 to 2 and so on.
I tried std::sort on vector but the values are already sorted, just not in a way that I desire (based on integer after _). Even if I just get the last integer and sort on that, it will still be sorted as 1, 100, 101...
Any help would be appreciated. Thanks.

You can use a custom function that compares strings with a special case for digits:
#include <ctype.h>
int natural_string_cmp(const char *sa, const char *sb) {
for (;;) {
int a = (unsigned char)*sa++;
int b = (unsigned char)*sb++;
/* simplistic version with overflow issues */
if (isdigit(a) && isdigit(b)) {
const char *sa1 = sa - 1;
const char *sb1 = sb - 1;
unsigned long na = strtoul(sa1, (char **)&sa, 10);
unsigned long nb = strtoul(sb1, (char **)&sb, 10);
if (na == nb) {
if ((sa - sa1) == (sb - sb1)) {
/* XXX should check for '.' */
continue;
} else {
/* Perform regular strcmp to handle 0 :: 00 */
return strcmp(sa1, sb1);
}
} else {
return (na < nb) ? -1 : +1;
}
} else {
if (a == b) {
if (a != '\0')
continue;
else
return 0;
} else {
return (a < b) ? -1 : 1;
}
}
}
}
Depending on your sorting algorithm, you may need to wrap it with an extra level of indirection:
int natural_string_cmp_ind(const void *p1, const void *p2) {
return natural_string_cmp(*(const char * const *)p1, *(const char * const *)p2);
}
char *array[size];
... // array is initialized with filenames
qsort(array, size, sizeof(*array), natural_string_cmp_ind);

I think you can play around with your data structure. For example instead of vector<string>, you can convert your data to vector< pair<int, string> >. Then {"good_6", "good_7", "good_8"...,"good_660"} should be {(6, "good"), (7, "good"), (7, "good")..., (660, "good")}. In the end, you convert it back and do whatever you want.
Another way is just to define your own comparator to do the exact comparison as what you want.

You can use string::replace to replace string "good_" with empty string, and use stoi to convert the rest of the integral part of the string. Lets say the value obtained is x.
Create std::map and populate it in this way myMap[x] = vec_element.
Then you can traverse from m.begin() till m.end() to find sorted order.
Code:
myMap[ stoi( vec[i].replace(0,5,"") )] = vec[i];
for( MapType::iterator it = myMap.begin(); it != myMap.end(); ++it ) {
sortedVec.push_back( it->second );

If I understand your question, you're just having trouble with the sorting and not how you plan to change the names after you sort.
Something like this might work for you:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <tuple>
#include <string.h>
int main()
{
std::vector<std::string> v;
char buffer[64] = {};
for (size_t i = 1; i < 10; ++i)
{
sprintf(buffer, "good_%d", i * 3);
v.push_back(buffer);
sprintf(buffer, "bad_%d", i * 2);
v.push_back(buffer);
}
std::random_shuffle(v.begin(), v.end());
for (const auto& s : v)
{
std::cout << s << "\n";
}
std::sort(v.begin(), v.end(),
[](const std::string& lhs, const std::string& rhs)
{
//This assumes a lot about the contents of the strings
//and has no error checking just to keep things short.
size_t l_pos = lhs.find('_');
size_t r_pos = rhs.find('_');
std::string l_str = lhs.substr(0, l_pos);
std::string r_str = rhs.substr(0, r_pos);
int l_num = std::stoi(lhs.substr(l_pos + 1));
int r_num = std::stoi(rhs.substr(r_pos + 1));
return std::tie(l_str, l_num) < std::tie(r_str, r_num);
});
std::cout << "-----\n";
for (const auto& s : v)
{
std::cout << s << "\n";
}
return 0;
}

Managed to do it with the following compare function:
bool numericStringComapre(const std::string& s1, const std::string& s2)
{
size_t foundUnderScore = s1.find_last_of("_");
size_t foundDot = s1.find_last_of(".");
string s11 = s1.substr(foundUnderScore+1, foundDot - foundUnderScore - 1);
foundUnderScore = s2.find_last_of("_");
foundDot = s2.find_last_of(".");
string s22 = s2.substr(foundUnderScore+1, foundDot-foundUnderScore - 1);
int i1 = stoi(s11);
int i2 = stoi(s22);
if (i1 < i2) return true;
return false;
}
full file name was good_0.png, hence that find_last_of(".").

uintx_t to const char* in freestanding c++ using GNU compiler

so I am trying to convert some integers in to character arrays that my terminal can write. so I can see the value of my codes calculations for debugging purposes when its running.
as in if the int_t count = 57 I want the terminal to write 57.
so char* would be an array of character of 5 and 7
The kicker here though is that this is in an freestanding environment so that means no standard c++ library.
EDIT:
this means No std::string, no c_str, no _tostring, I cant just print integers.
The headers I have access to are iso646,stddef,float,limits,stdint,stdalign, stdarg, stdbool and stdnoreturn
Ive tried a few things from casting the int as an const char*, witch just led to random characters being displayed. To feeding my compiler different headers from the GCC collection but they just keeped needing other headers that I continued feeding it until I did not know what header the compiler wanted.
so here is where the code needs to be used to be printed.
uint8_t count = 0;
while (true)
{
terminal_setcolor(3);
terminal_writestring("hello\n");
count++;
terminal_writestring((const char*)count);
terminal_writestring("\n");
}
any advice with this would be greatly appreciated.
I am using an gnu, g++ cross compiler targeted at 686-elf and I guess I am using C++11 since I have access to stdnoreturn.h but it could be C++14 since I only just built the compiler with the latest gnu software dependencies.

Without C/C++ Standard Library you have no options except writing conversion function manually, e.g.:
template <int N>
const char* uint_to_string(
unsigned int val,
char (&str)[N],
unsigned int base = 10)
{
static_assert(N > 1, "Buffer too small");
static const char* const digits = "0123456789ABCDEF";
if (base < 2 || base > 16) return nullptr;
int i = N - 1;
str[i] = 0;
do
{
--i;
str[i] = digits[val % base];
val /= base;
}
while (val != 0 && i > 0);
return val == 0 ? str + i : nullptr;
}
template <int N>
const char* int_to_string(
int val,
char (&str)[N],
unsigned int base = 10)
{
// Output as unsigned.
if (val >= 0) return uint_to_string(val, str, base);
// Output as binary representation if base is not decimal.
if (base != 10) return uint_to_string(val, str, base);
// Output signed decimal representation.
const char* res = uint_to_string(-val, str, base);
// Buffer has place for minus sign
if (res > str)
{
const auto i = res - str - 1;
str[i] = '-';
return str + i;
}
else return nullptr;
}
Usage:
char buf[100];
terminal_writestring(int_to_string(42, buf)); // Will print '42'
terminal_writestring(int_to_string(42, buf, 2)); // Will print '101010'
terminal_writestring(int_to_string(42, buf, 8)); // Will print '52'
terminal_writestring(int_to_string(42, buf, 16)); // Will print '2A'
terminal_writestring(int_to_string(-42, buf)); // Will print '-42'
terminal_writestring(int_to_string(-42, buf, 2)); // Will print '11111111111111111111111111010110'
terminal_writestring(int_to_string(-42, buf, 8)); // Will print '37777777726'
terminal_writestring(int_to_string(-42, buf, 16)); // Will print 'FFFFFFD6'
Live example: http://cpp.sh/5ras

You could declare a string and get the pointer to it :
std::string str = std::to_string(count);
str += "\n";
terminal_writestring(str.c_str());

String Formatting using C / C++

Recently I was asked in an interview to convert the string "aabbbccccddddd" to "a2b3c4d5". The goal is to replace each repeated character with a single occurrence and a repeat count. Here 'a' is repeated twice in the input, so we have to write it as 'a2' in the output. Also I need to write a function to reverse the format back to the original one (e.g. from the string "a2b3c4d5" to "aabbbccccddddd"). I was free to use either C or C++. I wrote the below code, but the interviewer seemed to be not very happy with this. He asked me to try a smarter way than this.
In the below code, I used formatstring() to eliminate repeated chars by just adding the repeated count and used reverseformatstring() to convert back to the original string.
void formatstring(char* target, const char* source) {
int charRepeatCount = 1;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// Compare the current char with previous one,
// increment repeat count
if (*source == *(source-1)) {
charRepeatCount++;
source++;
} else {
if (charRepeatCount > 1) {
// Convert repeat count to string, append to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
charRepeatCount = 1; // Reset repeat count
}
*target = *source;
source++; target++;
}
}
}
if (charRepeatCount > 1) {
// Convert repeat count to string, append it to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
}
*target = '\0';
}
void reverseformatstring(char* target, const char* source) {
int charRepeatCount = 0;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// If current char is alpha, add it to the target
if (isalpha(*source)) {
*target = *source;
target++; source++;
} else {
// Get repeat count of previous character
while (isdigit(*source)) {
int currentDigit = (*source) - '0';
charRepeatCount = (charRepeatCount == 0) ?
currentDigit : (charRepeatCount * 10 + currentDigit);
source++;
}
// Decrement repeat count as we have already written
// the first unique char to the target
charRepeatCount--;
// Repeat the last char for this count
while (charRepeatCount > 0) {
*target = *(target - 1);
target++;
charRepeatCount--;
}
}
}
}
*target = '\0';
}
I didn't find any issues with above code. Is there any other better way of doing this?

The approach/algorithm is fine, perhaps you could refine and shrink the code a bit (by doing something simpler, there's no need to solve this in an overly complex way). And choose an indentation style that actually makes sense.
A C solution:
void print_transform(const char *input)
{
for (const char *s = input; *s;) {
char current = *s;
size_t count = 1;
while (*++s == current) {
count++;
}
if (count > 1) {
printf("%c%zu", current, count);
} else {
putc(current, stdout);
}
}
putc('\n', stdout);
}
(This can be easily modified so that it returns the transformed string instead, or writes it to a long enough buffer.)
A C++ solution:
std::string transform(const std::string &input)
{
std::stringstream ss;
std::string::const_iterator it = input.begin();
while (it != input.end()) {
char current = *it;
std::size_t count = 1;
while (++it != input.end() && *it == current) {
count++;
}
if (count > 1) {
ss << current << count;
} else {
ss << current;
}
}
return ss.str();
}

Since several others have suggested very reasonable alternatives, I'd like to offer some opinions on what I think is your underlying question: "He asked me to try a smarter way than this.... Is there any other better way of doing this?"
When I interview a developer, I'm looking for signals that tell me how she approaches a problem:
Most important, as H2CO3 noted, is correctness: will the code work? I'm usually happy to overlook small syntax errors (forgotten semicolons, mismatched parens or braces, and so on) if the algorithm is sensible.
Proper use of the language, especially if the candidate claims expertise or has had extensive experience. Does he understand and use idioms appropriately to write straightforward, uncomplicated code?
Can she explain her train of thought as she formulates her solution? Is it logical and coherent, or is it a shotgun approach? Is she able and willing to communicate well?
Does he account for edge cases? And if so, does the intrinsic algorithm handle them, or is everything a special case? Although I'm happiest if the initial algorithm "just works" for all cases, I think it's perfectly acceptable to start with a verbose approach that covers all cases (or simply to add a "TODO" comment, noting that more work needs to be done), and then simplifying later, when it may be easier to notice patterns or duplicated code.
Does she consider error-handling? Usually, if a candidate starts by asking whether she can assume the input is valid, or with a comment like, "If this were production code, I'd check for x, y, and z problems," I'll ask what she would do, then suggest she focus on a working algorithm for now and (maybe) come back to that later. But I'm disappointed if a candidate doesn't mention it.
Testing, testing, testing! How will the candidate verify his code works? Does he walk through the code and suggest test cases, or do I need to remind him? Are the test cases sensible? Will they cover the edge cases?
Optimization: as a final step, after everything works and has been validated, I'll sometimes ask the candidate if she can improve her code. Bonus points if she suggests it without my prodding; negative points if she spends a lot of effort worrying about it before the code even works.
Applying these ideas to the code you wrote, I'd make these observations:
Using const appropriately is a plus, as it shows familiarity with the language. During an interview I'd probably ask a question or two about why/when to use it.
The proper use of char pointers throughout the code is a good sign. I tend to be pedantic about making the data types explicit within comparisons, particularly during interviews, so I'm happy to see, e.g.
while (*source != '\0') rather than the (common, correct, but IMO less careful) while(*source).
isFirstChar is a bit of a red flag, based on my "edge cases" point. When you declare a boolean to keep track of the code's state, there's often a way of re-framing the problem to handle the condition intrinsically. In this case, you can use charRepeatCount to decide if this is the first character in a possible series, so you won't need to test explicitly for the first character in the string.
By the same token, repeated code can also be a sign that an algorithm can be simplified. One improvement would be to move the conversion of charRepeatCount to a separate function. See below for an even better solution.
It's funny, but I've found that candidates rarely add comments to their code during interviews. Kudos for helpful ones, negative points for those of the ilk "Increment the counter" that add verbosity without information. It's generally accepted that, unless you're doing something weird (in which case you should reconsider what you've written), you should assume the person who reads your code is familiar with the programming language. So comments should explain your thought process, not translate the code back to English.
Excessive levels of nested conditionals or loops can also be a warning. You can eliminate one level of nesting by comparing each character to the next one instead of the previous one. This works even for the last character in the string, because it will be compared to the terminating null character, which won't match and can be treated like any other character.
There are simpler ways to convert charRepeatCount from an int to a string. For example, _snprintf() returns the number of bytes it "prints" to the string, so you can use
target += _snprintf(target, 10, "%i", charRepeatCount);
In the reversing function, you've used the ternary operator perfectly ... but it's not necessary to special-case the zero value: the math is the same regardless of its value. Again, there are also standard utility functions like atoi() that will convert the leading digits of a string into an integer for you.
Experienced developers will often include the increment or decrement operation as part of the condition in a loop, rather than as a separate statement at the bottom: while(charRepeatCount-- > 0). I'd raise an eyebrow but give you a point or two for humor and personality if you wrote this using the slide operator: while (charRepeatCount --> 0). But only if you'd promise not to use it in production.
Good luck with your interviewing!

I think your code is too complex for the task. Here's my approach (using C):
#include <ctype.h>
#include <stdio.h>
void format_str(char *target, char *source) {
int count;
char last;
while (*source != '\0') {
*target = *source;
last = *target;
target++;
source++;
for (count = 1; *source == last; source++, count++)
; /* Intentionally left blank */
if (count > 1)
target += sprintf(target, "%d", count);
}
*target = '\0';
}
void convert_back(char *target, char *source) {
char last;
int val;
while (*source != '\0') {
if (!isdigit((unsigned char) *source)) {
last = *source;
*target = last;
target++;
source++;
}
else {
for (val = 0; isdigit((unsigned char) *source); val = val*10 + *source - '0', source++)
; /* Intentionally left blank */
while (--val) {
*target = last;
target++;
}
}
}
*target = '\0';
}
format_str compresses the string, and convert_back uncompresses it.

Your code "works", but it doesn't adhere to some common patterns used in C++. You should have:
used std::string instead of plain char* array(s)
pass that string as const reference to avoid modification, since you write the result somewhere else;
use C++11 features such as ranged based for loops and lambdas as well.
I think the interviewer's purpose was to test your ability to deal with the C++11 standard, since the algorithm itself was pretty trivial.

Perhaps the interviewer wanted to test your knowledge of existing standard library tools. Here's how my take could look in C++:
#include <string>
#include <sstream>
#include <algorithm>
#include <iostream>
typedef std::string::const_iterator Iter;
std::string foo(Iter first, Iter last)
{
Iter it = first;
std::ostringstream result;
while (it != last) {
it = std::find_if(it, last, [=](char c){ return c != *it; });
result << *first << (it - first);
first = it;
}
return result.str();
}
int main()
{
std::string s = "aaabbbbbbccddde";
std::cout << foo(s.begin(), s.end());
}
An extra check is needed for empty input.

try this
std::string str="aabbbccccddddd";
for(int i=0;i<255;i++)
{
int c=0;
for(int j=0;j<str.length();j++)
{
if(str[j] == i)
c++;
}
if(c>0)
printf("%c%d",i,c);
}

My naive approach:
void pack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
int RepeatCount = 1;
while( '\0' != *Src_Ptr ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
Dst_Ptr += sprintf( Dst_Ptr, "%i", RepeatCount );
RepeatCount = 1;
}
}
*Dst_Ptr = '\0';
};
void unpack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
while( '\0' != *Src_Ptr ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
*Dst_Ptr = '\0';
};
But if interviewer asks for error-handling than solution turns to be much more complex (and ugly). My portable approach:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
// for MSVC
#ifdef _WIN32
#define snprintf sprintf_s
#endif
int pack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
int RepeatCount = 1;
// don't forget about buffers intercrossing
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) ) {
return 1;
}
// source string must contain no digits
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 \
&& !isdigit( *Src_Ptr ) && !Err ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
int res = snprintf( Dst_Ptr, DstBuf_End - Dst_Ptr - 1, "%i" \
, RepeatCount );
if( res < 0 ) {
Err = 1;
} else {
Dst_Ptr += res;
RepeatCount = 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int unpack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
// don't forget about buffers intercrossing
// first character of source string must be non-digit
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) || isdigit( SrcStr[0] ) ) {
return 1;
}
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 && !Err ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
if( !repeat_count || repeat_count - 1 > DstBuf_End - Dst_Ptr - 1 ) {
Err = 1;
} else {
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int main() {
char str[] = "aabbbccccddddd";
char buf1[128] = {0};
char buf2[128] = {0};
pack( str, buf1, 128 );
printf( "pack: %s -> %s\n", str, buf1 );
unpack( buf1, buf2, 128 );
printf( "unpack: %s -> %s\n", buf1, buf2 );
return 0;
}
Test: http://ideone.com/Y7FNE3. Also works in MSVC.

Try to make do with less boilerplate:
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;
template<typename in_iter,class ostream>
void torle(in_iter i, ostream &&o)
{
while (char c = *i++) {
size_t n = 1;
while ( *i == c )
++n, ++i;
o<<c<<n;
}
}
template<class istream, typename out_iter>
void fromrle(istream &&i, out_iter o)
{
char c; size_t n;
while (i>>c>>n)
while (n--) *o++=c;
}
int main()
{
typedef ostream_iterator<char> to;
string line; stringstream converted;
while (getline(cin,line)) {
torle(begin(line),converted);
cout<<converted.str()<<'\n';
fromrle(converted,ostream_iterator<char>(cout));
cout<<'\n';
}
}

How do I encode a string to base64 using only boost?

I'm trying to quickly encode a simple ASCII string to base64 (Basic HTTP Authentication using boost::asio) and not paste in any new code code or use any libraries beyond boost.
Simple signature would look like:
string Base64Encode(const string& text);
Again I realize the algorithm is easy and there are many libraries/examples doing this but I'm looking for a clean boost example. I found boost serialization but no clear examples there or from Google.
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/dataflow.html
Is this possible without adding the actual base64 algorithm explicitly to my code?

Here is my solution. It uses the same basic technique as the other solutions on this page, but solves the problem of the padding in what I feel is a more elegant way. This solution also makes use of C++11.
I think that most of the code is self explanatory. The bit of math in the encode function calculates the number of '=' characters we need to add. The modulo 3 of val.size() the remainder, but what we really want is the difference between val.size() and the next number divisible by three. Since we have the remainder we can just subtract the remainder from 3, but that leaves 3 in the case that we want 0, so we have to modulo 3 one more time.
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/algorithm/string.hpp>
std::string decode64(const std::string &val) {
using namespace boost::archive::iterators;
using It = transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>;
return boost::algorithm::trim_right_copy_if(std::string(It(std::begin(val)), It(std::end(val))), [](char c) {
return c == '\0';
});
}
std::string encode64(const std::string &val) {
using namespace boost::archive::iterators;
using It = base64_from_binary<transform_width<std::string::const_iterator, 6, 8>>;
auto tmp = std::string(It(std::begin(val)), It(std::end(val)));
return tmp.append((3 - val.size() % 3) % 3, '=');
}

I improved the example in the link you provided a little:
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/archive/iterators/ostream_iterator.hpp>
#include <sstream>
#include <string>
#include <iostream>
int main()
{
using namespace boost::archive::iterators;
std::string test = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ornare ullamcorper ipsum ac gravida.";
std::stringstream os;
typedef
insert_linebreaks< // insert line breaks every 72 characters
base64_from_binary< // convert binary values to base64 characters
transform_width< // retrieve 6 bit integers from a sequence of 8 bit bytes
const char *,
6,
8
>
>
,72
>
base64_text; // compose all the above operations in to a new iterator
std::copy(
base64_text(test.c_str()),
base64_text(test.c_str() + test.size()),
ostream_iterator<char>(os)
);
std::cout << os.str();
}
This prints the string encoded base64 nicely formated with a line break every 72 characters onto the console, ready to be put into an email. If you don't like the linebreaks, just stay with this:
typedef
base64_from_binary<
transform_width<
const char *,
6,
8
>
>
base64_text;

You could use beast's implementation.
For boost version 1.71, the functions are:
boost::beast::detail::base64::encode()
boost::beast::detail::base64::encoded_size()
boost::beast::detail::base64::decode()
boost::beast::detail::base64::decoded_size()
From #include <boost/beast/core/detail/base64.hpp>
For older versions back to beast's inclusion in 1.66, the functions are:
boost::beast::detail::base64_encode()
boost::beast::detail::base64_decode()
From #include <boost/beast/core/detail/base64.hpp>

Another solution using boost base64 encode decode:
const std::string base64_padding[] = {"", "==","="};
std::string base64_encode(const std::string& s) {
namespace bai = boost::archive::iterators;
std::stringstream os;
// convert binary values to base64 characters
typedef bai::base64_from_binary
// retrieve 6 bit integers from a sequence of 8 bit bytes
<bai::transform_width<const char *, 6, 8> > base64_enc; // compose all the above operations in to a new iterator
std::copy(base64_enc(s.c_str()), base64_enc(s.c_str() + s.size()),
std::ostream_iterator<char>(os));
os << base64_padding[s.size() % 3];
return os.str();
}
std::string base64_decode(const std::string& s) {
namespace bai = boost::archive::iterators;
std::stringstream os;
typedef bai::transform_width<bai::binary_from_base64<const char *>, 8, 6> base64_dec;
unsigned int size = s.size();
// Remove the padding characters, cf. https://svn.boost.org/trac/boost/ticket/5629
if (size && s[size - 1] == '=') {
--size;
if (size && s[size - 1] == '=') --size;
}
if (size == 0) return std::string();
std::copy(base64_dec(s.data()), base64_dec(s.data() + size),
std::ostream_iterator<char>(os));
return os.str();
}
And here are the test cases:
std::string t_e[TESTSET_SIZE] = {
""
, "M"
, "Ma"
, "Man"
, "pleasure."
, "leasure."
, "easure."
, "asure."
, "sure."
};
std::string t_d[TESTSET_SIZE] = {
""
, "TQ=="
, "TWE="
, "TWFu"
, "cGxlYXN1cmUu"
, "bGVhc3VyZS4="
, "ZWFzdXJlLg=="
, "YXN1cmUu"
, "c3VyZS4="
};
Hope this helps

For anyone coming here from Google, here's my base64 encode/decode functions based off boost. It handles padding correctly as per DanDan's comment above. The decode functions stops when it encounters an illegal character, and returns a pointer to that character, which is great if you're parsing base64 in json or xml.
///
/// Convert up to len bytes of binary data in src to base64 and store it in dest
///
/// \param dest Destination buffer to hold the base64 data.
/// \param src Source binary data.
/// \param len The number of bytes of src to convert.
///
/// \return The number of characters written to dest.
/// \remarks Does not store a terminating null in dest.
///
uint base64_encode(char* dest, const char* src, uint len)
{
char tail[3] = {0,0,0};
typedef base64_from_binary<transform_width<const char *, 6, 8> > base64_enc;
uint one_third_len = len/3;
uint len_rounded_down = one_third_len*3;
uint j = len_rounded_down + one_third_len;
std::copy(base64_enc(src), base64_enc(src + len_rounded_down), dest);
if (len_rounded_down != len)
{
uint i=0;
for(; i < len - len_rounded_down; ++i)
{
tail[i] = src[len_rounded_down+i];
}
std::copy(base64_enc(tail), base64_enc(tail + 3), dest + j);
for(i=len + one_third_len + 1; i < j+4; ++i)
{
dest[i] = '=';
}
return i;
}
return j;
}
///
/// Convert null-terminated string src from base64 to binary and store it in dest.
///
/// \param dest Destination buffer
/// \param src Source base64 string
/// \param len Pointer to unsigned int representing size of dest buffer. After function returns this is set to the number of character written to dest.
///
/// \return Pointer to first character in source that could not be converted (the terminating null on success)
///
const char* base64_decode(char* dest, const char* src, uint* len)
{
uint output_len = *len;
typedef transform_width<binary_from_base64<const char*>, 8, 6> base64_dec;
uint i=0;
try
{
base64_dec src_it(src);
for(; i < output_len; ++i)
{
*dest++ = *src_it;
++src_it;
}
}
catch(dataflow_exception&)
{
}
*len = i;
return src + (i+2)/3*4; // bytes in = bytes out / 3 rounded up * 4
}

While the encoding works, the decoder is certainly broken. Also there is a bug opened: https://svn.boost.org/trac/boost/ticket/5629.
I have not found a fix for that.

This is another answer:
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
std::string ToBase64(const std::vector<unsigned char>& binary)
{
using namespace boost::archive::iterators;
using It = base64_from_binary<transform_width<std::vector<unsigned char>::const_iterator, 6, 8>>;
auto base64 = std::string(It(binary.begin()), It(binary.end()));
// Add padding.
return base64.append((3 - binary.size() % 3) % 3, '=');
}
std::vector<unsigned char> FromBase64(const std::string& base64)
{
using namespace boost::archive::iterators;
using It = transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>;
auto binary = std::vector<unsigned char>(It(base64.begin()), It(base64.end()));
// Remove padding.
auto length = base64.size();
if(binary.size() > 2 && base64[length - 1] == '=' && base64[length - 2] == '=')
{
binary.erase(binary.end() - 2, binary.end());
}
else if(binary.size() > 1 && base64[length - 1] == '=')
{
binary.erase(binary.end() - 1, binary.end());
}
return binary;
}

Base64 encode text and data
const std::string base64_padding[] = {"", "==","="};
std::string base64EncodeText(std::string text) {
using namespace boost::archive::iterators;
typedef std::string::const_iterator iterator_type;
typedef base64_from_binary<transform_width<iterator_type, 6, 8> > base64_enc;
std::stringstream ss;
std::copy(base64_enc(text.begin()), base64_enc(text.end()), ostream_iterator<char>(ss));
ss << base64_padding[text.size() % 3];
return ss.str();
}
std::string base64EncodeData(std::vector<uint8_t> data) {
using namespace boost::archive::iterators;
typedef std::vector<uint8_t>::const_iterator iterator_type;
typedef base64_from_binary<transform_width<iterator_type, 6, 8> > base64_enc;
std::stringstream ss;
std::copy(base64_enc(data.begin()), base64_enc(data.end()), ostream_iterator<char>(ss));
ss << base64_padding[data.size() % 3];
return ss.str();
}

I modified the Answer 8 because it's not functional on my platform.
const std::string base64_padding[] = {"", "==","="};
std::string *m_ArchiveData;
/// \brief To Base64 string
bool Base64Encode(string* output)
{
try
{
UInt32 iPadding_Mask = 0;
typedef boost::archive::iterators::base64_from_binary
<boost::archive::iterators::transform_width<const char *, 6, 8> > Base64EncodeIterator;
UInt32 len = m_ArchiveData->size();
std::stringstream os;
std::copy(Base64EncodeIterator(m_ArchiveData->c_str()),
Base64EncodeIterator(m_ArchiveData->c_str()+len),
std::ostream_iterator<char>(os));
iPadding_Mask = m_ArchiveData->size() % 3;
os << base64_padding[iPadding_Pask];
*output = os.str();
return output->empty() == false;
}
catch (...)
{
PLOG_ERROR_DEV("unknown error happens");
return false;
}
}
/// \brief From Base64 string
bool mcsf_data_header_byte_stream_archive::Base64Decode(const std::string *input)
{
try
{
std::stringstream os;
bool bPaded = false;
typedef boost::archive::iterators::transform_width<boost::archive::iterators::
binary_from_base64<const char *>, 8, 6> Base64DecodeIterator;
UInt32 iLength = input->length();
// Remove the padding characters, cf. https://svn.boost.org/trac/boost/ticket/5629
if (iLength && (*input)[iLength-1] == '=') {
bPaded = true;
--iLength;
if (iLength && (*input)[iLength - 1] == '=')
{
--iLength;
}
}
if (iLength == 0)
{
return false;
}
if(bPaded)
{
iLength --;
}
copy(Base64DecodeIterator(input->c_str()) ,
Base64DecodeIterator(input->c_str()+iLength),
ostream_iterator<char>(os));
*m_ArchiveData = os.str();
return m_ArchiveData->empty() == false;
}
catch (...)
{
PLOG_ERROR_DEV("unknown error happens");
return false;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js