I have noticed interesting thing but I am not sure if it is supposed to happen this way.
I got some code that uses fgetc(); to read symbols from file and store them into an int say l;
l=fgetc(file);
file is opened in read binary mode ("rb"); using
file=fopen("filename", "rb");
then using string stream each symbol is converted into hex format and sent into a string and then stored in a char array;
std::stringstream sl;
sl << std::hex << l; sl >> sll;
char as[i]=sll[i];
The problem is that when fgetc(); reads a symbol that in an ascii table is represented as OC in hex format or FF as char my final char array gets filled with 0's.
In short if char[] element contains 0c the rest of elements are 0's;
I have no idea why this happens. When I edited my file using hex editor and replaced 0c with something else. That file was read properly and all symbols got stored in an array as they were written in the file.
If you could tell how to circumvent such behaviors, I would appreciate that.
Ok. Full code:
#include <stdio.h>
#include<iostream>
#include <string.h>
#include "u.c"
#include <wchar.h>
#include <sstream>
int main() {
unsigned long F, K;
std::string k;
char hhh[300];
char hh1[300];
char kk[64];
int lk;
memset(kk, 0, 64);
FILE *diy;
FILE *ydi;
std::cin >> k;
std::cin >> hhh;
std::cin >> hh1;
lk = k.length();
for (int i = 0; i < lk; i++) {
kk[i] = k[i];
}
;
bof(kk, lk);
diy = fopen(hhh,"rb");
ydi = fopen(hh1,"wb");
int mm = 0;
int l;
int r;
char ll[9];
char rr[9];
memset(ll, 0, 9);
memset(rr, 0, 9);
std::string sll;
std::string slr;
char sL[3];
char sR[3];
int i = 0;
while (!feof(diy)) {
l = fgetc(diy);
r = fgetc(diy);
std::stringstream sl;
std::stringstream sr;
sl << std::hex << l;
sl >> sll;
sL[0] = sll[0];
sL[1] = sll[1];
sr << std::hex << r;
sr >> slr;
sR[0] = slr[0];
sR[1] = slr[1];
if (i == 0) {
ll[0] = sL[0];
ll[1] = sL[1];
ll[2] = sR[0];
ll[3] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if (i==1) {
ll[4] = sL[0];
ll[5] = sL[1];
ll[6] = sR[0];
ll[7] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if (i == 2) {
rr[0] = sL[0];
rr[1] = sL[1];
rr[2] = sR[0];
rr[3] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if(i==3){
rr[4] = sL[0];
rr[5] = sL[1];
rr[6] = sR[0];
rr[7] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
if (i == 3) {
printf(" %s %s \n ", ll, rr); //indicated that my rr array had problems with that 0x0c;
sscanf(ll, "%08lX", &F);
sscanf(rr,"%08lX",&K);
printf(" before %08lx %08lx \n ", F, K);
omg( &F, &K);
printf(" after %20lx %20lx \n ", F, K);
memset(ll, 0, 9);
memset(rr, 0, 9);
char RR[9];
sprintf(RR, "%08lx", F);
char LL[9];
sprintf(LL, "%08lx", K);
printf(" %s %s ", LL, RR);
for (int j = 0; j < 4; j++) {
char ls[3];
ls[0] = LL[j*2];
ls[1] = LL[2*j+1];
int kj;
std::stringstream op;
op << ls;
op >> std::hex >> kj;
fputc(kj, ydi);
}
;
for(int j = 0; j < 4; j++) {
char lr[3];
lr[0] = RR[j*2];
lr[1] = RR[2*j+1];
int kjm;
std::stringstream ip;
ip << lr;
ip >> std::hex >> kjm;
fputc(kjm,ydi);
}
;
memset(LL, 0 ,9);
memset(RR, 0, 9);
}
;
i++;
std::cout << "\n";
if (i == 4) {
i = 0;
}
;
}
;
fclose(diy);
fclose(ydi);
}
;
Since you asked, now you have it.
this code will not compile because you do not have necessary libraries.
simplified code is at the beginning of this post.
those libraries that you do not posses have nothing to do with this issue.
The core problem
You assume that
std::stringstream the_stream;
std::string the_string;
the_stream << std::hex << 0x0C;
the_stream >> the_string;
results in the_string containing "0c". However, it will contain "c".
This means that later on, you end up converting the input "\x0c\xfe" to 'c', '\0', 'f', 'e'. If you use this at any point in a C-style string, of course it ends the string after c.
It was quite hard to debug this program. In the future, please write readable and understandable code. What follows is a non-exhaustive list of the problems I found.
Design problems
while(!feof(file)) is always wrong.
Use variable scoping. If you pull the declaration of sL and sR into the loop, you don't have to reset them. Less code, less potential errors.
You're using a lot of code for something as simple as converting a presumably 8-bit char to its hexadecimal representation. In fact, the only reason you ever use std::stringstream in your code is to do exactly that. Why don't you isolate this functionality to a function?
Irrelevant problems
Because of the poor code formatting, you probably didn't notice the copy-paste errors in the use of sL and sR:
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
Obviously, that last line should read sR[1] = '\0';
Style problems
There are many, many things wrong with your code, but one thing that easily stops people from helping is formatting. The space formatting in particular made your code very hard to read, so I took the liberty to edit the "full" code in your question to have (almost) consistent formatting. A few basic problems become evident:
Use meaningful names for variables and functions. I have no idea what you're trying to do here, which doesn't help in finding the real problem in the code.
Mixing <iostream> and <stdio.h> doesn't help the readability of your code. Choose one or the other. In fact, only ever use <iostream> in C++.
Besides that, use the appropriate header names for C++ (<cstring> and <cwchar> instead of <string.h> and <wchar.h>).
Don't write a semicolon after a compound statement. Instead of
int main(void) {
if (condition) {
one_statement();
another_statement();
};
};
you should write
int main(void) {
if (condition) {
one_statement();
another_statement();
}
}
The ; is part of a separate statement. It also prevents you from using else constructs.
Use initialisers where appropriate. So don't write
char ll[9];
char rr[9];
memset(ll, 0, 9);
memset(rr, 0, 9);
while
char ll[9] = { 0 };
char rr[9] = { 0 };
is more readable.
This 0c problem can be solved by :
changing char[] array where the value is stored to unsigned char[];
when the input is read with string stream this line is very helpfull
<< std::setfill('0') << std::setw(2) <<std::hex ;
When 0c is converted to c setw(); sets the width of a stream and setfill() pads it with 0's.
Related
I recently have need of reading a non-trivially sized file line by line, and to push performance, I decided to follow some advice I've gotten which states that fstreams are much slower than C style I/O. However despite my best efforts, I have not been able to reproduce the same dramatic differences ( ~25% which is large but not insane ). I also tried out fscanf and found out it is slower by a magnitude.
My question is what is causing the performance difference under the covers and why is fscanf abyssmal?
The following is my code (compiled with TDM GCC 5.1.0):
struct file
{
file(const char* str, const char* mode)
: fp(fopen(str, mode)){}
~file(){fclose(fp);}
FILE* fp;
};
constexpr size_t bufsize = 256;
auto readWord(int pos, char*& word, char* const buf)
{
for(; buf[pos] != '\n'; ++word, ++pos)
{
if(pos == bufsize)
return 0;
*word = buf[pos];
}
*word = '\0';
return pos + 1;
}
void readFileC()
{
file in{"inC.txt", "r"};
char buf[bufsize];
char word[40];
char* pw = word;
int sz = fread(buf, 1, bufsize, in.fp);
for(; sz == bufsize; sz = fread(buf, 1, bufsize, in.fp))
{
for(auto nextPos = readWord(0, pw, buf); (nextPos = readWord(nextPos, pw, buf));)
{
//use word here
pw = word;
}
}
for(auto nextPos = readWord(0, pw, buf); nextPos < sz; nextPos = readWord(nextPos, pw, buf))
{
//use word here
pw = word;
}
}
void readFileCline()
{
file in{"inCline.txt", "r"};
char word[40];
while(fscanf(in.fp, "%s", word) != EOF);
//use word here
}
void readFileCpp()
{
ifstream in{"inCpp.txt"};
string word;
while(getline(in, word));
//use word here
}
int main()
{
static constexpr int runs = 1;
auto countC = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileC();
auto dur = steady_clock::now() - start;
countC += duration_cast<milliseconds>(dur).count();
}
cout << "countC: " << countC << endl;
auto countCline = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileCline();
auto dur = steady_clock::now() - start;
countCline += duration_cast<milliseconds>(dur).count();
}
cout << "countCline: " << countCline << endl;
auto countCpp = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileCpp();
auto dur = steady_clock::now() - start;
countCpp += duration_cast<milliseconds>(dur).count();
}
cout << "countCpp: " << countCpp << endl;
}
Ran with a file of size 1070KB these are the results :
countC: 7
countCline: 61
countCpp: 9
EDIT: three test cases now read different files and run for once. The results is exactly 1/20 of reading the same file 20 times. countC is consistently outperforming countCpp even when I flipped the order at which they are performed
fscanf has to parse the format string parameter, looking for all possible % signs, and interpreting them, along with width-specifers, escape characters, expressions, etc. It has to walk the format parameter more-or-less one character at a time, working through a very big set of potential formats. Even if your format is as simple as "%s", it still is a lot of overhead involved relative to the other techniques which simply grab a bunch of bytes with almost no overhead of interpretation / conversion, etc.
This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l
The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.
I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}
Anyone know how to convert a char array to a single int?
char hello[5];
hello = "12345";
int myNumber = convert_char_to_int(hello);
Printf("My number is: %d", myNumber);
There are mulitple ways of converting a string to an int.
Solution 1: Using Legacy C functionality
int main()
{
//char hello[5];
//hello = "12345"; --->This wont compile
char hello[] = "12345";
Printf("My number is: %d", atoi(hello));
return 0;
}
Solution 2: Using lexical_cast(Most Appropriate & simplest)
int x = boost::lexical_cast<int>("12345");
Solution 3: Using C++ Streams
std::string hello("123");
std::stringstream str(hello);
int x;
str >> x;
if (!str)
{
// The conversion failed.
}
If you are using C++11, you should probably use stoi because it can distinguish between an error and parsing "0".
try {
int number = std::stoi("1234abc");
} catch (std::exception const &e) {
// This could not be parsed into a number so an exception is thrown.
// atoi() would return 0, which is less helpful if it could be a valid value.
}
It should be noted that "1234abc" is implicitly converted from a char[] to a std:string before being passed to stoi().
I use :
int convertToInt(char a[1000]){
int i = 0;
int num = 0;
while (a[i] != 0)
{
num = (a[i] - '0') + (num * 10);
i++;
}
return num;;
}
Use sscanf
/* sscanf example */
#include <stdio.h>
int main ()
{
char sentence []="Rudolph is 12 years old";
char str [20];
int i;
sscanf (sentence,"%s %*s %d",str,&i);
printf ("%s -> %d\n",str,i);
return 0;
}
I'll just leave this here for people interested in an implementation with no dependencies.
inline int
stringLength (char *String)
{
int Count = 0;
while (*String ++) ++ Count;
return Count;
}
inline int
stringToInt (char *String)
{
int Integer = 0;
int Length = stringLength(String);
for (int Caret = Length - 1, Digit = 1; Caret >= 0; -- Caret, Digit *= 10)
{
if (String[Caret] == '-') return Integer * -1;
Integer += (String[Caret] - '0') * Digit;
}
return Integer;
}
Works with negative values, but can't handle non-numeric characters mixed in between (should be easy to add though). Integers only.
For example, "mcc" is a char array and "mcc_int" is the integer you want to get.
char mcc[] = "1234";
int mcc_int;
sscanf(mcc, "%d", &mcc_int);
With cstring and cmath:
int charsToInt (char* chars) {
int res{ 0 };
int len = strlen(chars);
bool sig = *chars == '-';
if (sig) {
chars++;
len--;
}
for (int i{ 0 }; i < len; i++) {
int dig = *(chars + i) - '0';
res += dig * (pow(10, len - i - 1));
}
res *= sig ? -1 : 1;
return res;
}
Ascii string to integer conversion is done by the atoi() function.
Long story short you have to use atoi()
ed:
If you are interested in doing this the right way :
char szNos[] = "12345";
char *pNext;
long output;
output = strtol (szNos, &pNext, 10); // input, ptr to next char in szNos (null here), base
I have this code:
string get_md5sum(unsigned char* md) {
char buf[MD5_DIGEST_LENGTH + MD5_DIGEST_LENGTH];
char *bptr;
bptr = buf;
for(int i = 0; i < MD5_DIGEST_LENGTH; i++) {
bptr += sprintf(bptr, "%02x", md[i]);
}
bptr += '\0';
string x(buf);
return x;
}
Unfortunately, this is some C combined with some C++. It does compile, but I don't like the printf and char*'s. I always thought this was not necessary in C++, and that there were other functions and classes to realize this. However, I don't completely understand what is going on with this:
bptr += sprintf(bptr, "%02x", md[i]);
And therefore I don't know how to convert it into C++. Can someone help me out with that?
sprintf returns number of bytes written. So this one writes to bptr two bytes (value of md[i] converted to %02x -> which means hex, padded on 2 chars with zeroes from left), and increases bptr by number of bytes written, so it points on string's (buf) end.
I don't get the bptr += '\0'; line, IMO it should be *bptr = '\0';
in C++ it should be written like this:
using namespace std;
stringstream buf;
for(int i = 0; i < MD5_DIGEST_LENGTH; i++)
{
buf << hex << setfill('0') << setw(2) << static_cast<int>(static_cast<unsigned char>(md[i]));
}
return buf.str();
EDIT: updated my c++ answer
bptr += sprintf(bptr, "%02x", md[i]);
This is printing the character in md[i] as 2 hex characters into the buffer and advancing the buffer pointer by 2. Thus the loop prints out the hex form of the MD5.
bptr += '\0';
That line is probably not doing what you want... its adding 0 to the pointer, giving you the same pointer back...
I'd implememt this something like this.
string get_md5sum(unsigned char* md) {
static const char[] hexdigits="0123456789ABCDEF";
char buf[ 2*MD5_DIGEST_LENGTH ];
for(int i = 0; i < MD5_DIGEST_LENGTH; i++) {
bptr[2*i+0] = hexdigits[ md[i] / 16 ];
bptr[2*i+1] = hexdigits[ md[i] % 16 ];
}
return string(buf,2*MD5_DIGEST_LENGTH );
}
I don't know C++, so without using pointers and strings and stuff, here's a (almost) pseudo-code for you :)
for(int i = 0; i < MD5_DIGEST_LENGTH; i++) {
buf[i*2] = hexdigits[(md[i] & 0xF0) >> 4];
buf[i*2 + 1] = hexdigits[md[i] & 0x0F];
}
does anybody know any commonly used library for C++ that provides methods for encoding and decoding numbers from base 10 to base 32 and viceversa?
Thanks,
Stefano
[Updated] Apparently, the C++ std::setbase() IO manipulator and normal << and >> IO operators only handle bases 8, 10, and 16, and is therefore useless for handling base 32.
So to solve your issue of converting
strings with base 10/32 representation of numbers read from some input to integers in the program
integers in the program to strings with base 10/32 representations to be output
you will need to resort to other functions.
For converting C style strings containing base 2..36 representations to integers, you can use #include <cstdlib> and use the strtol(3) & Co. set of functions.
As for converting integers to strings with arbitrary base... I cannot find an easy answer. printf(3) style format strings only handle bases 8,10,16 AFAICS, just like std::setbase. Anyone?
Did you mean "base 10 to base 32", rather than integer to base32? The latter seems more likely and more useful; by default standard formatted I/O functions generate base 10 string format when dealing with integers.
For the base 32 to integer conversion the standard library strtol() function will do that. For the reciprocal, you don't need a library for something you can easily implement yourself (not everything is a lego brick).
Here's an example, not necessarily the most efficient, but simple;
#include <cstring>
#include <string>
long b32tol( std::string b32 )
{
return strtol( b32.c_str(), 0, 32 ) ;
}
std::string itob32( long i )
{
unsigned long u = *(reinterpret_cast<unsigned long*>)( &i ) ;
std::string b32 ;
do
{
int d = u % 32 ;
if( d < 10 )
{
b32.insert( 0, 1, '0' + d ) ;
}
else
{
b32.insert( 0, 1, 'a' + d - 10 ) ;
}
u /= 32 ;
} while( u > 0 );
return b32 ;
}
#include <iostream>
int main()
{
long i = 32*32*11 + 32*20 + 5 ; // BK5 in base 32
std::string b32 = itob32( i ) ;
long ii = b32tol( b32 ) ;
std::cout << i << std::endl ; // Original
std::cout << b32 << std::endl ; // Converted to b32
std::cout << ii << std::endl ; // Converted back
return 0 ;
}
In direct answer to the original (and now old) question, I don't know of any common library for encoding byte arrays in base32, or for decoding them again afterward. However, I was presented last week with a need to decode SHA1 hash values represented in base32 into their original byte arrays. Here's some C++ code (with some notable Windows/little endian artifacts) that I wrote to do just that, and to verify the results.
Note that in contrast with Clifford's code above, which, if I'm not mistaken, assumes the "base32hex" alphabet mentioned on RFC 4648, my code assumes the "base32" alphabet ("A-Z" and "2-7").
// This program illustrates how SHA1 hash values in base32 encoded form can be decoded
// and then re-encoded in base16.
#include "stdafx.h"
#include <string>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
unsigned char Base16EncodeNibble( unsigned char value )
{
if( value >= 0 && value <= 9 )
return value + 48;
else if( value >= 10 && value <= 15 )
return (value-10) + 65;
else //assert(false);
{
cout << "Error: trying to convert value: " << value << endl;
}
return 42; // sentinal for error condition
}
void Base32DecodeBase16Encode(const string & input, string & output)
{
// Here's the base32 decoding:
// The "Base 32 Encoding" section of http://tools.ietf.org/html/rfc4648#page-8
// shows that every 8 bytes of base32 encoded data must be translated back into 5 bytes
// of original data during a decoding process. The following code does this.
int input_len = input.length();
assert( input_len == 32 );
const char * input_str = input.c_str();
int output_len = (input_len*5)/8;
assert( output_len == 20 );
// Because input strings are assumed to be SHA1 hash values in base32, it is also assumed
// that they will be 32 characters (and bytes in this case) in length, and so the output
// string should be 20 bytes in length.
unsigned char *output_str = new unsigned char[output_len];
char curr_char, temp_char;
long long temp_buffer = 0; //formerly: __int64 temp_buffer = 0;
for( int i=0; i<input_len; i++ )
{
curr_char = input_str[i];
if( curr_char >= 'A' && curr_char <= 'Z' )
temp_char = curr_char - 'A';
if( curr_char >= '2' && curr_char <= '7' )
temp_char = curr_char - '2' + 26;
if( temp_buffer )
temp_buffer <<= 5; //temp_buffer = (temp_buffer << 5);
temp_buffer |= temp_char;
// if 8 encoded characters have been decoded into the temp location,
// then copy them to the appropriate section of the final decoded location
if( (i>0) && !((i+1) % 8) )
{
unsigned char * source = reinterpret_cast<unsigned char*>(&temp_buffer);
//strncpy(output_str+(5*(((i+1)/8)-1)), source, 5);
int start_index = 5*(((i+1)/8)-1);
int copy_index = 4;
for( int x=start_index; x<(start_index+5); x++, copy_index-- )
output_str[x] = source[copy_index];
temp_buffer = 0;
// I could be mistaken, but I'm guessing that the necessity of copying
// in "reverse" order results from temp_buffer's little endian byte order.
}
}
// Here's the base16 encoding (for human-readable output and the chosen validation tests):
// The "Base 16 Encoding" section of http://tools.ietf.org/html/rfc4648#page-10
// shows that every byte original data must be encoded as two characters from the
// base16 alphabet - one charactor for the original byte's high nibble, and one for
// its low nibble.
unsigned char out_temp, chr_temp;
for( int y=0; y<output_len; y++ )
{
out_temp = Base16EncodeNibble( output_str[y] >> 4 ); //encode the high nibble
output.append( 1, static_cast<char>(out_temp) );
out_temp = Base16EncodeNibble( output_str[y] & 0xF ); //encode the low nibble
output.append( 1, static_cast<char>(out_temp) );
}
delete [] output_str;
}
int _tmain(int argc, _TCHAR* argv[])
{
//string input = "J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH";
vector<string> input_b32_strings, output_b16_strings, expected_b16_strings;
input_b32_strings.push_back("J3WEDSJDRMJHE2FUHERUR6YWLGE3USRH");
expected_b16_strings.push_back("4EEC41C9238B127268B4392348FB165989BA4A27");
input_b32_strings.push_back("2HPUCIVW2EVBANIWCXOIQZX6N5NDIUSX");
expected_b16_strings.push_back("D1DF4122B6D12A10351615DC8866FE6F5A345257");
input_b32_strings.push_back("U4BDNCBAQFCPVDBL4FBG3AANGWVESI5J");
expected_b16_strings.push_back("A7023688208144FA8C2BE1426D800D35AA4923A9");
// Use the base conversion tool at http://darkfader.net/toolbox/convert/
// to verify that the above base32/base16 pairs are equivalent.
int num_input_strs = input_b32_strings.size();
for(int i=0; i<num_input_strs; i++)
{
string temp;
Base32DecodeBase16Encode(input_b32_strings[i], temp);
output_b16_strings.push_back(temp);
}
for(int j=0; j<num_input_strs; j++)
{
cout << input_b32_strings[j] << endl;
cout << output_b16_strings[j] << endl;
cout << expected_b16_strings[j] << endl;
if( output_b16_strings[j] != expected_b16_strings[j] )
{
cout << "Error in conversion for string " << j << endl;
}
}
return 0;
}
I'm not aware of any commonly-used library devoted to base32 encoding but Crypto++ includes a public domain base32 encoder and decoder.
I don't use cpp, so correct me if I'm wrong. I wrote this code for the sake of translating it from C# to save my acquaintance the trouble. The original source, that which I used to create these methods, is on a different post, here, on stackoverflow:
https://stackoverflow.com/a/10981113/13766753
That being said, here's my solution:
#include <iostream>
#include <math.h>
class Base32 {
public:
static std::string dict;
static std::string encode(int number) {
std::string result = "";
bool negative = false;
if (number < 0) {
negative = true;
}
number = abs(number);
do {
result = Base32::dict[fmod(floor(number), 32)] + result;
number /= 32;
} while(number > 0);
if (negative) {
result = "-" + result;
}
return result;
}
static int decode(std::string str) {
int result = 0;
int negative = 1;
if (str.rfind("-", 0) == 0) {
negative = -1;
str = str.substr(1);
}
for(char& letter : str) {
result += Base32::dict.find(letter);
result *= 32;
}
return result / 32 * negative;
}
};
std::string Base32::dict = "0123456789abcdefghijklmnopqrstuvwxyz";
int main() {
std::cout << Base32::encode(0) + "\n" << Base32::decode(Base32::encode(0)) << "\n";
return 0;
}