I recently have need of reading a non-trivially sized file line by line, and to push performance, I decided to follow some advice I've gotten which states that fstreams are much slower than C style I/O. However despite my best efforts, I have not been able to reproduce the same dramatic differences ( ~25% which is large but not insane ). I also tried out fscanf and found out it is slower by a magnitude.
My question is what is causing the performance difference under the covers and why is fscanf abyssmal?
The following is my code (compiled with TDM GCC 5.1.0):
struct file
{
file(const char* str, const char* mode)
: fp(fopen(str, mode)){}
~file(){fclose(fp);}
FILE* fp;
};
constexpr size_t bufsize = 256;
auto readWord(int pos, char*& word, char* const buf)
{
for(; buf[pos] != '\n'; ++word, ++pos)
{
if(pos == bufsize)
return 0;
*word = buf[pos];
}
*word = '\0';
return pos + 1;
}
void readFileC()
{
file in{"inC.txt", "r"};
char buf[bufsize];
char word[40];
char* pw = word;
int sz = fread(buf, 1, bufsize, in.fp);
for(; sz == bufsize; sz = fread(buf, 1, bufsize, in.fp))
{
for(auto nextPos = readWord(0, pw, buf); (nextPos = readWord(nextPos, pw, buf));)
{
//use word here
pw = word;
}
}
for(auto nextPos = readWord(0, pw, buf); nextPos < sz; nextPos = readWord(nextPos, pw, buf))
{
//use word here
pw = word;
}
}
void readFileCline()
{
file in{"inCline.txt", "r"};
char word[40];
while(fscanf(in.fp, "%s", word) != EOF);
//use word here
}
void readFileCpp()
{
ifstream in{"inCpp.txt"};
string word;
while(getline(in, word));
//use word here
}
int main()
{
static constexpr int runs = 1;
auto countC = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileC();
auto dur = steady_clock::now() - start;
countC += duration_cast<milliseconds>(dur).count();
}
cout << "countC: " << countC << endl;
auto countCline = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileCline();
auto dur = steady_clock::now() - start;
countCline += duration_cast<milliseconds>(dur).count();
}
cout << "countCline: " << countCline << endl;
auto countCpp = 0;
for(int i = 0; i < runs; ++i)
{
auto start = steady_clock::now();
readFileCpp();
auto dur = steady_clock::now() - start;
countCpp += duration_cast<milliseconds>(dur).count();
}
cout << "countCpp: " << countCpp << endl;
}
Ran with a file of size 1070KB these are the results :
countC: 7
countCline: 61
countCpp: 9
EDIT: three test cases now read different files and run for once. The results is exactly 1/20 of reading the same file 20 times. countC is consistently outperforming countCpp even when I flipped the order at which they are performed
fscanf has to parse the format string parameter, looking for all possible % signs, and interpreting them, along with width-specifers, escape characters, expressions, etc. It has to walk the format parameter more-or-less one character at a time, working through a very big set of potential formats. Even if your format is as simple as "%s", it still is a lot of overhead involved relative to the other techniques which simply grab a bunch of bytes with almost no overhead of interpretation / conversion, etc.
Related
I have noticed interesting thing but I am not sure if it is supposed to happen this way.
I got some code that uses fgetc(); to read symbols from file and store them into an int say l;
l=fgetc(file);
file is opened in read binary mode ("rb"); using
file=fopen("filename", "rb");
then using string stream each symbol is converted into hex format and sent into a string and then stored in a char array;
std::stringstream sl;
sl << std::hex << l; sl >> sll;
char as[i]=sll[i];
The problem is that when fgetc(); reads a symbol that in an ascii table is represented as OC in hex format or FF as char my final char array gets filled with 0's.
In short if char[] element contains 0c the rest of elements are 0's;
I have no idea why this happens. When I edited my file using hex editor and replaced 0c with something else. That file was read properly and all symbols got stored in an array as they were written in the file.
If you could tell how to circumvent such behaviors, I would appreciate that.
Ok. Full code:
#include <stdio.h>
#include<iostream>
#include <string.h>
#include "u.c"
#include <wchar.h>
#include <sstream>
int main() {
unsigned long F, K;
std::string k;
char hhh[300];
char hh1[300];
char kk[64];
int lk;
memset(kk, 0, 64);
FILE *diy;
FILE *ydi;
std::cin >> k;
std::cin >> hhh;
std::cin >> hh1;
lk = k.length();
for (int i = 0; i < lk; i++) {
kk[i] = k[i];
}
;
bof(kk, lk);
diy = fopen(hhh,"rb");
ydi = fopen(hh1,"wb");
int mm = 0;
int l;
int r;
char ll[9];
char rr[9];
memset(ll, 0, 9);
memset(rr, 0, 9);
std::string sll;
std::string slr;
char sL[3];
char sR[3];
int i = 0;
while (!feof(diy)) {
l = fgetc(diy);
r = fgetc(diy);
std::stringstream sl;
std::stringstream sr;
sl << std::hex << l;
sl >> sll;
sL[0] = sll[0];
sL[1] = sll[1];
sr << std::hex << r;
sr >> slr;
sR[0] = slr[0];
sR[1] = slr[1];
if (i == 0) {
ll[0] = sL[0];
ll[1] = sL[1];
ll[2] = sR[0];
ll[3] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if (i==1) {
ll[4] = sL[0];
ll[5] = sL[1];
ll[6] = sR[0];
ll[7] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if (i == 2) {
rr[0] = sL[0];
rr[1] = sL[1];
rr[2] = sR[0];
rr[3] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
if(i==3){
rr[4] = sL[0];
rr[5] = sL[1];
rr[6] = sR[0];
rr[7] = sR[1];
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
}
;
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
if (i == 3) {
printf(" %s %s \n ", ll, rr); //indicated that my rr array had problems with that 0x0c;
sscanf(ll, "%08lX", &F);
sscanf(rr,"%08lX",&K);
printf(" before %08lx %08lx \n ", F, K);
omg( &F, &K);
printf(" after %20lx %20lx \n ", F, K);
memset(ll, 0, 9);
memset(rr, 0, 9);
char RR[9];
sprintf(RR, "%08lx", F);
char LL[9];
sprintf(LL, "%08lx", K);
printf(" %s %s ", LL, RR);
for (int j = 0; j < 4; j++) {
char ls[3];
ls[0] = LL[j*2];
ls[1] = LL[2*j+1];
int kj;
std::stringstream op;
op << ls;
op >> std::hex >> kj;
fputc(kj, ydi);
}
;
for(int j = 0; j < 4; j++) {
char lr[3];
lr[0] = RR[j*2];
lr[1] = RR[2*j+1];
int kjm;
std::stringstream ip;
ip << lr;
ip >> std::hex >> kjm;
fputc(kjm,ydi);
}
;
memset(LL, 0 ,9);
memset(RR, 0, 9);
}
;
i++;
std::cout << "\n";
if (i == 4) {
i = 0;
}
;
}
;
fclose(diy);
fclose(ydi);
}
;
Since you asked, now you have it.
this code will not compile because you do not have necessary libraries.
simplified code is at the beginning of this post.
those libraries that you do not posses have nothing to do with this issue.
The core problem
You assume that
std::stringstream the_stream;
std::string the_string;
the_stream << std::hex << 0x0C;
the_stream >> the_string;
results in the_string containing "0c". However, it will contain "c".
This means that later on, you end up converting the input "\x0c\xfe" to 'c', '\0', 'f', 'e'. If you use this at any point in a C-style string, of course it ends the string after c.
It was quite hard to debug this program. In the future, please write readable and understandable code. What follows is a non-exhaustive list of the problems I found.
Design problems
while(!feof(file)) is always wrong.
Use variable scoping. If you pull the declaration of sL and sR into the loop, you don't have to reset them. Less code, less potential errors.
You're using a lot of code for something as simple as converting a presumably 8-bit char to its hexadecimal representation. In fact, the only reason you ever use std::stringstream in your code is to do exactly that. Why don't you isolate this functionality to a function?
Irrelevant problems
Because of the poor code formatting, you probably didn't notice the copy-paste errors in the use of sL and sR:
sL[0] = '\0';
sR[0] = '\0';
sL[1] = '\0';
sL[1] = '\0';
Obviously, that last line should read sR[1] = '\0';
Style problems
There are many, many things wrong with your code, but one thing that easily stops people from helping is formatting. The space formatting in particular made your code very hard to read, so I took the liberty to edit the "full" code in your question to have (almost) consistent formatting. A few basic problems become evident:
Use meaningful names for variables and functions. I have no idea what you're trying to do here, which doesn't help in finding the real problem in the code.
Mixing <iostream> and <stdio.h> doesn't help the readability of your code. Choose one or the other. In fact, only ever use <iostream> in C++.
Besides that, use the appropriate header names for C++ (<cstring> and <cwchar> instead of <string.h> and <wchar.h>).
Don't write a semicolon after a compound statement. Instead of
int main(void) {
if (condition) {
one_statement();
another_statement();
};
};
you should write
int main(void) {
if (condition) {
one_statement();
another_statement();
}
}
The ; is part of a separate statement. It also prevents you from using else constructs.
Use initialisers where appropriate. So don't write
char ll[9];
char rr[9];
memset(ll, 0, 9);
memset(rr, 0, 9);
while
char ll[9] = { 0 };
char rr[9] = { 0 };
is more readable.
This 0c problem can be solved by :
changing char[] array where the value is stored to unsigned char[];
when the input is read with string stream this line is very helpfull
<< std::setfill('0') << std::setw(2) <<std::hex ;
When 0c is converted to c setw(); sets the width of a stream and setfill() pads it with 0's.
I need to do something like this in the fastest way possible (O(1) would be perfect):
for (int j = 0; j < V; ++j)
{
if(!visited[j]) required[j]=0;
}
I came up with this solution:
for (int j = 0; j < V; ++j)
{
required[j]=visited[j]&required[j];
}
Which made the program run 3 times faster but I believe there is an even better way to do this. Am I right?
Btw. required and visited are dynamically allocated arrays
bool *required;
bool *visited;
required = new bool[V];
visited = new bool[V];
In the case where you're using a list of simple objects, you are most likely best suited using the functionality provided by the C++ Standard Library. Structures like valarray and vectors are recognized and optimized very effectively by all modern compilers.
Much debate exists as to how much you can rely on your compiler, but one guarantee is, your compiler was built alongside the standard library and relying on it for basic functionality (such as your problem) is generally a safe bet.
Never be afraid to run your own time tests and race your compiler! It's a fun exercise and one that is ever increasingly difficult to achieve.
Construct a valarray (highly optimized in c++11 and later):
std::valarray<bool> valRequired(required, V);
std::valarray<bool> valVisited(visited, V);
valRequired &= valVisited;
Alternatively, you could do it with one line using transform:
std::transform(required[0], required[V-1], visited[0], required[0], [](bool r, bool v){ return r & v; })
Edit: while fewer lines is not faster, your compiler will likely vectorize this operation.
I also tested their timing:
int main(int argc, const char * argv[]) {
auto clock = std::chrono::high_resolution_clock{};
{
bool visited[5] = {1,0,1,0,0};
bool required[5] = {1,1,1,0,1};
auto start = clock.now();
for (int i = 0; i < 5; ++i) {
required[i] &= visited[i];
}
auto end = clock.now();
std::cout << "1: " << (end - start).count() << std::endl;
}
{
bool visited[5] = {1,0,1,0,0};
bool required[5] = {1,1,1,0,1};
auto start = clock.now();
for (int i = 0; i < 5; ++i) {
required[i] = visited[i] & required[i];
}
auto end = clock.now();
std::cout << "2: " << (end - start).count() << std::endl;
}
{
bool visited[5] = {1,0,1,0,0};
bool required[5] = {1,1,1,0,1};
auto start = clock.now();
std::transform(required, required + 4, visited, required, [](bool r, bool v){ return r & v; });
auto end = clock.now();
std::cout << "3: " << (end - start).count() << std::endl;
}
{
bool visited[5] = {1,0,1,0,0};
bool required[5] = {1,1,1,0,1};
std::valarray<bool> valVisited(visited, 5);
std::valarray<bool> valrequired(required, 5);
auto start = clock.now();
valrequired &= valVisited;
auto end = clock.now();
std::cout << "4: " << (end - start).count() << std::endl;
}
}
Output:
1: 102
2: 55
3: 47
4: 45
Program ended with exit code: 0
In the line of #AlanStokes, use packed binary data and combine with the AVX instruction _mm512_and_epi64, 512 bits at a time. Be prepared for your hair messed up.
Goal
My goal is to quickly create a file from a large binary string (a string that contains only 1 and 0).
Straight to the point
I need a function that can achieve my goal. If I am not clear enough, please read on.
Example
Test.exe is running...
.
Inputted binary string:
1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
Done!
File(Test.txt) In Byte(s):
0xFF, 0xAA
.
Test.exe executed successfully!
Explanation
First, Test.exe requested the user to input a binary string.
Then, it converted the inputted binary string to hexadecimal.
Finally, it wrote the converted value to a file called Test.txt.
I've tried
As an fail attempt to achieve my goal, I've created this simple (and possibly horrible) function (hey, at least I tried):
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr )
{
std::ofstream OutputFile( Destination, std::ofstream::binary );
for( ::UINT Index1 = 0, Dec = 0;
// 8-Bit binary.
Index1 != BinaryStr.length( )/8;
// Get the next set of binary value.
// Write the decimal value as unsigned char to file.
// Reset decimal value to 0.
++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
{
// Convert the 8-bit binary to hexadecimal using the
// positional notation method - this is how its done:
// http://www.wikihow.com/Convert-from-Binary-to-Decimal
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
}
OutputFile.close( );
};
Example of usage
#include "Global.h"
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr );
int main( void )
{
std::string Bin = "";
// Create a binary string that is a size of 9.53674 mb
// Note: The creation of this string will take awhile.
// However, I only start to calculate the speed of writing
// and converting after it is done generating the string.
// This string is just created for an example.
std::cout << "Generating...\n";
while( Bin.length( ) != 80000000 )
Bin += "10101010";
std::cout << "Writing...\n";
BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );
std::cout << "Done!\n";
#ifdef IS_DEBUGGING
std::cout << "Paused...\n";
::getchar( );
#endif
return( 0 );
};
Problem
Again, that was my fail attempt to achieve my goal. The problem is the speed. It is too slow. It took more than 7 minutes. Are there any method to quickly create a file from a large binary string?
Thanks in advance,
CLearner
I'd suggest removing the substr call in the inner loop. You are allocating a new string and then destroying it for each character that you process. Replace this code:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
Dec += Inc;
by something like:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
Dec += Inc;
The majority of your time is spent here:
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
When I comment that out the file is written in seconds. I think you need to finetune your conversion.
I think I'd consider something like this as a starting point:
#include <bitset>
#include <fstream>
#include <algorithm>
int main() {
std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
std::ofstream out("junk.bin", std::ios::binary | std::ios::out);
std::transform(std::istream_iterator<std::bitset<8> >(in),
std::istream_iterator<std::bitset<8> >(),
std::ostream_iterator<unsigned char>(out),
[](std::bitset<8> const &b) { return b.to_ulong();});
return 0;
}
Doing a quick test, this processes an input file of 80 million bytes in about 6 seconds on my machine. Unless your files are much larger than what you've mentioned in your question, my guess is this is adequate speed, and the simplicity is going to be hard to beat.
Something not entirely unlike this should be significantly faster:
void
text_to_binary_file(const std::string& text, const char *fname)
{
unsigned char wbuf[4096]; // 4k is a good size of "chunk to write to file"
unsigned int i = 0, j = 0;
std::filebuf fp; // dropping down to filebufs may well be faster
// for this problem
fp.open(fname, std::ios::out|std::ios::trunc);
memset(wbuf, 0, 4096);
for (std::string::iterator p = text.begin(); p != text.end(); p++) {
wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
j++;
if (j == CHAR_BIT) {
j = 0;
i++;
}
if (i == 4096) {
if (fp.sputn(wbuf, 4096) != 4096)
abort();
memset(wbuf, 0, 4096);
i = 0;
j = 0;
}
}
if (fp.sputn(wbuf, i+1) != i+1)
abort();
fp.close();
}
Proper error handling left as an exercise.
So instead of converting back and forth between std::strings, why not use a bunch of machine word-sized integers for fast access?
const size_t bufsz = 1000000;
uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);
int i;
for (i = 0; i < bufsz; i++) {
ofile << hex << setw(8) << setfill('0') << buf[i];
// or if you want raw binary data instead of formatted hex:
ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}
delete[] buf;
For me, this runs in a fraction of a second.
Even though late, I want to place my example for handling such strings.
Architecture specific optimizations may use unaligned loads of chars into multiple registers for 'squeezing' out the bits in parallel. This untested example code does not check the chars and avoids alignment and endianness requirements. It assumes the characters of that binary string to represent contiguous octets (bytes) with the most significant bit first, not words and double words, etc., where their specific representation in memory (and in that string) would require special treatment for portability.
//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.
//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());
std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();
//You may treat cases, where (bits % 8 != 0) as error
//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
if(i & 0x7) //bit 7 ... 1
{
byte |= uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
else //bit 0: write and advance to the the next most significant bit of an octet
{
byte |= uint8_t(*cIt) - uint8_t('0');
ofs.put(byte);
//advance
++i;
++cIt;
byte = uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
}
ofs.flush();
This make a 76.2 MB (80,000,000 bytes) file of 1010101010101......
#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;
int main( void )
{
char Bin=0;
ofstream myfile;
myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
int c=0;
Bin = 0xAA;
while( c!= 80000000 ){
myfile.write(&Bin,1);
c++;
}
myfile.close();
cout << "Done!\n";
return( 0 );
};
This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l
The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.
I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}
I was writing code to read unsigned integers from a binary file using C/C++ on a 32 bit Linux OS intended to run on an 8-core x86 system. The application takes an input file which contains unsigned integers in little-endian format one after another. So the input file size in bytes is a multiple of 4. The file could have a billion integers in it. What is the fastest way to read and add all the integers and return the sum with 64 bit precision?
Below is my implementation. Error checking for corrupt data is not the major concern here and the input file is considered to without any issues in this case.
#include <iostream>
#include <fstream>
#include <pthread.h>
#include <string>
#include <string.h>
using namespace std;
string filepath;
unsigned int READBLOCKSIZE = 1024*1024;
unsigned long long nFileLength = 0;
unsigned long long accumulator = 0; // assuming 32 bit OS running on X86-64
unsigned int seekIndex[8] = {};
unsigned int threadBlockSize = 0;
unsigned long long acc[8] = {};
pthread_t thread[8];
void* threadFunc(void* pThreadNum);
//time_t seconds1;
//time_t seconds2;
int main(int argc, char *argv[])
{
if (argc < 2)
{
cout << "Please enter a file path\n";
return -1;
}
//seconds1 = time (NULL);
//cout << "Start Time in seconds since January 1, 1970 -> " << seconds1 << "\n";
string path(argv[1]);
filepath = path;
ifstream ifsReadFile(filepath.c_str(), ifstream::binary); // Create FileStream for the file to be read
if(0 == ifsReadFile.is_open())
{
cout << "Could not find/open input file\n";
return -1;
}
ifsReadFile.seekg (0, ios::end);
nFileLength = ifsReadFile.tellg(); // get file size
ifsReadFile.seekg (0, ios::beg);
if(nFileLength < 16*READBLOCKSIZE)
{
//cout << "Using One Thread\n"; //**
char* readBuf = new char[READBLOCKSIZE];
if(0 == readBuf) return -1;
unsigned int startOffset = 0;
if(nFileLength > READBLOCKSIZE)
{
while(startOffset + READBLOCKSIZE < nFileLength)
{
//ifsReadFile.flush();
ifsReadFile.read(readBuf, READBLOCKSIZE); // At this point ifsReadFile is open
int* num = reinterpret_cast<int*>(readBuf);
for(unsigned int i = 0 ; i < (READBLOCKSIZE/4) ; i++)
{
accumulator += *(num + i);
}
startOffset += READBLOCKSIZE;
}
}
if(nFileLength - (startOffset) > 0)
{
ifsReadFile.read(readBuf, nFileLength - (startOffset));
int* num = reinterpret_cast<int*>(readBuf);
for(unsigned int i = 0 ; i < ((nFileLength - startOffset)/4) ; ++i)
{
accumulator += *(num + i);
}
}
delete[] readBuf; readBuf = 0;
}
else
{
//cout << "Using 8 Threads\n"; //**
unsigned int currthreadnum[8] = {0,1,2,3,4,5,6,7};
if(nFileLength > 200000000) READBLOCKSIZE *= 16; // read larger blocks
//cout << "Read Block Size -> " << READBLOCKSIZE << "\n";
if(nFileLength % 28)
{
threadBlockSize = (nFileLength / 28);
threadBlockSize *= 4;
}
else
{
threadBlockSize = (nFileLength / 7);
}
for(int i = 0; i < 8 ; ++i)
{
seekIndex[i] = i*threadBlockSize;
//cout << seekIndex[i] << "\n";
}
pthread_create(&thread[0], NULL, threadFunc, (void*)(currthreadnum + 0));
pthread_create(&thread[1], NULL, threadFunc, (void*)(currthreadnum + 1));
pthread_create(&thread[2], NULL, threadFunc, (void*)(currthreadnum + 2));
pthread_create(&thread[3], NULL, threadFunc, (void*)(currthreadnum + 3));
pthread_create(&thread[4], NULL, threadFunc, (void*)(currthreadnum + 4));
pthread_create(&thread[5], NULL, threadFunc, (void*)(currthreadnum + 5));
pthread_create(&thread[6], NULL, threadFunc, (void*)(currthreadnum + 6));
pthread_create(&thread[7], NULL, threadFunc, (void*)(currthreadnum + 7));
pthread_join(thread[0], NULL);
pthread_join(thread[1], NULL);
pthread_join(thread[2], NULL);
pthread_join(thread[3], NULL);
pthread_join(thread[4], NULL);
pthread_join(thread[5], NULL);
pthread_join(thread[6], NULL);
pthread_join(thread[7], NULL);
for(int i = 0; i < 8; ++i)
{
accumulator += acc[i];
}
}
//seconds2 = time (NULL);
//cout << "End Time in seconds since January 1, 1970 -> " << seconds2 << "\n";
//cout << "Total time to add " << nFileLength/4 << " integers -> " << seconds2 - seconds1 << " seconds\n";
cout << accumulator << "\n";
return 0;
}
void* threadFunc(void* pThreadNum)
{
unsigned int threadNum = *reinterpret_cast<int*>(pThreadNum);
char* localReadBuf = new char[READBLOCKSIZE];
unsigned int startOffset = seekIndex[threadNum];
ifstream ifs(filepath.c_str(), ifstream::binary); // Create FileStream for the file to be read
if(0 == ifs.is_open())
{
cout << "Could not find/open input file\n";
return 0;
}
ifs.seekg (startOffset, ios::beg); // Seek to the correct offset for this thread
acc[threadNum] = 0;
unsigned int endOffset = startOffset + threadBlockSize;
if(endOffset > nFileLength) endOffset = nFileLength; // for last thread
//cout << threadNum << "-" << startOffset << "-" << endOffset << "\n";
if((endOffset - startOffset) > READBLOCKSIZE)
{
while(startOffset + READBLOCKSIZE < endOffset)
{
ifs.read(localReadBuf, READBLOCKSIZE); // At this point ifs is open
int* num = reinterpret_cast<int*>(localReadBuf);
for(unsigned int i = 0 ; i < (READBLOCKSIZE/4) ; i++)
{
acc[threadNum] += *(num + i);
}
startOffset += READBLOCKSIZE;
}
}
if(endOffset - startOffset > 0)
{
ifs.read(localReadBuf, endOffset - startOffset);
int* num = reinterpret_cast<int*>(localReadBuf);
for(unsigned int i = 0 ; i < ((endOffset - startOffset)/4) ; ++i)
{
acc[threadNum] += *(num + i);
}
}
//cout << "Thread " << threadNum + 1 << " subsum = " << acc[threadNum] << "\n"; //**
delete[] localReadBuf; localReadBuf = 0;
return 0;
}
I wrote a small C# program to generate the input binary file for testing.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace BinaryNumWriter
{
class Program
{
static UInt64 total = 0;
static void Main(string[] args)
{
BinaryWriter bw = new BinaryWriter(File.Open("test.txt", FileMode.Create));
Random rn = new Random();
for (UInt32 i = 1; i <= 500000000; ++i)
{
UInt32 num = (UInt32)rn.Next(0, 0xffff);
bw.Write(num);
total += num;
}
bw.Flush();
bw.Close();
}
}
}
Running the program on a Core i5 machine # 3.33 Ghz (its quad-core but its what I got at the moment) with 2 GB RAM and Ubuntu 9.10 32 bit had the following performance numbers
100 integers ~ 0 seconds (I would really have to suck otherwise)
100000 integers < 0 seconds
100000000 integers ~ 7 seconds
500000000 integers ~ 29 seconds (1.86 GB input file)
I am not sure if the HDD is 5400RPM or 7200RPM. I tried different buffer sizes for reading and found reading 16 MB at a time for big input files was kinda the sweet spot.
Are there any better ways to read faster from the file to increase overall performance? Is there a smarter way to add large arrays of integers faster and folding repeatedly? Is there any major roadblocks to performance the way I have written the code / Am I doing something obviously wrong that's costing a lot of time?
What can I do to make this process of reading and adding data faster?
Thanks.
Chinmay
Accessing a mechanical HDD from multiple threads the way you do is going to take some head movement (read slow it down). You're almost surely IO bound (65MBps for the 1.86GB file).
Try to change your strategy by:
starting the 8 threads - we'll call them CONSUMERS
the 8 threads will wait for data to be made available
in the main thread start to read chunks (say 256KB) of the file thus being a PROVIDER for the CONSUMERS
main thread hits the EOF and signals the workers that there is no more avail data
main thread waits for the 8 workers to join.
You'll need quite a bit of synchronization to get it working flawlessly and I think it would totally max out your HDD / filesystem IO capabilities by doing sequencial file access. YMMV on smallish files which can be cached and served from the cache at lightning speed.
Another thing you can try is to start only 7 threads, leave one free CPU for the main thread & the rest of the system.
.. or get an SSD :)
Edit:
For simplicity see how fast you can simply read the file (discarding the buffers) with no processing, single-threaded. That plus epsilon is your theoretical limit to how fast you can get this done.
If you want to read (or write) a lot of data fast, and you don't want to do much processing with that data, you need to avoid extra copies of the data between buffers. That means you want to avoid fstream or FILE abstractions (as they introduce an extra buffer that needs to be copied through), and avoid read/write type calls that copy stuff between kernel and user buffers.
Instead, on linux, you want to use mmap(2). On a 64-bit OS, just mmap the entire file into memory, use madvise(MADV_SEQUENTIAL) to tell the kernel you're going to be accessing it mostly sequentially, and have at it. For a 32-bit OS, you'll need to mmap in chunks, unmapping the previous chunk each time. Something much like your current structure, with each thread mmapping one fixed-size chunk at a time should work well.