ifstream.read only reads half the file - c++

I'm trying to make a simple image converter (ppm format to a custom one) and i'm having a problem with the ifstream.read method. Despite having this:
int rows,cols, maxV;
char header [100], *ptr;
std::ifstream im;
//open image in binary format
im.open(name.c_str(), std::ios::in | std::ios::binary);
if (!im)
{
std::cout << "Can't read image!" << std::endl;
exit(1);
}
//read the header of the image
im.getline(header, 3,'\n');
//test if header is P6
if ((header[0] != 80) || (header[1] != 54))
{
std::cout << "Image" << name << "is not .ppm format" << std::endl;
}
//get next line for height and width
im.getline(header,100,'\n');
//dont read the comments
while (header[0] == '#')
im.getline(header,100,'\n');
//number of columns, rows
cols = strtol(header, &ptr, 0);
rows = strtol(header, &ptr, 0);
maxV = strtol(header, &ptr, 0);
const int rows1=rows;
const int cols1=cols;
Component * tbuffer;
const_cast<Component*> (tbuffer);
tbuffer = new Component[rows1*cols1 * 3];
im.read((char *)tbuffer, cols*rows * 3);
std::cout << tbuffer[3000000] << std::endl;
im.close();
It only reads 2.700.007 elements out of 4.320.000 of the image i'm trying to read. so tbuffer[3.000.000] will "cout" NULL. Am i missing anything?
Edit: About component:
typedef unsigned char Component;
Edit2: The image is 1200*1200 (cols*rows).
2.700.007 is the last index of the tbuffer with a value in it. the rest of the tbuffer remains empty

The PPM format that you read does not guarantee that the magic number P6 is followed ended by a newline, nor that the rest of the header is followed by a newline, nor that lentgh, heigth and maxV are on the same line.
But the main problem that you have is
cols = strtol(header, &ptr, 0); // you start at the begin of the header
rows = strtol(header, &ptr, 0); // you start again at the begin of the header
maxV = strtol(header, &ptr, 0); // and another time !!
So your rows and maxV might not be the values in the file. You should --regardless of the other changes mentionned above-- rather use:
cols = strtol(header, &ptr, 0); // you start at the begin of the header
rows = strtol(ptr, &ptr, 0); // continue after the first number
maxV = strtol(ptr, &ptr, 0); // ...
But keep also in mind that you should not assume that the three are on the same line. And that there might be additional comments.
I propose you the following utility function to skip whitespace and comments according to the PPM format:
ifstream& skipwcmt(ifstream& im) {
char c;
do {
while ((c = im.get()) != EOF && isspace(c)) ;
if (isdigit(c))
im.unget();
else if (c == '#')
while ((c = im.get()) != EOF && c != '\n' && c != '\r');
} while (isspace(im.peek()));
return im;
}
You can use this function for reading the header as here:
// ...
// check magic number
im.read(header, 2);
if ((header[0] != 'P') || (header[1] != '6'))
{
std::cout << "Image" << name << "is not .ppm format" << std::endl;
exit(1);
}
skipwcmt(im) >> cols;
skipwcmt(im) >> rows;
skipwcmt(im) >> maxV;
if (!isspace(im.get())) { // folowed by exactly one whitespace !
std::cout << "Image" << name << "has a corrupted header" << std::endl;
exit(1);
}
// display the header to check the data
cout << "cols=" << cols << ", rows=" << rows << ", maxcol=" << maxV << endl;
Remark: I don't know if the files you have to read are guaranteed to have maxV<=255. In theory you could have values up to 65535 in which case you'd need to read 2 bytes for a color cmponent instead of one.

Related

How to add hex to char array?

I have been programming a while but I am fairly new to C++. I am writing a program that takes an .exe and gets its hex and stores it in and unsigned char array. I can take in the .exe and return its hex fine. My problem is I am having trouble storing the hex in the correct format in the char array.
When I print the array it outputs the hex but I need to add 0x to the front.
Sample output: 04 5F 4B F4 C5 A5
Needed output: 0x04 0x5F 0x4B 0xF4 0xC5 0xA5
I am trying to use hexcode[i] = ("0x%.2X", (unsigned char)c); to store it correctly and it still only seems to return the last two chars without the 0x.
I have also tried hexcode[i] = '0x' + (unsigned char)c; and looked into functions like sprintf.
Can anyone help me get my desired output? Is it even possible?
Full program -
#include <iostream>
unsigned char hexcode[99999] = { 0 };
//Takes exes hex and place it into unsigned char array
int hexcoder(std::string file) {
FILE *sf; //executable file
int i, c;
sf = fopen(file.c_str(), "rb");
if (sf == NULL) {
fprintf(stderr, "Could not open file.", file.c_str());
return 1;
}
for (i = 0;;i++) {
if ((c = fgetc(sf)) == EOF) break;
hexcode[i] = ("0x%.2X", (unsigned char)c);
//Print for debug
std::cout << std::hex << static_cast<int>(hexcode[i]) << ' ';
}
}
int main()
{
std::string file = "shuffle.exe"; // test exe to pass to get hex
hexcoder(file);
system("pause");
return 0;
}
I suppose you want to dump a file in hex format. So maybe it's something like the following code you are looking for.
Note that hexcode is changed to data type char instead of unsigned char such that it can be handled as a string containing printable characters.
int hexcoder(std::string file) {
FILE *sf; //executable file
int i, c;
sf = fopen(file.c_str(), "rb");
if (sf == NULL) {
fprintf(stderr, "Could not open file %s.", file.c_str());
return 1;
}
char hexcode[10000];
char* wptr = hexcode;
for (i = 0;;i++) {
if ((c = fgetc(sf)) == EOF) break;
wptr += sprintf(wptr,"0x%02X ", c);
}
*wptr = 0;
std::cout << hexcode;
return 0;
}
BTW: for printing out a value in hex format one could as well use...
printf("0x%2X ", c)
or
std::cout << "0x" << std::hex << std::setw(2) << std::setfill('0') << std::uppercase << c << " ";
Note that the latter requires #include <iomanip>.
But - in order to not change the semantics of your code too much - I kept the hexcode-string as target.

How to speed up counting the occurences of a word in large files?

I need to count the occurrences of the string "<page>" in a 104gb file, for getting the number of articles in a given Wikipedia dump. First, I've tried this.
grep -F '<page>' enwiki-20141208-pages-meta-current.xml | uniq -c
However, grep crashes after a while. Therefore, I wrote the following program. However, it only processes 20mb/s of the input file on my machine which is about 5% workload of my HDD. How can I speed up this code?
#include <iostream>
#include <fstream>
#include <string>
int main()
{
// Open up file
std::ifstream in("enwiki-20141208-pages-meta-current.xml");
if (!in.is_open()) {
std::cout << "Could not open file." << std::endl;
return 0;
}
// Statistics counters
size_t chars = 0, pages = 0;
// Token to look for
const std::string token = "<page>";
size_t token_length = token.length();
// Read one char at a time
size_t matching = 0;
while (in.good()) {
// Read one char at a time
char current;
in.read(&current, 1);
if (in.eof())
break;
chars++;
// Continue matching the token
if (current == token[matching]) {
matching++;
// Reached full token
if (matching == token_length) {
pages++;
matching = 0;
// Print progress
if (pages % 1000 == 0) {
std::cout << pages << " pages, ";
std::cout << (chars / 1024 / 1024) << " mb" << std::endl;
}
}
}
// Start over again
else {
matching = 0;
}
}
// Print result
std::cout << "Overall pages: " << pages << std::endl;
// Cleanup
in.close();
return 0;
}
Assuming there are no insanely large lines in the file using something like
for (std::string line; std::getline(in, line); } {
// find the number of "<page>" strings in line
}
is bound to be a lot faster! Reading each characters as a string of one character is about the worst thing you can possibly do. It is really hard to get any slower. For each character, there stream will do something like this:
Check if there is a tie()ed stream which needs flushing (there isn't, i.e., that's pointless).
Check if the stream is in good shape (except when having reached the end it is but this check can't be omitted entirely).
Call xsgetn() on the stream's stream buffer.
This function first checks if there is another character in the buffer (that's similar to the eof check but different; in any case, doing the eof check only after the buffer was empty removes a lot of the eof checks)
Transfer the character to the read buffer.
Have the stream check if it reached all (1) characters and set stream flags as needed.
There is a lot of waste in there!
I can't really imagine why grep would fail except that some line blows massively over the expected maximum line length. Although the use of std::getline() and std::string() is likely to have a much bigger upper bound, it is still not effective to process huge lines. If the file may contain lines which are massive, it may be more reasonable to use something along the lines of this:
for (std::istreambuf_iterator<char> it(in), end;
(it = std::find(it, end, '<') != end; ) {
// match "<page>" at the start of of the sequence [it, end)
}
For a bad implementation of streams that's still doing too much. Good implementations will do the calls to std::find(...) very efficiently and will probably check multiple characters at one, adding a check and loop only for something like every 16th loop iteration. I'd expect the above code to turn your CPU-bound implementation into an I/O-bound implementation. Bad implementation may still be CPU-bound but it should still be a lot better.
In any case, remember to enable optimizations!
I'm using this file to test with: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current1.xml-p000000010p000010000.bz2
It takes roughly 2.4 seconds versus 11.5 using your code. The total character count is slightly different due to not counting newlines, but I assume that's acceptable since it's only used to display progress.
void parseByLine()
{
// Open up file
std::ifstream in("enwiki-latest-pages-meta-current1.xml-p000000010p000010000");
if(!in)
{
std::cout << "Could not open file." << std::endl;
return;
}
size_t chars = 0;
size_t pages = 0;
const std::string token = "<page>";
std::string line;
while(std::getline(in, line))
{
chars += line.size();
size_t pos = 0;
for(;;)
{
pos = line.find(token, pos);
if(pos == std::string::npos)
{
break;
}
pos += token.size();
if(++pages % 1000 == 0)
{
std::cout << pages << " pages, ";
std::cout << (chars / 1024 / 1024) << " mb" << std::endl;
}
}
}
// Print result
std::cout << "Overall pages: " << pages << std::endl;
}
Here's an example that adds each line to a buffer and then processes the buffer when it reaches a threshold. It takes 2 seconds versus ~2.4 from the first version. I played with several different thresholds for the buffer size and also processing after a fixed number (16, 32, 64, 4096) of lines and it all seems about the same as long as there is some batching going on. Thanks to Dietmar for the idea.
int processBuffer(const std::string& buffer)
{
static const std::string token = "<page>";
int pages = 0;
size_t pos = 0;
for(;;)
{
pos = buffer.find(token, pos);
if(pos == std::string::npos)
{
break;
}
pos += token.size();
++pages;
}
return pages;
}
void parseByMB()
{
// Open up file
std::ifstream in("enwiki-latest-pages-meta-current1.xml-p000000010p000010000");
if(!in)
{
std::cout << "Could not open file." << std::endl;
return;
}
const size_t BUFFER_THRESHOLD = 16 * 1024 * 1024;
std::string buffer;
buffer.reserve(BUFFER_THRESHOLD);
size_t pages = 0;
size_t chars = 0;
size_t progressCount = 0;
std::string line;
while(std::getline(in, line))
{
buffer += line;
if(buffer.size() > BUFFER_THRESHOLD)
{
pages += processBuffer(buffer);
chars += buffer.size();
buffer.clear();
}
if((pages / 1000) > progressCount)
{
++progressCount;
std::cout << pages << " pages, ";
std::cout << (chars / 1024 / 1024) << " mb" << std::endl;
}
}
if(!buffer.empty())
{
pages += processBuffer(buffer);
chars += buffer.size();
std::cout << pages << " pages, ";
std::cout << (chars / 1024 / 1024) << " mb" << std::endl;
}
}

missing data in popen call

my program compiles without error and appears to run through all of the steps correctly. It is supposed to make a php call and return data. tcpdump does show the request going out so popen is being executed, but the receiving party never updates.
The only discrepancy I can find, is that the command variable appears to be missing data.
# .trol.o
market max price is 0.00638671 at position 0
php coin.php 155 0.006387
0.00638672
the second line in the output is the command I am sending to popen
cout << command << endl; -> php coin.php 155 0.006387
that number is supposed to be the same as the one under it 0.00638672
The number 6 and the number 2 have been chopped off somehow.
How do I get the correct data into my popen command?
code:
void mngr(){
//vector defs
vector<std::string> buydat;
vector<std::string> markdat;
vector<std::string> pricedat;
vector<std::string> purchaseid;
vector<double> doublePdat;
vector<double> doubleMdat;
doublePdat.reserve(pricedat.size());
doubleMdat.reserve(markdat.size());
char buybuff[BUFSIZ];
char command[70];
char sendbuy[12];
buydat = getmyData();
markdat = getmarketbuyData();
//string match "Buy" and send results to new vector with pricedat.push_back()
for(int b = 2; b < buydat.size(); b+=7){
if ( buydat[b] == "Buy" ) {
pricedat.push_back(buydat[b+1]);
}
}
transform(pricedat.begin(), pricedat.end(), back_inserter(doublePdat), [](string const& val) {return stod(val);});
transform(markdat.begin(), markdat.end(), back_inserter(doubleMdat), [](string const& val) {return stod(val);});
auto biggestMy = std::max_element(std::begin(doublePdat), std::end(doublePdat));
std::cout << "my max price is " << *biggestMy << " at position " << std::distance(std::begin(doublePdat), biggestMy) << std::endl;
auto biggestMark = std::max_element(std::begin(doubleMdat), std::end(doubleMdat));
std::cout << "market max price is " << *biggestMark << " at position " << std::distance(std::begin(doubleMdat), biggestMark) << std::endl;
if (biggestMy > biggestMark){
cout << "Biggest is Mine!" << endl;
}
else if (biggestMy < biggestMark){
//cout << "Biggest is market!";
*biggestMark += 0.00000001;
sprintf(sendbuy,"%f",*biggestMark);
sprintf(command, "php coin.php 155 %s",sendbuy);
FILE *markbuy = popen(command, "r");
if (markbuy == NULL) perror ("Error opening file");
while(fgets(buybuff, sizeof(buybuff), markbuy) != NULL){
size_t h = strlen(buybuff);
//clean '\0' from fgets
if (h && buybuff[h - 1] == '\n') buybuff[h - 1] = '\0';
if (buybuff[0] != '\0') purchaseid.push_back(buybuff);
}
cout << command << endl;
cout << *biggestMark << endl;
}
}
I would try to use long float format instead of float as the type of biggestMark should be evaluated as iterator across doubles. I mean try to change sprintf(sendbuy,"%f",*biggestMark); to sprintf(sendbuy,"%lf",*biggestMark);. Hope this would help.

Binary reader not triggering eof bit when reading exact number of bytes

I am writing images to a binary file using this code:
std::ofstream edgefile("C:\\****\\edge.bin", std::ofstream::binary | std::ofstream::app | std::ofstream::out);
Mat edges;
Canny(bilat, edges, cthr1, cthr2, 3); //cany sliders
if (writeedge){
int rows = edges.rows;
int cols = edges.cols;
edgefile.write(reinterpret_cast<const char *>(&rows), sizeof(int));
edgefile.write(reinterpret_cast<const char *>(&cols), sizeof(int));
edgefile.write(reinterpret_cast<char*>(edges.data), edges.rows*edges.cols*sizeof(uchar));
cout << "writen r:" << rows << "C: " << cols << "Bytes: " << edges.rows*edges.cols*sizeof(uchar) << endl;
}
And then reading the same images with this:
std::ifstream infile;
int main(int argc, char* argv[])
{
int * ptr;
ptr = new int;
int rows;
int cols;
infile.open("C:\\****\\edge.bin", std::ofstream::binary | std::ofstream::app | std::ofstream::in);
while (!infile.eof())
{
infile.read(reinterpret_cast<char*>(ptr), sizeof(int));
rows = *ptr;
infile.read(reinterpret_cast<char*>(ptr), sizeof(int));
cols = *ptr;
Mat ed(rows, cols, CV_8UC1, Scalar::all(0));
infile.read(reinterpret_cast<char*>(ed.data), rows * cols * (sizeof uchar));
cout << "writen r: " << rows << " C: " << cols << " Bytes: " << rows * cols * (sizeof uchar) << endl;
imshow("God Knows", ed);
cvWaitKey();
}
infile.close();
return 0;
}
The images are read accurately however eof bit is not triggered at the end thus multiplying the last ptr value and reading another blank image at the end. After this the cycle ends. How can I check if the next bit is EOF bit without resetting the currently read position?
(I know that if 1 more byte would be read it would trigger the EOF bit)
The EOF bit is set after you try to read past the end of the file, that's just how streams work.
You can easily restructure the main loop to check the status after the first read. This works because the return value from read is a reference to the stream, and casting the reference to bool checks whether the stream is still in a good status (i.e. no EOF).
while (infile.read(reinterpret_cast<char*>(ptr), sizeof(int)))
{
// ...

string.find function returning odd numbers

I am trying to find the position at which a character was found.
const char* normalize(std::string path)
{
std::cout << "executed " << path << std::endl;
//"foo//\\\bar////bar2///../.\bar2" -- foo/bar/bar2
std::size_t found;
std::size_t found2;
std::size_t curchar = 0;
std::string final;
std::string buffer;
bool notdone = true;
while (notdone) {
//std::cout << "loop" << std::endl;
//find the current element
// can be / or \
found = path.find("/", curchar);
found2 = path.find("\\",curchar);
std::cout << found << std::endl;
SDL_Delay(2000);
if (found != std::string::npos && found2 != std::string::npos) {
if (found < found2){
//read from the curchar to the slash
if (curchar-found > 1){
buffer = path.substr(curchar,found-curchar-1);
//add to path
final = final + "/" + buffer;
}
curchar = found+1;
//buffer will be the file/component
}else{
if (curchar-found2 > 1){
buffer = path.substr(curchar,found2-curchar-1);
//add to path
final = final + "/" + buffer;
}
curchar = found2+1;
}
}else if(found != std::string::npos){
//std::cout << "loop2" << found == std::string::npos << std::endl;
//std::cout << "loop2 " << path.substr(curchar, 1) << std::endl;
if (curchar-found > 1){//
buffer = path.substr(curchar,found-curchar-1);
//add to path
final = final + "/" + buffer;
}
curchar = found+1;
}else if(found2 != std::string::npos){
std::cout << "loop3" << std::endl;
if (curchar-found2 > 1){
buffer = path.substr(curchar,found2-curchar-1);
//add to path
final = final + "/" + buffer;
}
curchar = found2+1;
}else{
std::cout << "finishing" << std::endl;
final = final + "/" + path.substr(curchar,path.size()-curchar);
notdone = false;
}
}
return final.c_str();
}
normalize("test/");
This code should print out '4', but it instead prints out 18. It prints out 18 in an infinite loop. However, if I use std::cout << path.find("/", curchar) << std::endl it does print 4. At first I thought that it wasn't actually returning std::size_t but I checked and it was.
Your following line is creating the problem
//find the current element
// can be / or \
found = path.find("/", curchar);
I ran on my linux terminal and GCC treated as next line as continuation of comment of above line.
basic.cpp:18:9: warning: multi-line comment [-Wcomment]
// can be / or \
^
basic.cpp: In function ‘const char* normalize(std::string)’:
basic.cpp:21:22: warning: ‘found’ may be used uninitialized in this function [-Wmaybe-uninitialized]
std::cout << found << std::endl;
^
Now due to above comment style, your next line(code) was treated as comment. As found was not initialized so it had garbage value which screwed up your logic as it did not go inside the path where you have reset the flag notdone.
However GCC or any other compiler should give warning(usage of uninitialize variable) and if we would have carefully read, we might have back-trace and understood the problem.
Solution for this would be to change the comment style as
/* // can be / or \ */