Processing a large image text file - c++

I'm using C++ and have a 1234 by 1234 text file with values 0 to 255. I have been trying to speed up my code because its used in real time with the user. Right now it takes .5 seconds to run with .4 seconds devoted to reading the text file to a vector<vector<int>>. I am using getline then istringstream. Below is the code I'm currently using. There is some stuff in there where I get rid of the first and last 50 columns as well as take the first chunk of rows into one vector and the second chunk into another vector because that's how I need it for processing purposes.
void readInRawData(string fileName, int start, int split, int finish, vector< vector <int> > &rawArrayTop, vector< vector <int> > &rawArrayBottom)
{
string line;
vector<int> rawRow;
int counter=0;
int value=0;
int numberOfColumns=0, numberOfRows=0;
ifstream rawImage;
rawImage.open(fileName.c_str()); //open file using fileName
if (rawImage.is_open()&&!is_empty(rawImage))
{
int length=0;
getline(rawImage,line);
istringstream ss(line);
while(ss>>value)//clump into values between spaces
{
length++;
}
while(getline(rawImage, line))//get row
{
if(counter<start)
{
}
else
{
break;
}
counter++;
}
while(getline(rawImage, line))//get row
{
if(counter<split)
{
rawRow.clear();
istringstream ss(line);
for(int i=0;i<50;i++)
{
ss>>value;
}
for(int i=0; i<length-100; i++)
{
ss>>value;
rawRow.push_back(value);
}
rawArrayTop.push_back(rawRow);
}
else
{
break;
}
counter++;
}
while(getline(rawImage, line))//get row
{
if(counter<finish)
{
rawRow.clear();
istringstream ss(line);
for(int i=0;i<50;i++)
{
ss>>value;
}
for(int i=0; i<length-100; i++)
{
ss>>value;
rawRow.push_back(value);
}
rawArrayBottom.push_back(rawRow);
}
else
{
break;
}
counter++;
}
rawImage.close();
}
//if it can't be opened throw error
else
{
throw rawArrayTop;
}
}

To get a real increase in performance, you'll have to rewrite totally.
while((ch = fgetc(fp)) != EOF)
{
if(isdigit(ch))
{
sample = sample * 10 + ch - '0';
onsample = 1;
}
else
{
if(onsample)
{
*out++ = sample;
sample = 0;
onsample = 0;
}
}
}
Set up out with malloc(width * height). Now it should zip through the file almost as fast as it can read it.

I will not give you the code, but I will suggest how to proceed here:
parsing text takes a long time. If real-time is important, pre-process the file to a binary format, since it can be loaded directly with read/write functions. You will need to create a stream in binary mode from the binary file and use istream::read.
try to avoid vector<vector<int>> unless you use scoped allocators, which I assume you are not using. This is bad for the cache. It is a much better fit to use a vector with n * m reserved space.
If you need bidimensional accesss, you can just code your functions for that:
using Matrix = vector<int>;
int & idx(Matrix, size_t row, size_t col);
Matrix mat(m * n);
idx(mat, 2, 3) = 17;
Another concern is that you must load into the Matrix. If you want to avoid redundant initialization and at the same time prereserve memory before loading the data, that is not possible with the stl vector, but you can use Boost.Container, which has an overload for reserve with default_init_t. That will not trigger initialization of elements in the vector.
if the values are between 0 and 255 use char, not int. You will fit more data at once in cache.

Related

Reading Column Specific Data in C++ [duplicate]

I have a text file that has values and I want to put them into a 2D vector.
I can do it with arrays but I don't know how to do it with vectors.
The vector size should be like vector2D[nColumns][nLines] that I don't know in advance. At the most I can have in the text file the number of columns, but not the number of lines.
The number of columns could be different, from one .txt file to another.
.txt example:
189.53 -1.6700 58.550 33.780 58.867
190.13 -3.4700 56.970 42.190 75.546
190.73 -1.3000 62.360 34.640 56.456
191.33 -1.7600 54.770 35.250 65.470
191.93 -8.7500 58.410 33.900 63.505
with arrays I do it like this:
//------ Declares Array for values ------//
const int nCol = countCols; // read from file
float values[nCol][nLin];
// Fill Array with '-1'
for (int c = 0; c < nCol; c++) {
for (int l = 0; l < nLin; l++) {
values[c][l] = -1;
}
}
// reads file to end of *file*, not line
while (!inFile.eof()) {
for (int y = 0; y < nLin; y++) {
for (int i = 0; i < nCol; i++) {
inFile >> values[i][y];
}
i = 0;
}
}
Instead of using
float values[nCol][nLin];
use
std::vector<std::vector<float>> v;
You have to #include<vector> for this.
Now you don't need to worry about size.
Adding elements is as simple as
std::vector<float> f; f.push_back(7.5); v.push_back(f);
Also do not use .eof() on streams, because it doesn't set it until after the end has been reached and so it will attempt to read the end of the file.
while(!inFile.eof())
Should be
while (inFile >> values[i][y]) // returns true as long as it reads in data to values[x][y]
NOTE: Instead of vector, you can also use std::array, which is apparently the best thing after sliced bread.
My suggestion:
const int nCol = countCols; // read from file
std::vector<std::vector<float>> values; // your entire data-set of values
std::vector<float> line(nCol, -1.0); // create one line of nCol size and fill with -1
// reads file to end of *file*, not line
bool done = false;
while (!done)
{
for (int i = 0; !done && i < nCol; i++)
{
done = !(inFile >> line[i]);
}
values.push_back(line);
}
Now your dataset has:
values.size() // number of lines
and can be adressed with array notation also (besides using iterators):
float v = values[i][j];
Note: this code does not take into account the fact that the last line may have less that nCol data values, and so the end of the line vector will contain wrong values at end of file. You may want to add code to clear the end of the line vector when done becomes false, before you push it into values.

User input to matrix in C++

I have trouble to read in an input from user and convert them into matrix for calculation. For example, with the input = {1 2 3 / 4 5 6}, the program should read in the matrix in the form of
1 2 3
4 5 6
which have 3 cols and 2 rows. What i got so far which does not seem to work:
input.replace(input.begin(), input.end(), '/', ' ');
stringstream ss(input);
string token;
while (getline(ss, token, ' '))
{
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
int tok = atoi(token.c_str());
(*matrix).setElement(i, j, tok);
}
}
}
So what I'm trying to do is to break the input into token and store them into the matrix using the setElement function which take the number of row, column and the variable that user want to store. What wrong with this code is that the variable of tok doesnt seem to change and keep stuck in 0. Assuming that row and col are knows.
Thanks so much for any help.
Although many simple ways exist to solve the specific problem (and other answer have various good suggestions) let me try to give a more general view of the problem of "formatted input".
There are essentially three kind of problems, here:
at low level you have to do a string to number conversion
at a higher level you have to parse a composite format (understanding rows and line separation)
finally you also have to understand the size of the compound (how many rows and cols?)
this 3 things are not fully independent and the last is needed to know how to store elements (how do you size the matrix?)
Finally there is a 4th problem (that is spread all other the other 3): what to do if the input is "wrong".
These kind of problem are typically afforded in two opposite ways:
Read the data as they come, recognize if the format is matched, and dynamically grow the data structure that have to contain them or...
Read all the data as once as they are (textual form), then analyze the text to figure out how many elements it has, then isolate the "chunks" and do the conversions.
Point 2. requires good string manipulations, but also requires the ability to know how the input is long (what happens if one of the separating spaces is a new-line? the idea the everything is got with a getline fails in those cases)
Point 1 requires a Matrix class that is capable to grow as you read or a temporary dynamic structure (like and std container) in which you can place what you read before sending it into the appropriate place.
Since I don't know how your matrix works, let me keep a temporary vector and counters to store lines.
#include <vector>
#include <iostream>
#include <cassert>
class readmatrix
{
std::vector<int> data; //storage
size_t rows, cols; //the counted rows and columns
size_t col; //the counting cols in a current row
Matrix& mtx; //refer to the matrix that has to be read
public:
// just keep the reference to the destination
readmatrix(Matrix& m) :data(), rows(), cols(), cols(), mtx(m)
{}
// make this class a istream-->istream functor and let it be usable as a stream
// manipulator: input >> readmatrix(yourmatrix)
std::istream& operator()(std::istream& s)
{
if(s) //if we can read
{
char c=0:
s >> c; //trim spaces and get a char.
if(c!='{') //not an open brace
{ s.setstate(s.failbit); return s; } //report the format failure
while(s) //loop on rows (will break when the '}' will be found)
{
col=0;
while(s) //loop on cols (will break when the '/' or '}' will be found)
{
c=0; s >> c;
if(c=='/' || c=='}') //row finished?
{
if(!cols) cols=col; //got first row length
else if(cols != col) //it appears rows have different length
{ s.setstate(s.failbit); return s; } //report the format failure
if(c!='/') s.unget(); //push the char back for later
break; //row finished
}
s.unget(); //pushthe "not /" char back
int x; s >> x; //get an integer
if(!s) return s; //failed to read an integer!
++col; data.push_back(x); //save the read data
}
++rows; //got an entire row
c=0; s >> c;
if(c == '}') break; //finished the rows
else s.unget(); //push back the char: next row begin
}
}
//now, if read was successful,
// we can dispatch the data into the final destination
if(s)
{
mtx.setsize(rows,cols); // I assume you can set the matrix size this way
auto it = data.begin(); //will scan the inner vector
for(size_t r=0; r<rows; ++r) for(size_t c=0; c<cols; ++c, ++it)
mtx(r,c) = *it; //place the data
assert(it == data.end()); //this must be true if counting have gone right
}
return s;
}
};
Now you can read the matrix as
input >> readmatrix(matrix);
You will notice at this point that there are certain recurring patterns in the code: this is typical in one-pass parses, and those patterns can be grouped to form sub-parsers. If you do it generically you -in fact- will rewrite boost::spirit.
Of course some adaption can be done depending on how your matrix works (has it fixed sizes??), or what to do if rows sizes don't match (partial column filling ??)
You can even add a formatted input operator like
std::istream& operator>>(std::istream& s, Matrix& m)
{ return s >> readmatrix(m); }
so that you can just do
input >> matrix;
You are trying to operate on each cell of the matrix for each char read in the input!
You have to take one char for each cell, not multiple.
Splitting a string in tokens can be done by using the following function.
Please don't be shocked that the following code isn't runnable, this is due to the missing matrix class.
Try the following:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
using namespace std;
void split(const string& str, char delimiter, vector<string>& result) {
string::size_type i = 0;
string::size_type delimOcc = str.find(delimiter);
while (delimOcc != string::npos) {
result.push_back(str.substr(i, delimOcc-i));
i = ++delimOcc;
delimOcc = str.find(delimiter, delimOcc);
if (delimOcc == string::npos) {
result.push_back(str.substr(i, str.length()));
}
}
}
int main()
{
std::string input = "1 2 3 / 4 5 6";
vector<string> rows;
split(input, '/', rows);
for(int i = 0; i < rows.size(); i++) {
vector<string> cols;
split(rows[i], ' ', cols);
for(int j = 0; j < cols.size(); j++) {
if(cols[j][0] != '\0'){
int tok = stoi(cols[j]);
(*matrix).setElement(i, j, tok);
cout << tok << " - " << i << " - " << j << endl;
}
else {
if(j == 0) j--;
}
}
}
return 0;
}
If you know the size of the matrix on forehand you actually don't need getline, you should read int by int. (untested code)
input.replace(input.begin(), input.end(), '/', '\n');
stringstream ss(input);
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
int tok;
ss >> tok;
(*matrix).setElement(i, j, tok);
}
}

Data parsing from text file

i have encountered an issue regarding parsing values from a text file. What i am trying to do is i need to add up all the values for each specific events for all days and find the average of it. Example will be (290+370+346+325+325)/5 and (5+5+5+12)/4 based on the data in the text file.
A sample is listed below
For each line --> First event:Second event:Third event...:Total number of event:
Every new line is considered a new day.
3:290:61:148:2:5:
2:370:50:173:4:5:
5:346:87:131:4:
3:325:60:145:5:5:
3:325:60:145:5:12:13:7:
I have tried to do it myself but i have only managed to store each column in a string array only. Sample code below. Will appreciate if you guys can help, thanks!
void IDS::parseBase() {
string temp = "";
int counting = 0;
int maxEvent = 0;
int noOfLines = 0;
vector<string> baseVector;
ifstream readBaseFile("Base-Data.txt");
ifstream readBaseFileAgain("Base-Data.txt");
while (getline(readBaseFile, temp)) {
baseVector.push_back(temp);
}
readBaseFile.close();
//Fine the no. of lines
noOfLines = baseVector.size();
//Find the no. of events
for (int i=0; i<baseVector.size(); i++)
{
counting = count(baseVector[i].begin(), baseVector[i].end(), ':') - 1;
if (maxEvent < counting)
{
maxEvent = counting;
}
}
//Store individual events into array
string a[maxEvent];
while (getline(readBaseFileAgain, temp)) {
stringstream streamTemp(temp);
for (int i=0; i<maxEvent; i++)
{
getline(streamTemp, temp, ':');
a[i] += temp + "\n";
}
}
}
I suggest:
int a[maxEvent];
char c; // to hold the colon
while(streamTemp >> a[i++] >> c);

Bug in selection sort loop

I need to make a program that will accept a input file of numbers(integer.txt) which will be sorted one number per line, into a vector, then use a selection sort algorithm to sort the numbers in descending order and write them to the output file (sorted.txt). I'm quite sure something is wrong in my selectionSort() function that is causing the loop not to get the right values, because after tested with cout I get vastly improper output. I'm sure it's a beginning programmer's goof.
vector<string> getNumbers()
{
vector<string> numberList;
ifstream inputFile ("integer.txt");
string pushToVector;
while (inputFile >> pushToVector)
{
numberList.push_back(pushToVector);
}
return numberList;
}
vector<string> selectionSort()
{
vector<string> showNumbers = getNumbers();
int vectorMax = showNumbers.size();
int vectorRange = (showNumbers.size() - 1);
int i, j, iMin;
for (j = 0; j < vectorMax; j++)
{
iMin = j;
for( i = j; i < vectorMax; i++)
{
if(showNumbers[i] < showNumbers[iMin])
{
iMin = i;
}
}
if (iMin != j)
{
showNumbers[j] = showNumbers [iMin];
}
}
return showNumbers;
}
void vectorToFile()
{
vector<string> sortedVector = selectionSort();
int vectorSize = sortedVector.size();
ofstream writeTo;
writeTo.open("sorted.txt");
int i = 0;
while (writeTo.is_open())
{
while (i < vectorSize)
{
writeTo << sortedVector[i] << endl;
i += 1;
}
writeTo.close();
}
return;
}
int main()
{
vectorToFile();
}
vectorRange defined but not used.
In your selectionSort(), the only command that changes the vector is:
showNumbers[j] = showNumbers [iMin];
Every time control reaches that line, you overwrite an element of the vector.
You must learn to swap two values, before you even think about sorting a vector.
Also, your functions are over-coupled. If all you want to fix is selectionSort, then you should be able to post that plus a main that calls it with some test data and displays the result, but no, your functions all call each other. Learn to decouple.
Also your variable names are awful.

Sorting a file in-place with Shell sort

I have been asked to sort a file in-place using shell sort (and quick sort too, but I think that if I find the way to do one I will be able to do both of them). I have been thinking what could be helpful but I can't find a way to do it. I have the algorithm for an array, but I can't think a way to get it to work with a file.
Is there any way this can be done?
Edit:
With the help of the code posted by André Puel I was able to write some code that is working for the moment, here it is if you want to check it out:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <cstdlib>
#include <sstream>
using namespace std;
int toNum(const string &s) {
stringstream ss(s);
int n;
ss >> n;
return n;
}
string toStr(int n) {
stringstream ss;
ss << n;
string s;
ss >> s;
return string(5 - s.size(),' ') + s;
}
int getNum(fstream &f,int pos) {
f.seekg(pos*5);
string s;
for(int i = 0; i < 5; ++i) s += f.get();
return toNum(s);
}
void putNum(fstream &f, int pos,int n) {
f.seekp(pos*5);
f.write(toStr(n).c_str(),5);
}
int main() {
fstream input("entrada1",fstream::in | fstream::out);
string aux;
getline(input,aux);
int n = aux.size() / 5,temp,j;
int gaps[] = {701,301,132,57,23,10,4,1};
int g = sizeof(gaps)/sizeof(gaps[0]);
for(int k = 0; k < g; ++k) {
for(int i = k; i < n; ++i) {
temp = getNum(input,i);
for(j = i; j >= k and getNum(input,j - k) > temp; j -= k) {
putNum(input,j,getNum(input,j - k));
}
putNum(input,j,temp);
}
}
input.close();
return 0;
}
When you open a file in C++ you have two pointers. The getter pointer and the putter pointer. They indicate where in the file you are writing and reading.
Using seekp, you may tell where you wanna write. Using tellp you know where you are going to write. Everytime you write something the putter pointer advances automatically.
The same goes to the getter pointer, the functions are seekg and tellg.
Using theses operations you may easily simulate an array. Let me show you some code:
class FileArray {
public:
FileArray(const char* path)
: file(path, std::fstream::app|std::fstream::binary)
{
file.seekg(0,std::fstream::end);
size = file.tellg();
}
void write(unsigned pos, char data) {
assert(pos < size );
file.tellp(pos);
file.put(data);
}
char read(unsigned pos) {
assert(pos < size);
file.seekg(pos);
return file.get();
}
private:
std::fstream file;
std::size_t size;
}
This is a naive way to deal with a file because you are supposing random access. Well, random access is true, but it may be slow. File streams works faster when you access data that are near each other (spacial locality).
Even though, it is a nice way to start dealing with your problem, you with get experienced with file IO and you will end figuring out ways to improve the performance for your specific problem. Lets keep the baby steps.
Other thing that I want you to note is that when you perform a write, the data is redirected to the fstream that will write to the file. I know that the kernel will try to cache this stuff, and optimize the speed, but still would be better if you had some kind of cache layer to avoid writing directly to the disk.
Finally, I supposed that you are dealing with chars (because it would be easier), but you can deal with other data types, you will just need to be careful about the indexing and the size of the data type. For example, long long type does have size of 8 bytes, if you want to access the first element in your file-array you will access the position 8*0, and you will have to read 8 bytes. If you want the 10th element, you will access the position 8*10 and again read 8 bytes of data to construct the long long value.