I am a beginner to caffe
. Recently I have been learning how to use a pretrained caffe model to do some prediction in my own project,
and now I am trying to do a iteratively prediction while in each loop there will be a new data(input) and will be used to predict something.
I use memory data layer as my data input layer.
Before entering the loop, I will make some declaration
caffe::Datum datum;
datum.set_channels(1);
datum.set_height(1);
datum.set_width(30);
vector<float> mydata;
vector<caffe::Datum> dvector;
boost::shared_ptr<MemoryDataLayer<float> > memory_data_layer;
memory_data_layer = boost::static_pointer_cast<MemoryDataLayer<float>>(net.layer_by_name("datas"));
const boost::shared_ptr<Blob<float>> & blobs = net.blob_by_name("result");
const float* output = blobs->cpu_data();
And in each loop, "mydata" will get some new data and will be used for a new prediction.
Here is what I do in each loop
("mydata" updated)
datum.clear_data();
for(int i=0;i<30;i++)
datum.add_float_data(mydata[i]);
dvector.clear();
dvector.push_back(datum);
memory_data_layer->AddDatumVector(dvector);
float loss = 0.0;
net.Forward(&loss);
for (int i = 0; i < 10; i =++)
{
cout<< output[i] <<endl;
}
For the first loop, the result is correct.
But for the following loop, though "mydata" get the new data, the output remains unchanged, it still show the same result as the first loop.
Did I skip any necessary step?
How to fix it?
Thanks.
Solved by replacing
datum.clear_data();
to
datum.clear_float_data();
I think it is because float data take another memory space
so if i want to clear up the old float data
I need to clear the space for float data.
Related
I'm just getting started with HDF5 and would appreciate some advice on the following.
I have a 2-d array: data[][] passed into a method. The method looks like:
void WriteData( int data[48][100], int sizes[48])
The size of the data is not actually 48 x 100 but rather 48 x sizes[i]. I.e. each row could be a different length! In one simple case I'm dealing with, all rows are the same size (but not 100), so you can say that the array is 48 X sizes[0].
How best to write this to HDF5?
I have some working code where I loop through 0 to 48 and create a new dataset for each row.
Something like:
for (int i = 0; i < 48; i++)
{
hsize_t dsSize[2];
dsSize[0] = 48;
dsSize[1] = sizes[0]; // use sizes[i] in most general case
// Create the Data Space
DataSpace dataSpace = DataSpace(2, dsSize);
DataSet dataSet = group.createDataSet(dataSetName, intDataType, dataSpace);
dataSet.write(data[i], intDataType);
}
Is there a way to write the data all at once in one DataSet? Perhaps one solution for the simpler case of all rows the same length, and another for the ragged rows?
I've tried a few things to no avail. I called dataSet.write(data, intDataType), i.e. I threw the whole array at it. I seemed to get garbage in the file, I suspect because the array the data is stored in is actually 48x100 and I only need a small part of that.
It occurred to me that I could maybe use double ptrs int** or vector> but I'm stuck on that. As far as I can tell, "write" need a void* ptr. Also, I'd like the file to "look correct". I.e. one giant row with all rows of data is not desirable, if I must go that route, someone would need to communicate a slick way to store the info that would allow me to read the data back in from file (perhaps store row lengths as attributes?).
Perhaps my real problem is finding C++ examples of non-trivial use cases.
Any help is much appreciated.
Dave
Here is how you can do it using variable length arrays if your data is a vector of vectors (which seems to make sense for your use case):
void WriteData(const std::vector< std::vector<int> >& data)
{
hsize_t dim(data.size());
H5::DataSpace dspace(1, &dim);
H5::VarLenType dtype(H5::PredType::NATIVE_INT);
H5::DataSet dset(group.createDataSet(dataSetName, dtype, dspace));
hvl_t vl[dim];
for (hsize_t i = 0; i < dim; ++i)
{
vl[i].len = data[i].size();
vl[i].p = &data[i][0];
}
dset.write(vl, dtype);
}
I need to export very large matrix, that can have various dimensions (100x10,1000000x4,100000x100) into a formatted binary file, which will carry the informations how columns are divided. This file is then loaded into Matlab for some plot results.
Until now, I was using simple ASCII export as follows in one of my function of class I made (file constructor defined elsewhere):
void file_io::writeColData(vector< vector<double> >* data){
for(unsigned int i = 0;i < (*data).size();i++){
for(unsigned int j = 0;j < (*data)[i].size();j++){
file << (*data)[i][j] << '\t';
}
file << '\n';
}
}
Which prints a nice file with full formating that is simply loaded into Matlab as load(file.txt);. Everything works fine, matrix is preserved, except that it is unbelievably and painfully slow.
Only thing I managed to get working in binary was an export of a simple vector:
void file_io::writeColDataBin(vector<double>* data){
for(unsigned int i = 0;i < (*data).size();i++){
file.write(reinterpret_cast<const char*>(&(*data)[i]), sizeof((*data)[i]));
}
}
I can apply another dimension to void function above, however I cannot manage to format the output - all I get is just one long column of data (and I need more of those columns, number depending on situation).
I know that there are Matlab libraries which I can include (I am using Eclipse IDE), but I think it is just a big overkill for my needs, not mentioning the fact that I was not able to get it working after few painful hours. Another library I know of is MAT IO (matio.h), but I wasn't able to get this working either.
After the export, I need to import it into Matlab, possibly by
fid = ('data.dat','r','l');
data = fopen(fid,'double');
with (hopefully) showing the matrix I need.
Is there a way to accomplish this? A really... easy, simplistic way?
Thanks in advance!
Your attempt for binary output of array data looks ok so far. However, you need to also write the array dimensions (rows, columns) before the data. Then, assuming these numbers are written e.g. as uint64_t in C++, you can read the file in matlab as follows:
function matrix = load_2d(filename, data_type)
fid = fopen(filename, 'rb');
rows = fread(fid, 1, 'uint64');
cols = fread(fid, 1, 'uint64');
matrix = fread(fid, [rows cols], data_type);
fclose(fid);
end
where data_type is a string representation of the matlab data type corresponding to your data type in C++.
Since your represent your matrix by a vector of vectors (with each inner vector holding a column), this will only work if all inner vectors are of the same size. Otherwise you'd need to write the size (rows) of each column individually and adjust load_2d accordingly. But if the target is a single 2d matrix, you'd have to truncate somehow.
Similarly, to save back:
function save_2d(filename, matrix, data_type)
fid = fopen(filename, 'wb');
[rows, cols] = size(matrix);
fwrite(fid, rows, 'uint64');
fwrite(fid, cols, 'uint64');
fwrite(fid, matrix, data_type);
fclose(fid);
end
I haven't tested though.
So I finally made the output to satisfy my needs. The result is very similar to what iavr wrote (thank you for your quick response!), however I will copy my full (working) codes so it can be beneficial for someone else too.
This is mine function of class file_io that is writing the data:
void file_io::writeColDataBin(vector< vector<double> >* data){
double rows = (double)(*data).size();
double cols = (double)(*data)[0].size();
file.write(reinterpret_cast<const char*>(&rows), sizeof(rows));
file.write(reinterpret_cast<const char*>(&cols), sizeof(cols));
for(unsigned int j = 0;j < (*data)[0].size();j++){
for(unsigned int i = 0;i < (*data).size();i++){
file.write(reinterpret_cast<const char*>(&(*data)[i][j]), sizeof((*data)[i][j]));
}
}
}
It simply writes the number of rows and columns first, then continues as normal, however writing the cells of matrix by rows first in every column, just then moving to next column. This is important, because Matlab is ordering its fread by columns, not rows. Size of rows and columns is also transformed into double, so Matlab can read the whole file at once.
The same class also opens a file of ofstream file as:
void file_io::fileOpenBin(const char* fileName){
file.open(fileName, ios::out | ios::binary | ios::trunc);
}
After this, matrix is exported and loaded into Matlab with:
fid = fopen('data.dat','r');
data = fread(fid,'double');
fclose(fid);
rows = data(1);
cols = data(2);
data(1:2) = [];
data = reshape(data,rows,cols);
Rows and columns are imported, then first two cells are deleted from data and then they are reshaped into matrix needed.
Hope this helps someone in the future, it is probably not the quickest binary reading process, but it is definitely many times faster than reading ASCII.
Some days ago I made you a question and I got some really useful answers. I will make a summary to those of you who didn't read and I will explain my new doubts and where I have problems now.
Explanation
I have been working on a program, simulating a small database, that first of all read information from txt files and store them in the computer memory and then, I can make queries taking normal tables and/or transposed tables. The problem is that the performance is not good enough yet. It works slower than what I expect. I have improved it but I think I should improve it more. I have specific points where my program doesn't have a good performance.
Current problem
The first problem that I have now (where my program is slower) is that I spend more time to, for example table with 100,000 columns & 100 rows (0.325 min, I've improved this thanks to your help) than 100,000 rows & 100 columns (1.61198 min, the same than before). But on the other hand, access time to some data is better in the second case (in a determined example, 47 seconds vs. 6079 seconds in the first case) any idea why??
Explanation
Now let me remind you how my code works (with an atached summary of my code)
First of all I have a .txt file simulating a database table with random strings separated with "|". Here you have an example of table (with 7 rows and 5 columns). I also have the transposed table
NormalTable.txt
42sKuG^uM|24465\lHXP|2996fQo\kN|293cvByiV|14772cjZ`SN|
28704HxDYjzC|6869xXj\nIe|27530EymcTU|9041ByZM]I|24371fZKbNk|
24085cLKeIW|16945TuuU\Nc|16542M[Uz\|13978qMdbyF|6271ait^h|
13291_rBZS|4032aFqa|13967r^\\`T|27754k]dOTdh|24947]v_uzg|
1656nn_FQf|4042OAegZq|24022nIGz|4735Syi]\|18128klBfynQ|
6618t\SjC|20601S\EEp|11009FqZN|20486rYVPR|7449SqGC|
14799yNvcl|23623MTetGw|6192n]YU\Qe|20329QzNZO_|23845byiP|
TransposedTable.txt (This is new from the previous post)
42sKuG^uM|28704HxDYjzC|24085cLKeIW|13291_rBZS|1656nn_FQf|6618t\SjC|14799yNvcl|
24465\lHXP|6869xXj\nIe|16945TuuU\Nc|4032aFqa|4042OAegZq|20601S\EEp|23623MTetGw|
2996fQo\kN|27530EymcTU|16542M[Uz\|13967r^\\`T|24022nIGz|11009FqZN|6192n]YU\Qe|
293cvByiV|9041ByZM]I|13978qMdbyF|27754k]dOTdh|4735Syi]\|20486rYVPR|20329QzNZO_|
14772cjZ`SN|24371fZKbNk|6271ait^h|24947]v_uzg|18128klBfynQ|7449SqGC|23845byiP|
Explanation
This information in a .txt file is read by my program and stored in the computer memory. Then, when making queries, I will access to this information stored in the computer memory. Loading the data in the computer memory can be a slow process, but accessing to the data later will be faster, what really matters me.
Here you have the part of the code that read this information from a file and store in the computer.
Code that reads data from the Table.txt file and store it in the computer memory
int h;
do
{
cout<< "Do you want to query the normal table or the transposed table? (1- Normal table/ 2- Transposed table):" ;
cin>>h;
}while(h!=1 && h!=2);
string ruta_base("C:\\Users\\Raul Velez\\Desktop\\Tables\\");
if(h==1)
{
ruta_base +="NormalTable.txt"; // Folder where my "Table.txt" is found
}
if(h==2)
{
ruta_base +="TransposedTable.txt";
}
string temp; // Variable where every row from the Table.txt file will be firstly stored
vector<string> buffer; // Variable where every different row will be stored after separating the different elements by tokens.
vector<ElementSet> RowsCols; // Variable with a class that I have created, that simulated a vector and every vector element is a row of my table
ifstream ifs(ruta_base.c_str());
while(getline( ifs, temp )) // We will read and store line per line until the end of the ".txt" file.
{
size_t tokenPosition = temp.find("|"); // When we find the simbol "|" we will identify different element. So we separate the string temp into tokens that will be stored in vector<string> buffer
// --- NEW PART ------------------------------------
const char* p = temp.c_str();
char* p1 = strdup(p);
char* pch = strtok(p1, "|");
while(pch)
{
buffer.push_back(string(pch));
pch = strtok(NULL,"|");
}
free(p1);
ElementSet sss(0,buffer);
buffer.clear();
RowsCols.push_back(sss); // We store all the elements of every row (stores as vector<string> buffer) in a different position in "RowsCols"
// --- NEW PART END ------------------------------------
}
Table TablesStorage(RowsCols); // After every loop we will store the information about every .txt file in the vector<Table> TablesDescriptor
vector<Table> TablesDescriptor;
TablesDescriptor.push_back(TablesStorage); // In the vector<Table> TablesDescriptor will be stores all the different tables with all its information
DataBase database(1, TablesDescriptor);
Information already given in the previous post
After this, comes the access to the information part. Let's suppose that I want to make a query, and I ask for input. Let's say that my query is row "n", and also the consecutive tuples "numTuples", and the columns "y". (We must say that the number of columns is defined by a decimal number "y", that will be transformed into binary and will show us the columns to be queried, for example, if I ask for columns 54 (00110110 in binary) I will ask for columns 2, 3, 5 and 6). Then I access to the computer memory to the required information and store it in a vector shownVector. Here I show you the part of this code.
Problem
In the loop if(h == 2) where data from the transposed tables are accessed, performance is poorer ¿why?
Code that access to the required information upon my input
int n, numTuples;
unsigned long long int y;
cout<< "Write the ID of the row you want to get more information: " ;
cin>>n; // We get the row to be represented -> "n"
cout<< "Write the number of followed tuples to be queried: " ;
cin>>numTuples; // We get the number of followed tuples to be queried-> "numTuples"
cout<<"Write the ID of the 'columns' you want to get more information: ";
cin>>y; // We get the "columns" to be represented ' "y"
unsigned int r; // Auxiliar variable for the columns path
int t=0; // Auxiliar variable for the tuples path
int idTable;
vector<int> columnsToBeQueried; // Here we will store the columns to be queried get from the bitset<500> binarynumber, after comparing with a mask
vector<string> shownVector; // Vector to store the final information from the query
bitset<5000> mask;
mask=0x1;
clock_t t1, t2;
t1=clock(); // Start of the query time
bitset<5000> binaryNumber = Utilities().getDecToBin(y); // We get the columns -> change number from decimal to binary. Max number of columns: 5000
// We see which columns will be queried
for(r=0;r<binaryNumber.size();r++) //
{
if(binaryNumber.test(r) & mask.test(r)) // if both of them are bit "1"
{
columnsToBeQueried.push_back(r);
}
mask=mask<<1;
}
do
{
for(int z=0;z<columnsToBeQueried.size();z++)
{
ElementSet selectedElementSet;
int i;
i=columnsToBeQueried.at(z);
Table& selectedTable = database.getPointer().at(0); // It simmulates a vector with pointers to different tables that compose the database, but our example database only have one table, so don't worry ElementSet selectedElementSet;
if(h == 1)
{
selectedElementSet=selectedTable.getRowsCols().at(n);
shownVector.push_back(selectedElementSet.getElements().at(i)); // We save in the vector shownVector the element "i" of the row "n"
}
if(h == 2)
{
selectedElementSet=selectedTable.getRowsCols().at(i);
shownVector.push_back(selectedElementSet.getElements().at(n)); // We save in the vector shownVector the element "n" of the row "i"
}
n=n+1;
t++;
}
}while(t<numTuples);
t2=clock(); // End of the query time
showVector().finalVector(shownVector);
float diff ((float)t2-(float)t1);
float microseconds = diff / CLOCKS_PER_SEC*1000000;
cout<<"Time: "<<microseconds<<endl;
Class definitions
Here I attached some of the class definitions so that you can compile the code, and understand better how it works:
class ElementSet
{
private:
int id;
vector<string> elements;
public:
ElementSet();
ElementSet(int, vector<string>&);
const int& getId();
void setId(int);
const vector<string>& getElements();
void setElements(vector<string>);
};
class Table
{
private:
vector<ElementSet> RowsCols;
public:
Table();
Table(vector<ElementSet>&);
const vector<ElementSet>& getRowsCols();
void setRowsCols(vector<ElementSet>);
};
class DataBase
{
private:
int id;
vector<Table> pointer;
public:
DataBase();
DataBase(int, vector<Table>&);
const int& getId();
void setId(int);
const vector<Table>& getPointer();
void setPointer(vector<Table>);
};
class Utilities
{
public:
Utilities();
static bitset<500> getDecToBin(unsigned long long int);
};
Summary of my problems
Why the load of the data is different depending on the table format???
Why the access to the information also depends on the table (and the performance is in the opposite way than the table data load?
Thank you very much for all your help!!! :)
One thing I see that may explain both your problems is that you are doing many allocations, a lot of which appear to be temporary. For example, in your loading you:
Allocate a temporary string per row
Allocate a temporary string per column
Copy the row to a temporary ElementSet
Copy that to a RowSet
Copy the RowSet to a Table
Copy the Table to a TableDescriptor
Copy the TableDescriptor to a Database
As far as I can tell, each of these copies is a complete new copy of the object. If you only had a few 100 or 1000 records that might be fine but in your case you have 10 million records so the copies will be time consuming.
Your loading times may differ due to the number of allocations done in the loading loop per row and per column. Memory fragmentation may also contribute at some point (when dealing with a large number of small allocations the default memory handler sometimes takes a long time to allocate new memory). Even if you removed all your unnecessary allocations I would still expect the 100 column case to be slightly slower than the 100,000 case due to how your are loading and parsing by line.
Your information access times may be different as you are creating a full copy of a row in selectedElementSet. When you have 100 columns this will be fast but when you have 100,000 columns it will be slow.
A few specific suggestions to improving your code:
Reduce the number of allocations and copies you make. The ideal case would be to make one allocation for reading the file and then another allocation per record when stored.
If you're going to store the data in a Database then put it there from the beginning. Don't make half-a-dozen complete copies of your data to go from a temporary object to the Database.
Make use of references to the data instead of actual copies when possible.
When profiling make sure you get times when running a new instance of the program. Memory use and fragmentation may have a significant impact if you test both cases in the same instance and the order in which you do the tests will matter.
Edit: Code Suggestion
To hopefully improve your speed in the search loop try something like:
for(int z=0;z<columnsToBeQueried.size();z++)
{
int i;
i=columnsToBeQueried.at(z);
Table& selectedTable = database.getPointer().at(0);
if(h == 1)
{
ElementSet& selectedElementSet = selectedTable.getRowsCols().at(n);
shownVector.push_back(selectedElementSet.getElements().at(i));
}
else if(h == 2)
{
ElementSet& selectedElementSet = selectedTable.getRowsCols().at(i);
shownVector.push_back(selectedElementSet.getElements().at(n));
}
n=n+1;
t++;
}
I've just changed the selectedElementSet to use a reference which should complete eliminate the row copies taking place and, in theory, it should have a noticeable impact in performance. For even more performance gain you can change shownVector to be a reference/pointer to avoid yet another copy.
Edit: Answer Comment
You asked where you were making copies. The following lines in your original code:
ElementSet selectedElementSet;
selectedElementSet = selectedTable.getRowsCols().at(n);
creates a copy of the vector<string> elements member in ElementSet. In the 100,000 column case this will be a vector containing 100,000 strings so the copy will be relatively expensive time wise. Since you don't actually need to create a new copy changing selectedElementSet to be a reference, like in my example code above, will eliminate this copy.
I have been working on a program, simulating a small database where I could make queries, and after writing the code, I have executed it, but the performance is quite bad. It works really slow. I have tried to improve it, but I started with C++ on my own a few months ago, so my knowledge is still very low. So I would like to find a solution to improve the performance.
Let me explain how my code works. Here I have atached a summarized example of how my code works.
First of all I have a .txt file simulating a database table with random strings separated with "|". Here you have an example of table (with 5 rows and 5 columns).
Table.txt
0|42sKuG^uM|24465\lHXP|2996fQo\kN|293cvByiV
1|14772cjZ`SN|28704HxDYjzC|6869xXj\nIe|27530EymcTU
2|9041ByZM]I|24371fZKbNk|24085cLKeIW|16945TuuU\Nc
3|16542M[Uz\|13978qMdbyF|6271ait^h|13291_rBZS
4|4032aFqa|13967r^\\`T|27754k]dOTdh|24947]v_uzg
This information in a .txt file is read by my program and stored in the computer memory. Then, when making queries, I will access to this information stored in the computer memory. Loading the data in the computer memory can be a slow process, but accessing to the data later will be faster, what really matters me.
Here you have the part of the code that read this information from a file and store in the computer.
Code that reads data from the Table.txt file and store it in the computer memory
string ruta_base("C:\\a\\Table.txt"); // Folder where my "Table.txt" is found
string temp; // Variable where every row from the Table.txt file will be firstly stored
vector<string> buffer; // Variable where every different row will be stored after separating the different elements by tokens.
vector<ElementSet> RowsCols; // Variable with a class that I have created, that simulated a vector and every vector element is a row of my table
ifstream ifs(ruta_base.c_str());
while(getline( ifs, temp )) // We will read and store line per line until the end of the ".txt" file.
{
size_t tokenPosition = temp.find("|"); // When we find the simbol "|" we will identify different element. So we separate the string temp into tokens that will be stored in vector<string> buffer
while (tokenPosition != string::npos)
{
string element;
tokenPosition = temp.find("|");
element = temp.substr(0, tokenPosition);
buffer.push_back(element);
temp.erase(0, tokenPosition+1);
}
ElementSet ss(0,buffer);
buffer.clear();
RowsCols.push_back(ss); // We store all the elements of every row (stores as vector<string> buffer) in a different position in "RowsCols"
}
vector<Table> TablesDescriptor;
Table TablesStorage(RowsCols);
TablesDescriptor.push_back(TablesStorage);
DataBase database(1, TablesDescriptor);
After this, comes the IMPORTANT PART. Let's suppose that I want to make a query, and I ask for input. Let's say that my query is row "n", and also the consecutive tuples "numTuples", and the columns "y". (We must say that the number of columns is defined by a decimal number "y", that will be transformed into binary and will show us the columns to be queried, for example, if I ask for columns 54 (00110110 in binary) I will ask for columns 2, 3, 5 and 6). Then I access to the computer memory to the required information and store it in a vector shownVector. Here I show you the part of this code.
Code that access to the required information upon my input
int n, numTuples;
unsigned long long int y;
clock_t t1, t2;
cout<< "Write the ID of the row you want to get more information: " ;
cin>>n; // We get the row to be represented -> "n"
cout<< "Write the number of followed tuples to be queried: " ;
cin>>numTuples; // We get the number of followed tuples to be queried-> "numTuples"
cout<<"Write the ID of the 'columns' you want to get more information: ";
cin>>y; // We get the "columns" to be represented ' "y"
unsigned int r; // Auxiliar variable for the columns path
int t=0; // Auxiliar variable for the tuples path
int idTable;
vector<int> columnsToBeQueried; // Here we will store the columns to be queried get from the bitset<500> binarynumber, after comparing with a mask
vector<string> shownVector; // Vector to store the final information from the query
bitset<500> mask;
mask=0x1;
t1=clock(); // Start of the query time
bitset<500> binaryNumber = Utilities().getDecToBin(y); // We get the columns -> change number from decimal to binary. Max number of columns: 5000
// We see which columns will be queried
for(r=0;r<binaryNumber.size();r++) //
{
if(binaryNumber.test(r) & mask.test(r)) // if both of them are bit "1"
{
columnsToBeQueried.push_back(r);
}
mask=mask<<1;
}
do
{
for(int z=0;z<columnsToBeQueried.size();z++)
{
int i;
i=columnsToBeQueried.at(z);
vector<int> colTab;
colTab.push_back(1); // Don't really worry about this
//idTable = colTab.at(i); // We identify in which table (with the id) is column_i
// In this simple example we only have one table, so don't worry about this
const Table& selectedTable = database.getPointer().at(0); // It simmulates a vector with pointers to different tables that compose the database, but our example database only have one table, so don't worry ElementSet selectedElementSet;
ElementSet selectedElementSet;
selectedElementSet=selectedTable.getRowsCols().at(n);
shownVector.push_back(selectedElementSet.getElements().at(i)); // We save in the vector shownVector the element "i" of the row "n"
}
n=n+1;
t++;
}while(t<numTuples);
t2=clock(); // End of the query time
float diff ((float)t2-(float)t1);
float microseconds = diff / CLOCKS_PER_SEC*1000000;
cout<<"The query time is: "<<microseconds<<" microseconds."<<endl;
Class definitions
Here I attached some of the class definitions so that you can compile the code, and understand better how it works:
class ElementSet
{
private:
int id;
vector<string> elements;
public:
ElementSet();
ElementSet(int, vector<string>);
const int& getId();
void setId(int);
const vector<string>& getElements();
void setElements(vector<string>);
};
class Table
{
private:
vector<ElementSet> RowsCols;
public:
Table();
Table(vector<ElementSet>);
const vector<ElementSet>& getRowsCols();
void setRowsCols(vector<ElementSet>);
};
class DataBase
{
private:
int id;
vector<Table> pointer;
public:
DataBase();
DataBase(int, vector<Table>);
const int& getId();
void setId(int);
const vector<Table>& getPointer();
void setPointer(vector<Table>);
};
class Utilities
{
public:
Utilities();
static bitset<500> getDecToBin(unsigned long long int);
};
So the problem that I get is that my query time is very different depending on the table size (it has nothing to do a table with 100 rows and 100 columns, and a table with 10000 rows and 1000 columns). This makes that my code performance is very low for big tables, what really matters me... Do you have any idea how could I optimizate my code????
Thank you very much for all your help!!! :)
Whenever you have performance problems, the first thing you want to do is to profile your code. Here is a list of free tools that can do that on windows, and here for linux. Profile your code, identify the bottlenecks, and then come back and ask a specific question.
Also, like I said in my comment, can't you just use SQLite? It supports in-memory databases, making it suitable for testing, and it is lightweight and fast.
One obvious issue is that your get-functions return vectors by value. Do you need to have a fresh copy each time? Probably not.
If you try to return a const reference instead, you can avoid a lot of copies:
const vector<Table>& getPointer();
and similar for the nested get's.
I have not done the job, but you may analyse the complexity of your algorithm.
The reference says that access an item is in constant time, but when you create loops, the complexity of your program increases:
for (i=0;i<1000; ++i) // O(i)
for (j=0;j<1000; ++j) // O(j)
myAction(); // Constant in your case
The program complexity is O(i*j), so how big may be i an j?
What if myAction is not constant in time?
No need to reinvent the wheel again, use FirebirdSQL embedded database instead. That combined with IBPP C++ interface gives you a great foundation for any future needs.
http://www.firebirdsql.org/
http://www.ibpp.org/
Though I advise you to please use a profiler to find out which parts of your code are worth optimizing, here is how I would write your program:
Read the entire text file into one string (or better, memory-map the file.) Scan the string once to find all | and \n (newline) characters. The result of this scan is an array of byte offsets into the string.
When the user then queries item M of row N, retrieve it with code something like this:
char* begin = text+offset[N*items+M]+1;
char* end = text+offset[N*items+M+1];
If you know the number of records and fields before the data is read, the array of byte offsets can be a std::vector. If you don't know and must infer from the data, it should be a std::deque. This is to minimize costly memory allocation and deallocation, which I imagine is the bottleneck in such a program.
I am thinking of using wxMathPlot for plotting/graphing some data that arrives continuously. I want to draw "Real-time" plot/graph using it. Is that possible?
I.E. I don't want just a static graph of a one-time read of a file - I want the streaming data plotted and continued out to the right of the graph - (and let the left side fall off/scroll out of view)
EDIT
I still have not gotten an answer for this. There is an interesting class in the wxmathPlot library called mpFXYVector but that appears just to draw one plot from a vector of data. What I want is something that can be fed a stream and scroll the graph horizontally (and also resize the scale if needed)
Thanks ravenspoint...!! I did what you said.. It works flawless!
here is my AddData() function:
void mpFXYVector::AddData(float x, float y, std::vector<double> &xs, std::vector<double> &ys)
{
// Check if the data vectora are of the same size
if (xs.size() != ys.size()) {
wxLogError(_("wxMathPlot error: X and Y vector are not of the same length!"));
return;
}
//Delete first point if you need a filo buffer (i dont need it)
//xs.erase(xs.begin());
//xy.erase(xy.begin());
//Add new Data points at the end
xs.push_back(x);
ys.push_back(y);
// Copy the data:
m_xs = xs;
m_ys = ys;
// Update internal variables for the bounding box.
if (xs.size()>0)
{
m_minX = xs[0];
m_maxX = xs[0];
m_minY = ys[0];
m_maxY = ys[0];
std::vector<double>::const_iterator it;
for (it=xs.begin();it!=xs.end();it++)
{
if (*it<m_minX) m_minX=*it;
if (*it>m_maxX) m_maxX=*it;
}
for (it=ys.begin();it!=ys.end();it++)
{
if (*it<m_minY) m_minY=*it;
if (*it>m_maxY) m_maxY=*it;
}
m_minX-=0.5f;
m_minY-=0.5f;
m_maxX+=0.5f;
m_maxY+=0.5f;
}
else
{
m_minX = -1;
m_maxX = 1;
m_minY = -1;
m_maxY = 1;
}
}
in the Main() you only have to:
m_Vector->AddData(xPos,yPos,vectorX, vectorY);
m_plot->Fit();
I think mpFXYVector is the way to go.
The simplest way to deal with this might be to write a wrapper class for mpFXYVector which holds a FIFO buffer of recent data points. Each time a new datapoint arrives, add it to the FIFO buffer, which will drop the oldest point, then load mpFXYVector with the updated buffer. The wxMathPlot class mpWindow will look after the rest of what you need.
A more elegant approach would be a specialization of mpFXYVector which implements the FIFO buffer, using the simple vectors in mpFXYVector. The advantage of this would be that you are holding just one copy of the display data. Unless you are displaying many thousands of points, I doubt the advantage is worth the extra trouble of inheriting from mpFXYVector, rather than simply using the mpFXYVector documented interface.
After looking at the details, the only tricky bit is to replace mpFXYVector::SetData() with a new method Add() to add data points as they arrive. The new method needs to manage the mpFXYVector vectors as FIFO buffers, and to re-implement the code to update the bounding box ( which unfortunately was not written with inheritance in mind ).
The result is that specialization gives a solution with a smaller memory requirement and more flexibility than using a wrapper.
I know this is an old thread but I needed to plot a scrolling X axis with wxMathPlot.
I've done a simple modification to jayjo's code to make X axis scrolling work.
I hoe this helps.
void mpFXYVector::AddData(float x, float y, std::vector<double> &xs, std::vector<double> &ys)
{
// Check if the data vectora are of the same size
if (xs.size() != ys.size()) {
wxLogError(_("wxMathPlot error: X and Y vector are not of the same length!"));
return;
}
//After a certain number of points implement a FIFO buffer
//As plotting too many points can cause missing data
if (x > 300)
{
xs.erase(xs.begin());
ys.erase(ys.begin());
}
//Add new Data points at the end
xs.push_back(x);
ys.push_back(y);
// Copy the data:
m_xs = xs;
m_ys = ys;
// Update internal variables for the bounding box.
if (xs.size()>0)
{
m_minX = xs[0];
m_maxX = xs[0];
m_minY = ys[0];
m_maxY = ys[0];
std::vector<double>::const_iterator it;
for (it=xs.begin();it!=xs.end();it++)
{
if (*it<m_minX) m_minX=*it;
if (*it>m_maxX) m_maxX=*it;
}
for (it=ys.begin();it!=ys.end();it++)
{
if (*it<m_minY) m_minY=*it;
if (*it>m_maxY) m_maxY=*it;
}
m_minX-=0.5f;
m_minY-=0.5f;
m_maxX+=0.5f;
m_maxY+=0.5f;
}
else
{
m_minX = -1;
m_maxX = 1;
m_minY = -1;
m_maxY = 1;
}
}
I do not have any personal experience with wxMathPlot, but I have been working with wxWidgets for years and highly recommend it for cross platform gui programming in c++, with that said according to the wxWiki graphics page the Numerix Graphics Library can be used for real time data so maybe that can help you out. Good luck.
Maybe someone will have same problem and will need it... I needed very fast plotting for showing the data from oscilloscope.
I was getting the data in packets. I made few changes that made a code a lot of faster.
First thing is to change the if state in function SetData from if (xs.size()>0) to if (!xs.empty).
Then you should firstly add all of your data packet to the vector
Vector1_X.push_back(x);
Vector1_Y.push_back(y);
And after that you should fit and set data.
Vector1 ->SetData(Vector1_X,Vector1_Y); // add vectors to main vector
MathPlot1-> Fit(); //fit plot to the data
Vector1_X.clear(); //if you want to clear plot after every packet
Vector1_Y.clear(); //you should use it
Your code in main function will be longer but function will be faster because you add all data "at once".
We ended up using ChartDirector instead. It has a lot of capability and is fast.