Parsing data fast from text file [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i read in a full file into a string. This is very quick. (For a example 180Mb file - 2s)
Then i extract some values from the string using >> operator and create several arrays from it and insert the arrays into a struct and add each struct into a vector.
I'm trying to find the bottleneck, because this is very slow (but maybe you cant do anything)
is the >> approach fast?
string str; // gets filled with the file
struct A;
std::vector<A> b; // global variables
// in the function inside the loop
str >> a.val
A a;
b.push_back(a);
Does the vector take ownership of the a or does it make a copy? Is a still on the stack? I have about 60.000 structs that get insert into the vector. Is this a fast approach or is there a better one.

Question is the >> approach fast?
Answer Fast is relative. What do you compare it with?
Question Does the vector take ownership of the a or does it make a copy?
Answer std::vector::push_back() makes a copy of the input object.
Question Is a still on the stack?
Answer Judging solely by the posted code, yes, both A and b are on the stack.
Queston I have about 60.000 structs that get insert into the vector. Is this a fast approach or is there a better one?
Answer You might gain some performance by creating the b with the required size and reading the data directly into the objects in b.
std::vector<A> b(60000);
for ( i = 0; ; ++i /* Use whatever looping construct you can */ )
{
str >> b[i].val;
}
Update
If you are able to, writing and reading the data in binary form will be the fastest. Use std::ostream::write() to write the data and std::istream::read() to read the data.

C I/O will often be faster than C++ I/O. Try parsing chunks of data with fscanf() (see: http://www.cplusplus.com/reference/cstdio/fscanf/) and you'll likely find the C approach runs a lot faster.

Related

Which is better, to define the variable inside the loop or outside, with huge loop times [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I need to use an array in a loop, and the loop time is very huge.
Case 1: define the array outside the for-loop and pass it to fun2
void fun1(){
int temp[16];
for(int i = 0;i <times; i++)
{
fun2(temp);
}
}
void fun2(int[]& temp){
/** do something with temp*/
}
Case 2: define the array in fun2:
void fun1() {
for (int i = 0; i < times; i++)
{
fun2();
}
}
void fun2() {
int temp[16];
/** do something with temp */
}
fun1 will be called very often. In this situation, which is better?
Does Case 2 have some influence on performance?
If you look for an answer to the general case, the answer is, "it depends." If you want an answer to your specific example, the answer is that the second version will be more efficient.
Ask yourself:
Is there a cost to construction / destruction / reuse?
In your example, there is none (except adjusting the stack pointer, which is extremely cheap). But if it was an array of objects, or if you had to initialize the array to a specific value, that changes.
How does the cost of parameterization factor in?
This is very minor but in your first case, you pass a pointer to the function. If the function is not inlined, this means that the array can only be accessed through that pointer. This takes up one register which could be used otherwise. In the second example, the array can be accessed through the stack pointer, which is basically free.
It also affects alias and escape analysis negatively which can lead to less efficient code. Basically, the compiler has to write values to memory instead of keeping them in registers if it cannot prove that a following memory read may not refer to the same value.
Which version is more robust?
The second version ensures that the array is always properly sized. On the other hand, if you pass an object whose constructor may throw an exception, constructing outside the function may allow you to throw the exception at a more convenient location. This could be significant for exception safety guarantees.
Is there a benefit in deallocating early?
Yes, allocation and deallocation are costly, but early destruction may allow some reuse. I've had cases where deallocating objects early allowed reuse of the memory in other parts of the code which improved use of the CPU cache.
depends on what you want to achieve..in this case, i'm assuming you are looking for performance which case 2 would be the better option as the function would create the variable on the fly instead of trying to get the variable globally then its value.

C++ Allocate memory for unknown no of objects read from file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
How to create unknown no of objects at run time in C++, I am reading data from a text file and dont't want to waste any memory i.e No extra objects
Player* g_data()
{
system("cls");
char name[40];int level;
fstream file;
file.open("data.txt",ios::app|ios::in|ios::out);
Player data[40],*ptr[100];
int i=0;
while(!file.eof()&&i<100)
{
file >>name>>level;
strcpy(data[i].name,name);
data[i].level=level;
data[i].id=i;
ptr[i]=&data[i];
cout<<"Address-"<<ptr[i]<<"data"<<ptr[i]->name<<"id"<<ptr[i]->id<<endl;
i++;
}
system("pause");
return ptr[i-1];
}
The thing is I need access to the memory location after I return the object and I don't want that memory to fade away, Now how can I allocate memory and access the memory throughout the program without wasting any.
If your text file is fixed-width, you can determine how many objects there are by dividing the file size by the size per object.
Modern systems rarely use fixed-width files. In that case, your only alternative is to allocate memory by actually reading the file. Read a line, allocate memory for the object represented by that line. Continue until the file is entirely read. No memory is wasted.
If you are trying to pre-allocate an array, don't. Instead, dynamically allocate memory as you read in lines.
If you must use an array (e.g. homework constraint), you can read the file twice. The first pass counts the number of objects that you need room for, you then allocate an array of appropriate size, then you read the file again to populate the array. This is wasteful as it doubles the IO requirement of your algorithm. Alternatively in this scenario, you can allocate an array with room for one element (or for the minimum number of elements that you expect), an then reallocate the array for each additional element that you read in. This is rather inefficient in that it in general requires a new memory allocation and copying of the old array data to the new array memory for each new object.

Normal array vs Array of pointers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Are there any advantage of using normal arrays over array of pointers(and vice-versa)?
When should I use array of pointers and when should I avoid it?
In college, everyone seems to be crazy about pointers. I know std::vector is an easy way out, but it is prohibited to use in college. We're asked to solve stuff without STL for now.
I got an answer from SO (link), but the answer went way over my head.
For example: Is using int array[] better or is int* parray[] better?
int array[] is an array of an int. What it means is it will hold a collection of multiple integer numbers. Imagine it as a place holder that holds a number of integers. When you use int array[] in C++, you must give it a fixed size before you use it:
int array[5]
and the size will be put inside the square bracket [], otherwise it won't compile and will give you error. The disadvantage of using this normal array is you have to know the size of the array first, otherwise the program won't run. What if your estimation size is different from actual use ? What if your estimation is much much larger than the real value ? It will cost you a lot of memory.
int *array[] is not valid in C++. If you want to do a pointer to an array without knwoing the size of the array at run time. Do this:
int *p;
int size;
cout << "How big is the size ?";
cin >> size;
p = new int[size];
That way, you don't need to know the value of size before run time, thus you won't waste memory.

Squish A Bunch Of Arrays Together C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Is there a way to take two pieces of heap allocated memory, and put them together efficiently? Would this be more efficient than the following?
for( unsigned int i = 0; i < LENGTH_0; ++i )
AddToArray( buffer, array0[ i ] );
for( unsigned int i = 0; i < LENGTH_1; ++i )
AddToArray( buffer, array1[ i ] );
For copying memory byte by byte, you can't go wrong with memcpy. That's going to be the fastest way to move memory.
Note that there are several caveats, however. For one, you have to manually ensure that your destination memory is big enough. You have to manually compute sizes of objects (with the sizeof operator). It won't work well with some objects (shared_ptr comes to mind). And it's pretty gross looking in the middle of some otherwise elegant C++11.
Your way works too and should be nearly as fast.
You should strongly consider C++'s copy algorithm (or one of its siblings), and use vectors to resize on the fly. You get to use iterators, which are much nicer. Again, it should be nearly as fast as memcpy, with the added benefit that it is far, far safer than moving bytes around: shared_ptr and its ilk will work as expected.
I'd do something like this until proven to slow:
vector<decltype(*array0)> vec;
vec.reserve(LENGTH_0 + LENGTH_1);
vec.insert(vec.end(),array0,array0 + LENGTH_0);
vec.insert(vec.end(),array1,array1 + LENGTH_1);
Depending on the data stored in array1 and array0 that might be as fast or even faster than calling a function for every single data.

Is having a loop in C++ constructor a good Idea? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I was writing code for my homework. So as I finished one of my classes I ran into a question. Is having a loop to assign values to array a good idea?
This is my class. I was thinking of making either loop in the constructor or create a function which would assign values later, by calling it manually.
Are these choices different? if yes, Which choice is better and why?
class Mule
{
private:
int numofMules;
int MapSize;
MuleNode* field;
MuleNode* mules;
public:
void random_positions();
void motion();
void print_field();
Mule(int nofMules, int mSize)
{
numofMules = nofMules;
MapSize = mSize;
mules = new MuleNode[numofMules];
field = new MuleNode[MapSize*MapSize];
for(i = 0; i < numofMules; i++)
{
mules[i].ID = i+1;
}
random_positions();
}
}
Edited the code because of the problem with allocation of one dimensional array at compilation time and recreated 2 dimensional array in 1 dimensional using formulas.
+---------------+-----------+-----------+
| i = j * w + i | x = i % w | y = i / w | w - width of the 2 dimentional array
+---------------+-----------+-----------+
Conclusion: As the question was marked as opinion-based, I guess it means that there is no big difference in using loop in the constructor or creating a function which would assign values later.
If there are any facts or opinions about this question worth sharing, please comment or write your answer.
There's not necessarily anything terrible about having a loop in a ctor.
At the same time, it's worth considering whether those items you're initializing couldn't/shouldn't be objects that know how to initialize themselves instead of creating uninitialized instances, then writing values into them.
As you've written it, the code doesn't really seem to make much sense though. The class name is Mule, but based on the ctor, it's really more like a collection of Mules. A Mule should be exactly that: one mule. A collection of N mules should be something like a std::vector<Mule>. A Mule that's really a collection of Mules is a poor idea.
You should also at least consider using std::vector instead of an array (assuming that you end up with a collection of items in the class at all, of course).
In general, not a good idea, but some constructors require a loop (example, initializing an array in heap, which is initialized in the constructor). But not all constructors are called so often (singletons, for example, called only once per process).
In the end, it depends on the class and program/object design.
Your particular class appears like it will be created only once per process. So my take is that it is OK. If that is not the case, then we have to evaluate it on a case-by-case basis.