Populate dynamics array from file

Populate dynamics array from file - c++

What is the best way to populate array "in fly" when reading file lines?
I found similar solution here
But firstly the programmer looping through the file to get numbers of lanes and then in another loop saving content to structure.
I'm looking for the way how to do it in one loop

Alternatively, you can use malloc / realloc to read the array with unknown structure on the fly.
Use custom unidirectional lists or std::list when it is not memory and access-time critical.

Related

Iterating through an array or getting characters from an open file - are there any advantages of one over the other?

I'm just wondering if say...
ifstream fin(xxx)
then
char c;
fin.get(c)
is any better than putting the entire text into an array and iterating through the array instead of getting characters from the loaded file.
I guess there's the extra step to put the input file into an array.

If the file is 237 GB, then iterating over it is more feasible than copying it to a memory array.
If you iterate, you still want to do the actual disk I/O in page sized chunks (not go to the device for every byte). But streams usually provide that kind of buffering.
So what you want is a mix of both.

C++ - using a 2D array of which the dimension is [250][12]

I have a CSV file which contains 250 lines and each line contains 12 items separated by commas. I am going to store this in a 2D array of which the dimension is [250][12].
My question is : " Is it a bad programming practice to use such a huge array ?
I am going to pass this array to a method which takes a 2D array as the argument. It comes with openCV.
will there be a memory overflow ? "

Well, if you don't have to use it, it would be better. For example, read the file line by line and enter each line into the csv parser. That way each line is dealt with, and you rely on the (hopefully professional and optimized) memory management.
However, if it works it works. If you don't need this in a production environment, I don't see why you should have to change it, other than good practice.

First, you have to be clear about how you'll break a line of text into 12 fields typed as expected by openCV. You may want that to be the central area of the design.
No problem using a static array if the size 250x12 will never change and memory consumption is suitable for the hardware your program is supposed to run on. You face a trade-off between memory usage and complexity of code: if memory is a concern or if you have flexibility in mind then you should process line by line or even token by token, provided openCV implements those modes.

If you know the size of the array is going to be limited to 250*12 then that is not a huge array, assuming you are using a reasonable machine. Even with long double type elements your array is going to take 36 MB of space. If, however, your underlying elements are objects with sub-elements then you may want to re-think your approach e.g., processing the array row-by-row or element-by-element instead of reading it into the memory all at once.
As for passing the array to the function, you will not pass the array by value, you will pass a pointer to the array so it should not be a big overhead.

Dynamic length string array

I an a newbie to c++.
I want to write a program to read values from file which has data in format:
text<tab or space>text
text<tab or space>text
...
(... indicates more such lines)
The number of lines in file varies. Now, I want to read this file and store the text into either 1 2D string array or 2 1D string arrays.
How do I do it?
Furthermore, I want to run a for loop over this array to process the each entry in file. How can I write that loop?

You're looking for a resizable array. Try std::vector<string>. You can find documentation here.
Edit: You could probably also manage to do this by opening the file, looping through to count the lines of the file, generating your fixed-size array, closing and reopening the file, and then looping through the file to populate the array. However, this is not recommended, as it will increase your runtime complexity far more than the slight overhead involved with managing vector, and it will make your code much more confusing for anyone who reads it.
(ps - I agree with #matthias-vallentin, you should've been able to find this on the site with minimal work)

C++ fstream Erase the file contents from a selected Point

I need to Erase the file contents from a selected Point (C++ fstream) which function should i use ?
i have written objects , i need to delete these objects in middle of the file

C++ has no standard mechanism to truncate a file at a given point. You either have to recreate the file (open with ios::trunc and write the contents you want to keep) or use OS-specific API calls (SetEndOfFile on Windows, truncate or ftruncate on Unix).
EDIT: Deleting stuff in the middle of a file is an exceedingly precarious business. Long before considering any other alternatives, I would try to use a server-less database engine like SQLite to store serialised objects. Better still, I would use SQLite as intended by storing the data needed by those objects in a proper schema.
EDIT 2: If the problem statement requires raw file access...
As a general rule, you don't delete data from the middle of a file. If the objects can be serialised to a fixed size on disk, you can work with them as records, and rather than trying to delete data, you use a table that indexes records within the file. E.g., if you write four records in sequence, the table will hold [0, 1, 2, 3]. In order to delete the second record, you simply remove its entry from the table: [0, 2, 3]. There are at least two ways to reuse the holes left behind by the table:
On each insertion, scan for the first unused index and write the object out at the corresponding record location. This will become more expensive, though, as the file grows.
Maintain a free list. Store, as a separate variable, the index of the most recently freed record. In the space occupied by that record encode the index of the record freed before it, and so on. This maintains a handy linked-list of free records while only requiring space fo one additional number. It is more complicated to work with, however, and requires an extra disk I/O when deleting and inserting.
If the objects can't be serialised to a fixed-length, then this approach becomes much, much harder. Variable-length record management code is very complex.
Finally, if the problem statement requires keeping records in order on disk, then it's a stupid problem statement, because insertion/removal in the middle of a file is ridiculously expensive; no sane design would require this.

The general method is to open the file for read access, open a new file for write access, read the content of the first file and write the data you want retained to the second file. When complete, you delete the first file and rename the second to that of the first.

How to store a hash table in a file?

How can I store a hash table with separate chaining in a file on disk?
Generating the data stored in the hash table at runtime is expensive, it would be faster to just load the HT from disk...if only I can figure out how to do it.
Edit:
The lookups are done with the HT loaded in memory. I need to find a way to store the hashtable (in memory) to a file in some binary format. So that next time when the program runs it can just load the HT off disk into RAM.
I am using C++.

What language are you using? The common method is to do some sort binary serialization.
Ok, I see you have edited to add the language. For C++ there a few options. I believe the Boost serialization mechanism is pretty good. In addition, the page for Boost's serialization library also describes alternatives. Here is the link:
http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/index.html

Assuming C/C++: Use array indexes and fixed size structs instead of pointers and variable length allocations. You should be able to directly write() the data structures to file for later read()ing.
For anything higher-level: A lot of higher language APIs have serialization facilities. Java and Qt/C++ both have methods that sprint immediately to mind, so I know others do as well.

You could just write the entire data structure directly to disk by using serialization (e.g. in Java). However, you might be forced to read the entire object back into memory in order to access its elements. If this is not practical, then you could consider using a random access file to store the elements of the hash table. Instead of using a pointer to represent the next element in the chain, you would just use the byte position in the file.

Ditch the pointers for indices.
This is a bit similar to constructing an on-disk DAWG, which I did a while back. What made that so very sweet was that it could be loaded directly with mmap instead reading the file. If the hash-space is manageable, say 216 or 224 entries, then I think I would do something like this:
Keep a list of free indices. (if the table is empty, each chain-index would point at the next index.)
When chaining is needed use the free space in the table.
If you need to put something in an index that's occupied by a squatter (overflow from elsewhere) :
record the index (let's call it N)
swap the new element and the squatter
put the squatter in a new free index, (F).
follow the chain on the squatter's hash index, to replace N with F.
If you completely run out of free indices, you probably need a bigger table, but you can cope a little longer by using mremap to create extra room after the table.
This should allow you to mmap and use the table directly, without modification. (scary fast if in the OS cache!) but you have to work with indices instead of pointers. It's pretty spooky to have megabytes available in syscall-round-trip-time, and still have it take up less than that in physical memory, because of paging.

Perhaps DBM could be of use to you.

If your hash table implementation is any good, then just store the hash and each object's data - putting an object into the table shouldn't be expensive given the hash, and not serialising the table or chain directly lets you vary the exact implementation between save and load.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js