I an a newbie to c++.
I want to write a program to read values from file which has data in format:
text<tab or space>text
text<tab or space>text
...
(... indicates more such lines)
The number of lines in file varies. Now, I want to read this file and store the text into either 1 2D string array or 2 1D string arrays.
How do I do it?
Furthermore, I want to run a for loop over this array to process the each entry in file. How can I write that loop?
You're looking for a resizable array. Try std::vector<string>. You can find documentation here.
Edit: You could probably also manage to do this by opening the file, looping through to count the lines of the file, generating your fixed-size array, closing and reopening the file, and then looping through the file to populate the array. However, this is not recommended, as it will increase your runtime complexity far more than the slight overhead involved with managing vector, and it will make your code much more confusing for anyone who reads it.
(ps - I agree with #matthias-vallentin, you should've been able to find this on the site with minimal work)
Related
I'm just wondering if say...
ifstream fin(xxx)
then
char c;
fin.get(c)
is any better than putting the entire text into an array and iterating through the array instead of getting characters from the loaded file.
I guess there's the extra step to put the input file into an array.
If the file is 237 GB, then iterating over it is more feasible than copying it to a memory array.
If you iterate, you still want to do the actual disk I/O in page sized chunks (not go to the device for every byte). But streams usually provide that kind of buffering.
So what you want is a mix of both.
What is the best way to populate array "in fly" when reading file lines?
I found similar solution here
But firstly the programmer looping through the file to get numbers of lanes and then in another loop saving content to structure.
I'm looking for the way how to do it in one loop
Alternatively, you can use malloc / realloc to read the array with unknown structure on the fly.
Use custom unidirectional lists or std::list when it is not memory and access-time critical.
I have a CSV file which contains 250 lines and each line contains 12 items separated by commas. I am going to store this in a 2D array of which the dimension is [250][12].
My question is : " Is it a bad programming practice to use such a huge array ?
I am going to pass this array to a method which takes a 2D array as the argument. It comes with openCV.
will there be a memory overflow ? "
Well, if you don't have to use it, it would be better. For example, read the file line by line and enter each line into the csv parser. That way each line is dealt with, and you rely on the (hopefully professional and optimized) memory management.
However, if it works it works. If you don't need this in a production environment, I don't see why you should have to change it, other than good practice.
First, you have to be clear about how you'll break a line of text into 12 fields typed as expected by openCV. You may want that to be the central area of the design.
No problem using a static array if the size 250x12 will never change and memory consumption is suitable for the hardware your program is supposed to run on. You face a trade-off between memory usage and complexity of code: if memory is a concern or if you have flexibility in mind then you should process line by line or even token by token, provided openCV implements those modes.
If you know the size of the array is going to be limited to 250*12 then that is not a huge array, assuming you are using a reasonable machine. Even with long double type elements your array is going to take 36 MB of space. If, however, your underlying elements are objects with sub-elements then you may want to re-think your approach e.g., processing the array row-by-row or element-by-element instead of reading it into the memory all at once.
As for passing the array to the function, you will not pass the array by value, you will pass a pointer to the array so it should not be a big overhead.
Currently I read arrays in C++ with ifstream, read and reinterpret_cast by making a loop on values. Is it possible to load for example an unsigned int array from a binary file in one time without making a loop ?
Thank you very much
Yes, simply pass the address of the first element of the array, and the size of the array in bytes:
// Allocate, for example, 47 ints
std::vector<int> numbers(47);
// Read in as many ints as 'numbers' has room for.
inFile.read(&numbers[0], numbers.size()*sizeof(numbers[0]));
Note: I almost never use raw arrays. If I need a sequence that looks like an array, I use std::vector. If you must use an array, the syntax is very similar.
The ability to read and write binary images is non-portable. You may not be able to re-read the data on another machine, or even on the same machine with a different compiler. But, you have that problem already, with the solution that you are using now.
I would like to delete parts from a binary file, using C++. The binary file is about about 5-10 MB.
What I would like to do:
Search for a ANSI string "something"
Once I found this string, I would like to delete the following n bytes, for example the following 1 MB of data. I would like to delete those character, not to fill them with NULL, thus make the file smaller.
I would like to save the modified file into a new binary file, what is the same as the original file, except for the missing n bytes what I have deleted.
Can you give me some advice / best practices how to do this the most efficiently? Should I load the file into memory first?
How can I search efficiently for an ANSI string? I mean possibly I have to skip a few megabytes of data before I find that string. >> I have been told I should ask it in an other question, so its here:
How to look for an ANSI string in a binary file?
How can I delete n bytes and write it out to a new file efficiently?
OK, I don't need it to be super efficient, the file will not be bigger than 10 MB and its OK if it runs for a few seconds.
There are a number of fast string search routines that perform much better than testing each and every character. For example, when trying to find "something", only every 9th character needs to be tested.
Here's an example I wrote for an earlier question: code review: finding </body> tag reverse search on a non-null terminated char str
For a 5-10MB file I would have a look at writev() if your system supports it. Read the entire file into memory since it is small enough. Scan for the bytes you want to drop. Pass writev() the list of iovecs (which will just be pointers into your read buffer and lenghts) and then you can rewrite the entire modified contents in a single system call.
First, if I understand your meaning in your "How can I search efficiently" subsection, you cannot just skip a few megabytes of data in the search if the target string might be in those first few megabytes.
As for loading the file into memory, if you do that, don't forget to make sure you have enough space in memory for the entire file. You will be frustrated if you go to use your utility and find that the 2GB file you want to use it on can't fit in the 1.5GB of memory you have left.
I am going to assume you will load into memory or memory map it for the following.
You did specifically say this was a binary file, so this means that you cannot use the normal C++ string searching/matching, as the null characters in the file's data will confuse it (end it prematurely without a match). You might instead be able to use memchr to find the first occurrence of the first byte in your target, and memcmp to compare the next few bytes with the bytes in the target; keep using memchr/memcmp pairs to scan through the entire thing until found. This is not the most efficient way, as there are better pattern-matching algorithms, but this is a sort of efficient way, I suppose.
To "delete" n bytes you have to actually move the data after those n bytes, copying the entire thing up to the new location.
If you actually copy the data from disk to memory, then it'd be faster to manipulate it there and write to the new file. Otherwise, once you find the spot on the disk you want to start deleting from, you can open a new file for writing, read in X bytes from the first file, where X is the file pointer position into the first file, and write them right into the second file, then seek into the first file to X+n and do the same from there to file1's eof, appending that to what you've already put into file2.