allocate array from file data c++

allocate array from file data c++ - c++

Here is my code where prime.txt contains some primes: 7, 11, 13, 17, 23... :
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
string file= "primes.txt";
ifstream fichier;
fichier.open(file);
int count = 0, prime;
int *buffer = (int*) malloc(8080*sizeof(int));
while ( fichier >> prime){
buffer[count] = prime;
count++;
}
fichier.close();
return 0;
}
I would like to know if there's a way to allocate an array from file's data without using loops? I saw that you can do it with binary files but I was wondering if we could do it for files with string or int too.

Here is a method that doesn't use an explicit loop:
fichier >> buffer[count++]; // Read and convert to internal format
fichier >> buffer[count++];
fichier >> buffer[count++];
// Repeat for each number in the file
fichier >> buffer[count++];
In the runtime library, a loop is used when reading characters to build the number. Also, the input stream may be buffered, which is another loop.

You can avoid writing the loop if you use iterators or ranges. Here is an equivalent example using ranges:
auto view = std::ranges::istream_view<int>(fichier);
auto copy_result = std::ranges::copy(view, buffer);
int count = std::distance(buffer, copy_result.out);
The loops are still there, inside the standard library. There's no way to avoid that with input streams.
I saw that you can do it with binary files but I was wondering if we could do it for files with string or int too.
You transform the character strings into integers. Memory mapping doesn't work when the data needs to be transformed. This could work if the integers were written in binary format rather than text. That does however have the caveat that the file would not be portable to other systems that represent integers differently in memory.
Furthermore, there is no standard way to memory map files in C++. You would need to rely on an API provided by the system.
Even then, there would be loops. But those loops would be inside the operating system kernel and not having system calls in loop iterations of user code may potentially improve performance depending on usage pattern.

If you are mentioning the loop used to store the length in count than no, you'll always need to iterate through the content of the file.
It's possible that there are some methods to do this without a explicit loop but internally they will necessarily contain a loop.

Related

Dynamically Allocating Array With Datafile

On a C++ project, I have been trying to use an array to store data from a textfile that I would later use. I have been having problems initializing the array without a size. Here is a basic sample of what I have been doing:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
int i = 0;
ifstream usern;
string data;
string otherdata;
string *users = nullptr;
usern.open("users.txt");
while(usern >> otherdata)
i++;
users = new (nothrow) string[i];
for (int n = 0; usern >> data; n++)
{
users[n] = data;
}
usern.close();
return 0;
}
This is a pretty rough example that I threw together. Basically I try to read the items from a text file called users.txt and store them in an array. I used pointers in the example that I included (which probably wasn't the best idea considering I don't know too much about poniters). When I run this program, regardless of the data in the file, I do not get any result when I try to test the values by including cout << *(users + 1). It just leaves a blank line in the window. I am guessing my error is in my use of pointers or in how I am assigning values in the pointers themselves. I was wondering if anybody could point me in the right direction on how to get the correct values into an array. Thanks!

Try reopening usern after
while(usern >> otherdata)
i++;
perhaps, try putting in
usern.close();
ifstream usern2;
usern2.open("users.txt");
right after that.
There may be other issues, but this seems like the most likely one to me. Let me know if you find success with this. To me it appears like usern is already reaching eof, and then you try to read from it a second time.
One thing that helps me a lot in finding such issues is to just put a cout << "looping"; or something inside the for loop so you know that you're at least getting in that for loop.
You can also do the same thing with usern.seekg(0, ios::beg);

What I think has happened in your code is that you have moved the pointer in the file that shows where the file is being read from. This happened when you iterated the number of strings to be read in using the code below.
while(usern >> otherdata)
i++;
This however brought the file pointer to the end of the file this means that in order to read the file you need to move the file pointer to the beginning of the file before you re-read it into your array of strings that you allocated of size i. This can be acheived by adding usern.seekg(0, ios::beg); after your while loop, as shown below. (For a good tutorial on file pointers see here.)
while(usern >> otherdata)
i++;
// Returns file pointer to beginning of file.
usern.seekg(0, ios::beg);
// The rest of your code.
Warning: I am unsure about how safe dynamically allocating STL containers are, I have previously run into issues with code similar to yours and would recommend staying away from this in functional code.

Length of a char array

I have the code like this:
#include <iostream.h>
#include <fstream.h>
void main()
{
char dir[25], output[10],temp[10];
cout<<"Enter file: ";
cin.getline(dir,25); //like C:\input.txt
ifstream input(dir,ios::in);
input.getline(output,'\eof');
int num = sizeof(output);
ofstream out("D:\\size.txt",ios::out);
out<<num;
}
I want to print the length of the output. But it always returns the number 10 (the given length) even if the input file has only 2 letters ( Like just "ab"). I've also used strlen(output) but nothing changed. How do I only get the used length of array?
I'm using VS C++ 6.0

sizeof operator on array gives you size allocated for the array, which is 10.
You need to use strlen() to know length occupied inside the array, but you need to make sure the array is null terminated.
With C++ better alternative is to simple use: std::string instead of the character array. Then you can simply use std::string::size() to get the size.

sizeof always prints the defined size of an object based on its type, not anything like the length of a string.
At least by current standards, your code has some pretty serious problems. It looks like it was written for a 1993 compiler running on MS-DOS, or something on that order. With a current compiler, the C++ headers shouldn't have .h on the end, among other things.
#include <iostream>
#include <fstream>
#include <string>
int main() {
std::string dir, output, temp;
std::cout<<"Enter file: ";
std::getline(cin, dir); //like C:\input.txt
std::ifstream input(dir.c_str());
std::getline(input, output);
std::ofstream out("D:\\size.txt");
out<<output.size();
}

The getline that you are using is an unformatted input function so you can retrieve the number of characters extracted with input.gcount().
Note that \e is not a standard escape sequence and the character constant \eof almost certainly doesn't do what you think it does. If you don't want to recognise any delimiter you should use read, not getline, passing the size of your buffer so that you don't overflow it.

saving files in c++ at different full paths

I am writing a program in C++ which I need to save some .txt files to different locations as per the counter variable in program what should be the code? Please help
I know how to save file using full path
ofstream f;
f.open("c:\\user\\Desktop\\**data1**\\example.txt");
f.close();
I want "c:\user\Desktop\data*[CTR]*\filedata.txt"
But here the data1,data2,data3 .... and so on have to be accessed by me and create a textfile in each so what is the code?
Counter variable "ctr" is already evaluated in my program.

You could snprintf to create a custom string. An example is this:
char filepath[100];
snprintf(filepath, 100, "c:\\user\\Desktop\\data%d\\example.txt", datanum);
Then whatever you want to do with it:
ofstream f;
f.open(filepath);
f.close();
Note: snprintf limits the maximum number of characters that can be written on your buffer (filepath). This is very useful for when the arguments of *printf are strings (that is, using %s) to avoid buffer overflow. In the case of this example, where the argument is a number (%d), it is already known that it cannot have more than 10 characters and so the resulting string's length already has an upper bound and just making the filepath buffer big enough is sufficient. That is, in this special case, sprintf could be used instead of snprintf.

You can use the standard string streams, such as:
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
void f ( int data1 )
{
ostringstream path;
path << "c:\\user\\Desktop\\" << data1 << "\\example.txt";
ofstream file(path.str().c_str());
if (!file.is_open()) {
// handle error.
}
// write contents...
}

HUGE .cpp file better than reading from text file?

Is it a legitimate optimisation to simply create a really HUGE source file which initialises a vector with hundreds of thousands of values manually? rather than parsing a text file with the same values into a vector?
Sorry that could probably be worded better. The function that parses the text file in is very slow due to C++'s stream reading being very slow (takes about 6 minutes opposed to about 6 seconds in the C# version.
Would making a massive array initialisation file be a legitimate solution? It doesn't seem elegant, but if it's faster then I suppose it's better?
this is the file reading code:
//parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
// Read the file line by line.
ifstream myFile(pathTable);
for (unsigned int i = 0; i < nodes.size(); i++)
{
pathLookupVectors.push_back(vector<vector<int>>());
for (unsigned int j = 0; j < nodes.size(); j++)
{
string line;
if (getline(myFile, line)) //enter if a line is read successfully
{
stringstream ss(line);
istream_iterator<int> begin(ss), end;
pathLookupVectors[i].push_back(vector<int>(begin, end));
}
}
}
myFile.close();
}
sample line from the text file (in which there are about half a million lines of similar format but varying length.
0 5 3 12 65 87 n

First, make sure you're compiling with the highest optimization level available, then please add the following lines marked below, then test again. I doubt this will fix the problem, but it may help. Hard to say until I see the results.
//parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
// Read the file line by line.
ifstream myFile(pathTable);
pathLookupVectors.reserve(nodes.size()); // HERE
for (unsigned int i = 0; i < nodes.size(); i++)
{
pathLookupVectors.push_back(vector<vector<int> >(nodes.size()));
pathLookupVectors[i].reserve(nodes.size()); // HERE
for (unsigned int j = 0; j < nodes.size(); j++)
{
string line;
if (getline(myFile, line)) //enter if a line is read successfully
{
stringstream ss(line);
istream_iterator<int> begin(ss), end;
pathLookupVectors[i].push_back(vector<int>(begin, end));
}
}
}
myFile.close();
}

6 minutes vs 6 seconds!! must be something wrong with your C++ code. Optimize it using good old methods before you revert to such an extreme "optimization" mentioned in your post.
Also know that reading from file would allow you to change the vector contents without changing the source code. If you do it the way you mention it, you'll have to re-code, compile n link all over again.

Depending if the data changes. If the data can/needs to be changed (after compiletime) than the only option is to load it from textfile. If not, well I don't see any harm to compile it.

I was able to get the following result with Boost.Spirit 2.5:
$ time ./test input
real 0m6.759s
user 0m6.670s
sys 0m0.090s
'input' is a file containing 500,000 lines containing 10 random integers between 0 and 65535 each.
Here's the code:
#include <vector>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/classic_file_iterator.hpp>
using namespace std;
namespace spirit = boost::spirit;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
typedef vector<int> ragged_matrix_row_type;
typedef vector<ragged_matrix_row_type> ragged_matrix_type;
template <class Iterator>
struct ragged_matrix_grammar : qi::grammar<Iterator, ragged_matrix_type()> {
ragged_matrix_grammar() : ragged_matrix_grammar::base_type(ragged_matrix_) {
ragged_matrix_ %= ragged_matrix_row_ % qi::eol;
ragged_matrix_row_ %= qi::int_ % ascii::space;
}
qi::rule<Iterator, ragged_matrix_type()> ragged_matrix_;
qi::rule<Iterator, ragged_matrix_row_type()> ragged_matrix_row_;
};
int main(int argc, char** argv){
typedef spirit::classic::file_iterator<> ragged_matrix_file_iterator;
ragged_matrix_type result;
ragged_matrix_grammar<ragged_matrix_file_iterator> my_grammar;
ragged_matrix_file_iterator input_it(argv[1]);
qi::parse(input_it, input_it.make_end(), my_grammar, result);
return 0;
}
At this point, result contains the ragged matrix, which can be confirmed by printing its contents. In my case the 'ragged matrix' isn't so ragged-it's a 500000 x 10 rectangle-but it won't matter because I'm pretty sure the grammar is correct. I got even better results when I read the entire file into memory before parsing (~4 sec), but the code for that is longer and it's generally undesirable to copy large files into memory in their entirety.
Note: my test machine has an SSD, so I don't know if you'll get the same numbers I did (unless your test machine has an SSD as well).
HTH!

I wouldn't consider compiling static data into your application to be bad practice. If there is little conceivable need to change your data without a recompilation, parsing the file at compile time not only improves runtime performance (since your data have been pre-parsed by the compiler and are in a usable format at runtime), but also reduces risks (like the data file not being found at runtime or any other parse errors).
Make sure that users won't have need to change the data (or have the means to recompile the program), document your motivation and you should be absolutely fine.
That said, you could make the iostream version a lot faster if necessary.

using a huge array in a C++ file is a totally allowed option, depending on the case.
You must consider if the data will change and how often.
If you put it in a C++ file, that means that you will have to recompile your program each time the data change (and distribute it to your customers each time !) So that wouldn't be a good solution if you have to distribute the program to other people.
Now if a compilation is allowed for every data change, then you can have the best of two worlds : just use a small script (for example in python or perl) which will take your .txt and generate a C++ file, so the file parsing will only have to be done one time for each data change. You can even integrate this step in your build process with automatic dependency management.
Good luck !

Don't use the std input stream, it's extremely slow.
There are better alternatives.
Since people decided to downvote my answer because they are too lazy to use google, here:
http://accu.org/index.php/journals/1539

How to speed-up loading of 15M integers from file stream?

I have an array of precomputed integers, it's fixed size of 15M values. I need to load these values at the program start. Currently it takes up to 2 mins to load, file size is ~130MB. Is it any way to speed-up loading. I'm free to change save process as well.
std::array<int, 15000000> keys;
std::string config = "config.dat";
// how array is saved
std::ofstream out(config.c_str());
std::copy(keys.cbegin(), keys.cend(),
std::ostream_iterator<int>(out, "\n"));
// load of array
std::ifstream in(config.c_str());
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in_ranks.close();
Thanks in advance.
SOLVED. Used the approach proposed in accepted answer. Now it takes just a blink.
Thanks all for your insights.

You have two issues regarding the speed of your write and read operations.
First, std::copy cannot do a block copy optimization when writing to an output_iterator because it doesn't have direct access to underlying target.
Second, you're writing the integers out as ascii and not binary, so for each iteration of your write output_iterator is creating an ascii representation of your int and on read it has to parse the text back into integers. I believe this is the brunt of your performance issue.
The raw storage of your array (assuming a 4 byte int) should only be 60MB, but since each character of an integer in ascii is 1 byte any ints with more than 4 characters are going to be larger than the binary storage, hence your 130MB file.
There is not an easy way to solve your speed problem portably (so that the file can be read on different endian or int sized machines) or when using std::copy. The easiest way is to just dump the whole of the array to disk and then read it all back using fstream.write and read, just remember that it's not strictly portable.
To write:
std::fstream out(config.c_str(), ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
And to read:
std::fstream in(config.c_str(), ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
----Update----
If you are really concerned about portability you could easily use a portable format (like your initial ascii version) in your distribution artifacts then when the program is first run it could convert that portable format to a locally optimized version for use during subsequent executions.
Something like this perhaps:
std::array<int, 15000000> keys;
// data.txt are the ascii values and data.bin is the binary version
if(!file_exists("data.bin")) {
std::ifstream in("data.txt");
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(), keys.begin());
in.close();
std::fstream out("data.bin", ios::out | ios::binary);
out.write( keys.data(), keys.size() * sizeof(int) );
} else {
std::fstream in("data.bin", ios::in | ios::binary);
in.read( keys.data(), keys.size() * sizeof(int) );
}
If you have an install process this preprocessing could also be done at that time...

Attention. Reality check ahead:
Reading integers from a large text file is an IO bound operation unless you're doing something completely wrong (like using C++ streams for this). Loading 15M integers from a text file takes less than 2 seconds on an AMD64#3GHZ when the file is already buffered (and only a bit long if had to be fetched from a sufficiently fast disk). Here's a quick & dirty routine to prove my point (that's why I do not check for all possible errors in the format of the integers, nor close my files at the end, because I exit() anyway).
$ wc nums.txt
15000000 15000000 156979060 nums.txt
$ head -n 5 nums.txt
730547560
-226810937
607950954
640895092
884005970
$ g++ -O2 read.cc
$ time ./a.out <nums.txt
=>1752547657
real 0m1.781s
user 0m1.651s
sys 0m0.114s
$ cat read.cc
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <vector>
int main()
{
char c;
int num=0;
int pos=1;
int line=1;
std::vector<int> res;
while(c=getchar(),c!=EOF)
{
if (c>='0' && c<='9')
num=num*10+c-'0';
else if (c=='-')
pos=0;
else if (c=='\n')
{
res.push_back(pos?num:-num);
num=0;
pos=1;
line++;
}
else
{
printf("I've got a problem with this file at line %d\n",line);
exit(1);
}
}
// make sure the optimizer does not throw vector away, also a check.
unsigned sum=0;
for (int i=0;i<res.size();i++)
{
sum=sum+(unsigned)res[i];
}
printf("=>%d\n",sum);
}
UPDATE: and here's my result when read the text file (not binary) using mmap:
$ g++ -O2 mread.cc
$ time ./a.out nums.txt
=>1752547657
real 0m0.559s
user 0m0.478s
sys 0m0.081s
code's on pastebin:
http://pastebin.com/NgqFa11k
What do I suggest
1-2 seconds is a realistic lower bound for a typical desktop machine for load this data. 2 minutes sounds more like a 60 Mhz micro controller reading from a cheap SD card. So either you have an undetected/unmentioned hardware condition or your implementation of C++ stream is somehow broken or unusable. I suggest to establish a lower bound for this task on your your machine by running my sample code.

if the integers are saved in binary format and you're not concerned with Endian problems, try reading the entire file into memory at once (fread) and cast the pointer to int *

You could precompile the array into a .o file, which wouldn't need to be recompiled unless the data changes.
thedata.hpp:
static const int NUM_ENTRIES = 5;
extern int thedata[NUM_ENTRIES];
thedata.cpp:
#include "thedata.hpp"
int thedata[NUM_ENTRIES] = {
10
,200
,3000
,40000
,500000
};
To compile this:
# make thedata.o
Then your main application would look something like:
#include "thedata.hpp"
using namespace std;
int main() {
for (int i=0; i<NUM_ENTRIES; i++) {
cout << thedata[i] << endl;
}
}
Assuming the data doesn't change often, and that you can process the data to create thedata.cpp, then this is effectively instant loadtime. I don't know if the compiler would choke on such a large literal array though!

Save the file in a binary format.
Write the file by taking a pointer to the start of your int array and convert it to a char pointer. Then write the 15000000*sizeof(int) chars to the file.
And when you read the file, do the same in reverse: read the file as a sequence of chars, take a pointer to the beginning of the sequence, and convert it to an int*.
of course, this assumes that endianness isn't an issue.
For actually reading and writing the file, memory mapping is probably the most sensible approach.

If the numbers never change, preprocess the file into a C++ source and compile it into the application.
If the number can change and thus you have to keep them in separate file that you have to load on startup then avoid doing that number by number using C++ IO streams. C++ IO streams are nice abstraction but there is too much of it for such simple task as loading a bunch of number fast. In my experience, huge part of the run time is spent in parsing the numbers and another in accessing the file char by char.
(Assuming your file is more than single long line.) Read the file line by line using std::getline(), parse numbers out of each line using not streams but std::strtol(). This avoids huge part of the overhead. You can get more speed out of the streams by crafting your own variant of std::getline(), such that reads the input ahead (using istream::read()); standard std::getline() also reads input char by char.

Use a buffer of 1000 (or even 15M, you can modify this size as you please) integers, not integer after integer. Not using a buffer is clearly the problem in my opinion.

If the data in the file is binary and you don't have to worry about endianess, and you're on a system that supports it, use the mmap system call. See this article on IBM's website:
High-performance network programming, Part 2: Speed up processing at both the client and server
Also see this SO post:
When should I use mmap for file access?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

allocate array from file data c++ - c++

If you are mentioning the loop used to store the length in count than no, you'll always need to iterate through the content of the file. It's possible that there are some methods to do this without a explicit loop but internally they will necessarily contain a loop.

Related

Dynamically Allocating Array With Datafile

Length of a char array

saving files in c++ at different full paths

HUGE .cpp file better than reading from text file?

How to speed-up loading of 15M integers from file stream?

Categories

Resources