I have a bunch of data files I need to read in to some multidimensional container, all of which are of the following form:
a1,a2,a3,...,aN,
b1,b2,b3,...,bN,
c1,c2,c3,...,cN,
................
z1,z2,z3,...,zN,
I know from this previous question that a quick way of counting the total number of lines in a file can be achieved as follows:
std::ifstream is("filename");
int lines = std::count(std::istreambuf_iterator<char>(is), std::istreambuf_iterator<char>(), '\n');
This lets me know what z, the total number of data sets to read in, each of which contains N data points. The next challenge is to count the number of data values per line, for which I can do the following:
std::ifstream is("filename");
std::string line;
std::getline(is, line);
std::istringstream line_(line);
int points = std::count(std::istreambuf_iterator<char>(line_), std::istreambuf_iterator<char>(), ',');
I can be confident that each file has the same amount of data values per line. My question is, is there a nicer/faster way of achieving the above without resorting to using getline to and dumping a single line to a string? I was wondering if this could be achieved with stream buffers, but having done a bit of searching it's not quite clear to me.
Any help would be much appreciated, thank-you!
If you were required to use
int points = std::count(std::istreambuf_iterator<char>(line_), std::istreambuf_iterator<char>(), ',');
for every line of text, I would advise you to look for a way to make it more efficient.
However, you said:
I can be confident that each file has the same amount of data values per line.
That means, you can compute the number points from the first line and assume it to be valid for the rest of the lines.
I wouldn't sweat it for a one time call.
Related
I have a function that reads a text file as input and stores the data in a vector.
It works, as long as the text file doesn't contain any extra new lines or white space.
Here is the code I currently have:
std::ifstream dataStream;
dataStream.open(inputFileName, std::ios_base::in);
std::string pushThis;
while(dataStream >> pushThis){
dataVector.push_back(pushThis);
}
For example:
safe mace
bait mate
The above works as an input text file.
This does not work:
safe mace
bait mate
Is there any way to stop the stream once you reach the final character in the file, while still maintaining separation via white space between words in order to add them to something like a vector, stack, whatever?
i.e. a vector would contain ['safe', 'mace', 'bait', 'mate']
Answer:
The problem came from having two streams, one using !dataStream.eof() and the other using dataStream >> pushThis.
Fixed so that both use dataStream >> pushThis.
For future reference for myself and others who may find this:
Don't use eof() unless you want to grab the ending bit(s) of a file (whitespace inclusive).
I am trying to read only 10 or 100 line from my file. Is there any way that I can read certain line like this?
To read a single line from a file, use:
std::string text_from_file;
std::getline(text_file_stream, text_from_file);
In C++, to perform an action many times, we use a loop. So to read 10 lines from a file, we would use a for loop:
for (unsigned int i = 0U; i < 10U; ++i)
{
std::getline(text_file_stream, text_from_file);
}
Another method:
unsigned int lines_read = 0U;
while ((lines_read < 10) && (std::getline(text_file_stream, text_from_file)))
{
++lines_read;
}
To read 100 lines, you would change the constant from 10 to 100.
Skipping Lines
The fundamental issue with skipping lines or seeking to a given line, is that the text file has variable length records. You will have to read each line to figure out where the next one starts.
So the technique for skipping lines is to read a line into a text variable and ignore it, much like the examples above.
There are methods to speed this up, but they involve reading large blocks of data into memory or treating the file as memory (a.k.a. memory mapping). One issue with this technique is handling the case where the text line you want crosses the end of the buffer (it is not fully in the buffer). These techniques can be found in other posts on StackOverflow or on the Internet.
Reading until a delimiter
A delimiter is something that indicates the end of text. The standard delimiter for text files is a newline. You can read text until a comma, period tab or other delimiter, by using the 3rd parameter of std::getline.
const char delimiter = '.';
std::string text_from_file;
std::getline(text_data_stream, text_from_file, delimiter);
All this is available in good text books or a good online reference.
I am writing a program to read a text file line by line, store the line values in a vector, do some processing then write back to a new text file. This is what the text file typically looks like:
As you can see, there are two columns: one for the frame number and another for the time. What I want is only the second column (aka the time). There can be hundreds, if not thousands of lines in the text file. Previously I have been manually deleting the frame number column which i'd rather not do. So my question is: is there an easy way to edit my current code so that when I read the file with getline() it skips the first word and only gets the second? Here is the code that I use to read the text file. Thanks
ifstream sysfile(sys_time_dir);
//Store lines in a vector
vector<string> sys_times;
string textline;
while (getline(sysfile, textline))
{
sys_times.push_back(textline);
}
Since you have two numbers in each line, you can read two numbers and ignore the first number.
vector<double> sys_times;
int first;
double second;
while ( sysfile >> first >> second )
{
sys_times.push_back(second);
}
std::string ignore_me;
while (sysfile >> ignore_me, getline(sysfile, textline)) {
...
This utilizes the comma operator, reading in the first word (here defining "word" as a continuous sequence of non-space characters) of the line, but ignoring the result, then using getline to read the rest of the line.
Note that for the specific data format you describe, I would rather choose what RSahu showed in their answer. My answer is more general to the problem of "skipping the first word and reading the rest of the line".
I am running C++ code where I need to import data from txt file.
The text file contains 10,000 lines. Each line contains n columns of binary data.
The code has to loop 100,000 times, each time it has to randomly select a line out of the txt file and assign the binary values in the columns to some variables.
What is the most efficient way to write this code? should I load the file first into the memory or should I randomly open a random line number?
How can I implement this in C++?
To randomly access a line in a text file, all lines need to have the same byte-length. If you don't have that, you need to loop until you get at the correct line. Since this will be pretty slow for so much access, better just load it into a std::vector of std::strings, each entry being one line (this is easily done with std::getline). Or since you want to assign values from the different columns, you can use a std::vector with your own struct like
struct MyValues{
double d;
int i;
// whatever you have / need
};
std::vector<MyValues> vec;
Which might be better instead of parsing the line all the time.
With the std::vector, you get your random access and only have to loop once through the whole file.
10K lines is a pretty small file.
If you have, say, 100 chars per line, it will use the HUGE amount of 1MB of your RAM.
Load it to a vector and access it the way you want.
maybe not THE most efficient, but you could try this:
int main() {
//use ifstream to read
ifstream in("yourfile.txt");
//string to store the line
string line = "";
//random number generator
srand(time(NULL));
for(int i = 0; i < 100000; i++) {
in.seekg(rand() % 10000);
in>>line;
//do what you want with the line here...
}
}
Im too lazy right now, but you need to make sure that you check your ifstream for errors like end-of-file, index-out-of-bounds, etc...
Since you're taking 100,000 samples from just 10,000 lines, the majority of lines will be sampled. Read the entire file into an array data structure, and then randomly sample the array. This avoids file seeking entirely.
The more common case is to sample only a small subset of the file's data. To do that, assuming the lines are different length, seek to random points in the file, skip to the next newline (for example cin.ignore( numeric_limits< streamsize >::max(), '\n' ), and then parse the subsequent text.
So I was feeling bored and decided I wanted to make a hangman game. I did an assignment like this back in high school when I first took C++. But this was before I even too geometry, so unfortunately I didn't do well in any way shape or form in it, and after the semester I trashed everything in a fit of rage.
I'm looking to make a txt document and just throw in a whole bunch of words
(ie:
test
love
hungery
flummuxed
discombobulated
pie
awkward
you
get
the
idea
)
So here's my question:
How do I get C++ to read a random word from the document?
I have a feeling #include<ctime> will be needed, as well as srand(time(0)); to get some kind of pseudorandom choice...but I haven't the foggiest on how to have a random word taken from a file...any suggestions?
Thanks ahead of time!
Here's a rough sketch, assuming that the words are separated by whitespaces (space, tab, newline, etc):
vector<string> words;
ifstream in("words.txt");
while(in) {
string word;
in >> word;
words.push_back(word);
}
string r=words[rand()%words.size()];
The operator >> used on a string will read 1 (white) space separated word from a stream.
So the question is do you want to read the file each time you pick a word or do you want to load the file into memory and then pick up the word from a memory structure. Without more information I can only guess.
Pick a Word from a file:
// Note a an ifstream is also an istream.
std::string pickWordFromAStream(std::istream& s,std::size_t pos)
{
std::istream_iterator<std::string> iter(s);
for(;pos;--pos)
{ ++iter;
}
// This code assumes that pos is smaller or equal to
// the number of words in the file
return *iter;
}
Load a file into memory:
void loadStreamIntoVector(std::istream& s,std::vector<std::string> words)
{
std::copy(std::istream_iterator<std::string>(s),
std::istream_iterator<std::string>(),
std::back_inserter(words)
);
}
Generating a random number should be easy enough. Assuming you only want psudo-random.
I would recommend creating a plain text file (.txt) in Notepad and using the standard C file APIs (fopen(), and fread()) to read from it. You can use fgets() to read each line one at a time.
Once you have your plain text file, just read each line into an array and then randomly choose an entry in the array using the method you've suggested above.