Getting the nth line of a text file in C++

Getting the nth line of a text file in C++ - c++

I need to read the nth line of a text file (e.g. textfile.findline(0) would find the first line of the text file loaded with ifstream textfile). Is this possible?
I don't need to put the contents of the file in an array/vector, I need to just assign a specific line of the text file to a varible (specifically a int).
P.S. I am looking for the simplest solution that would not require me to use any big external library (e.g. Boost)
Thanks in advance.

How about this?
std::string ReadNthLine(const std::string& filename, int N)
{
std::ifstream in(filename.c_str());
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
//skip N lines
for(int i = 0; i < N; ++i)
std::getline(in, s);
std::getline(in,s);
return s;
}

If you want to read the start of the nth line, you can use stdin::ignore to skip over the first n-1 lines, then read from the next line to assign to the variable.
template<typename T>
void readNthLine(istream& in, int n, T& value) {
for (int i = 0; i < n-1; ++i) {
in.ignore(numeric_limits<streamsize>::max(), '\n');
}
in >> value;
}

Armen's solution is the correct answer, but I thought I'd throw out an alternative, based on jweyrich's caching idea. For better or for worse, this reads in the entire file at construction, but only saves the newline positions (doesn't store the entire file, so it plays nice with massive files.) Then you can simply call ReadNthLine, and it will immediately jump to that line, and read in the one line you want. On the other hand, this is only optimal if you want to get only a fraction of the lines at a time, and the line numbers are not known at compile time.
class TextFile {
std::ifstream file_stream;
std::vector<std::ifstream::streampos> linebegins;
TextFile& operator=(TextFile& b) = delete;
public;
TextFile(std::string filename)
:file_stream(filename)
{
//this chunk stolen from Armen's,
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
while(file_stream) {
linebegins.push_back(file_stream.tellg());
std::getline(file_stream, s);
}
}
TextFile(TextFile&& b)
:file_stream(std::move(b.file_stream)),
:linebegins(std::move(b.linebegins))
{}
TextFile& operator=(TextFile&& b)
{
file_stream = std::move(b.file_stream);
linebegins = std::move(b.linebegins);
}
std::string ReadNthLine(int N) {
if (N >= linebegins.size()-1)
throw std::runtime_error("File doesn't have that many lines!");
std::string s;
// clear EOF and error flags
file_stream.clear();
file_stream.seekg(linebegins[N]);
std::getline(file_stream, s);
return s;
}
};

It's certainly possible. There are (n-1) '\n' characters preceding the nth line. Read lines until you reach the one you're looking for. You can do this on the fly without storing anything except the current line being considered.

Related

Extracting a particular data from a CSV file in c++

I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++. I want to count the no. of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file. I have written the following code to read a CVS file, but I am not sure how shall I count the no. of columns as I have mentioned before.
Will appreciate it if anyone could please tell me how shall I go about it?
char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size] = '\0'; //make it null-terminated
return s;
}
char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
fin.getline(line, 1024);
lines[i] = new char[strlen(line)+1];
strcpy(lines[i], line);
};
fin.close();
return lines;
}
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,
,Afghanistan,33.0,65.0,0,0,0,0,0,0,0,
,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.

To answer your question:
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
Yeah, not that difficult - that's how I would do it:
#include <iostream>
#include <string>
#include <fstream>
#include <regex>
#define FILENAME "test.csv" //Your filename as Macro
//(The compiler just sees text.csv instead of FILENAME)
void read(){
std::string n;
//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;
std::ifstream file(FILENAME, std::ios::in);
if ( ! file.is_open() )
{
std::cout << "Could not open file!" << '\n';
}
do{
getline(file,n,',');
//https://en.cppreference.com/w/cpp/string/basic_string/getline
if(std::regex_search(n,result1,pattern1))
std::cout << result1.str(1) << n << std::endl;
if(std::regex_search(n,result2,pattern2))
std::cout << result2.str(1) << n << std::endl;
}
while(!file.eof());
file.close();
}
int main ()
{
read();
return 0;
}
The file test.csv contains the following for testing:
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0
It actually is pretty simple:
getline takes the open file and "escapes" at a so called escape-charachter,
in your case a comma ','.
(That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill)
After this you got all data nicely separated underneath one another without a comma.
Now you can "filter" out what you need. I use regex - but use what ever you want.
(Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..)
I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make
the regex pattern more easy to read/understand in 2 lines.
you can/may have to expand the regex pattern if the data somehow matches
your date format in the file, really good explained here and not as complicated as it looks.
From that point you can put all the printed stuff f.e. in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.
If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.

You basically want to search for the seperator substring within your line (normally it is ';').
If you print out your lines it should look like this:
a;b;c;d;e;f;g;h
There are several ways to achieve what you want, I would look for a strip or split upon character function. Something along the example below should work. If you use std you can go with str.IndexOf instead of a loop.
int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}
The recursion can obviously be replaced by one loop ;)
int countSign(char* line, char* sign){
unsigned l = strlen(line);
int count = 0;
for (int i=0; i < l; i++) {
if(strcmp(line[i],sign)) count++;
}
}

Txt to 2 different arrays c++

I have a txt file with a lot of things in it.
The lines have this pattern: 6 spaces then 1 int, 1 space, then a string.
Also, the 1st line has the amount of lines that the txt has.
I want to put the integers in an array of ints and the string on an array of strings.
I can read it and put it into an array , but only if I'm considering the ints as chars and putting into one array of strings.When I try to separate things I have no idea on how I'd do it. Any ideas?
The code I used for putting everything in an array was this:
int size()
{
ifstream sizeX;
int x;
sizeX.open("cities.txt");
sizeX>>x;
return x;
};
int main(void)
{
int size = size();
string words[size];
ifstream file("cities.txt");
file.ignore(100000,'\n');
if(file.is_open())
{
for(int i=0; i<size; i++)
{
getline(file,words[i]);
}
}
}

Just to start I'm going to provide some tips about your code:
int size = size();
Why do you need to open the file, read the first line and then close it? That process can be done opening the file just once.
The code string words[size]; is absolutely not legal C++. You cannot instantiate a variable-length-array in C++. That C feature has been not included in C++ standard (some ref). I suggest you to replace with std::vector, which is more C++ code.
Here I write a snippet of function which perform what you need.
int parse_file(const std::string& filename,
std::vector<std::string>* out_strings,
std::vector<int>* out_integers) {
assert(out_strings != nullptr);
assert(out_integers != nullptr);
std::ifstream file;
file.open(filename, std::ios_base::in);
if (file.fail()) {
// handle the error
return -1;
}
// Local variables
int num_rows;
std::string line;
// parse the first line
std::getline(file, line);
if (line.size() == 0) {
// file empty, handle the error
return -1;
}
num_rows = std::stoi(line);
// reserve memory
out_strings->clear();
out_strings->reserve(num_rows);
out_integers->clear();
out_integers->reserve(num_rows);
for (int row = 0; row < num_rows; ++row) {
// read the line
std::getline(file, line);
if (line.size() == 0) {
// unexpected end of line, handle it
return -1;
}
// get the integer
out_integers->push_back(
std::stoi(line.substr(6, line.find(' ', 6) - 6)));
// get the string
out_strings->push_back(
line.substr(line.find(' ', 6) + 1, std::string::npos));
}
file.close();
return 0;
}
You can definitely improved it, but I think it's a good point where to start.
The last suggest I can give you, in order to improve the robustness of your code, you can match each line with a regular expression. In this way you can be sure your line is formatted exactly how you need.
For example:
std::regex line_pattern("\\s{6}[0-9]+\\s[^\\n]+");
if (std::regex_match(line, line_pattern) == false) {
// ups... the line is not formatted how you need
// this is an error
}

How to read only some previously know lines using ifstream (C++)

By preprocessing on the file i found some line for further processing, know i want to read that lines. is there any faster solution than reading lines one by one using ifstream::getline(...) ?
For example i know that i want only lines of product 4 (0-4-8-12-16-...) or special line numbers stored in a vector...
Now I'm doing this :
string line;
int counter = 0;
while( getline(ifstr,line) ){
if(counter%4 =0){
// some code working with line
}
}
but i want something like this (if faster)
while(getline(ifstr,line)){
// some code working with line
while(++counter%4 !=0){ // or checking on index vector
skipline(ifstr)
}
}
Let me mention again that i have some line index (sorted but not this regular) but i use this example of product4 for simplicity.
Edit: and i want to jump to line at the begining, for example i know that i need to read from line number 2000, how to skip 1999 lines quickly ?
Thanks all

Because #caps said this left him with the feeling there's nothing in the standard library to help with this kind of task, I felt compelled to demonstrate otherwise :)
Live On Coliru
template <typename It, typename Out, typename Filter = std::vector<int> >
Out retrieve_lines(It begin, It const end, Filter lines, Out out, char const* delim = "\\n") {
if (lines.empty())
return out;
// make sure input is orderly
assert(std::is_sorted(lines.begin(), lines.end()));
assert(lines.front() >= 0);
std::regex re(delim);
std::regex_token_iterator<It> line(begin, end, re, -1), eof;
// make lines into incremental offsets
std::adjacent_difference(lines.begin(), lines.end(), lines.begin());
// iterate advancing by each offset requested
auto advanced = [&line, eof](size_t n) { while (line!=eof && n--) ++line; return line; };
for (auto offset = lines.begin(); offset != lines.end() && advanced(*offset) != eof; ++offset) {
*out++ = *line;
}
return out;
}
This is noticably more generic. The trade off (for now) is that the tokenizing iterator requires a random access iterator. I find this a good trade-off because "random access" on files really asks for memory mapped files anyways
Live Demo 1: from string to vector<string>
Live On Coliru
int main() {
std::vector<std::string> output_lines;
std::string is(" a b c d e\nf g hijklmnop\nqrstuvw\nxyz");
retrieve_lines(is.begin(), is.end(), {0,3,999}, back_inserter(output_lines));
// for debug purposes
for (auto& line : output_lines)
std::cout << line << "\n";
}
Prints
a b c d e
xyz
Live Demo 2: From file to cout
Live On Coliru
#include <boost/iostreams/device/mapped_file.hpp>
int main() {
boost::iostreams::mapped_file_source is("/etc/dictionaries-common/words");
retrieve_lines(is.begin(), is.end(), {13,784, 9996}, std::ostream_iterator<std::string>(std::cout, "\n"));
}
Prints e.g.
ASL's
Apennines
Mercer's
The use of boost::iostreams::mapped_file_source can easily be replaced with straight up ::mmap but I found it uglier in the presentation sample.

Store std::fstream::streampos instances corresponding to line beginnings of your file into a std::vector and then you can access a specific line using the index of this vector. A possible implementation follows,
class file_reader {
public:
// load file streampos offsets during construction
explicit file_reader(const std::string& filename)
: fs(filename) { cache_file_streampos(); }
std::size_t lines() const noexcept { return line_streampos_vec.size(); }
// get a std::string representation of specific line in file
std::string read_line(std::size_t n) {
if (n >= line_streampos_vec.size() - 1)
throw std::out_of_range("out of bounds");
navigate_to_line(n);
std::string rtn_str;
std::getline(fs, rtn_str);
return rtn_str;
}
private:
std::fstream fs;
std::vector<std::fstream::streampos> line_streampos_vec;
const std::size_t max_line_length = // some sensible value
// store file streampos instances in vector
void cache_file_streampos() {
std::string s;
s.reserve(max_line_length);
while (std::getline(fs, s))
line_streampos_vec.push_back(fs.tellg());
}
// go to specific line in file stream
void navigate_to_line(std::size_t n) {
fs.clear();
fs.seekg(line_streampos_vec[n]);
}
};
Then you can read a specific line of your file via,
file_reader fr("filename.ext");
for (int i = 0; i < fr.lines(); ++i) {
if (!(i % 4))
std::string line_contents = fr.read_line(i); // then do something with the string
}

ArchbishopOfBanterbury's answer is nice, and I would agree with him that you will get cleaner code and better efficiency by just storing the character positions of the beginning of each line when you do your preprocessing.
But, supposing that is not possible (perhaps the preprocessing is handled by some other API, or is from user input), there is a solution that should do the minimal amount of work necessary to read in only the specified lines.
The fundamental problem is that, given a file with variable line lengths, you cannot know where each line begins and ends, since a line is defined as a sequence of characters that end in '\n'. So, you must parse every character to check and see if it is '\n' or not, and if so, advance your line counter and read in the line if the line counter matches one of your desired inputs.
auto retrieve_lines(std::ifstream& file_to_read, std::vector<int> line_numbers_to_read) -> std::vector<std::string>
{
auto begin = std::istreambuf_iterator<char>(file_to_read);
auto end = std::istreambuf_iterator<char>();
auto current_line = 0;
auto next_line_num = std::begin(line_numbers_to_read);
auto output_lines = std::vector<std::string>();
output_lines.reserve(line_numbers_to_read.size()); //this may be a silly "optimization," since all the strings are still separate unreserved buffers
//we can bail if we've reached the end of the lines we want to read, even if there are lines remaining in the stream
//we *must* bail if we've reached the end of the stream, even if there are supposedly lines left to read; that input must have been incorrect
while(begin != end && next_line_num != std::end(line_numbers_to_read))
{
if(current_line == *next_line_num)
{
auto matching_line = std::string();
if(*begin != '\n')
{
//potential optimization: reserve matching_line to something that you expect will fit most/all of your input lines
while(begin != end && *begin != '\n')
{
matching_line.push_back(*begin++);
}
}
output_lines.emplace_back(matching_line);
++next_line_num;
}
else
{
//skip this "line" by finding the next '\n'
while(begin != end && *begin != '\n')
{
++begin;
}
}
//either code path in the previous if/else leaves us staring at the '\n' at the end of a line,
//which is not the right state for the next iteration of the loop.
//So skip this '\n' to get to the beginning of the next line
if (begin != end && *begin == '\n')
{
++begin;
}
++current_line;
}
return output_lines;
}
Here it is live on Coliru, along with the input I tested it with. As you can see, it correctly handles empty lines as well as correctly handling being told to grab more lines than are in the file.

Comparing numbers in strings to numbers in int?

I'm trying to make a program that will open a txt file containing a list of names in this format (ignore the bullets):
3 Mark
4 Ralph
1 Ed
2 Kevin
and will create a file w/ organized names based on the number in front of them:
1 Ed
2 Kevin
3 Mark
4 Ralph
I think I'm experiencing trouble in line 40, where I try to compare the numbers stored in strings with a number stored in an int.
I can't think of any other way to tackle this, any advice would be wonderful!
#include <iostream>
#include <fstream>
#include <vector>
#include <cstdlib>
using namespace std;
int main()
{
ifstream in;
ofstream out;
string line;
string collection[5];
vector <string> lines;
vector <string> newLines;
in.open("infile.txt");
if (in.fail())
{
cout << "Input file opening failed. \n";
exit(1);
}
out.open("outfile.txt");
if (out.fail())
{
cout << "Output file opening failed. \n";
exit(1);
}
while (!in.eof())
{
getline(in, line);
lines.push_back(line);
}
for (int i = 0; i < lines.size(); i++)
{
collection[i] = lines[i];
}
for (int j = 0; j < lines.size(); j++)
{
for (int x = 0; x < lines.size(); x--)
{
if (collection[x][0] == j)
newLines.push_back(collection[x]);
}
}
for (int k = 0; k < newLines.size(); k++)
{
out << newLines[k] << endl;
}
in.close( );
out.close( );
return 0;
}

Using a debugger would tell you where you went wrong, but let me highlight the mistake:
if (collection[x][0] == j)
You're expecting a string like 3 Mark. The first character of this string is '3', but that has the ASCII value of 51, and that is the numerical value you'll get when trying work with it is this way! This will never equal j, unless you've got a lot of lines in your file, and then your search system will not work at all like you wanted. YOu need to convert that character into an integer, and then do your comparison.
C++ offers many way to process data via streams, including parsing simple datafiles and converting text to numbers and vice versa. Here's a simple standalone function that will read a datafile like you have (only with arbitrary text including spaces after the number on each line).
#include <algorithm>
// snip
struct file_entry { int i; std::string text; };
std::vector<file_entry> parse_file(std::istream& in)
{
std::vector<file_entry> data;
while (!in.eof())
{
file_entry e;
in >> e.i; // read the first number on the line
e.ignore(); // skip the space between the number and the text
std::getline(in, e.text); // read the whole of the rest of the line
data.push_back(e);
}
return data;
}
Because the standard way that >> works involves reading until the next space (or end of line), if you want to read a chunk of text which contains whitespace, it will be much easier to use std::getline to just slurp up the whole of the rest of the current line.
Note: I've made no attempt to handle malformed lines in the textfile, or any number of other possible error conditions. Writing a proper file parser is outside of the scope of this question, but there are plenty of tutorials out there on using C++'s stream functionality appropriately.
Now you have the file in a convenient structure, you can use other standard c++ features to sort it, rather than reinventing the wheel and trying to do it yourself:
int sort_file_entry(file_entry a, file_entry b)
{
return a.i < b.i;
}
int main()
{
// set up all your streams, etc.
std::vector<file_entry> lines = parse_file(in);
std::sort(lines.begin(), lines.end(), sort_file_entry);
// now you can write the sorted vector back out to disk.
}
Again, a full introduction to how iterators and containers work is well outside the scope of this answer, but the internet has no shortage of introductory C++ guides out there. Good luck!

Performance bottleneck with CSV parser

My current parser is given below - Reading in ~10MB CSV to an STL vector takes ~30secs, which is too slow for my liking given I've got over 100MB which needs to be read in every time the program is run. Can anyone give some advice on how to improve performance? Indeed, would it be faster in plain C?
int main() {
std::vector<double> data;
std::ifstream infile( "data.csv" );
infile >> data;
std::cin.get();
return 0;
}
std::istream& operator >> (std::istream& ins, std::vector<double>& data)
{
data.clear();
// Reserve data vector
std::string line, field;
std::getline(ins, line);
std::stringstream ssl(line), ssf;
std::size_t rows = 1, cols = 0;
while (std::getline(ssl, field, ',')) cols++;
while (std::getline(ins, line)) rows++;
std::cout << rows << " x " << cols << "\n";
ins.clear(); // clear bad state after eof
ins.seekg(0);
data.reserve(rows*cols);
// Populate data
double f = 0.0;
while (std::getline(ins, line)) {
ssl.str(line);
ssl.clear();
while (std::getline(ssl, field, ',')) {
ssf.str(field);
ssf.clear();
ssf >> f;
data.push_back(f);
}
}
return ins;
}
NB: I have also have openMP at my disposal, and the contents will eventually be used for GPGPU computation with CUDA.

You could half the time by reading the file once and not twice.
While presizing the vector is beneficial, it will never dominate runtime, because I/O will always be slower by some magnitude.
Another possible optimization could be reading without a string stream. Something like (untested)
int c = 0;
while (ins >> f) {
data.push_back(f);
if (++c < cols) {
char comma;
ins >> comma; // skip comma
} else {
c = 0; // end of line, start next line
}
}
If you can omit the , and separate the values by white space only, it could be even
while (ins >> f)
data.push_back(f);
or
std::copy(std::istream_iterator<double>(ins), std::istream_iterator<double>(),
std::back_inserter(data));

On my machine, your reserve code takes about 1.1 seconds and your populate code takes 8.5 seconds.
Adding std::ios::sync_with_stdio(false); made no difference to my compiler.
The below C code takes 2.3 seconds.
int i = 0;
int j = 0;
while( true ) {
float x;
j = fscanf( file, "%f", & x );
if( j == EOF ) break;
data[i++] = x;
// skip ',' or '\n'
int ch = getc(file);
}

Try calling
std::ios::sync_with_stdio(false);
at the start of your program. This disables the (allegedly quite slow) synchronization between cin/cout and scanf/printf (I have never tried this myself, but have often seen the recommendation, such as here). Note that if you do this, you cannot mix C++-style and C-style IO in your program.
(In addition, Olaf Dietsche is completely right about only reading the file once.)

apparently, file io is a bad idea, just map the whole file into memory, access the
csv file as a continous vm block, this incur only a few syscall

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting the nth line of a text file in C++ - c++

How about this? std::string ReadNthLine(const std::string& filename, int N) { std::ifstream in(filename.c_str()); std::string s; //for performance s.reserve(some_reasonable_max_line_length); //skip N lines for(int i = 0; i < N; ++i) std::getline(in, s); std::getline(in,s); return s; }

It's certainly possible. There are (n-1) '\n' characters preceding the nth line. Read lines until you reach the one you're looking for. You can do this on the fly without storing anything except the current line being considered.

Related

Extracting a particular data from a CSV file in c++

Txt to 2 different arrays c++

How to read only some previously know lines using ifstream (C++)

Comparing numbers in strings to numbers in int?

Performance bottleneck with CSV parser

Categories

Resources