How to eliminate this extra element? - c++

I am writing a c++ function for reading the nth column of a tab delimited text file, here is what I have done:
typedef unsigned int uint;
inline void fileExists (const std::string& name) {
if ( access( name.c_str(), F_OK ) == -1 ) {
throw std::string("File does not exist!");
}
}
size_t bimNCols(std::string fn) {
try {
fileExists(fn);
std::ifstream in_file(fn);
std::string tmpline;
std::getline(in_file, tmpline);
std::vector<std::string> strs;
strs = boost::split(strs, tmpline, boost::is_any_of("\t"), boost::token_compress_on);
return strs.size();
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
typedef std::vector<std::string> vecStr;
vecStr bimReadCol(std::string fn, uint ncol_select) {
try {
size_t ncols = bimNCols(fn);
if(ncol_select < 1 or ncol_select > ncols) {
throw std::string("Your column selection is out of range!");
}
std::ifstream in_file(fn);
std::string tmpword;
vecStr colsel; // holds the column of strings
while (in_file) {
for(int i=1; i<ncol_select; i++) {
in_file >> tmpword;
}
in_file >> tmpword;
colsel.push_back(tmpword);
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return colsel;
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
The problem is, in the bimReadCol function, at the last line, after
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
in_file.good() still evaluates to true. So, suppose I have a text file test.txt like this:
a 1 b 2
a 1 b 2
a 1 b 2
bimReadCol("test.txt", 3) would return a vector (b, b, b, b), with an extra element.
Any idea how to fix this?

The usual solution for line oriented input is to read line by
line, then parse each line:
std::string line;
while ( std::getline( in_file, line ) ) {
std::istringstream parser( line );
for ( int i = 1; parser >> tmpword && i <= ncol_select; ++ i ) {
}
if ( parser ) {
colsel.push_back( tmpword );
}
// No need for any ignore.
}
The important thing is that you must absolutely test after the
input (be it from in_file or parser) before you use the
value. A test before the value was read doesn't mean anything
(as you've seen).

Ok, I got it. The last line of the text file does not contain a newline, that's why in_file evaluates
to true at the last line.
I think I should calculate the number of lines of the file, then replace while(in_file) with a
for loop.
If someone has a better idea, please post it and I will accept.
Update
The fix turns out to be rather simple, just check if tmpword is empty:
vecStr bimReadCol(std::string fn, uint ncol_select) {
try {
size_t ncols = bimNCols(fn);
if(ncol_select < 1 or ncol_select > ncols) {
throw std::string("Your column selection is out of range!");
}
std::ifstream in_file(fn);
vecStr colsel; // holds the column of strings
std::string tmpword;
while (in_file) {
tmpword = "";
for(int i=1; i<=ncol_select; i++) {
in_file >> tmpword;
}
if(tmpword != "") {
colsel.push_back(tmpword);
}
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return colsel;
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
As #James Kanze has pointed out, even if the last line contains a newline, in_file
would still evaluate to true, but since we are at the end of file, the next reading into
tmpword will be empty, so we will be fine as long as we check that.

Related

how to read line by line string in a text file?

this code will only read and calculate the first input in the input.txt file and ignore the rest of the inputs in the input file.I have been trying to solve it so that it can read all the rest of the inputs and calculate them.
this is my code i think there is something wrong with it.
i have tried several looping methods
int main()
{
string inputLine;
ifstream file ("input.txt");// input file to be read
ofstream file1;
file1.open("output.txt");
freopen("output.txt", "w", stdout);// store all the output to this file
while (std::getline (file, inputLine)) // read the strings in the input file
{
if( strncmp( "----", inputLine.c_str(), 4 ) == 0 )
continue;
//calculating binary and hexadecimal values
char *opr = "^+-/%*=,()";
std::string::iterator end_pos = std::remove(inputLine.begin(),
inputLine.end(), ' ');
inputLine.erase(end_pos, inputLine.end());
string str=inputLine;
string str2="";
int length=str.length();
char t[length];
str.copy(t, length);
t[length] = '\0';
char* tok;
char *cop=new char [length];
str.copy(cop,length);
char *w = strtok_fixed( t, opr );
while (w!=NULL)
{
string w2=w;
std::stringstream tr;
tr << w2;
w2.clear();
tr >> w2;
int x=w2.length();
int y=x-3;
string check= w2.substr(0,3);
string check1=w2.substr(0,x);
if(check.find("0x") != std::string::npos)
{
unsigned int x= strtol(w2.c_str(), NULL, 0);
std::ostringstream s;
s << x;
const std::string ii(s.str());
str2=str2+ ii;
}
else if (check1.find("b")!=std::string::npos)
{
w2.pop_back();
long bin=std::strtol(w2.c_str(),0,2);
std::ostringstream s2;
s2<<bin;
const std::string t2(s2.str());
//inputLine.replace(inputLine.find(w2),(w2.length()+1),t2);
str2=str2+t2;
}
else
{
str2=str2+w2;
}
char a =cop[w-t+strlen(w)];
string s1="";
s1=s1+a;
std::stringstream tr1;
tr1 << s1;
s1.clear();
tr1 >> s1;
str2=str2+s1;
w = strtok_fixed (NULL, opr);
}
//str2 should be taken to the parser for final evaluations
Parser p(str2);
double value = p.Evaluate ();
std::cout<<"----------------------"<<endl;
std::cout << "Result = " << value << std::endl;
std::cout<<"----------------------"<<endl;
return 0;
}
}
The problem is at the end
return 0;
}
}
should be
}
return 0;
}
You are returning from inside your while loop instead of after your while loop finishes.
You should spend the time to indent your code correctly. It will help you spot this kind of error. You should also learn to break up your code into smaller functions. Again this will help you understand your own code a bit better.

c++ specify dividers for reading words from text file

I have the following code prints each unique word and its count from a text file (contains >= 30k words), however it's separating words by whitespace, I had results like so:
how can I modify the code to specify the expected dividers?
template <class KTy, class Ty>
void PrintMap(map<KTy, Ty> map)
{
typedef std::map<KTy, Ty>::iterator iterator;
for (iterator p = map.begin(); p != map.end(); p++)
cout << p->first << ": " << p->second << endl;
}
void UniqueWords(string fileName) {
// Will store the word and count.
map<string, unsigned int> wordsCount;
// Begin reading from file:
ifstream fileStream(fileName);
// Check if we've opened the file (as we should have).
if (fileStream.is_open())
while (fileStream.good())
{
// Store the next word in the file in a local variable.
string word;
fileStream >> word;
//Look if it's already there.
if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time.
wordsCount[word] = 1; // Initialize it to 1.
else // Then we've already seen it before..
wordsCount[word]++; // Just increment it.
}
else // We couldn't open the file. Report the error in the error stream.
{
cerr << "Couldn't open the file." << endl;
}
// Print the words map.
PrintMap(wordsCount);
}
You can use a stream with a std::ctype<char> facet imbue()ed which considers whatever characters you fancy as space. Doing so would look something like this:
#include<locale>
#include<cctype>
struct myctype_table {
std::ctype_base::mask table[std::ctype<char>::table_size];
myctype_table(char const* spaces) {
while (*spaces) {
table[static_cast<unsigned char>(*spaces)] = std::ctype_base::isspace;
}
}
};
class myctype
: private myctype_table,
, public std::ctype<char> {
public:
myctype(char const* spaces)
: myctype_table(spaces)
, std::ctype<char>(table) {
};
};
int main() {
std::locale myloc(std::locale(), new myctype(" \t\n\r?:.,!"));
std::cin.imbue(myloc);
for (std::string word; std::cin >> word; ) {
// words are separated by the extended list of spaces
}
}
This code isn't test right now - I'm typing on a mobile device. I probably misused some of the std::cypte<char> interfaces but something along those lines after fixing the names, etc. should work.
As you expect the forbidden characters at the end of the found word you can remove them prior to push the word into wordsCount:
if(word[word.length()-1] == ';' || word[word.length()-1] == ',' || ....){
word.erase(word.length()-1);
}
After fileStream >> word;, you can call this function. Take a look and see if it's clear:
string adapt(string word) {
string forbidden = "!?,.[];";
string ret = "";
for(int i = 0; i < word.size(); i++) {
bool ok = true;
for(int j = 0; j < forbidden.size(); j++) {
if(word[i] == forbidden[j]) {
ok = false;
break;
}
}
if(ok)
ret.push_back(word[i]);
}
return ret;
}
Something like this:
fileStream >> word;
word = adapt(word);

Reading Lines after a line in C++ not working

I've spent like 2 hours trying to parse the following bytes from a file :
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
I would like to store the word Rosalind_, and store every line, concatenate all, and have just one string having all the lines.
I tried the following code, but it still doesn't work probably, I always miss the last line.
int main()
{
std::ifstream infile("data_set.txt");
map < int, string > ID;
map < int, string > dataSetMap;
int idNumber= 0;
int idDataSetNumber = 0;
std::string line;
std::vector<string> dataSetString;
std::string seqid;
while (!infile.eof() )
{
while(std::getline(infile, line))
{
if ( line.substr(0,1)== ">")
{
conct = "";
seqid = line.substr(1,line.length() - 1);
ID.insert(make_pair( idNumber++, seqid));
lineNumber = 0;
line.clear();
std::string data= "";
if(dataSetString.size()>0)
{
for (int i = 0; i<dataSetString.size(); i++)
{
data+=dataSetString[i];
}
dataSetMap.insert(make_pair(idDataSetNumber++, data));
}
dataSetString.clear();
}
if(!line.empty() )
{
dataSetString.push_back(line);
}
}
}
I'm trying to practice problems solving approaches, and that really gave me headache.
I'm looking for a better approach also.
This code does what you want:
#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <fstream>
int main()
{
std::istream& infile = std::cin;
std::map < int, std::string > ID;
std::map < int, std::string > dataSetMap;
int idNumber= 0;
int idDataSetNumber = 0;
std::string line;
std::vector<std::string> dataSetString;
std::string seqid;
bool success = std::getline(infile, line);
while(success) {
if( line.substr(0,1) == ">" ) {
seqid = line.substr(1,line.length() - 1);
ID.insert(make_pair( idNumber++, seqid));
std::string data;
while(success = std::getline(infile, line)) {
if(line.substr(0,1) == ">") break;
data += line;
}
dataSetMap.insert(make_pair(idDataSetNumber++, data));
} else {
std::cout << "Invalid input file. It needs to start with >SOME_ID" << std::endl;
return 1;
}
}
std::cout << "Parsed data ----------------" << std::endl;
for(std::map<int,std::string>::const_iterator it = dataSetMap.begin(); it != dataSetMap.end(); ++it) {
std::cout << "Id: " << ID[it->first] << std::endl;
std::cout << (it->second) << std::endl;
}
}
It first reads a line from the input file and tries to parse it as an ID. If that fails, it returns an error. Then it reads the data until it finds another ID or EOF. It inserts the data and continues to parse the ID it found if it didn't encounter EOF.
Working demo: http://ideone.com/F4mcrc
Note: This fails when the file is empty, you might want to check for the empty string or a string containing only whitespaces in the else of the ID check and skip it.
EDITED I have corrected my answer and tested it. So no more downvote please!
int main()
{
using namespace std;
ifstream infile("data_set.txt");
map < int, string > ID;
map < int, string > dataSetMap;
int idNumber= 0;
int idDataSetNumber = 0;
string line;
vector<string> dataSetString;
string seqid;
while ( true)
{
bool b=infile.eof();
if(!b)
std::getline(infile, line);
if ( line.substr(0,1)== ">" || b)
{
if(!b)
{
seqid = line.substr(1,line.length() - 1);
ID.insert(make_pair( idNumber++, seqid));
}
line.clear();
string data= "";
if(dataSetString.size()>0)
{
for (unsigned int i = 0; i<dataSetString.size(); i++)
{
data+=dataSetString[i];
}
dataSetMap.insert(make_pair(idDataSetNumber++, data));
}
dataSetString.clear();
if(b)
break;
}
if(!line.empty() )
{
dataSetString.push_back(line);
}
}
return 0;
}

How to get the line number from a file in C++?

What would be the best way to get the line number of the current line in a file that I have opened with a ifstream? So I am reading in the data and I need to store the line number that it is on so that I can display it later if the data doesn't match the specifications.
If you don't want to limit yourself to std::getline, then you could use class derived from std::streambuf, and which keeps track of the current line number:
class CountingStreamBuffer : public std::streambuf { /* see below */ };
// open file
std::ifstream file("somefile.txt");
// "pipe" through counting stream buffer
CountingStreamBuffer cntstreambuf(file.rdbuf());
std::istream is(&cntstreambuf);
// sample usage
is >> x >> y >> z;
cout << "At line " << cntstreambuf.lineNumber();
std::getline(is, str);
cout << "At line " << cntstreambuf.lineNumber();
Here is a sample implementation of CountingStreamBuffer:
#include <streambuf>
class CountingStreamBuffer : public std::streambuf
{
public:
// constructor
CountingStreamBuffer(std::streambuf* sbuf) :
streamBuf_(sbuf),
lineNumber_(1),
lastLineNumber_(1),
column_(0),
prevColumn_(static_cast<unsigned int>(-1)),
filePos_(0)
{
}
// Get current line number
unsigned int lineNumber() const { return lineNumber_; }
// Get line number of previously read character
unsigned int prevLineNumber() const { return lastLineNumber_; }
// Get current column
unsigned int column() const { return column_; }
// Get file position
std::streamsize filepos() const { return filePos_; }
protected:
CountingStreamBuffer(const CountingStreamBuffer&);
CountingStreamBuffer& operator=(const CountingStreamBuffer&);
// extract next character from stream w/o advancing read pos
std::streambuf::int_type underflow()
{
return streamBuf_->sgetc();
}
// extract next character from stream
std::streambuf::int_type uflow()
{
int_type rc = streamBuf_->sbumpc();
lastLineNumber_ = lineNumber_;
if (traits_type::eq_int_type(rc, traits_type::to_int_type('\n')))
{
++lineNumber_;
prevColumn_ = column_ + 1;
column_ = static_cast<unsigned int>(-1);
}
++column_;
++filePos_;
return rc;
}
// put back last character
std::streambuf::int_type pbackfail(std::streambuf::int_type c)
{
if (traits_type::eq_int_type(c, traits_type::to_int_type('\n')))
{
--lineNumber_;
lastLineNumber_ = lineNumber_;
column_ = prevColumn_;
prevColumn_ = 0;
}
--column_;
--filePos_;
if (c != traits_type::eof())
return streamBuf_->sputbackc(traits_type::to_char_type(c));
else
return streamBuf_->sungetc();
}
// change position by offset, according to way and mode
virtual std::ios::pos_type seekoff(std::ios::off_type pos,
std::ios_base::seekdir dir,
std::ios_base::openmode mode)
{
if (dir == std::ios_base::beg
&& pos == static_cast<std::ios::off_type>(0))
{
lastLineNumber_ = 1;
lineNumber_ = 1;
column_ = 0;
prevColumn_ = static_cast<unsigned int>(-1);
filePos_ = 0;
return streamBuf_->pubseekoff(pos, dir, mode);
}
else
return std::streambuf::seekoff(pos, dir, mode);
}
// change to specified position, according to mode
virtual std::ios::pos_type seekpos(std::ios::pos_type pos,
std::ios_base::openmode mode)
{
if (pos == static_cast<std::ios::pos_type>(0))
{
lastLineNumber_ = 1;
lineNumber_ = 1;
column_ = 0;
prevColumn_ = static_cast<unsigned int>(-1);
filePos_ = 0;
return streamBuf_->pubseekpos(pos, mode);
}
else
return std::streambuf::seekpos(pos, mode);
}
private:
std::streambuf* streamBuf_; // hosted streambuffer
unsigned int lineNumber_; // current line number
unsigned int lastLineNumber_;// line number of last read character
unsigned int column_; // current column
unsigned int prevColumn_; // previous column
std::streamsize filePos_; // file position
};
From an ifstream point of view there is no line number. If you read in the file line by line, then you just have to keep track of it yourself.
Use std::getline to read each line in one by one. Keep an integer indicating the number of lines you have read: initialize it to zero and each time you call std::getline and it succeeds, increment it.
An inefficient but dead simple way is to have a function that given a stream, it counts the new line characters from the beginning of the stream to the current position.
int getCurrentLine(std::istream& is)
{
int lineCount = 1;
is.clear(); // need to clear error bits otherwise tellg returns -1.
auto originalPos = is.tellg();
if (originalPos < 0)
return -1;
is.seekg(0);
char c;
while ((is.tellg() < originalPos) && is.get(c))
{
if (c == '\n') ++lineCount;
}
return lineCount;
}
In some code I am working on, I am only interested to know the line number if invalid input is encountered, in which case import is aborted immediately. Since the function is called only once the inefficiency is not really a problem.
The following is a full example:
#include <iostream>
#include <sstream>
int getCurrentLine(std::istream& is)
{
int lineCount = 1;
is.clear(); // need to clear error bits otherwise tellg returns -1.
auto originalPos = is.tellg();
if (originalPos < 0)
return -1;
is.seekg(0);
char c;
while ((is.tellg() < originalPos) && is.get(c))
{
if (c == '\n') ++lineCount;
}
return lineCount;
}
void ReadDataFromStream(std::istream& s)
{
double x, y, z;
while (!s.fail() && !s.eof())
{
s >> x >> y >> z;
if (!s.fail())
std::cout << x << "," << y << "," << z << "\n";
}
if (s.fail())
std::cout << "Error at line: " << getCurrentLine(s) << "\n";
else
std::cout << "Read until line: " << getCurrentLine(s) << "\n";
}
int main(int argc, char* argv[])
{
std::stringstream s;
s << "0.0 0.0 0.0\n";
s << "1.0 ??? 0.0\n";
s << "0.0 1.0 0.0\n";
ReadDataFromStream(s);
std::stringstream s2;
s2 << "0.0 0.0 0.0\n";
s2 << "1.0 0.0 0.0\n";
s2 << "0.0 1.0 0.0";
ReadDataFromStream(s2);
return 0;
}

How to get the last but not empty line in a txt file

I want to get the last but not empty line in a txt file.
This is my code:
string line1, line2;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( !myfile.eof() )
{
getline(myfile, line1);
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
}
myfile.close();
}
else
cout << "Unable to open file";
The problem is I cannot check the empty line.
Okay, let's start with the obvious part. This: while( !myfile.eof() ) is essentially always wrong, so you're not going to detect the end of the file correctly. Since you're using getline to read the data, you want to check its return value:
while (getline(myfile, line1)) // ...
Likewise, the logic here:
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
...is clearly wrong. I'm guessing you really want && instead of || for this. As it stands, the result is always true, because no matter what value line1 contains, it must be unequal to at least one of those values (i.e., it can't simultaneously contain only a tab and contain only a new-line and contain nothing at all -- but that would be necessary for the result to be false). Testing for both !line1.empty() and line1 != "" appears redundant as well.
Why not read the file backwards? That way you don't have to scan the entire file to accomplish this. Seems like it ought to be possible.
int main(int argc, char **argv)
{
std::cout<<"Opening "<<fn<<std::endl;
std::fstream fin(fn.c_str(), std::ios_base::in);
//go to end
fin.seekg(0, std::ios_base::end);
int currpos = fin.tellg();
//go to 1 before end of file
if(currpos > 0)
{
//collect the chars here...
std::vector<char> chars;
fin.seekg(currpos - 1);
currpos = fin.tellg();
while(currpos > 0)
{
char c = fin.get();
if(!fin.good())
{
break;
}
chars.push_back(c);
currpos -= 1;
fin.seekg(currpos);
}
//do whatever u want with chars...
//this is the reversed order
for(std::vector<char>::size_type i = 0; i < chars.size(); ++i)
{
std::cout<<chars[i];
}
//this is the forward order...
for(std::vector<char>::size_type i = chars.size(); i != 0; --i)
{
std::cout<<chars[i-1];
}
}
return 0;
}
It wouldn't be enough to change your ||'s to &&'s to check if the line is empty. What if there are seven spaces, a tab character, another 3 spaces and finally a newline? You can't list all the ways of getting only whitespace in a line. Instead, check every character in the line to see if it is whitespace.
In this code, is_empty will be false if any non-space character is found in the line.
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
Full solution:
#include <iostream>
#include <fstream>
#include <cctype>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string line;
string last_line;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( getline(myfile, line) ) {
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
if (!is_empty) {
last_line = line;
}
}
myfile.close();
cout << "Last line: " << last_line << endl;
}
else {
cout << "Unable to open file";
}
return 0;
}
Additional to what the others said:
You can avoid reading whitespace by doing myfile >> std::ws before you call std::getline(). This will consume all leading whitespaces.
Then your condition reduces to !line1.empty(). This would also work when the line contains nothing but several whitespaces, for which your version fails.
I wasn't able to google an appropriate get_last_line function for my needs and here's what i came up with. You can even read multiple non-empty last lines by recalling the instream get_last_line func without resetting the seeker. It supports a 1 char only file. I added the reset parameter, which can be set to ios_base::end to allow output operations after reading the last line(s)
std::string& get_last_line(
std::istream& in_stream,
std::string& output = std::string(),
std::ios_base::seekdir reset = std::ios_base::cur)
{
output.clear();
std::streambuf& buf = *in_stream.rdbuf();
bool text_found = false;
while(buf.pubseekoff(-1, std::ios_base::cur) >= 0)
{
char c = buf.sgetc();
if(!isspace(c))
text_found = true;
if(text_found)
{
if(c == '\n' || c == -1)
break;
output.insert(0, sizeof c, c);
}
}
buf.pubseekoff(0, reset);
return output;
}
std::string& get_last_line(
const std::string& file_name,
std::string& output = std::string())
{
std::ifstream file_in(
file_name.c_str(),
std::ios_base::in | std::ios_base::ate);
if(!file_in.is_open())
{
output.clear();
return output;
}
get_last_line(file_in, output);
file_in.close();
return output;
}