How to read file and save hyphen using STL C++

How to read file and save hyphen using STL C++ - c++

I have to read text file, convert it to lower case and remove non-alphabetic characters but also need to save hyphen and do not count it as a word. here is my coding. It is counting hyphen as word in UnknownWords . I just want to save hyphen and just only want to count words which are on the left and right side of the hyphen in the .txt.
My output:
110 Known words read
79 Unknown words read //it is because it is counting hyphen as word
Desired output is:
110 Known words read
78 Unknown words read
Code:
void WordStats::ReadTxtFile(){
std::ifstream ifile(Filename);
if(!ifile)
{
std::cerr << "Error Opening file " << Filename << std::endl;
exit(1);
}
for (std::string word; ifile >> word; )
{
transform (word.begin(), word.end(), word.begin(), ::tolower);
word.erase(std::remove_if(word.begin(), word.end(), [](char c)
{
return (c < 'a' || c > 'z') && c != '\'' && c != '-';
}), word.end());
if (Dictionary.count(word))
{
KnownWords[word].push_back(ifile.tellg());
}
else
{
UnknownWords[word].push_back(ifile.tellg());
}
}
// std::string word; ifile >> word;
std::cout << KnownWords.size() << " known words read." << std::endl;
std::cout << UnknownWords.size() << " unknown words read." << std::endl;
}

If you don't want to put a word that's just "-" by itself, check for that before adding to the word vectors:
for (std::string word; ifile >> word; )
{
transform (word.begin(), word.end(), word.begin(), ::tolower);
word.erase(std::remove_if(word.begin(), word.end(), [](char c)
{
return (c < 'a' || c > 'z') && c != '\'' && c != '-';
}), word.end());
if (word.find_first_not_of("-") == string::npos) { // Ignore word that's only hyphens
continue;
}
if (Dictionary.count(word))
{
KnownWords[word].push_back(ifile.tellg());
}
else
{
UnknownWords[word].push_back(ifile.tellg());
}
}

Related

Input string in C++ do while

I'm trying to read a string as an input.
The string can only contain A C G and T letters and the length can't be more than 20000.
If the length is more than 20000 or it contains any other letter than A C G or T print out "error" and read again.
EXAMPLE INPUT: ACCGGTATTTACG
Here's my code and currently it prints error for every input.
int main()
{
string str;
string tmp;
bool hiba;
do{
cout<<"Str: ";cin>>str;
for(int x = 0;x < str.length();x++){
hiba = (cin.fail() || str[x] != 'A' || str[x] != 'C' ||str[x] != 'G' ||str[x] != 'T' || str.length() > 20000);
if(hiba){
cout<<"Error\n";
cin.clear();
getline(cin,tmp);
break;
}
}
}while(hiba);
}

Here is a direct transliteration of your requirements. Note the use of std::string::find_first_not_of to find "bad" characters. Also note that it returns an empty string should std::cin >> sequence fail. You need to handle that corner case.
std::string read_dna(int max_length) {
std::string sequence;
while (std::cin >> sequence) {
if (sequence.length() > max_length ||
sequence.find_first_not_of("ACGT") != std::string::npos) {
std::cerr << "error" << std::endl;
} else {
return sequence;
}
}
return std::string{};
}

I need to convert some code so that it works with an input and output file text

I have a program that reverses the letters in a sentence but keeps the words in the same order. I need to change the code from an iostream library to an fstream library where the user inputs a sentence into an input file("input.txt") and the program outputs the reverse into an output text file.
example of input:
This problem is too easy for me. I am an amazing programmer. Do you agree?
Example of output:
sihT melborp si oot ysae rof em. I ma na gnizama remmargorp. oD uoy eerga?
The code I already have:
int main()
{
int i=0, j=0, k=0, l=0;
char x[14] = "I LOVE CODING";
char y[14] = {'\0'};
for(i=0; i<=14; i++) {
if(x[i]==' ' || x[i]=='\0') {
for(j=i-1; j>=l; j--)
y[k++] = x[j];
y[k++] = ' ';
l=i+1;
}
}
cout << y;
return 0;
}

I would use std::string to store the string, and benefit from std::vector and const_iterator to make better use of C++ features:
#include <string>
#include <vector>
int main()
{
std::string s("This problem is too easy for me. I am an amazing programmer. Do you agree?");
const char delim = ' ';
std::vector<std::string> v;
std::string tmp;
for(std::string::const_iterator i = s.begin(); i <= s.end(); ++i)
{
if(*i != delim && i != s.end())
{
tmp += *i;
}else
{
v.push_back(tmp);
tmp = "";
}
}
for(std::vector<std::string>::const_iterator it = v.begin(); it != v.end(); ++it)
{
std::string str = *it,b;
for(int i=str.size()-1;i>=0;i--)
b+=str[i];
std::cout << b << " ";
}
std::cout << std::endl;
}
Output:
sihT melborp si oot ysae rof .em I ma na gnizama .remmargorp oD uoy ?eerga

The code that you submitted looks much more like something from C rather than from C++. Not sure if you are familiar std::string and function calls. As the code you wrote is pretty sophisticated, I will assume that you are.
Here is an example of how to use fstream. I almost always you getline for the input because I find that it gets me into fewer problems.
I then almost always use stringstream for parsing the line because it neatly splits the lines at each space.
Finally, I try to figure out a while() or do{}while(); loop that will trigger off of the input from the getline() call.
Note that if the word ends in a punctuation character, to keep the punctuation at the end, the reverse_word() function has to look for non-alpha characters at the end and then save that aside. This could be done by only reversing runs of alphas.
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
///////////////////
/// return true if ch is alpha
/// return false for digits, punctuation, and all else
bool is_letter(char ch){
if((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z')) {
return true;
} else {
return false;
}
}
////////
// Only reverse the letter portion of each word
//
std::string reverse_word(std::string str)
{
std::string output_str; // Probably have to create a copy for output
output_str.reserve(str.length()); // reserve size equal to input string
// iterate through each letter of the string, backwards,
// and copy the letters to the new string
char save_non_alpha = 0;
for (auto it = str.rbegin(); it != str.rend(); it++) {
/// If the last character is punctuation, then save it to paste on the end
if(it == str.rbegin() && !is_letter(*it)) {
save_non_alpha = *it;
} else {
output_str += *it;
}
}
if(save_non_alpha != 0) {
output_str += save_non_alpha;
}
return output_str; // send string back to caller
}
int main()
{
std::string input_file_name{"input.txt"};
std::string output_file_name{"output.txt"};
std::string input_line;
std::ifstream inFile;
std::ofstream outFile;
inFile.open(input_file_name, std::ios::in);
outFile.open(output_file_name, std::ios::out);
// if the file open failed, then exit
if (!inFile.is_open() || !outFile.is_open()) {
std::cout << "File " << input_file_name
<< " or file " << output_file_name
<< " could not be opened...exiting\n";
return -1;
}
while (std::getline(inFile, input_line)) {
std::string word;
std::string sentence;
std::stringstream stream(input_line);
// I just like stringstreams. Process the input_line
// as a series of words from stringstream. Stringstream
// will split on whitespace. Punctuation will be reversed with the
// word that it is touching
while (stream >> word) {
if(!sentence.empty()) // add a space before all but the first word
sentence += " ";
word = reverse_word(word);
sentence += word;
}
outFile << sentence << std::endl;
}
inFile.close();
outFile.close();
return 0;
}

C++: vector<string> does not match regex

I really don't know what is wrong that my regex is not matching.
I tried two codes:
Code1 (NOT MATCH):
I am pushing from file to vector<string> cn;
//FILE
ifstream file(params.file_name);
string line;
// read each line of the file
while ( getline(file, line) ){
istringstream iss(line);
string token;
unsigned int loopCsv = 0;
while (getline(iss, token, ';')){ //when ";" separate cn;uid;mail
if (loopCsv == 0)
cn.push_back(token); //cn
if (loopCsv == 1)
uid.push_back(token); //uid
if (loopCsv == 2)
mail.push_back(token); //mail
loopCsv++;
if (loopCsv == 3) //after 3 (cn,uid,mail) repeat
loopCsv=0;
}
}
then trying regex:
cout << "There is Neruda Jakub: " << cn[286] << endl;
regex regexX(".*Jakub", std::regex::ECMAScript | std::regex::icase);
bool match = regex_search(cn[286], regexX);
if (match)
cout << "MATCH!" << endl;
I am getting output:
There is Neruda Jakub: Neruda Jakub
but no match. I have also tried adding some symbols around cn[286] if there isn't any space |Neruda Jakub| and there isn't
Code2 (MATCH):
vector<string> someVctr;
someVctr.push_back("Neruda Jakub");
regex regexX(".*Jakub", std::regex::ECMAScript | std::regex::icase);
bool match = regex_search(someVctr[0], regexX);
if (match)
cout << "MATCH!" << endl;
without problem, I will get MATCH!
I will be grateful for any help you can provide.

From the comments, it looks like the file was encoded as utf-16. The simple fix for all of this is to replace all the char streams and strings with char16_t (or wchar_t on Windows)
basic_string<char16_t> cn;
basic_string<char16_t> uid;
basic_string<char16_t> mail;
//FILE
basic_ifstream<char16_t> file(params.file_name);
basic_string<char16_t> line;
// read each line of the file
while ( getline(file, line) ){
basic_istringstream<char16_t> iss(line);
basic_string<char16_t> token;
unsigned int loopCsv = 0;
while (getline(iss, token, ';')){ //when ";" separate cn;uid;mail
if (loopCsv == 0)
cn.push_back(token); //cn
if (loopCsv == 1)
uid.push_back(token); //uid
if (loopCsv == 2)
mail.push_back(token); //mail
loopCsv++;
if (loopCsv == 3) //after 3 (cn,uid,mail) repeat
loopCsv=0;
}
}
cout << "There is Neruda Jakub: " << cn[286] << endl;
basic_regex<char16_t> regexX(".*Jakub", std::regex::ECMAScript | std::regex::icase);
bool match = regex_search(cn[286], regexX);
if (match)
cout << "MATCH!" << endl;

How can I check a string to see if a carriage return exists in C++?

vector<string> wordstocheck;
in.open("readin.txt");
string line;
string word = "";
int linecount = 0;
while (getline(in, line))
{
//cout << line << endl;
for (int i = 0; i < line.size(); i++)
{
if(isalpha(line[i]))
{
word.push_back(tolower(line[i]));
}
else if (line[i] == ' ' || ispunct(line[i]) || line[i] == '\n')
{
wordstocheck.push_back(word);
word = "";
}
}
linecount++;
}
for (int i = 0; i < wordstocheck.size(); i++)
{
cout << wordstocheck[i] << endl;
}
system("pause");
}
The code above reads in the following from a .txt file:
If debugging is the
process of removing bugs.
Then programming must be the
process of putting them in.
I'm trying to get the program to recognize each word, and save that individual word into a vector, and then print that vector of words out. It does pretty well with the exception of the two 'the's on the first and third lines.
Output:
if
debugging
is
theprocess
of
removing
bugs
then
programming
must
be
theprocess
of
putting
them
in
Press any key to continue . . .
It doesn't split up "theprocess" as I had hoped.

getline won't read the newline. However, in this case it's relatively simple to work around this problem.
Where you currently have linecount++;, add these lines before it:
if (word != "")
{
wordstocheck.push_back(word);
word = "";
}
You may want to use the same if (word != "") on the first place where you push the word onto wordstocheck since if the text has "A Word", you'd add the word "A" followed by an empty word for as the seconds space triggers the word to be added to the list.
As an alternative, you could get rid of getline, and just use int ch = in.get() to read a character at a time from the input. Then instead of counting lines inside the while()..., and use ch instead of line[i] al through the loop, and then add a second if inside the else if section, which checks for newline and counts up linecount. This would probably make for shorter code.

I believe the problem is that you're expecting the newline character to be included in the result from getline(), which it isn't. It seems like if you take the two lines you already have in that block:
wordstocheck.push_back(word);
word = "";
And add them alongside the line:
linecount++;
Then it should work as you expect.

If you want to read a word at a time, why use std::getline in the first place?
// read the words into a vector of strings:
std::vector<std::string> words{std::istream_iterator<std::string(in),
std::istream_iterator<std::string()};
You can use std::for_each or std::transform to convert everything to lower case, and finally print them out with for (auto const &w : words) std::cout << w << "\n";

So far i know, getline reads a whole line and does not recognize a carriage return. The only way i know is to read the file, by read it char by char.
Here is a example that gives the correct result:
#include <iostream> // std::cin, std::cout
#include <fstream> // std::ifstream
int main ()
{
char str[256];
int line = 1;
int charcount = 0;
std::cout << "Enter the name of an existing text file: ";
std::cin.get (str,256);
std::ifstream is(str);
if (!is)
{
std::cerr << "Error opening file!" << std::endl;
return -1;
}
char c;
while ((c = is.get()) && is.good()) // loop while extraction from file if possible
{
if (c == 10 || c == 13 || c == 32) // if it is a line break or carriage return or space
{
std::cout << std::endl;
line++;
}
else // everything else
{
std::cout << c;
charcount++;
}
}
is.close();
std::cout << std::endl; // close file
std::cout << line << " lines" << std::endl;
std::cout << charcount << " chars" << std::endl;
return 0;
}

Finding Word + X letters after it

I want to start of by saying that I am still learning and some might think that my code looks bad, but here it goes.
So I have this text file we can call example.txt.
A line in example.txt can look like this:
randomstuffhereitem=1234randomstuffhere
I want my program to take in the numbers that are next to the item= and I have started a bit on it using the following code.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
string word;
int main()
{
ifstream readFile("example.txt", ios::app);
ofstream outfile("Found_Words.txt", ios::app);
bool found = false;
long int price;
cout << "Insert a number" << endl;
cout << "number:";
cin >> number;
system("cls");
outfile << "Here I start:";
while( readFile >> word )
{
if(word == "item=")
Here is the problem; first of all it only searchs for "item=" but to find it, it cannot be included with other letters. It has to be a standalone word.
It wont find:
helloitem=hello
It will find:
hello item= hello
It has to be separated with spaces which is also a problem.
Secondly I want to find numbers next to the item=. Like I want it to be able to find item=1234 and please note that 1234 can be any number like 6723.
And I dont want it to find what comes after the number, so when the number stops, it wont take in anymore data. Like item=1234hello has to be item=1234
{
cout <<"The word has been found." << endl;
outfile << word << "/" << number;
//outfile.close();
if(word == "item=")
{
outfile << ",";
}
found = true;
}
}
outfile << "finishes here" ;
outfile.close();
if( found = false){
cout <<"Not found" << endl;
}
system ("pause");
}

You can use a code like this:
bool get_price(std::string s, std::string & rest, int & value)
{
int pos = 0; //To track a position inside a string
do //loop through "item" entries in the string
{
pos = s.find("item", pos); //Get an index in the string where "item" is found
if (pos == s.npos) //No "item" in string
break;
pos += 4; //"item" length
while (pos < s.length() && s[pos] == ' ') ++pos; //Skip spaces between "item" and "="
if (pos < s.length() && s[pos] == '=') //Next char is "="
{
++pos; //Move forward one char, the "="
while (pos < s.length() && s[pos] == ' ') ++pos; //Skip spaces between "=" and digits
const char * value_place = s.c_str() + pos; //The number
if (*value_place < '0' || *value_place > '9') continue; //we have no number after =
value = atoi(value_place); //Convert as much digits to a number as possible
while (pos < s.length() && s[pos] >= '0' && s[pos] <= '9') ++pos; //skip number
rest = s.substr(pos); //Return the remainder of the string
return true; //The string matches
}
} while (1);
return false; //We did not find a match
}
Note that you should also change the way you read strings from file. You can either read to newline (std::getline) or to the end of stream, like mentioned here: stackoverflow question

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to read file and save hyphen using STL C++ - c++

Related

Input string in C++ do while

I need to convert some code so that it works with an input and output file text

C++: vector<string> does not match regex

How can I check a string to see if a carriage return exists in C++?

Finding Word + X letters after it

Categories

Resources