Lexical Analyzer Project - Vector not outputting correctly - c++

I have the following code which is part of a larger project. What this code is supposed to do is go through the line character by character looking for "tokens." The token I am looking for in this code is an ID. Which is defined as a letter followed by zero or more numbers or letters.
When a letter is detected it goes into the inner loop and loops through the next few characters, adding each character or letter to the idstring, until it finds the end of ID character(which is defined in the code) and then adds that idstring to a vector. At the end of the line it should output each element of the vector. Im not getting the output I need. I hope this is enough information to understand what is going on in the code. If someone could help me fix this problem I would be very great full. Thank you!
The output I need: ab : ab
What I get: a : a
#include <iostream>
#include <regex>
#include <string>
#include <vector>
int main()
{
std::vector<std::string> id;
std::regex idstart("[a-zA-Z]");
std::regex endID("[^a-z]|[^A-Z]|[^0-9]");
std::string line = "ab ab";
//Loops character by character through the line
//Adding each recognized token to the appropriate vector
for ( int i = 0; i<line.length(); i++ )
{
std::string tempstring(1,line[i]);
//Character is letter
if ( std::regex_match(tempstring,idstart) )
{
std::string tempIDString = tempstring;
int lineInc = 0;
for ( int j = i + 1; j<line.length(); j++)
{
std::string tempstring2(1,line[j]);
//Checks next character for end of potential ID
if ( std::regex_match(tempstring2,endID) )
{
i+=lineInc+1;
break;
}
else
{
tempIDString+=tempstring2;
lineInc++;
}
}
id.push_back(tempIDString);
}
}
std::cout << id.at(0) << " : " << id[1] << std::endl;
return 0;
}

The question is 2.5 year old and now you will maybe laugh seeing it. You break; the inner for when finding the second charcter that matches and so you will never assign tempstring2 to tempstring1.
But let's forget about that code. There is no good design here.
You had a good idea to use std::regex, but you did not know, how it worked.
So lets have a look at the correct implementation:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <regex>
// Our test data (raw string). So, containing also \n and so on
std::string testData(
R"#( :-) IDcorrect1 _wrongID I2DCorrect
3FALSE lowercasecorrect Underscore_not_allowed
i3DCorrect,i4 :-)
}
)#");
std::regex re("(\\b[a-zA-Z][a-zA-Z0-9]*\\b)");
int main(void)
{
// Define the variable id as vector of string and use the range constructor to read the test data and tokenize it
std::vector<std::string> id{ std::sregex_token_iterator(testData.begin(), testData.end(), re, 1), std::sregex_token_iterator() };
// For debug output. Print complete vector to std::cout
std::copy(id.begin(), id.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
This does all the work in the definition of the variable and by calling the range constructor. So, a typical one-liner.
Hope somebody can learn from this code . . .

Related

c++ Show to the user who has the highest score in the file

I have a file named "SCORES.TXT" that contains this:
Player - Score
--------------------
John Miles - 132
Henry - 90
Juliet P - 110
The program must show to the user the person name and the respective score, like:
John Miles has a score of 132
Henry has a score of 90
Juliet P has a score of 110
I have the following code but it is not working properly. The variable nickname only gets the first name, and if I add a string variable to get the second name, the programm will not work in the lines that only have one name.
#include <string>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;
int main()
{
string fileName = "SCORES.TXT", line, nickname;
char c;
unsigned int score;
unsigned short int i = 0;
ifstream file(fileName);
while (getline(file, line)) {
if (i < 2) { //
i++; // ignoring the header
} //
else {
stringstream s(line);
s >> nickname >> c >> score;
cout << nickname << " has a score of " << score;
}
}
file.close();
return 0;
}
One solution to scan a complex input line is to use std::regex This is very flexible but maybe not very efficient.
Some hints:
You should not use using namespace std;
file.close() is not explicitly needed, as at end of scope of main the ifstream go out of scope and will be destructed automatically which also closes the file but it is not a failure to manually close it before.
See the following example how to use regex to scan a string for patterns and how to fetch the search results:
Quite clear: This is only one of thousand possible solutions! It is very flexible but less efficient as already mentioned.
#include <string>
#include <iostream>
#include <fstream>
#include <regex>
int main()
{
std::string fileName = "SCORES.TXT";
std::string line;
// we search the input line for the following regular expression:
// first everything which is a charater -> a-z or A-Z and a space and as mutch as we find *
// after that we search for - and also the following whitspaces
// next we pick the number
// the () arround the expression forwards the result of each () in a separate result m[]
std::regex re("([a-zA-Z ]*)[- ]*([0-9]*)", std::regex::extended );
std::ifstream file(fileName);
while (std::getline(file, line))
{
try // if regex did not find correct patterns, we ignore that
{
std::smatch m;
std::regex_search( line, m, re );
// we expect 3 parts as result, first is the full string, second is our nickname, last is the number
if ( m.size() == 3 )
{
// as we have also the trailing whitespaces in the nickname part, we remove them here
std::string nickname = std::regex_replace(std::string(m[1]), std::regex(" +$"), "");
// create an int from found number string
int number = std::stoi(m[2]);
std::cout << nickname << " has a score of " << number << std::endl;
}
} catch(...){}
}
return 0;
}
You can test online regular expressions which might help to get femiliar with the complex syntax. There are a lot sites to help you! Take care that some regular expressions are a bit different or not available in some languages.

Missing last word of string when I split the sentence into word [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 2 years ago.
I am missing the last word of string. this is code I used to store word into array.
string arr[10];
int Add_Count = 0;
string sentence = "I am unable to store last word"
string Words = "";
for (int i = 0; i < sentence.length(); i++)
{
if (Sentence[i] == ' ')
{
arr[Add_Count] = Words;
Words = "";
Add_Count++;
}
else if (isalpha(Sentence[i]))
{
Words = Words + sentence[i];
}
}
Let's print the arr:
for(int i =0; i<10; i++)
{
cout << arr[i] << endl;
}
You are inserting the word found when you see a blank character.
Since the end of the string is not a blank character, the insertion for the last word never happens.
What you can do is:
(1) If the current character is black, skip to the next character.
(2) See the next character of current character.
(2-1) If the next character is blank, insert the accumulated word.
(2-2) If the next character doesn't exist (end of the sentence), insert the accumulated word.
(2-3) If the next character is not blank, accumulate word.
Obviously you lost the last word because when you go to the end the last word is not extracted yet. You can add this line to get the last word
if (Words.length() != 0) {
arr[Add_Count] = Words;
Words = "";
}
Following on from the very good approach by #Casey, but adding the use of std::vector instead of an array, allows you to break a line into as many words as may be included in it. Using the std::stringstream and extracting with >> allows a simple way to tokenize the sentence while ignoring leading, multiple included and trailing whitespace.
For example, you could do:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
int main (void) {
std::string sentence = " I am unable to store last word ",
word {};
std::stringstream ss (sentence); /* create stringstream from sentence */
std::vector<std::string> words {}; /* vector of strings to hold words */
while (ss >> word) /* read word */
words.push_back(word); /* add word to vector */
/* output original sentence */
std::cout << "sentence: \"" << sentence << "\"\n\n";
for (const auto& w : words) /* output all words in vector */
std::cout << w << '\n';
}
Example Use/Output
$ ./bin/tokenize_sentence_ss
sentence: " I am unable to store last word "
I
am
unable
to
store
last
word
If you need more fine-grained control, you can use std::string::find_first_of and std::string::find_first_not_of with a set of delimiters to work your way through a string finding the first character in a token with std::string::find_first_of and then skipping over delimiters to the start of the next token with std::string::find_first_not_of. That involves a bit more arithmetic, but is a more flexible alternative.
This happens because the last word has no space after it, just add this line after for loop.
arr[Add_Count] = Words;
My version :
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
int main() {
std::istringstream iss("I am unable to store last word");
std::vector<std::string> v(std::istream_iterator<std::string>(iss), {});
std::copy(v.begin(), v.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Sample Run :
I
am
unable
to
store
last
word
If you know you won't have to worry about punctuation, the easiest way to handle it is to throw the string into a istringstream. You can use the extraction operator overload to extract the "words". The extraction operator defaults to splitting on whitespace and automatically terminates at the end of the stream:
#include <algorithm>
#include <sstream>
#include <string>
#include <vector>
std::string sentence = // ... Get the string from cin, a file, or hard-code it here.
std::istringstream ss(sentence);
std::vector<std::string> arr;
arr.reserve(1 + std::count(std::cbegin(sentence), std::cend(sentence), ' '));
std::string word;
while(ss >> word) {
arr.push_back(word);
}

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.
This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.
You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.
I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

How do I check for stored "\t" in a string?

Can someone explain to me how to properly search for a "tab" character stored in a string class?
For example:
text.txt contents:
std::cout << "Hello"; // contains one tab space
User enters on prompt: ./a.out < text.txt
main.cpp:
string arrey;
getline(cin, arrey);
int i = 0;
while( i != 10){
if(arrey[i] == "\t") // error here
{
std::cout << "I found a tab!!!!"
}
i++;
}
Since there is only one tab space in the textfile, I am assuming it is stored in index [0], but the problem is that I can't seem to make a comparison and I don't know any other way of searching it. Can someone help explain an alternative?
Error: ISO C++ forbids comparison between pointer and integer
First of all, what is i? And secondly, when you use array-indexing of a std::string object, you get a character (i.e. a char) and not a string.
The char is converted to an int and then the compiler tries to compare that int with the pointer to the string literal, and you can't compare plain integers with pointers.
You can however compare a character with another character, like in
arrey[i] == '\t'
std::string::find() might help.
Try this:
...
if(arrey.find('\t') != string::npos)
{
std::cout << "I found a tab!!!!";
}
More info on std::string::find is available here.
Why not using what C++ library provides? You could do it like this:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string arrey;
getline(cin, arrey);
if (arrey.find("\t") != std::string::npos) {
std::cout << "found a tab!" << '\n';
}
return 0;
}
The code is based on this answer. Here is the ref for std::find.
About your edit, how are sure that the input is going to be 10 positions? That might be too little or too big! If it is less than the actual size of the input, you won't look all the characters of the string and if it is too big, you are going to overflow!
You could use .size(), which says the size of the string and use a for loop like this:
#include <iostream>
#include <string>
using namespace std;
int main() {
string arrey;
getline(cin, arrey);
for(unsigned int i = 0; i < arrey.size(); ++i) {
if (arrey[i] == '\t') {
std::cout << "I found a tab!!!!";
}
}
return 0;
}

C++ Remove punctuation from String

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}
Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);
POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.
#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.
ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.
Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .
The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}
Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.
I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}
Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful
i got it.
size_t found = text.find('.');
text.erase(found, 1);