Separator character in string c++ - c++

This is the requirement: Read a string and loop it, whenever a new word is encountered insert it into std::list. If the . character has a space, tab, newline or digit on the left and a digit on the right then it is treated as a decimal point and thus part of a word. Otherwise it is treated as a full stop and a word separator.
And this is the result I run from the template program:
foo.bar -> 2 words (foo, bar)
f5.5f -> 1 word
.4.5.6.5 -> 1 word
d.4.5f -> 3 words (d, 4, 5f)
.5.6..6.... -> 2 words (.5.6, 6)
It seems very complex for me in first time dealing with string c++. Im really stuck to implement the code. Could anyone suggest me a hint ? Thanks
I just did some scratch ideas
bool isDecimal(std::string &word) {
bool ok = false;
for (unsigned int i = 0; i < word.size(); i++) {
if (word[i] == '.') {
if ((std::isdigit(word[(int)i - 1]) ||
std::isspace(word[(int)i -1]) ||
(int)(i - 1) == (int)(word.size() - 1)) && std::isdigit(word[i + 1]))
ok = true;
else {
ok = false;
break;
}
}
}
return ok;
}
void checkDecimal(std::string &word) {
if (!isDecimal(word)) {
std::string temp = word;
word.clear();
for (unsigned int i = 0; i < temp.size(); i++) {
if (temp[i] != '.')
word += temp[i];
else {
if (std::isalpha(temp[i + 1]) || std::isdigit(temp[i + 1]))
word += ' ';
}
}
}
trimLeft(word);
}

I think you may be approaching the problem from the wrong direction. It seems much easier if you turn the condition upside down. To give you some pointers in a pseudocode skeleton:
bool isSeparator(const std::string& string, size_t position)
{
// Determine whether the character at <position> in <string> is a word separator
}
void tokenizeString(const std::string& string, std::list& wordList)
{
// for every character in string
// if(isSeparator(character) || end of string)
// list.push_back(substring from last separator to this one)
}

I suggest to implement it using flex and bison with c++ implementation

Related

Pangram String Error On My Hackerrank Code

I found this question on Hackerrank where I have to write a method to say whether or not a given string is a pangram. A sentence is a pangram if it contains all 26 letters of the alphabet. The input will only contain characters that are alphabetical (uppercase or lowercase) and spaces.
Here's the code I've gotten so far, where I use a set to keep track of which letters are present in the string. However, the code just keeps running infinitely in the while loop below.
string pangrams(string s) {
set<char> set{};
int i=0;
while (i!=s.length()) {
if(s[i]!='\0') {
set.insert(tolower(s[i]));
}
}
if (set.size() == 27) {
return "pangram";
} else {
return "not pangram";
}
}
Your function needs a slight modification. Firstly, you aren't incrementing i which makes your function go into infinite loop. Other modification is explained in code below -
string pangrams(string s) {
set<char> set{};
int i=0;
while (i!=s.length()) {
if(s[i]!=' ') { # if character is space, ignore it
set.insert(tolower(s[i]));
}
i++; # Main reason of runtime error - you missed incrementing i
}
if (set.size() == 26) { # 26 and not 27. There may be strings without space that are pangrams. So we wont add space into our set.
return "pangram";
} else {
return "not pangram";
}
}
Also, you don't need to check s[i]!='\0' since a c++ string isn't terminated with \0 character. Only checking i!=s.length() will be enough.
Hope this clears your issue !
You're never incrementing i, so your code will run infinitely. I would recommend a for loop for (int i = 0; i < s.length(); i ++) or a for-each loop for (char c : s)
Instead of using a set, you could also try this, where each character corresponds to an index in a bool[]
bool exists[27];
for (char c : s) {
if ('a' <= c && c <= 'z') {
exists[c - 'a'] = true;
} else if ('A' <= c && c <= 'A') {
exists[c - 'A'] = true;
} else if (c == ' ') {
exists[26] = true;
}
}
for (bool b : exists) {
if (!b) return false;
}
return true;

makeValidWord(std::string word) not working properly

I'm programming a hash table thing in C++, but this specific piece of code will not run properly. It should return a string of alpha characters and ' and -, but I get cases like "t" instead of "art" when I try to input "'aRT-*".
isWordChar() return a bool value depending on whether the input is a valid word character or not using isAlpha()
// Words cannot contain any digits, or special characters EXCEPT for
// hyphens (-) and apostrophes (') that occur in the middle of a
// valid word (the first and last characters of a word must be an alpha
// character). All upper case characters in the word should be convertd
// to lower case.
// For example, "can't" and "good-hearted" are considered valid words.
// "12mOnkEYs-$" will be converted to "monkeys".
// "Pa55ive" will be stripped "paive".
std::string WordCount::makeValidWord(std::string word) {
if (word.size() == 0) {
return word;
}
string r = "";
string in = "";
size_t incr = 0;
size_t decr = word.size() - 1;
while (incr < word.size() && !isWordChar(word.at(incr))) {
incr++;
}
while (0 < decr && !isWordChar(word.at(decr))) {
decr--;
}
if (incr > decr) {
return r;
}
while (incr <= decr) {
if (isWordChar(word.at(incr)) || word.at(incr) == '-' || word.at(incr) == '\'') {
in =+ word.at(incr);
}
incr++;
}
for (size_t i = 0; i < in.size(); i++) {
r += tolower(in.at(i));
}
return r;
}
Assuming you can use standard algorithms its better to rewrite your function using them. This achieves 2 goals:
code is more readable, since using algorithms shows intent along with code itself
there is less chance to make error
So it should be something like this:
std::string WordCount::makeValidWord(std::string word) {
auto first = std::find_if(word.cbegin(), word.cend(), isWordChar);
auto last = std::find_if(word.crbegin(), word.crend(), isWordChar);
std::string i;
std::copy_if(first, std::next(last), std::back_inserter(i), [](char c) {
return isWordChar(c) || c == '-' || c == '\'';
});
std::string r;
std::transform(i.cbegin(), i.cend(), std::back_inserter(r), std::tolower);
return r;
}
I am going to echo #Someprogrammerdude and say: Learn to use a debugger!
I pasted your code into Visual Studio (changed isWordChar() to isalpha()), and stepped it through with the debugger. Then it was pretty trivial to notice this happening:
First loop of while (incr <= decr) {:
Second loop:
Ooh, look at that; the variable in does not update correctly - instead of collecting a string of the correct characters it only holds the last one. How can that be?
in =+ word.at(incr); Hey, that is not right, that operator should be +=.
Many errors are that easy and effortless to find and correct if you use a debugger. Pick one up today. :)

How to remove an entire word in a sentence c++

I want to remove specific word in a sentences and I have tried to breakdown the sentences into word and compare the word but when i call the erase function , the index will update. I have tried another method while it will delete substring in a word which i dont want to. Can anyone help me out a bit? what approach i should use.
input
The house whirled around two or three times and rose slowly through the air.
output
The house whirled around two or three times and rose slowly through air.
here is my prototype of the function
int RemoveWordFromLine(string line, string word)
{
// ==========================
string tmp_str="",spacebar=" ";
int start=0,end=-1;
for(int i=0;i<line.length();i++)
{
if(isspace(line[i])||int(line[i])==44||int(line[i])==46)
{
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
if(tmp_str==word)
{
line.erase(start,end);
}
tmp_str="";
start=i+1;
end=i;
} else
{
tmp_str+=line[i];
end++;
}
}
if(tmp_str==word)
{
line.erase(start,end);
}
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
cout<<line<<endl;
// ==========================
}
ideone's link
You can write your function in the following way:
void RemoveWordFromLine(std::string &line, const std::string &word)
{
auto n = line.find(word);
if (n != std::string::npos)
{
line.erase(n, word.length());
}
}
And use it like:
std::string line("This is a wrong line");
RemoveWordFromLine(line, "wrong");
printf("line is: '%s'\n", line.c_str());
Prints out:
line is: 'This is a line'
You are passing end position instead of length of string to be deleted.
You just need to replace
if(tmp_str==word)
{
line.erase(start,end);
}
with
if(tmp_str==word)
{
line.erase(start,word.length()+1); //+1 to prevent 2 spaces
}
You could also do line.erase(start-1,word.length()+1); if you would like to keep '.' or ',' after deleted word.
With <regex>, you might do:
std::string RemoveWordFromLine(const std::string& line, std::string word)
{
// \s* : extra optional spaces at the start
// \b : word boundary
// Ideally, we should check that `word` doesn't break regex too:
std::regex reg("\\s*\\b" + word + "\\b");
return std::regex_replace(line, reg, "");
}
Demo
Remove multiple occurrences as well with this function
void removeWords(string& s, const string& p) {
size_t n = p.length();
for (size_t i = s.find(p);
i != string::npos;
i = s.find(p))
// take care of subword
if((i==0 && (i+n <= s.length()) && (s[i+n+1] == ' ' || s[i+ n+1] == '.'))|| //first word
(i + n <= s.length() && s[i - 1] == ' ' && (s[i+n]=='\0' ||s[i + n] == '.' || s[i+ n+1] == ' '))) //other word
s.erase(i, n);
}

ask for text to edit, text formatting

I would like to make a program that asks for text (a paragraph with several words) that would be separated by commas.
To transform the text and add a tag between the two, like to format the text to html text
Example:
word1, word2, word3
to
<a> word1 </a>, <a> word2 </a>, <a> word3 </a>
So I started doing this code but I do not know how to continue. How can I test the text to find the front of the word? I imagine with ASCII tests?
Maybe with a table that will test every case ?
I do not necessarily ask the complete answer but maybe a direction to follow could help.
#include <iostream>
#include <iomanip>
#include <string> //For getline()
using namespace std;
// Creating class
class GetText
{
public:
string text;
string line; //Using this as a buffer
void userText()
{
cout << "Please type a message: ";
do
{
getline(cin, line);
text += line;
}
while(line != "");
}
void to_string()
{
cout << "\n" << "User's Text: " << "\n" << text << endl;
}
};
int main() {
GetText test;
test.userText();
test.to_string();
system("pause");
return 0;
}
The next thing you would need to do is to split your input by a deltimeter (in your case ',') into a vector and later combine everything with pre and posfixes. C++ does not support splitting by default, you would have to be creative or search for a solution like here.
If you want to keep it really simple, you can detect word boundaries by checking two characters at a time. Here's a working example.
using namespace std;
#include <iostream>
#include <string>
#include <cctype>
typedef enum boundary_type_e {
E_BOUNDARY_TYPE_ERROR = -1,
E_BOUNDARY_TYPE_NONE,
E_BOUNDARY_TYPE_LEFT,
E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;
typedef struct boundary_s {
boundary_type_t type;
int pos;
} boundary_t;
bool is_word_char(int c) {
return ' ' <= c && c <= '~' && !isspace(c) && c != ',';
}
boundary_t maybe_word_boundary(string str, int pos) {
int len = str.length();
if (pos < 0 || pos >= len) {
return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
} else {
if (pos == 0 && is_word_char(str[pos])) {
// if the first character is word-y, we have a left boundary at the beginning
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
} else if (pos == len - 1 && is_word_char(str[pos])) {
// if the last character is word-y, we have a right boundary left of the null terminator
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
// if we have a delimiter followed by a word char, we have a left boundary left of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
// if we have a word char followed by a delimiter, we have a right boundary right of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
}
return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
}
}
int main() {
string str;
string ins_left("<tag>");
string ins_right("</tag>");
getline(cin, str);
// can't use length for the loop condition without recalculating it all the time
for (int i = 0; str[i] != '\0'; i++) {
boundary_t boundary = maybe_word_boundary(str, i);
if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
str.insert(boundary.pos, ins_left);
i += ins_left.length();
} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
str.insert(boundary.pos, ins_right);
i += ins_right.length();
}
}
}
It would be better to use enum class but I forgot the notation. You can also copy to a buffer instead of generating the new string in-place, I was just trying to keep it simple. Feel free to expand it to a class based C++ style. To get your exact desired output, strip the spaces first and add spaces to ins_left and ins_right.

Loop quitting for no reason

I have a question regarding C++. This is my current function:
string clarifyWord(string str) {
//Remove all spaces before string
unsigned long i = 0;
int currentASCII = 0;
while (i < str.length()) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i++;
continue;
} else {
break;
}
}
//Remove all spaces after string
i = str.length();
while (i > -1) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i--;
continue;
} else {
break;
}
}
return str;
}
Just to get the basic and obvious things out of the way, I have #include <string> and using namespace std; so I do have access to the string functions.
The thing is though that the loop is quitting and sometimes skipping the second loop. I am passing in the str to be " Cheese " and it should remove all the spaces before the string and after the string.
In the main function, I am also assigning a variable to clarifyWord(str) where str is above. It doesn't seem to print that out either using cout << str;.
Is there something I am missing with printing out strings or looping with strings? Also ASCII code 32 is Space.
Okay so the erase function you are calling looks like this:
string& erase ( size_t pos = 0, size_t n = npos );
The n parameter is the number of items to delete. The npos means, delete everything up until the end of the string, so set the second parameter to 1.
str.erase(i,1)
[EDIT]
You could change the first loop to this:
while (str.length() > 0 && str[0] == ' ')
{
str.erase(0,1);
}
and the second loop to this:
while (str.length() > 0 && str[str.length() - 1] == ' ')
{
str.erase(str.length() - 1, 1);
}
In your second loop, you can't initialize i to str.length().
str[str.length()] is going to be after the end of your string, and so is unlikely to be a space (thus triggering the break out of the second loop).
You're using erase (modifying the string) while you're in a loop checking its size. This is a dangerous way of processing the string. As you return a new string, I would recommend you first to search for the first occurrence in the string of the non-space character, and then the last one, and then returning a substring. Something along the lines of (not tested):
size_t init = str.find_first_not_of(' ');
if (init == std::string::npos)
return "";
size_t fini = std.find_last_not_of(' ');
return str.substr(init, fini - init + 1);
You see, no loops, erases, etc.
unsigned long i ... while (i > -1) Well, that's not right, is it? How would you expect that to work? The compiler will in fact convert both operands to the same type: while (i > static_cast<unsigned long>(-1)). And that's just another way to write ULONG-MAX, i.e. while (i > ULONG_MAX). In other words, while(false).
You're using erase incorrectly. It'll erase from pos to npos.
i.e. string& erase ( size_t pos = 0, size_t n = npos );
See: http://www.cplusplus.com/reference/string/string/erase/
A better way to do this is to note the position of the first non space and where the spaces occur at the end of the string. Then use either substr or erase twice.
You also don't need to go to the trouble of doing this:
currentASCII = int(str[i]);
if (currentASCII == 32) {
Instead do this:
if (str[i] == ' ') {
Which I think you'll agree is a lot easier to read.
So, you can shorten it somewhat with something like: (not tested but it shouldn't be far
off)
string clarifyWord(string str) {
int start = 0, end = str.length();
while (str[start++] == ' ');
while (str[end--] == ' ');
return str.substr(start, end);
}