C++ Remove punctuation from String - c++

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}

Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);

POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.

#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.

ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.

Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .

The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);

#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}

#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}

Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.

I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}

Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful

i got it.
size_t found = text.find('.');
text.erase(found, 1);

Related

Lexical Analyzer Project - Vector not outputting correctly

I have the following code which is part of a larger project. What this code is supposed to do is go through the line character by character looking for "tokens." The token I am looking for in this code is an ID. Which is defined as a letter followed by zero or more numbers or letters.
When a letter is detected it goes into the inner loop and loops through the next few characters, adding each character or letter to the idstring, until it finds the end of ID character(which is defined in the code) and then adds that idstring to a vector. At the end of the line it should output each element of the vector. Im not getting the output I need. I hope this is enough information to understand what is going on in the code. If someone could help me fix this problem I would be very great full. Thank you!
The output I need: ab : ab
What I get: a : a
#include <iostream>
#include <regex>
#include <string>
#include <vector>
int main()
{
std::vector<std::string> id;
std::regex idstart("[a-zA-Z]");
std::regex endID("[^a-z]|[^A-Z]|[^0-9]");
std::string line = "ab ab";
//Loops character by character through the line
//Adding each recognized token to the appropriate vector
for ( int i = 0; i<line.length(); i++ )
{
std::string tempstring(1,line[i]);
//Character is letter
if ( std::regex_match(tempstring,idstart) )
{
std::string tempIDString = tempstring;
int lineInc = 0;
for ( int j = i + 1; j<line.length(); j++)
{
std::string tempstring2(1,line[j]);
//Checks next character for end of potential ID
if ( std::regex_match(tempstring2,endID) )
{
i+=lineInc+1;
break;
}
else
{
tempIDString+=tempstring2;
lineInc++;
}
}
id.push_back(tempIDString);
}
}
std::cout << id.at(0) << " : " << id[1] << std::endl;
return 0;
}
The question is 2.5 year old and now you will maybe laugh seeing it. You break; the inner for when finding the second charcter that matches and so you will never assign tempstring2 to tempstring1.
But let's forget about that code. There is no good design here.
You had a good idea to use std::regex, but you did not know, how it worked.
So lets have a look at the correct implementation:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <regex>
// Our test data (raw string). So, containing also \n and so on
std::string testData(
R"#( :-) IDcorrect1 _wrongID I2DCorrect
3FALSE lowercasecorrect Underscore_not_allowed
i3DCorrect,i4 :-)
}
)#");
std::regex re("(\\b[a-zA-Z][a-zA-Z0-9]*\\b)");
int main(void)
{
// Define the variable id as vector of string and use the range constructor to read the test data and tokenize it
std::vector<std::string> id{ std::sregex_token_iterator(testData.begin(), testData.end(), re, 1), std::sregex_token_iterator() };
// For debug output. Print complete vector to std::cout
std::copy(id.begin(), id.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
This does all the work in the definition of the variable and by calling the range constructor. So, a typical one-liner.
Hope somebody can learn from this code . . .

How to change each word in a string vector to upper case

I was inquiring about reading a sequence of words and storing the values in a vector. Then proceed to change each word in the vector to uppercase and print the out put with respect to eight word to a line. I think my code is either slow or running infinitely as i can't seem to achieve an output.
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main() {
string word;
vector<string> text;
while (getline(cin, word)) {
text.push_back(word);
}
for (auto index = text.begin(); index != text.end(); ++index) {
for ( auto it = word.begin(); it != word.end(); ++it)
*it = toupper(*it);
/*cout<< index << " " << endl;*/
}
for (decltype(text.size()) i = 0; i != 8; i++)
cout << text[i] << endl;
return 0;
}
At least as far as I can tell, the idea here is to ignore the existing line structure, and write out 8 words per line, regardless of line breaks in the input data. Assuming that's correct, I'd start by just reading words from the input, paying no attention to the existing line breaks.
From there, it's a matter of capitalizing the words, writing them out, and (if you're at a multiple of 8, a new-line.
I would also use standard algorithms for most of the work, instead of writing my own loops to do the pars such as reading and writing the data. Since the pattern is basically just reading a word, modifying it, then writing out the result, it fits nicely with the std::transform algorithm.
Code to do that could look something like this:
#include <string>
#include <iostream>
#include <algorithm>
std::string to_upper(std::string in) {
for (auto &ch : in)
ch = toupper((unsigned char) ch);
return in;
}
int main() {
int count = 0;
std::transform(
std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout),
[&](std::string const &in) {
char sep = (++count % 8 == 0) ? '\n' : ' ';
return to_upper(in) + sep;
});
}
We could implement capitalizing each string as a second lambda, nested inside the first, but IMO, that starts to become a bit unreadable. Likewise, we could use std::tranform to implement the upper-case transformation inside of to_upper.
I'll rewrite my answer here:
Your outer for loop defines index to cycle through text, but you never use index inside it. The inner loop uses word, but word is still the last one the user entered. You should change the inner loop so that it uses index instead of word, like this:
for ( auto it = index->begin(); it != index->end(); ++it)
This is effectively an infinite loop:
while (getline(cin, word)) {
text.push_back(word);
}
getline(cin, word) reads a line (ending in '\n') from stdin, and puts it into word. It then returns cin itself (which will evaluate to true if the read was successful). You seem to be using it to get a space-delimited word, rather than a whole line, but that's not what it does. Since you put it in the condition of the while, after you enter a line, it will wait for another line.
This loop only breaks when getline fails. For example, by hitting an End of File character. I expect you're using the console and pressing Enter. In that case, you are never causing getline to fail. (If you're feeding a file into stdin, it should work.)
The typical solution to this is to have some sort of way of indicating a stop (such as an "Enter an empty line to stop" or "Write \"STOP\" to stop", and then checking for that before inserting the line into the vector). For you, the solution is to read in a SINGLE line, and then break it up into words (for example, using the sstream library).
You can detect whether the program is doing actual work (rather than waiting for more input) by viewing your CPU use. In Windows, this is CTRL+SHIFT+ESC -> Performance, and -> Processes to see your program in particular. You will find that the program isn't actually using the CPU (because it's waiting for more input).
You should try inserting print statements into your program to figure out where it gets up to. You will find it never goes past the for-loop.
Short Answer
for (string &str : vec)
{
transform(str.begin(), str.end(), str.begin(), [](char c) { return std::toupper(c); });
}
Complete working code as example:
#include <iostream>
#include <string>
#include <vector>
#include <cctype>
#include <algorithm>
using namespace std;
int main()
{
vector<string> vec;
string str;
while (cin >> str)
{
vec.push_back(str);
}
for (string &str : vec)
{
transform(str.begin(), str.end(), str.begin(), [](char c)
{ return toupper(c); });
}
for (auto str : vec)
{
cout << str << endl;
}
return 0;
}

How to upper-case first letter

How can I write a C program that reads your first and last names and than converts them to upper-case and lower-case letters...I know how upper and lower letters but dk how to do for first and last names..any sugegstion?...
#include<iostream>
#include<string.h>
using namespace std;
int i;
char s[255];
int main()
{
cin.get(s,255,'\n');
int l=strlen(s);
for(i=0;i<l;i++)
......................................
cout<<s; cin.get();
cin.get();
return 0;
}
You can read the first and last names directly into std::string's. There is no reason to manage the buffers yourself or guess what size they will or should be. This can be done with something like this
std::string first, last;
// Read in the first and last name.
std::cin >> first >> last;
You will want to convert the string to upper/lower case based on your requirements. This can be done with std::toupper and std::tolower which are available in the C++ Standard Library. Just include <cctype> and they are available. There are several ways to do this but one easy way is to convert the entire string to lower case then convert the first character to upper case.
// set all characters to lowercase
std::transform(str.begin(), str.end(), str.begin(), std::tolower);
// Set the first character to upper case.
str[0] = static_cast<std::string::value_type>(toupper(str[0]));
Putting this all together you get something that looks a little like this
#include <iostream>
#include <string>
#include <cctype>
void capitalize(std::string& str)
{
// only convert if the string is not empty
if (str.size())
{
// set all characters to lowercase
std::transform(str.begin(), str.end(), str.begin(), std::tolower);
// Set the first character to upper case.
str[0] = static_cast<std::string::value_type>(toupper(str[0]));
}
}
int main()
{
std::string first, last;
// Read in the first and last name.
std::cin >> first >> last;
// let's capialize them.
capitalize(first);
capitalize(last);
// Send them to the console!
std::cout << first << " " << last << std::endl;
}
Note: Including statements like using namespace std; is considered bad form as it pulls everything from the std namespace into the current scope. Avoid is as much as possible. If your professor/teacher/instructor uses it they should be chastised and forced to watch the movie Hackers until the end of time.
Since you are using C++, you should use std::string instead of a char array, and getline() does exactly what you want.
#include <iostream>
#include <string>
int main()
{
std::string first, last;
while (std::getline(cin, first, ' '))
{
std::getline(cin, last);
//Convert to upper, lower, whatever
}
}
You can leave out the loop if you only want it to get one set of input per run. The third parameter of getline() is a delimiter, which will tell the function to stop reading at when it reaches that character. It is \n by default, so you don't need to include it if you want to read the rest of the line.

Anagram solver in C++

I am working on a project for school and I am stuck on what I believe is just a small part but I cant figure it out.
Here is what I have so far:
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <vector>
#include <algorithm>
#include <set>
using namespace std;
int main(int argc, char* argv[])
{
set<string> setwords;
ifstream infile;
infile.open("words.txt"); //reads file "words.txt"
string word = argv[1]; // input from command line
transform(word.begin(), word.end(), word.begin(), tolower); // transforms word to lower case.
sort(word.begin(), word.end()); // sorts the word
vector<string> str; // vector to hold all variations of the word
do {
str.push_back(word);
}
while (next_permutation(word.begin(), word.end())); // pushes all permutations of "word" to vector str
if (!infile.eof())
{
string items;
infile >> items;
setwords.insert(items); //stores set of words from file
}
system("PAUSE");
return 0;
}
Now I need to compare the words from the file and the permutations stored in vector str
and print out the ones that are real words.
I know I need to use the find method of the set class. I am just not sure how to go about that. I was trying something like this with no luck, but my thought process is probably wrong.
for (unsigned int i = 0; i < str.size(); i++)
if (setwords.find(word) == str[i])
cout << str[i] << endl;
If you guys could help or point me in the right direction I would greatly appreciate it.
First, I'd like to say that this is a well-asked question. I appreciate new users that take the time to articulate their problem in detail.
The problem is that the find() method of a std::set<> returns an iterator object pointing to the value that it finds, or the end() of the container if it can't. When you compare it with str[i] (a string) it can't find a suitable overload of operator==() that takes both the iterator and a string.
Instead of making a full-on comparison with the string, you can instead compare the return value with end() to determine if it found the string:
if (setwords.find(str[i]) != setwords.end())
// ^^^^^^ ^^^^^^^^^^^^^^
If the expression returns true, then it sucessfully found the string inside the set.
There's also another potential problem I'd like to address in your code. Using if (!file.eof()) is the wrong way to condition your input. You should instead make the extract part of the condition, like this:
for (std::string item; infile >> item; )
{
setwords.insert(item);
}
Here's another way, using std::istream_iterator<>:
setwords.insert(std::istream_iterator<std::string>(infile),
std::istream_iterator<std::string>());
You actually are really close to having it right.
The set::find method doesn't return the value if it is found in the set, but rather an iterator object that points to the value. So your if statement is comparing the current string to the returned iterator object instead of the value that the iterator points to.
To get the value than an iterator points to, you just have to dereference it like you would a pointer, by prefixing it with an asterisk. Which means that you probably intended your if statement look like this:
if (*(setwords.find(word)) == str[i])
This would work for cases where the value was found in the set, but would be problematic for cases where the value was not found. If the value is not found, an iterator that points to the position after the last item in the set is returned - and you shouldn't try to dereference such an iterator (because it doesn't point to a valid object).
The way these checks are usually conducted is by comparing the returned iterator with the iterator that points to the end of the set (e.g., set::end, in this case). If the iterators do not match, that means the item was found.
if (setwords.find(word) != setwords.end())
cout << word << endl;
I think you need to write something like this:
for (unsigned int i = 0; i < str.size(); i++)
if (setwords.find(str[i]) != setwords.end())
cout << str[i] << endl;
But I think you don't need to store all permutations. You can store set of words with sorted letters. And compare it with sorted word.....
here is simpler solution
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <vector>
#include <algorithm>
#include <map>
using namespace std;
int main(int argc, char* argv[])
{
map<string, string> mapwords;
ifstream infile;
infile.open("words.txt"); //reads file "words.txt"
string word = argv[1]; // input from command line
transform(word.begin(), word.end(), word.begin(), tolower); // transforms word to lower case.
sort(word.begin(), word.end()); // sorts the word
if (!infile.eof())
{
string item;
infile >> item;
string sorted_item = item;
sort(sorted_item.begin(), sorted_item.end()); // sorts the word
mapwords.insert(make_pair(sorted_item, item)); //stores set of words from file
}
map<string, string>::iterator i = mapwords.find(word);
if(i != mapwords.end())
cout << i->second << endl;
system("PAUSE");
return 0;
}

Create a new string from prefix up to character position in C++

How can I find the position of a character in a string? Ex. If I input "abc*ab" I would like to create a new string with just "abc". Can you help me with my problem?
C++ standard string provides a find method:
s.find(c)
returns the position of first instance of character c into string s or std::string::npos in case the character is not present at all. You can also pass the starting index for the search; i.e.
s.find(c, x0)
will return the first index of character c but starting the search from position x0.
std::find returns an iterator to the first element it finds that compares equal to what you're looking for (or the second argument if it doesn't find anything, in this case the end iterator.) You can construct a std::string using iterators.
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
std::string s = "abc*ab";
std::string s2(s.begin(), std::find(s.begin(), s.end(), '*'));
std::cout << s2;
return 0;
}
If you are working with std::string type, then it is very easy to find the position of a character, by using std::find algorithm like so:
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
string first_string = "abc*ab";
string truncated_string = string( first_string.cbegin(), find( first_string.cbegin(), first_string.cend(), '*' ) );
cout << truncated_string << endl;
}
Note: if your character is found multiple times in your std::string, then the find algorithm will return the position of the occurrence.
Elaborating on existing answers, you can use string.find() and string.substr():
#include <iostream>
#include <string>
int main() {
std::string s = "abc*ab";
size_t index = s.find("*");
if (index != std::string::npos) {
std::string prefix = s.substr(0, index);
std::cout << prefix << "\n"; // => abc
}
}