Extracting integers from strings in C++ with arbitrary structure - c++

This seems like a question that should be easy to search for, but any answers out there seem to be drowned out by a sea of questions asking the more common problem of converting a string to an integer.
My question is: what's an easy way to extract integers from std::strings that might look like "abcd451efg" or "hel.lo42-world!" or "hide num134rs here?" I see that I can use isDigit to manually parse the strings myself, but I'm wondering if there is a more standard way in the vein of atoi or stoi, etc.
The outputs above would be 451, 42, and 134. We can also assume there is only one integer in a string (although a general solution wouldn't hurt). So we don't have to worry about strings like "abc123def456".
Java has an easy solution in the form of
Integer.parseInt(str.replaceAll("[\\D]", ""));
does C++ have something as straightforward?

You can use
string::find_first_of("0123456789") to get the position of the first digit, then string::find_last_of("0123456789") to get the position of the last digit, and finally use an atoi on the substring defined by the two positions. I cannot think of anything simpler (without regex).
BTW, this works only when you have a single number inside the string.
Here is an example:
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int main()
{
string s = "testing;lasfkj358kdfj-?gt";
size_t begin = s.find_first_of("0123456789");
size_t end = s.find_last_of("0123456789");
string num = s.substr(begin, end - begin + 1);
int result = atoi(num.c_str());
cout << result << endl;
}
If you have more than 1 number, you can combine string::find_first_of with string::find_first_not_of to get the beginning and the end of each number inside the string.
This code is the general solution:
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int main()
{
string s = "testing;lasfkj358kd46fj-?gt"; // 2 numbers, 358 and 46
size_t begin = 0, end = 0;
while(end != std::string::npos)
{
begin = s.find_first_of("0123456789", end);
if(begin != std::string::npos) // we found one
{
end = s.find_first_not_of("0123456789", begin);
string num = s.substr(begin, end - begin);
int number = atoi(num.c_str());
cout << number << endl;
}
}
}

atoi can extract numbers from strings even if there are trailing non-digits
int getnum(const char* str)
{
for(; *str != '\0'; ++str)
{
if(*str >= '0' && *str <= '9')
return atoi(str);
}
return YOURFAILURENUMBER;
}

Here's one way
#include <algorithm>
#include <iostream>
#include <locale>
#include <string>
int main(int, char* argv[])
{
std::string input(argv[1]);
input.erase(
std::remove_if(input.begin(), input.end(),
[](char c) { return !isdigit(c, std::locale()); }),
input.end()
);
std::cout << std::stoll(input) << '\n';
}
You could also use the <functional> library to create a predicate
auto notdigit = not1(
std::function<bool(char)>(
bind(std::isdigit<char>, std::placeholders::_1, std::locale())
)
);
input.erase(
std::remove_if(input.begin(), input.end(), notdigit),
input.end()
);
It's worth pointing out that so far the other two answers hard-code the digit check, using the locale version of isdigit guarantees your program will recognize digits according to the current global locale.

Related

How to delete part of a string c++ [duplicate]

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}
Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);
POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.
#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.
ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.
Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .
The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}
Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.
I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}
Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful
i got it.
size_t found = text.find('.');
text.erase(found, 1);

Splitting a string withisalnum and store into a vector of strings

I'm working with a string and trying to break it up whenever it is non-alphanumeric (not a-z, A-Z, and 0-9). I found isalnum to be a useful function to use.
For example, if I have the string "bob-michael !#mi%#pa hi3llary-tru1mp"
The vector should contain: bob, michael, mi, pa, hi3llary, and tru1mp.
My current code is:
vector<string> result;
string something = "bob-michael !#mi%#pa hi3llary-tru1mp";
stringstream pie(something);
//not sure what to do after this point(I know, not a lot. See below for my current thinking)
My idea was using a loop and while isalnum results in 1 continue forward, if isalnum results in 0 then push whatever I have so far into the vector of strings. Perhaps I could use isalnum as a delim? I'm having a hard time taking my idea and writing this. Could anyone point me in the right direction? Thanks!
Edit: Thanks everyone for the help.
Something along these lines, perhaps:
std::vector<std::string> result;
std::string something = "bob-michael !#mi%#pa hi3llary-tru1mp";
std::regex token("[A-Za-z0-9]+");
std::copy(
std::sregex_token_iterator(something.begin(), something.end(), token),
std::sregex_token_iterator(),
std::back_inserter(result));
Demo
The std::replace_if trick I commented on turned out to not be quite as trivial as I thought it was because std::isalnum doesn't return bool.
#include <iostream>
#include <vector>
#include <string>
#include <cctype>
#include <algorithm>
#include <sstream>
#include <iterator>
int main()
{
std::vector<std::string> result;
std::string something = "bob-michael !#mi%#pa hi3llary-tru1mp";
// I expected replace_if(something.begin(), something.end(), &isalnum, " ");
// would work, but then I did a bit of reading and found is alnum returned int,
// not bool. resolving this by wrapping isalnum in a lambda
std::replace_if(something.begin(),
something.end(),
[](char val)->bool {
return std::isalnum(val) == 0;
},
' ');
std::stringstream pie(something);
// read stream into vector
std::copy(std::istream_iterator<std::string>(pie),
std::istream_iterator<std::string>(),
std::back_inserter<std::vector<std::string>>(result));
// prove it works
for(const std::string & str: result)
{
std::cout << str << std::endl;
}
}
You can also iterate through the string and then check if the current index is a letter or not, then if not break it then store to vector
std::string something = "bob-michael !#mi%#pa hi3llary-tru1mp";
std::vector<std::string> result;
std::string newResult = "";
for ( int a = 0; a < something.size(); a++ )
{
if((something[a] >= 'a' && something[a] <= 'z')||(something[a] >= 'A' && something[a] <= 'Z')
|| (something[a] >= '0' && something[a] <= '9'))
{
newResult += something[a];
}
else
{
if(newResult.size() > 0)
{
result.push_back(newResult);
newResult = "";
}
}
}
result.push_back(newResult);

Remove character from array where spaces and punctuation marks are found [duplicate]

This question already has answers here:
C++ Remove punctuation from String
(12 answers)
Closed 9 years ago.
In my program, I am checking whole cstring, if any spaces or punctuation marks are found, just add empty character to that location but the complilor is giving me an error: empty character constant.
Please help me out, in my loop i am checking like this
if(ispunct(str1[start])) {
str1[start]=''; // << empty character constant.
}
if(isspace(str1[start])) {
str1[start]=''; // << empty character constant.
}
This is where my errors are please correct me.
for eg the word is str,, ing, output should be string.
There is no such thing as an empty character.
If you mean a space then change '' to ' ' (with a space in it).
If you mean NUL then change it to '\0'.
Edit: the answer is no longer relevant now that the OP has edited the question. Leaving up for posterity's sake.
If you're wanting to add a null character, use '\0'. If you're wanting to use a different character, using the appropriate character for that. You can't assign it nothing. That's meaningless. That's like saying
int myHexInt = 0x;
or
long long myIndeger = L;
The compiler will error. Put in the value you wanted. In the char case, that's a value from 0 to 255.
UPDATE:
From the edit to OP's question, it's apparent that he/she wanted to trim a string of punctuation and space characters.
As detailed in the flagged possible duplicate, one way is to use remove_copy_if:
string test = "THisisa test;;';';';";
string temp, finalresult;
remove_copy_if(test.begin(), test.end(), std::back_inserter(temp), ptr_fun<int, int>(&ispunct));
remove_copy_if(temp.begin(), temp.end(), std::back_inserter(finalresult), ptr_fun<int, int>(&isspace));
ORIGINAL
Examining your question, replacing spaces with spaces is redundant, so you really need to figure out how to replace punctuation characters with spaces. You can do so using a comparison function (by wrapping std::ispunct) in tandem with std::replace_if from the STL:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
using namespace std;
bool is_punct(const char& c) {
return ispunct(c);
}
int main() {
string test = "THisisa test;;';';';";
char test2[] = "THisisa test;;';';'; another";
size_t size = sizeof(test2)/sizeof(test2[0]);
replace_if(test.begin(), test.end(), is_punct, ' ');//for C++ strings
replace_if(&test2[0], &test2[size-1], is_punct, ' ');//for c-strings
cout << test << endl;
cout << test2 << endl;
}
This outputs:
THisisa test
THisisa test another
Try this (as you asked for cstring explicitly):
char str1[100] = "str,, ing";
if(ispunct(str1[start]) || isspace(str1[start])) {
strncpy(str1 + start, str1 + start + 1, strlen(str1) - start + 1);
}
Well, doing this just in pure c language, there are more efficient solutions (have a look at #MichaelPlotke's answer for details).
But as you also explicitly ask for c++, I'd recommend a solution as follows:
Note you can use the standard c++ algorithms for 'plain' c-style character arrays also. You just have to place your predicate conditions for removal into a small helper functor and use it with the std::remove_if() algorithm:
struct is_char_category_in_question {
bool operator()(const char& c) const;
};
And later use it like:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
#include <cstring>
// Best chance to have the predicate elided to be inlined, when writing
// the functor like this:
struct is_char_category_in_question {
bool operator()(const char& c) const {
return std::ispunct(c) || std::isspace(c);
}
};
int main() {
static char str1[100] = "str,, ing";
size_t size = strlen(str1);
// Using std::remove_if() is likely to provide the best balance from perfor-
// mance and code size efficiency you can expect from your compiler
// implementation.
std::remove_if(&str1[0], &str1[size + 1], is_char_category_in_question());
// Regarding specification of the range definitions end of the above state-
// ment, note we have to add 1 to the strlen() calculated size, to catch the
// closing `\0` character of the c-style string being copied correctly and
// terminate the result as well!
std::cout << str1 << endl; // Prints: string
}
See this compilable and working sample also here.
As I don't like the accepted answer, here's mine:
#include <stdio.h>
#include <string.h>
#include <cctype>
int main() {
char str[100] = "str,, ing";
int bad = 0;
int cur = 0;
while (str[cur] != '\0') {
if (bad < cur && !ispunct(str[cur]) && !isspace(str[cur])) {
str[bad] = str[cur];
}
if (ispunct(str[cur]) || isspace(str[cur])) {
cur++;
}
else {
cur++;
bad++;
}
}
str[bad] = '\0';
fprintf(stdout, "cur = %d; bad = %d; str = %s\n", cur, bad, str);
return 0;
}
Which outputs cur = 18; bad = 14; str = string
This has the advantage of being more efficient and more readable, hm, well, in a style I happen to like better (see comments for a lengthy debate / explanation).

Splitting a string into multiple strings with multiple delimiters without removing?

I use boost framework, so it could be helpful, but I haven't found a necessary function.
For usual fast splitting I can use:
string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("mM"));
but it removes m and M characters.
I also can't siply use regexp because it searches the string for the longest value which meets a defined pattern.
P.S. There are a lot of similar questions, but they describe this implementation in other programming languages only.
Untested, but rather than using vector<string>, you could try a vector<boost::iterator_range<std::string::iterator>> (so you get a pair of iterators to the main string for each token. Then iterate from (start of range -1 [as long as start of range is not begin() of main string], to end of range)
EDIT: Here is an example:
#include <iostream>
#include <string>
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/range/iterator_range.hpp>
int main(void)
{
std::string str = "FooMBarMSFM";
std::vector<boost::iterator_range<std::string::iterator>> tokens;
boost::split(tokens, str, boost::is_any_of("mM"));
for(auto r : tokens)
{
std::string b(r.begin(), r.end());
std::cout << b << std::endl;
if (r.begin() != str.begin())
{
std::string bm(std::prev(r.begin()), r.end());
std::cout << "With token: [" << bm << "]" << std::endl;
}
}
}
Your need is beyond the conception of split. If you want to keep 'm or M', you could write a special split by strstr, strchr,strtok or find function. You could change some code to produce a flexible split function.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void split(char *src, const char *separator, char **dest, int *num)
{
char *pNext;
int count = 0;
if (src == NULL || strlen(src) == 0) return;
if (separator == NULL || strlen(separator) == 0) return;
pNext = strtok(src,separator);
while(pNext != NULL)
{
*dest++ = pNext;
++count;
pNext = strtok(NULL,separator);
}
*num = count;
}
Besides, you could try boost::regex.
My current solution is the following (but it is not universal and looks like too complex).
I choose one character which couldn't appear in this string. In my case it is '|'.
string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("m"));
str = boost::join(strs, "|m");
boost::split(strs, str, boost::is_any_of("M"));
str = boost::join(strs, "|M");
if (boost::iequals(str.substr(0, 1), "|") {
str = str.substr(1);
}
boost::split(strs, str, boost::is_any_of("|"));
I add "|" before each of symbols m/M, except of the very first position in string. Then I split the string into substrings with deleting of this extra character

Parse string to be number or number and percent symbol

In C++, better without Boost library, how to make sure that the std::string str contains either a number or a number followed by '%' sign? If it does not belong to these two cases an error should be issued.
#include <iostream>
#include <string>
#include <algorithm>
#include <ctype.h>
bool is_a_bad_char(char c) {
return !(isdigit(c) || (c=='%'));
}
int main() {
std::string str = "123123%4141219";
if (std::find_if(str.begin(), str.end(), is_a_bad_char) != str.end()) {
std::cout << "error" << std::endl;
return 1;
}
return 0;
}
The easiest solution is probably to convert the string (using strtol
or strtod, depending on what type of number you expect), then look at
the following character. Something like:
(EDITED to correct error handling):
bool
isNumberOrPercent( std::string const& value )
{
char const* end;
errno = 0;
strtod( value.c_str(), &end );
return errno == 0
&& (*end = '%' ? end + 1 : end) - value.c_str() == value.size();
}
find_first_not_of with all the digits and %
If the above returns npos, then check the last character is %.
Not very C++-ish, but something like this would do it:
#include <cstdlib>
#include <cstring>
bool checkformat(const std::string &s) {
const char *begin = s.c_str();
char *end;
double val = std::strtod(begin, &end);
if (end == begin) return false;
if (*end == '%') ++end;
return (end - begin == s.size());
}
Be aware that strtod skips initial whitespace, so if you don't want to accept a string with initial whitespace then you'd need to separately reject that. It also accepts "NAN", "INF", "INFINITY" (all case-insensitive), and each of those things preceded by + or -, and in the case of "NAN" optionally followed some implementation-defined characters to indicate which NaN value it represents. Arguably "INF" is a number, but by definition "NAN" isn't, so you'd want to return false if val != val and possibly also check for infinities.
[Edit: I think I've fixed the issues James raises below, except that " " and " %" are still in dispute. And then he added overflow to the mix. Between his answer and mine, you should get the idea -- first decide how you want to treat each edge case, then code it.]