How to use wildcard for strings (matching and replacing)? - c++

I want to search for a number of letters including ? replaced by a letter matched in a string in C++.
Think of a word like abcdefgh. I want to find an algorithm to search for an input ?c for any letter replaced by ?, and finds bc, but also it should also check for ?e? and find def.
Do you have any ideas?

How about using boost::regex? or std::regex if you're using c++11 enabled compilers.

If you just want to support ?, that's pretty easy: when you encounter a ? in the pattern, just skip ahead over one byte of input (or check for isalpha, if you really meant you only want to match letters).
Edit: Assuming the more complex problem (finding a match starting at any position in the input string), you could use code something like this:
#include <string>
size_t match(std::string const &pat, std::string const &target) {
if (pat.size() > target.size())
return std::string::npos;
size_t max = target.size()-pat.size()+1;
for (size_t start =0; start < max; ++start) {
size_t pos;
for (pos=0; pos < pat.size(); ++pos)
if (pat[pos] != '?' && pat[pos] != target[start+pos])
break;
if (pos == pat.size())
return start;
}
return std::string::npos;
}
#ifdef TEST
#include <iostream>
int main() {
std::cout << match("??cd?", "aaaacdxyz") << "\n";
std::cout << match("?bc", "abc") << "\n";
std::cout << match("ab?", "abc") << "\n";
std::cout << match("ab?", "xabc") << "\n";
std::cout << match("?cd?", "cdx") << "\n";
std::cout << match("??cd?", "aaaacd") << "\n";
std::cout << match("??????", "abc") << "\n";
return 0;
}
#endif
If you only want to signal a yes/no based on whether the whole pattern matches the whole input, you do pretty much the same thing, but with the initial test for != instead of >, and then basically remove the outer loop.

Or if you insist on "wildcards" in the form you exhibit the term you want to search for is "glob"s (at least on unix-like systems).
The c-centric API is to be found in glob.h on unix-like systems, and consists of two calls glob and globfree in section 3 of the manual.
Switching to full regular expressions will allow you to use a more c++ approach as shown in the other answers.

Related

How can I remove a newline from inside a string in C++?

I am trying to take text input from the user and compare it to a list of values in a text file. The values are this:
That line at the end is the cursor, not a straight line, but it doesn't matter. Anyway, I sort by word and produce the values, then check the values. Semicolon is a separator between words. All the data is basic to get the code working first. The important thing is that all the pieces of data have newlines after them. No matter what I try, I can't get rid of the newlines completely. Looking at the ASCII values shows why, My efforts remove only the new line, but not the carriage return. This is fine most of the time, but when comparing values they won't be the same because the one with the carriage return is treated as longer. Here is the important parts of the code:
int pos = 0;
while (pos != std::string::npos)
{
std::string look = lookContents.substr(pos+1, lookContents.find("\n", pos + 1) - pos);
//look.erase(std::remove(look.begin(), look.end(), '\n'), look.end());
//##
for (int i = 0; i < look.length(); i++)
{
std::cout << (int)(look[i]) << " ";
}
std::cout << std::endl;
std::cout << look << ", " << words[1] << std::endl;
std::cout << look.compare(0,3,words[1]) << std::endl;
std::cout << pos << std::endl;
//##
//std::cout << look << std::endl;
if (look == words[1])
{
std::cout << pos << std::endl;
break;
}
pos = lookContents.find("\n", pos + 1);
}
Everything between the //## are just error checking things. Heres what is outputs when I type look b:2
As you can see, the values have the ASCII 10 and 13 at the end, which is what is used to create newlines. 13 is carriage return and 10 is newline. The last one has its 10 remove earlier in the code so the code doesn't do an extra loop on an empty substring. My efforts to remove the newline, including the commented out erase function, either only remove the 13, or remove both the 10 and 13 but corrupt later data like this:
Also, you can see that using cout to print look and words1 at the same time causes look to just not exist for some reason. Printing it by itself works fine though. I realise I could fix this by just using that compare function in the code to check all but the last characters, but this feels like a temporary fix. Any solutions?
My efforts remove only the new line, but not the carriage return
The newline and carriage control are considered control characters.
To remove all the control characters from the string, you can use std::remove_if along with std::iscntrl:
#include <cctype>
#include <algorithm>
//...
lookContents.erase(std::remove_if(lookContents.begin(), lookContents.end(),
[&](char ch)
{ return std::iscntrl(static_cast<unsigned char>(ch));}),
lookContents.end());
Once you have all the control characters removed, then you can process the string without having to check for them.

c++11 regex : check if a set of characters exist in a string

If for example, I have the string: "asdf{ asdf }",
I want to check if the string contains any character in the set []{}().
How would I go about doing this?
I'm looking for a general solution that checks if the string has the characters in the set, so that I can continue to add lookup characters in the set in the future.
Your question is unclear on whether you only want to detect if any of the characters in the search set are present in the input string, or whether you want to find all matches.
In either case, use std::regex to create the regular expression object. Because all the characters in your search set have special meanings in regular expressions, you'll need to escape all of them.
std::regex r{R"([\[\]\{\}\(\)])"};
char const *str = "asdf{ asdf }";
If you want to only detect whether at least one match was found, use std::regex_search.
std::cmatch results;
if(std::regex_search(str, results, r)) {
std::cout << "match found\n";
}
On the other hand, if you want to find all the matches, use std::regex_iterator.
std::cmatch results;
auto first = std::cregex_iterator(str, str + std::strlen(str), r);
auto last = std::cregex_iterator();
if(first != last) std::cout << "match found\n";
while(first != last) {
std::cout << (*first++).str() << '\n';
}
Live demo
I know you are asking about regex but this specific problem can be solved without it using std::string::find_first_of() which finds the position of the first character in the string(s) that is contained in a set (g):
#include <string>
#include <iostream>
int main()
{
std::string s = "asdf{ asdf }";
std::string g = "[]{}()";
// Does the string contain one of thecharacters?
if(s.find_first_of(g) != std::string::npos)
std::cout << s << " contains one of " << g << '\n';
// find the position of each occurence of the characters in the string
for(size_t pos = 0; (pos = s.find_first_of(g, pos)) != std::string::npos; ++pos)
std::cout << s << " contains " << s[pos] << " at " << pos << '\n';
}
OUTPUT:
asdf{ asdf } contains one of []{}()
asdf{ asdf } contains { at 4
asdf{ asdf } contains } at 11

How to make String::Find(is) omit this

If I have a list, which contains the 4 nodes ("this"; "test example"; "is something of"; "a small") and I want to find every string that has "is" (only 1 positive with this list). This topic has been posted a large number of times, which I have used to help get me this far. However, I can't see anywhere how I omit "this" from a positive result. I could probably use string::c_str, then find it myself, after I've reduced my much larger list. Or is there a way I could use string::find_first_of? It would seem there's a better way. Thanks.
EDIT: I know that I can omit a particular string, but I'm looking for bigger picture b/c my list is quite large (ex: poem).
for(it = phrases.begin(); it != phrases.end(); ++it)
{
found = it->find(look);
if(found != string::npos)
cout << i++ << ". " << *it << endl;
else
{
i++;
insert++;
}
}
Just to clarify: what are you struggling with?
What you want to do is check if what you have found is the start of a word (or the phrase) and is also the end of a word (or the phrase)
ie. check if:
found is equal to phrases.begin OR the element preceding found is a space
AND two elements after found is a space OR phrases.end
EDIT: You can access the character that was found by using found (replace X with the length of the string you're finding (look.length)
found = it->find(look);
if(found!=string::npos)
{
if((found==0 || it->at(found-1)==' ')
&& (found==it->length-X || it->at(found+X)==' '))
{
// Actually found it
}
} else {
// Do whatever
}
We can use boost regex for searching regular expressions. Below is an example code. Using regular expression complex seacrh patterns can be created.
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <boost/tokenizer.hpp>
using namespace boost;
using namespace std;
int main()
{
std::string list[4] = {"this","hi how r u ","is this fun is","no"};
regex ex("^is");
for(int x =0;x<4;++x)
{
string::const_iterator start, end;
boost::char_separator<char> sep(" ");
boost::tokenizer<boost::char_separator<char> > token(list[x],sep);
cout << "Search string: " << list[x] <<"\n"<< endl;
int x = 0;
for(boost::tokenizer<boost::char_separator<char> >::iterator itr = token.begin();
itr!=token.end();++itr)
{
start = (*itr).begin();
end = (*itr).end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
if(boost::regex_search(start, end, what, ex, flags))
{
++x;
cout << "Found--> " << what.str() << endl;
}
}
cout<<"found pattern "<<x <<" times."<<endl<<endl;
}
return 0;
}
Output:
Search string: this
found pattern 0 times.
Search string: hi how r u
found pattern 0 times.
Search string: is this fun is
Found--> is Found--> is found pattern 2 times.
Search string: no
found pattern 0 times.
I didn't realize you only wanted to match "is". You can do this by using an std::istringstream to tokenize it for you:
std::string term("is");
for(std::list<std::string>::const_iterator it = phrases.begin();
it != phrases.end(); ++it)
{
std::istringstream ss(*it);
std::string token;
while(ss >> token)
{
if(token == term)
std::cout << "Found " << token << "\n";
}
}

C++ code for taking substrings

I have a string as "1.0.0" and I want to extract the "1", "0", and "0". If the last zero is not present, the string must store 0 by default:
verstr.substr(0,verstr.find(".");
The above statement can find the first digit that is "1", however, I am not able to think of a solution for extracting the remainder of the string.
After this i convert it to a long as:
va = atol(verstr.substr(0,verstr.find(".")).c_str());
so i want the "1" in va , 0 in "vb" and so on
Thanks.
C++11 solution:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main(int, char **) {
string version("1.2.3");
match_results<string::const_iterator> m;
regex re("([0-9]+)\\.([0-9]+)(\\.([0-9]+))?");
if (regex_match(version, m, re)) {
int major = stoi(m[1].str()),
minor = stoi(m[2].str()),
rev = stoi(m[4].str().length() == 0 ? 0 : m[4].str());
cout << "major: " << major << endl;
cout << "minor: " << minor << endl;
cout << "rev: " << rev << endl;
} else {
cout << "no match\n";
}
}
The regular expression used is ([0-9]+)\.([0-9]+)(\.([0-9]+))? and breaks down as follows:
[0-9]+ matches one or more digits
\. matches a literal dot.
? following the last expression indicates that it is optional
Expressions wrapped in ( and ) are capture groups. There are five capture groups in this expression:
0 - always matches the entire string - we don't use this.
1 - matches the major version number.
2 - matches the minor version number.
3 - matches a dot followed by the revision number - we don't use this but it is necessary because we use the parentheses followed by a ? to make this whole group optional.
4 - matches the revision number.
Not sure if I understand what you need, if you want to retrieve the digits as strings, with a minimum of x digits, you can do something like this.
vector<string> GetVersion(const string &strInput, int iMinSize)
{
vector<string> vRetValue;
std::stringstream ss(strInput);
string strItem;
while(std::getline(ss, strItem, '.'))
vRetValue.push_back(strItem);
while(vRetValue.size() < iMinSize)
vRetValue.push_back("0");
return vRetValue;
}
int _tmain(int argc, _TCHAR* argv[])
{
vector<string> vRetValue = GetVersion("1.0", 3);
return 0;
}
A possibility would to use std::sscanf(). It is simple to use and provides a level of error checking with relatively few lines of code:
#include <iostream>
#include <string>
#include <cstdio>
int main()
{
std::string input[] = { "1.0.7", "1.0.", "1.0", "1.", "1" };
for (size_t i = 0; i < sizeof(input)/sizeof(input[0]); i++)
{
std::cout << input[i] << ": ";
// Init to zero.
int parts[3] = { 0 };
// sscanf() returns number of assignments made.
if (std::sscanf(input[i].c_str(),
"%d.%d.%d",
&parts[0],
&parts[1],
&parts[2]) >= 2)
{
// OK, the string contained at least two digits.
std::cout << parts[0]
<< ","
<< parts[1]
<< ","
<< parts[2]
<< "\n";
}
else
{
std::cout << "bad format\n";
}
}
return 0;
}
Output:
1.0.7: 1,0,7
1.0.: 1,0,0
1.0: 1,0,0
1.: bad format
1: bad format
See online demo: http://ideone.com/0Ox9b .
find and substr are two really nice family of function overloads that are pretty well suited to many simple parsing problems, especially when your syntax checking only needs to be loose.
To extract multiple scalars out of your version vector, store the found index somewhere:
const auto a = verstr.find('.');
const std::string major = verstr.substr(0, a);
Then re-use it with one of the overloads of string::find, saying start searching at one after a:
const auto b = verstr.find ('.', a+1);
const std::string minor = verstr.substr(a+1, b);
And so forth.
If you need a syntax check, compare the returned indices against string::npos:
const auto a = verstr.find('.');
if (std::string::npos == a)
.... bad syntax ....
Pastebin style version of this answer:
#include <string>
#include <stdexcept>
#include <iostream>
struct Version
{
std::string Major, Minor, Patch;
Version(std::string const &Major)
: Major(Major), Minor("0"), Patch("0")
{}
Version(std::string const &Major, std::string const &Minor)
: Major(Major), Minor(Minor), Patch("0")
{}
Version(std::string const &Major, std::string const &Minor, std::string const &Patch)
: Major(Major), Minor(Minor), Patch(Patch)
{}
};
std::ostream& operator<< (std::ostream &os, Version const &v)
{
return os << v.Major << '.' << v.Minor << '.' << v.Patch;
}
Version parse (std::string const &verstr) {
if (verstr.empty()) throw std::invalid_argument("bad syntax");
const auto first_dot = verstr.find('.');
if (first_dot == std::string::npos)
return Version(verstr);
const auto second_dot = verstr.find('.', first_dot+1);
if (second_dot == std::string::npos)
return Version(verstr.substr(0, first_dot),
verstr.substr(first_dot+1, second_dot));
return Version(verstr.substr(0, first_dot),
verstr.substr(first_dot+1, second_dot),
verstr.substr(second_dot+1));
}
and then
int main () {
std::cout << parse("1.0") << '\n'
<< parse("1.0.4+Patches(55,322)") << '\n'
<< parse("1") << '\n';
parse(""); // expected to throw
}
try something like this instead of solution below the line
string s = "1.0.0";
string delimiters = ".";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = s.find_first_of( delimiters, current );
string current_substring = s.substr( current, next - current ); // here you have the substring
}
while (next != string::npos);
Ok, please don't use this solution below, if you really don't know what you're doing, according to discussion below this answer with #DavidSchwartz
Take a look at function strtok http://www.cplusplus.com/reference/clibrary/cstring/strtok/
char str[] = "1.0.0";
char * pch;
pch = strtok (str,".");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ".");
}
Take a look at Boost libraries, specifically String Algo.
Standard library support for string manipulation is somewhat limited in C++. And reinventing the wheel is just plain bad.
Update:
I was asked in comments why I consider all find/substr based solutions bad style.
I'll try my best.
As questions does not states otherwise, performance is not a question here. Maintainability and readability are much more important. All solutions proposed here tightly tie split algorithm semantics with a specific version parsing algorithm semantics. This hurts both.
This hurts maintainability, because when you will need to change version format, it will involve changing the very same block of code that implements splitting, making it more error-prone. Same applies to unit-tests.
This hurts readability, because due to mixed semantics I can't at once guess an intent behind this block of code. For example, when I am looking up parse algorithm to check how missing 3d version argument is handled, I'd better not waste my time digging through split implementation details.
If parsing pattern would have been slightly more difficult, I'd have advised regular expressions. But in this case splitting string by a delimiter is an action generic and often used enough to justify having it as a separate function.
if it's only simple char comparison in a small string...
char[] should not be so bad... and c functions should work... (EDIT: for some, its a blasphemy... a lot of C++ method use char* whether it's const or not).
why use an object if it's to has the same functionality with more memory to be used, and more time for the process to spend?
EDIT:
I saw that some answer suppose to create a lot of string object... i don't khnow if it's really the best way...
a little 2 line recursive C like function can do that without gasping a lot.
In c++ code I probably would do that with string object, as it's negligible gasp... but just to say it so.
In string object i would use the length property to get the last char first (with [] operator, or appropriate method).
then just need to get the two elements (in a loop, or with 2 back reference in an object accepting regex (which is less efficient))

How to check if a string contains spaces/tabs/new lines (anything that's blank)?

I know there's an "isspace" function that checks for spaces, but that would require me to iterate through every character in the string, which can be bad on performance since this would be called a lot. Is there a fast way to check if a std::string contains only spaces?
ex:
function(" ") // returns true
function(" 4 ") // returns false
One solution I've thought of is to use regex, then i'll know that it only contains whitespace if it's false... but i'm not sure if this would be more efficient than the isspace function.
regex: [\w\W] //checks for any word character(a,b,c..) and non-word character([,],..)
thanks in advance!
With a regular string, the best you can do will be of the form:
return string::find_first_not_of("\t\n ") == string::npos;
This will be O(n) in the worst case, but without knowing else about the string, this will be the best you can do.
Any method would, of necessity, need to look at each character of the string. A loop that calls isspace() on each character is pretty efficient. If isspace() is inlined by the compiler, then this would be darn near optimal.
The loop should, of course, abort as soon as a non-space character is seen.
You are making the assumption regex doesnt iterate over the string. Regex is probably much heavier than a linear search since it might build a FSM and traverse based on that.
The only way you could speed it up further and make it a near-constant time operation is to amortize the cost by iterating on every update to the string and caching a bool/bit that tracks if there is a space-like character, returning that value if no changes have been made since, and updating that bit whenever you do a write operation to that string. However, this sacrifices/slows that speed of modifying operations in order to increase the speed of your custom has_space().
For what it's worth, a locale has a function (scan_is) to do things like this:
#include <locale>
#include <iostream>
#include <iomanip>
int main() {
std::string inputs[] = {
"all lower",
"including a space"
};
std::locale loc(std::locale::classic());
std::ctype_base::mask m = std::ctype_base::space;
for (int i=0; i<2; i++) {
char const *pos;
char const *b = &*inputs[i].begin();
char const *e = &*inputs[i].end();
std::cout << "Input: " << std::setw(20) << inputs[i] << ":\t";
if ((pos=std::use_facet<std::ctype<char> >(loc).scan_is(m, b, e)) == e)
std::cout << "No space character\n";
else
std::cout << "First space character at position " << pos - b << "\n";
}
return 0;
}
It's probably open to (a lot of) question whether this gives much (if any) real advantage over using isspace in a loop (or using std::find_if).
You can also use find_first_not_of if you all the characters to be in a given list.
Then you can avoid loops.
Here is an example
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string str1=" ";
string str2=" u ";
bool ContainsNotBlank1=(str1.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank2=(str2.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank3=(str2.find_first_not_of("\t\n u")==string::npos);
cout << ContainsNotBlank1 <<endl;
cout << ContainsNotBlank2 <<endl;
cout << ContainsNotBlank3 <<endl;
return 0;
}
Output:
1: because only blanks and a tab
0: because u is not into the list "\t\n "
1: because str2 contains blanks, tabs and a u.
Hope it helps
Tell me if you have any questions