QRegexp Missing Digits - regex

We are all stumped on this one:
QRegExp kcc_stationing("(-)?(\\d+)\\.(\\d+)[^a-zA-Z]");
QString str;
if (kcc_stationing.indexIn(description) > -1)
{
str = kcc_stationing.cap(1) + kcc_stationing.cap(2) + "." + kcc_stationing.cap(3);
qDebug() << kcc_stationing.cap(1);
qDebug() << kcc_stationing.cap(2);
qDebug() << kcc_stationing.cap(3);
qDebug() << "Description: " << description;
qDebug() << "Returned Stationing string: " << str;
}
Running this code on "1082.006":
Note the missing "6"
After some just blind guessing, we removed [^a-zA-Z] and got the correct answer. We added this originally so that we would reject any number with other characters directly attached without spaces.
For example: 10.05D should be rejected.
Can anyone explain why this extra piece was causing us to lose that last "6"?

The [^a-zA-Z] is a character class. Character classes match one character. It will not match the end of a string, since there is no character there.
To get that result, the engine will match all the numbers with the \\d+, including the last one. It will then need to backtrack in order for the last character class to be satisfied.
I think you want to allow zero-width match (specifically when it's the end of the string). In your case, it would be easiest to use:
(-)?(\\d+)\\.(\\d+)([^a-zA-Z]|$)
Or, if Qt supports non-capturing groups:
(-)?(\\d+)\\.(\\d+)(?:[^a-zA-Z]|$)
Note that I also recommend using [.] instead of \\., since I feel it improves readability.

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

How to split a string from a vector [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 6 years ago.
My homework is as follows:
Step Two - Create a file called connections.txt with a format like:
Kelp-SeaUrchins
Kelp-SmallFishes
Read these names in from a file and split each string into two (org1,org2). Test your work just through printing for now. For example:
cout << “pair = “ << org1 << “ , “ << org2 << endl;
I am not sure how to split the string, that is stored in a vector, using the hyphen as the token to split it. I was instructed to either create my own function, something like int ind(vector(string)orgs, string animal) { return index of animal in orgs.} or use the find function.
Here is one approach...
Open the file:
ifstream file{ "connections.txt", ios_base::in };
if (!file) throw std::exception("failed to open file");
Read all the lines:
vector<string> lines;
for (string line; file >> line;)
lines.push_back(line);
You can use the regex library from C++11:
regex pat{ R"(([A-Za-z0-9]+)-([A-Za-z0-9]+))" };
for (auto& line : lines) {
smatch matches;
if (regex_match(line, matches, pat))
cout << "pair = " << matches[1] << ", " << matches[2] << endl;
}
You will have to come up with the pattern according to your needs.
Here it will try to match for "at least one alphanumeric" then a - then "at least one alphanumeric".
matches[0] will contain the whole matched string.
matches[1] will contain first alphanumeric part, i.e. your org1
matches[2] will contain second alphanumeric part, i.e. your org2
(You can take them in variables org1 and org2 if you want.)
If org1 and org2 don't contain any white spaces, then you can use another trick.
In every line, you can replace - with a blank.(std::replace).
Then simply use stringstreams to get your tokens.
Side note: This is just to help you. You should do your homework on your own.

Determining the location of C++11 regular expression matches

How do I efficiently determine the location of a capture group inside a searched string? Getting the location of the entire match is easy, but I see no obvious ways to get at capture groups beyond the first.
This is a simplified example, lets presume "a*" and "b*" are complicated regexes that are expensive to run.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main()
{
regex matcher("a*(needle)b*");
smatch findings;
string haystack("aaaaaaaaneedlebbbbbbbbbbbbbb");
if( regex_match(haystack, findings, matcher) )
{
// What do I put here to know how the offset of "needle" in the
// string haystack?
// This is the position of the entire, which is
// always 0 with regex_match, with regex_search
cout << "smatch::position - " << findings.position() << endl;
// Is this just a string or what? Are there member functions
// That can be called?
cout << "Needle - " << findings[1] << endl;
}
return 0;
}
If it helps I built this question in Coliru: http://coliru.stacked-crooked.com/a/885a6b694d32d9b5
I will not mark this as and answer until 72 hours have passed and no better answers are present.
Before asking this I presumed smatch::position took no arguments I cared about, because when I read the cppreference page the "sub" parameter was not obviously an index into the container of matches. I thought it had something to do with "sub"strings and the offset value of the whole match.
So my answer is:
cout << "Needle Position- " << findings.position(1) << endl;
Any explanation on this design, or other issues my line of thinking may have caused would be appreciated.
According to the documentation, you can access the iterator pointing to the beginning and the end of the captured text via match[n].first and match[n].second. To get the start and end indices, just do pointer arithmetic with haystack.begin().
if (findings[1].matched) {
cout << "[" << findings[1].first - haystack.begin() << "-"
<< findings[1].second - haystack.begin() << "] "
<< findings[1] << endl;
}
Except for the main match (index 0), capturing groups may or may not capture anything. In such cases, first and second will point to the end of the string.
I also demonstrate the matched property of sub_match object. While it's unnecessary in this case, in general, if you want to print out the indices of the capturing groups, it's necessary to check whether the capturing group matches anything first.

regex to find anything except "this String"

I need a regex to find anything except this String
example data is:
this is one line with this string only this should not match
this is another line but has no string in
this String is another
and then this line should match also
I want to find and highlight the entire line, so the line
this is one line with this string is the only one that will be selected.
I tried ^(?!(this String)$) but this find zero length match, so not much help, I tried adding .* in various places but don't understand how to do this.
Chances are that you don't need regex for this task at all. Look at these two pieces of code:
std::regex pattern("^((?!this string).)*$");
while (std::getline(infile, chkme)) {
if (std::regex_match(chkme,pattern)) {
std::cout << "Not found! " << chkme;
}}
and
std::string pattern = "this string";
while (std::getline(infile, chkme)) {
if (chkme.find(pattern) == string::npos) {
std::cout << "Not found! " << chkme;
}}

PCRECPP (pcre) extract hostname from url code problem

I have this simple piece of code in c++:
int main(void)
{
string text = "http://www.amazon.com";
string a,b,c,d,e,f;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f))
{
std::cout << "match: " << f << "\n";
// should print "www.amazon.com"
}else{
std::cout << "no match. \n";
}
return 0;
}
When I run this it doesn't find a match.
I pretty sure that the regex pattern is correct and my code is what's wrong.
If anyone familiar with pcrecpp can take a look at this Ill be grateful.
EDIT:
Thanks to Dingo, it works great.
another issue I had is that the result was at the sixth place - "f".
I edited the code above so you can copy/paste if you wish.
The problem is that your code contains ??( which is a trigraph in C++ for [. You'll either need to disable trigraphs or do something to break them up like:
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??" "([^#]+)?#?(\\w*)");
Please do
cout << re.pattern() << endl;
to double-check that all your double-slashing is done right (and also post the result).
Looks like
^((\w+):///?)?((\w+):?(\w+)?#)?([^/\?:]+):?(\d+)?(/?[^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)
The hostname isn't going to be returned from the first capture group, why are you using parentheses around for example \w+ that you aren't wanting to capture?