Filter out url from string

Filter out url from string - c++

Im trying to filter out urls from a string that contains lots of special characters, blank space and urls. I have tried to use regex but it fails, it manages sometimes to line up the url but the output still contains special characters and blank space, so here I am. Best regards P
string str;
std::ifstream in("c:/Users/Petrus/Documents/History", std::ios::binary);
std::stringstream buffer;
if (!in.is_open()){
cout << "Failed to open" << endl;
}
else{
cout << "Opened OK" << endl;
}
buffer << in.rdbuf();
std::string contents(buffer.str());
std::ofstream out("urls.txt");
unsigned counter = 0;
std::regex word_regex(
R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)",
std::regex::extended
);
auto words_begin = std::sregex_iterator(contents.begin(), contents.end(), word_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = match.str();
for (const auto& res : match) {
counter++;
std::cout << counter++ << ": " << res << std::endl;
}
std::cout << " " << match_str << '\n';
}
system("PAUSE");
return 0;
}

a few steps to simplify (and debug) the regex:
use named groups (?<groupname>regex) to help identify what's what and access results.
for 'grouping only' ()'s, use (?:regex) to "not remember" captures, also helps clarify what's going on
once done, just a few tweaks "fixes" this regex for all your inputs:
(?<protocol>https?:\/\/)(?:(?<urlroot>[^\/?#\n\s]+))?(?<urlResource>[^?#\n\s]+)?(?<queryString>\?(?:[^#\n\s]*))?(?:#(?<fragment>[^\n\s]))?
I changed the negated char classes to not match newlines or spaces: [^#\n\s]
specified that any segment after urlRoot is optional.
added the string "https?" to limit results to valid urls
regex demo output:
and the match groups (truncated but all there):

Related

Regex character class subtraction in C++

I'm writing a C++ program that will need to take regular expressions that are defined in a XML Schema file and use them to validate XML data. The problem is, the flavor of regular expressions used by XML Schemas does not seem to be directly supported in C++.
For example, there are a couple special character classes \i and \c that are not defined by default and also the XML Schema regex language supports something called "character class subtraction" that does not seem to be supported in C++.
Allowing the use of the \i and \c special character classes is pretty simple, I can just look for "\i" or "\c" in the regular expression and replace them with their expanded versions, but getting character class subtraction to work is a much more daunting problem...
For example, this regular expression that is valid in an XML Schema definition throws an exception in C++ saying it has unbalanced square brackets.
#include <iostream>
#include <regex>
int main()
{
try
{
// Match any lowercase letter that is not a vowel
std::regex rx("[a-z-[aeiuo]]");
}
catch (const std::regex_error& ex)
{
std::cout << ex.what() << std::endl;
}
}
How can I get C++ to recognize character class subtraction within a regex? Or even better, is there a way to just use the XML Schema flavor of regular expressions directly within C++?

Character ranges subtraction or intersection is not available in any of the grammars supported by std::regex, so you will have to rewrite the expression into one of the supported ones.
The easiest way is to perform the subtraction yourself and pass the set to std::regex, for instance [bcdfghjklvmnpqrstvwxyz] for your example.
Another solution is to find either a more featureful regular expression engine or a dedicated XML library that supports XML Schema and its regular expression language.

Starting from the cppreference examples
#include <iostream>
#include <regex>
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
// greedy match, repeats [a-z] 4 times
show_matches("abcdefghi", "(?:(?![aeiou])[a-z]){2,4}");
}
You can test and check the the details of the regular expression here.
The choice of using a non capturing group (?: ...) is to prevent it from changing your groups in case you will use it in a bigger regular expression.
(?![aeiou]) will match without consuming the input if finds a character not matching [aeiou], the [a-z] will match letters. Combining these two condition is equivalent to your character class subtraction.
The {2,4} is a quantifier that says from 2 to 4, could also be + for one or more, * for zero or more.
Edit
Reading the comments in the other answer I understand that you want to support XMLSchema.
The next program shows how to use ECMA regular expression to translate the "character class differences" to a ECMA compatible format.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
std::string translated_regex(const std::string &pattern){
// pattern to identify character class subtraction
std::regex class_subtraction_re(
"\\[((?:\\\\[\\[\\]]|[^[\\]])*)-\\[((?:\\\\[\\[\\]]|[^[\\]])*)\\]\\]"
);
// translate the regular expression to ECMA compatible
std::string translated = std::regex_replace(pattern,
class_subtraction_re, "(?:(?![$2])[$1])");
return translated;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
std::string re = translated_regex("[a-z-[aeiou]]{2,4}");
show_matches("abcdefghi", re);
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translated_regex(test) << '\n';
}
return 0;
}
Edit: Recursive and Named character classes
The above approach does not work with recursive character class negation. And there is no way to deal with recursive substitutions using only regular expressions. This rendered the solution far less straight forward.
The solution has the following levels
one function scans the regular expression for a [
when a [ is found there is a function to handle the character classes recursively when '-[` is found.
The pattern \p{xxxxx} is handled separately to identify named character patterns. The named classes are defined in the specialCharClass map, I fill two examples.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
#include <map>
std::map<std::string, std::string> specialCharClass = {
{"IsDigit", "0-9"},
{"IsBasicLatin", "a-zA-Z"}
// Feel free to add the character classes you want
};
const std::string getCharClassByName(const std::string &pattern, size_t &pos){
std::string key;
while(++pos < pattern.size() && pattern[pos] != '}'){
key += pattern[pos];
}
++pos;
return specialCharClass[key];
}
std::string translate_char_class(const std::string &pattern, size_t &pos){
std::string positive;
std::string negative;
if(pattern[pos] != '['){
return "";
}
++pos;
while(pos < pattern.size()){
if(pattern[pos] == ']'){
++pos;
if(negative.size() != 0){
return "(?:(?!" + negative + ")[" + positive + "])";
}else{
return "[" + positive + "]";
}
}else if(pattern[pos] == '\\'){
if(pos + 3 < pattern.size() && pattern[pos+1] == 'p'){
positive += getCharClassByName(pattern, pos += 2);
}else{
positive += pattern[pos++];
positive += pattern[pos++];
}
}else if(pattern[pos] == '-' && pos + 1 < pattern.size() && pattern[pos+1] == '['){
if(negative.size() == 0){
negative = translate_char_class(pattern, ++pos);
}else{
negative += '|';
negative = translate_char_class(pattern, ++pos);
}
}else{
positive += pattern[pos++];
}
}
return '[' + positive; // there is an error pass, forward it
}
std::string translate_regex(const std::string &pattern, size_t pos = 0){
std::string r;
while(pos < pattern.size()){
if(pattern[pos] == '\\'){
r += pattern[pos++];
r += pattern[pos++];
}else if(pattern[pos] == '['){
r += translate_char_class(pattern, pos);
}else{
r += pattern[pos++];
}
}
return r;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"[a]",
"[a-z]d",
"[\\p{IsBasicLatin}-[\\p{IsDigit}-[89]]]",
"[a-z-[aeiou]]{2,4}",
"[a-z-[aeiou-[e]]]",
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translate_regex(test) << '\n';
// Construct a reegx (validate syntax)
std::regex(translate_regex(test));
}
std::string re = translate_regex("[a-z-[aeiou-[e]]]{2,10}");
show_matches("abcdefghi", re);
return 0;
}

Try using a library function from a library with XPath support, like xmlregexp in libxml (is a C library), it can handle the XML regexes and apply them to the XML directly
http://www.xmlsoft.org/html/libxml-xmlregexp.html#xmlRegexp
----> http://web.mit.edu/outland/share/doc/libxml2-2.4.30/html/libxml-xmlregexp.html <----
An alternative could have been PugiXML (C++ library, What XML parser should I use in C++? ) however i think it does not implement the XML regex functionality ...

Okay after going through the other answers I tried out a few different things and ended up using the xmlRegexp functionality from libxml2.
The xmlRegexp related functions are very poorly documented so I figured I would post an example here because others may find it useful:
#include <iostream>
#include <libxml/xmlregexp.h>
int main()
{
LIBXML_TEST_VERSION;
xmlChar* str = xmlCharStrdup("bcdfg");
xmlChar* pattern = xmlCharStrdup("[a-z-[aeiou]]+");
xmlRegexp* regex = xmlRegexpCompile(pattern);
if (xmlRegexpExec(regex, str) == 1)
{
std::cout << "Match!" << std::endl;
}
free(regex);
free(pattern);
free(str);
}
Output:
Match!
I also attempted to use the XMLString::patternMatch from the Xerces-C++ library but it didn't seem to use an XML Schema compliant regex engine underneath. (Honestly I have no clue what regex engine it uses underneath and the documentation for that was pretty abysmal and I couldn't find any examples online so I just gave up on it.)

regex_match giving unexpected results

I'm trying to write a recursive descent parser and am trying to search a match a regex within a string inputted by the user. I am trying to do the following to try to understand the <regex> library offered by C++11, but I'm getting unexpected results.
std::string expression = "2+2+2";
std::regex re("[-+*/()]");
std::smatch m;
std::cout << "My expression is " << expression << std::endl;
if(std::regex_search(expression, re)) {
std::cout << "Found a match!" << std::endl;
}
std::regex_match(expression, m, re);
std::cout << "matches:" << std::endl;
for (auto it = m.begin(); it!=m.end(); ++it) {
std::cout << *it << std::endl;
}
So based on my regular expression, I expect it to output
Found a match!
matches:
+
+
However, the output I get is:
My expression is 2+2+2
Found a match!
matches:
I feel like I'm making a stupid mistake, but I can't seem to figure out why there's a discrepancy between the outputs.
Thanks,
erip

You've got a few issues. First, let's look at some working code:
#include <regex>
#include <iostream>
int main() {
std::string expr = "2+2+2";
std::regex re("[+\\-*/()]");
const auto operators_begin = std::sregex_iterator(expr.begin(), expr.end(), re);
const auto operators_end = std::sregex_iterator();
std::cout << "Count: " << std::distance(operators_begin, operators_end) << "\n";
for (auto i = operators_begin; i != operators_end; ++i) {
std::smatch match = *i;
std::cout << match.str() << "\n";
}
}
Output:
Count: 2
+
+
Issues with your code:
regex_match() returns false.
You don't have any capturing groups in the regular expression. So even if regex_match() returned true, it wouldn't capture anything.
The number of captures in a regex_match can be determined strictly by looking at the regular expression. So my re will capture exactly one group.
But we want to apply this regular expression on our string multiple times, because we want to find all of the matches. The tool for that is regex_iterator.
We also needed to escape the - in the regular expression. The minus has a special meaning within character classes.

I found the regex_iterator, which solved the problem. Here's the working code:
std::regex re("[-+*/()]");
std::smatch m;
std::cout << "My expression is " << expression << std::endl;
if(std::regex_search(expression, re)) {
std::cout << "Found a match!" << std::endl;
}
try {
std::sregex_iterator next(expression.begin(), expression.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".

First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.

The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.

If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);

Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);

You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.

string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)

Constructing boost regex

I want to match every single number in the following string:
-0.237522264173E+01 0.110011117918E+01 0.563118085683E-01 0.540571836345E-01 -0.237680494785E+01 0.109394729137E+01 -0.237680494785E+01 0.109394729137E+01 0.392277532367E+02 0.478587433035E+02
However, for some reason the following boost::regex doesn't work:
(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)
What's wrong with it?
EDIT: posting relevant code:
std::ifstream plik("chains/peak-summary.txt");
std::string mystr((std::istreambuf_iterator<char>(plik)), std::istreambuf_iterator<char>());
plik.close();
boost::cmatch what;
boost::regex expression("(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)");
std::cout << "String to match against: \"" << mystr << "\"" << std::endl;
if(regex_match(mystr.c_str(), what, expression))
{
std::cout << "Match!";
std::cout << std::endl << what[0] << std::endl << what[1] << std::endl;
} else {
std::cout << "No match." << std::endl;
}
output:
String to match against: " -0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02
"
No match.
Also posting the contents of file read into the string:
[dare2be#schroedinger multinest-peak]$ cat chains/peak-summary.txt
-0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02

The (.*) around your regex match and consume all text at the start and end of the string, so if there are more than ten numbers, the first ones won't be matched.
Also, you're not allowing for negative exponents.
(-?\\d\\.\\d+E[+-]\\d+ *){10,}
should work.
This will match all of the numbers in a single string; if you want to match each number separately, you have to use (-?\\d\\.\\d+E[+-]\\d+) iteratively.

Try with:
(-?[0-9]+\\.[0-9]+E[+-][0-9]+)
Your (.*) in the beggining matches greedy whole string.

Extract IP address from a string using boost regex?

I was wondering if anyone can help me, I've been looking around for regex examples but I still can't get my head over it.
The strings look like this:
"User JaneDoe, IP: 12.34.56.78"
"User JohnDoe, IP: 34.56.78.90"
How would I go about to make an expression that matches the above strings?

The question is how exactly do you want to match these, and what else do you want to exclude?
It's trivial (but rarely useful) to match any incoming string with a simple .*.
To match these more exactly (and add the possibility of extracting things like the user name and/or IP), you could use something like: "User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})". Depending on your input, this might still run into a problem with a name that includes a comma (e.g., "John James, Jr."). If you have to allow for that, it gets quite a bit uglier in a hurry.
Edit: Here's a bit of code to test/demonstrate the regex above. At the moment, this is using the C++0x regex class(es) -- to use Boost, you'll need to change the namespaces a bit (but I believe that should be about all).
#include <regex>
#include <iostream>
void show_match(std::string const &s, std::regex const &r) {
std::smatch match;
if (std::regex_search(s, match, r))
std::cout << "User Name: \"" << match[1]
<< "\", IP Address: \"" << match[2] << "\"\n";
else
std::cerr << s << "did not match\n";
}
int main(){
std::string inputs[] = {
std::string("User JaneDoe, IP: 12.34.56.78"),
std::string("User JohnDoe, IP: 34.56.78.90")
};
std::regex pattern("User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})");
for (int i=0; i<2; i++)
show_match(inputs[i], pattern);
return 0;
}
This prints out the user name and IP address, but in (barely) enough different format to make it clear that it's matching and printing out individual pieces, not just passing entire strings through.

#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main() {
std::string text = "User JohnDoe, IP: 121.1.55.86";
boost::regex expr ("User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})");
boost::smatch matches;
try
{
if (boost::regex_match(text, matches, expr)) {
std::cout << matches.size() << std::endl;
for (int i = 1; i < matches.size(); i++) {
std::string match (matches[i].first, matches[i].second);
std::cout << "matches[" << i << "] = " << match << std::endl;
}
}
else {
std::cout << "\"" << expr << "\" does not match \"" << text << "\". matches size(" << matches.size() << ")" << std::endl;
}
}
catch (boost::regex_error& e)
{
std::cout << "Error: " << e.what() << std::endl;
}
return 0;
}
Edited: Fixed missing comma in string, pointed out by Davka, and changed cmatch to smatch

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Filter out url from string - c++

Related

Regex character class subtraction in C++

regex_match giving unexpected results

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

Constructing boost regex

Extract IP address from a string using boost regex?

Categories

Resources