boost regex match non-whitespace and angle brackets - regex

I may be asking a duplicate question, but I've spent a couple of hours googling this to no avail!
I'm trying to extract a string from some SIP URLs parsed by a program I'm working on. Here's an excerpt of the code. I'm passing in sipUrl, and have all the right includes etc:
static const boost::regex sipRegExp ("(sip:\\S+?#(?=\\S)[^>]+);");
boost::cmatch result;
boost::match_results<string::const_iterator> results;
boost::match_flag_type flags = boost::format_perl;
string newSipUrl;
cout << sipUrl << endl;
bool toggle = boost::regex_search(sipUrl, result, sipRegExp, flags);
if (toggle) {
cout << result[1].str() << endl;
newSipUrl = result[1].str();
}
cout << "new url: " << newSipUrl << endl;
I'm basically trying to extract the sip:user#IP from strings like "\"alex#192.168.1.2\"<sip:alex#192.168.1.2>;tag=fe310852" or "\"bob\"<sip:bob#foo.com>;", however, I can't get it to match! It worked fine when I wasn't using lookahead to try and remove the last angle bracket, but ever since then it fails to match.
Posting this just before running out of the door, so it may need more info. If anyone can spot something glaringly obvious, then that'd be a great help! And please feel free to point me at links that I might have missed!

Have you tried something simpler such as regex against:
`sip:[a-zA-Z]*#[0-9a-zA-Z.]*`
works on terminal but haven't tried it through boost yet. If you start of with something simple then add bit by bit to make it more specific then it will be easier to track which part of the regex isn't working.

You missed the > before the semicolon:
"(sip:\\S+?#(?=\\S)[^>]+)>;"
Although actually you probably don't need the semicolon at all. Something like Scott's answer should be sufficient.

I ended up going with a modification of #David Knipe's comment - the winning regex was:
sip:\\S+#[^\\s>;]+
Which matches with or without angle brackets, up to the colon. Both answers provided did work, but being able to remove the lookahead was quite nice. I also went with the + modifiers to make some effort to find a valid URI and not a blank one.
Thanks for the help!

Related

Regex Python Problems with MS

i am a newbe to regex and cant find the solution. I have searched like 3 hours for a solution...
I have the text
HELLO MS. I HOPE YOU HAVE NO PROBLEMS.
And i want to get the Result:
HELLO MISTRESS I HOPE YOU HAVE NO PROBLEMS.
But my code replace also the "MS." from Problems.
re.sub(r'(MS)+[.]', 'MISTRESS', text)
Thanks for your help.
Using Python 3.5.
Well an immediate fix here would be to place a lookbehind before MS. to assert that whitespace precedes:
text = "HELLO MS. I HOPE YOU HAVE NO PROBLEMS."
output = re.sub(r'(?<!\S)(MS)+[.]', 'MISTRESS', text)
print(output)
However, for a more general solution, we might need to better understand the grammar behind which contexts should be replaced and which should not.
Another way without regex using simple replace(),
dictionary = {"MR.":"MISTER", "MS.":"MISTRESS" }
main_string = "HELLO MS. I HOPE YOU HAVE NO PROBLEMS WITH MR. X."
for key in dictionary.keys():
main_string = main_string.replace(key, dictionary[key])
print(main_string)

Parsing /etc/passwd with regex_*, unstandard behavior C++ [duplicate]

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 7 years ago.
Let's assume I have this line in my etc/passwd:
xuser01:*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh
I browse the file by lines.
From parameters I get usernames/userids that tells me which lines I should store into variable.
Using both regex_match and regex_search I got no results, while when I was testing it on online regex testers, it work like hell. Any idea why this is not working?
regExpr = "^(xuser01|xuser02)+:((.*):?)+";
if(regex_search(line, regex(regExpr)))
{
cout << "Boom I got you!" << endl;
}
line contains line read at the moment, it loops through the whole file, and doesn't find the string. I used regex_match too, same results.
Different regular expressions I tried: (xuser01|xuser02)+ and similar, designed to be almost 100% sure match (but still what I need to match), neither of it works in my C++ program, on online regex testers it does.
Advices?
Thanks in advance!
It looks like the quantifier + is preventing C++ from getting your matches. I think it is redundant in your regex since you only have a unique number of "xuser"s in your string.
This code works alright, gets to the cout line:
string line( "xuser01:*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh" );
regex regExpr("^(xuser01|xuser02):((.*):?)");
if(regex_search(line, regExpr))
{
cout << "Boom I got you!" << endl;
}
However, you did not indicate what you are looking for. Currently, it will only match 3 groups:
xuser01
*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh
*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh

How to use string::erase (it adds garbage)

I hope someone is able to help me with this.
I have some code, where I have a string variable data. data contains always something like this: "'401454654". It is always a ' with a number in the back. I want to remove the ' in the front. It is also possible, that data is an empty string. My current solution looks like this:
string data = /* ... */;
if(!data.empty())
data.erase(data.begin());
else
cout << "Error in line ...." << endl;
The interesting thing is, that I always get the correct string with only the number, or an empty string. But sometimes I get some weird characters plus the original '401454654 back. I really do not know, what the cause of this is.
Tested on g++ 4.6 and g++ 4.9 linaro on both windows and linux. Always the exact same result. I hope someone can give me an advice.
Sorry for the late answer. I solved it by myself. I actually do not know the bug in the implementation, which I uploaded on gist, but I implemented it a second time in a far better way(For the first one I had only a few minutes time to implement it). Thanks for the help.
You might be seeing this bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60278
It's interesting though, I was able to get this to compile in gcc 4.8.1:
data.erase(data.begin());
If it is the bug you can just implement the code that it's doing under the hood:
copy(next(data.begin()), data.end(), data.begin());
data.pop_back();

(Homework) Easiest way to format a table-like list and a logic issue?

So, my teacher gave us a document for an assignment due this Tuesday. Unfortunately its not very clear. I have a strong grasp of the basics of programming, but not with C++. Here are my questions.
1.) Right now I clear the screen with system("cls"), and I print the menu screen with spaces and \n for formatting. The doc says to look up something called stdlib.h and a clrscr() function and how it can be used for clearing lines i.e. clrscr(4).....
I found nothing on google, do you guys know what he's talking about?
2.) What is the easiest way to format a table-like list in C++? Example of what I am trying to achieve here:
The way it outputs each line is in 3 different cout's, first one with t: x and the 1st number, second one tacking on the 2nd number to the right, and third one tacking on the last number and endl. This will then loop until some parameter is met.
3.) Is my logic above sound? The problem is, I do not understand the assignment doc he provided, and my e-mails remain unanswered. So I've tried to just do it as intuitively as I can and thats what I came up with. Here is the snippet from the doc that I don't get:
I know its kind of an intricate issue I'm having so if you would like some more context for the last screenshot please let me know.
Any help is appreciated, thanks!
1.) Right now I clear the screen with system("cls"), and I print the menu screen with spaces and \n for formatting. The doc says to look up something called stdlib.h and a clrscr() function and how it can be used for clearing lines i.e. clrscr(4).....
try using cplusplus.com, it is awesome and will answer a lot of your questions.
http://www.cplusplus.com/reference/cstdlib/?kw=stdlib.h
2.) What is the easiest way to format a table-like list in C++?
Well personally, I think using the following function:
setw()
would be the best way to go about making a chart like that.
I feel this is better than just doing "\t" or " ",
because it will do work more efficiently, and in a organized manner.
Let's put setw() and "\t" to the test:
Let's say we have values 8 and 10,000 and want to print the values.
cout << "\t" << "8";
cout << "\t" << "10000";
will output:
8
10000
while if you had:
cout << setw(8) << "8";
cout << setw(8) << "10000";
it would output:
8
10000
It's just an issue of keeping your code organized and looking nice.

C++ Boost's sregex_token_iterator crash

I'm using the following code to get the image filenames from an HTML file.
The code goes somehow like this:
std::tr1::regex term=(std::tr1::regex)r;
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator i(s.begin(),s.end(), term); i != end; ++i)
{
std::cout << *i << std::endl;
}
s is a string that is already declared and contains the full string of the file.
r is a string that contains the regex term to look for.
This code does actually retrieve the values from the file correctly, but after reaching the last one it crashes. It might have to do with the token_iterator i, but I don't have a clue of what is causing it or how to fix it.
I don't know if you already solved the problem, but find my suggestions below:
Did you try to change the ++i to i++?
Did you look at the HTML file to see if the first filename that cout shows is in fact the first one in the file?
I think the first loop on cout will print the second match in the HTML file.
If you already solved it please let me know the code applied, I'm working with boost regex and it would help me on future problems that I may have.
Regards,
Tchesko.
I really forgot about this-- I'm pretty sure there was an external problem on this, linker-related, so it was kinda hard to figure out. But the code was fine.