Parsing /etc/passwd with regex_*, unstandard behavior C++ [duplicate] - c++

This question already has answers here:
Is gcc 4.8 or earlier buggy about regular expressions?
(3 answers)
Closed 7 years ago.
Let's assume I have this line in my etc/passwd:
xuser01:*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh
I browse the file by lines.
From parameters I get usernames/userids that tells me which lines I should store into variable.
Using both regex_match and regex_search I got no results, while when I was testing it on online regex testers, it work like hell. Any idea why this is not working?
regExpr = "^(xuser01|xuser02)+:((.*):?)+";
if(regex_search(line, regex(regExpr)))
{
cout << "Boom I got you!" << endl;
}
line contains line read at the moment, it loops through the whole file, and doesn't find the string. I used regex_match too, same results.
Different regular expressions I tried: (xuser01|xuser02)+ and similar, designed to be almost 100% sure match (but still what I need to match), neither of it works in my C++ program, on online regex testers it does.
Advices?
Thanks in advance!

It looks like the quantifier + is preventing C++ from getting your matches. I think it is redundant in your regex since you only have a unique number of "xuser"s in your string.
This code works alright, gets to the cout line:
string line( "xuser01:*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh" );
regex regExpr("^(xuser01|xuser02):((.*):?)");
if(regex_search(line, regExpr))
{
cout << "Boom I got you!" << endl;
}
However, you did not indicate what you are looking for. Currently, it will only match 3 groups:
xuser01
*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh
*:111000:201:User Name, School Info, Year:/homes/pc/xu/xuser01:/bin/ksh

Related

Regex extract number from a string with a specific pattern in Alteryx [duplicate]

This question already has answers here:
Find numbers after specific text in a string with RegEx
(3 answers)
Closed 3 years ago.
I have string like this which looks like a url
mainpath/path2/abc/PI 6/j
From the string I need to get the number along with PI
Main problem is the position of PI part wont be always the same. Sometimes it could be at the end. Sometimes at the middle.
So how can I get that number extracted using regex?
I'm really stucked with this
It's as simple as using the RegEx Tool. A Regular Expression of /PI (\d+) and the Output Method of "Parse" should do the trick.
If you're using Alteryx... suppose your field name is [s] and you're looking for [f] (in your example the value of [f] is "PI")... then you could have a Formula tool that first finds /PI by first creating a new field [tmp] as:
SubString([s],FindString([s],"/"+[f])+1)
and then creating the field you're after [target]:
SubString([tmp],0,FindString([tmp],"/"))
From there run [target] through a "Text to Columns" tool to split on the space, which will give you "PI" and "6".

RegEx for matching everything except new lines and a special char [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I was working on a HW problem that involves removing all of the html tags "<...>" from the text of an html code and then count all of the tokens in that text.
I wrote a solution that works but it all comes down to a single line of code that I didn't actually write and I'm curious to learn more about how this kind of code works.
public static int tagStrip(Scanner in) {
int count = 0;
while(in.hasNextLine()) {
String line = in.nextLine();
line = line.replaceAll("<[^>\r\n]*>", "");
Scanner scan = new Scanner(line);
while(scan.hasNext()) {
String word = scan.next();
count++;
}
}
return count;
}
Line 7 is the one I'm curious about. I understand how the replaceAll() method works. I'm not sure how that String "<[^>\r\n]*>" works. I read a little bit about patterns and messed around with it a bit.
I replaced it with "<[^>]+>" and it still works exactly the same. So I was hoping somebody could explain how these characters work and what they do especially within the construct of this type of program.
RegEx
If you wish to explore or modify your expression, you can modify/change your expressions in regex101.com.
<[^>]+> may not work since it would pass your new lines, which seems to be undesired.
RegEx Circuit
You can also visualize your expressions in jex.im:

[Regex]::Match() behaving differently inside vs outside an If (that also uses [Regex]::Match() ) [duplicate]

This question already has answers here:
Execute "real life" command line from variable in Powershell
(3 answers)
Closed 4 years ago.
Given a uninstallString of "C:\ProgramData\Package Cache\{56e11d69-7cc9-40a5-a4f9-8f6190c4d84d}\VC_redist.x86.exe" /uninstall I can successfully extract the quoted text with ([Regex]::Match($uninstallString, '^\".*\"').Value). however, if I test to see if the string has the required /uninstall bit, then try to extract the quoted bit, like this...
if ([Regex]::Match($uninstallString, '^\".*\" +/uninstall').Succes) {
([Regex]::Match($uninstallString, '^\".*\"').Value)
}
Instead of the value being the full string, it's only returning "C:\ProgramData\Package. Now, My understanding is that . is everything but a line break, so it should be OK with the space. But, if I replace the space with an underscore in the string it works as expected, so it's definitely the space causing the issue.
Also, I am confused why it works outside of the If, but not inside. I was under the impression that using [Regex]::Match() creates individual objects with each use, that wouldn't interact with each other, but here it seems they are.
Since you want to see if the quoted string (path) is found AND if it contains a switch '/uninstall' or not,
I'd do something like this:
$uninstallString = '"C:\ProgramData\Package Cache\{56e11d69-7cc9-40a5-a4f9-8f6190c4d84d}\VC_redist.x86.exe"'
if ($uninstallString -match '^(?<path>".*")(?:\s+(?<switch>/uninstall))?') {
$uninstallPath = $matches['path'] # at least the path (quoted string) is found
$uninstallSwitch = $matches['switch'] # if '/uninstall' switch is not present, this will result in $null
}

boost regex match non-whitespace and angle brackets

I may be asking a duplicate question, but I've spent a couple of hours googling this to no avail!
I'm trying to extract a string from some SIP URLs parsed by a program I'm working on. Here's an excerpt of the code. I'm passing in sipUrl, and have all the right includes etc:
static const boost::regex sipRegExp ("(sip:\\S+?#(?=\\S)[^>]+);");
boost::cmatch result;
boost::match_results<string::const_iterator> results;
boost::match_flag_type flags = boost::format_perl;
string newSipUrl;
cout << sipUrl << endl;
bool toggle = boost::regex_search(sipUrl, result, sipRegExp, flags);
if (toggle) {
cout << result[1].str() << endl;
newSipUrl = result[1].str();
}
cout << "new url: " << newSipUrl << endl;
I'm basically trying to extract the sip:user#IP from strings like "\"alex#192.168.1.2\"<sip:alex#192.168.1.2>;tag=fe310852" or "\"bob\"<sip:bob#foo.com>;", however, I can't get it to match! It worked fine when I wasn't using lookahead to try and remove the last angle bracket, but ever since then it fails to match.
Posting this just before running out of the door, so it may need more info. If anyone can spot something glaringly obvious, then that'd be a great help! And please feel free to point me at links that I might have missed!
Have you tried something simpler such as regex against:
`sip:[a-zA-Z]*#[0-9a-zA-Z.]*`
works on terminal but haven't tried it through boost yet. If you start of with something simple then add bit by bit to make it more specific then it will be easier to track which part of the regex isn't working.
You missed the > before the semicolon:
"(sip:\\S+?#(?=\\S)[^>]+)>;"
Although actually you probably don't need the semicolon at all. Something like Scott's answer should be sufficient.
I ended up going with a modification of #David Knipe's comment - the winning regex was:
sip:\\S+#[^\\s>;]+
Which matches with or without angle brackets, up to the colon. Both answers provided did work, but being able to remove the lookahead was quite nice. I also went with the + modifiers to make some effort to find a valid URI and not a blank one.
Thanks for the help!

Same Command Over Several Lines in C++ [duplicate]

This question already has answers here:
C++ multiline string literal
(10 answers)
Closed 9 years ago.
I want to write a single variable over several lines in C++. more precisely in WINAPI.
Something like: (if \ is the command that does it,)
str=" This is a sample file trying to write multiple lines. \n but it is not same as line break. \
I am defining the same string over several lines. This is different from using backslash n. \
This is not supposed to print multipline in screen or in write file or on windows display. This\
is for ease of programming.";
The problem with this is that I got "|||" whereever I had used \ in my code. I don't want that to appear.
What shall I do?
There are several alternatives. Here are two:
Put the content of the string into a file and read the file content into the string. When you find yourself using lots of long strings, this probably the “correct” way.
Use the following syntax:
str = "This is a string that is going over several lines "
"but it does not include line breaks and if you print "
"the string you will see that it looks like it was "
"written normally.";
– C++ allows you to write several string literals after another and concatenates them automatically at compile time. That is, "a" "b" is the same as "ab", as far as C++ is concerned.