Boost wregex throwing exception, regex syntax wrong? - regex

I have imported Boost library in to a .dll that I am using. I am trying to parse a string using:
boost::wregex regPlayerAtSeat(L"*Governor: Seat.?[1-9].*");
But all I get is an 'interop service exception. Is the syntax of my regex wrong?
Thanks, R.

The first * doesn't appear to have any characters before it. In regex it acts as a quantifier, not a wildcard like in UNIX command lines and so forth. You probably want something like .* in its place, but that's partly just a guess. The full regex would then look like this:
boost::wregex regPlayerAtSeat(L".*Governor: Seat.?[1-9].*");
.* will match zero or more repetitions of (almost) any character (probably not newlines, but I don't know the inner workings of the boost regex engine). Is that what you were going for at the beginning of your string? Alternatively, since you haven't anchored your regex, you might be able to just use:
boost::wregex regPlayerAtSeat(L"Governor: Seat.?[1-9]");
This will depend on what exactly you're trying to match and what format it is in, however.

Related

Vim S&R to remove number from end of InstallShield file

I've got a practical application for a vim regex where I'd like to remove numbers from the end of file location links. For example, if the developer is sloppy and just adds files and doesn't reuse file locations, you'll end up with something awful like this:
PATH_TO_MY_FILES&gt
PATH_TO_MY_FILES1&gt
...
PATH_TO_MY_FILES22&gt
PATH_TO_MY_FILES_ELSEWHERE&gt
PATH_TO_MY_FILES_ELSEWHERE1&gt
...
So all I want to do is to S&R and replace PATH_TO_MY_FILES*\d+ with PATH_TO_MY_FILES* using regex. Obviously I am not doing it quite right, so I was hoping someone here could not spoon feed the answer necessarily, but throw a regex buzzword my way to get me on track.
Here's what I have tried:
:%s\(PATH_TO_MY_FILES\w*\)\(\d+\)&gt:gc
But this doesn't work, i.e. if I just do a vim search on that, it doesn't find anything. However, if I use this:
:%s\(PATH_TO_MY_FILES\w*\)\(\d\)&gt:gc
It will match the string, but the grouping is off, as expected. For example, the string PATH_TO_MY_FILES22 will be grouped as (PATH_TO_MY_FILES2)(2), presumably because the \d only matches the 2, and the \w match includes the first 2.
Question 1: Why doesn't \d+ work?
If I go ahead and use the second string (which is wrong), Vim appears to find a match (even though the grouping is wrong), but then does the replacement incorrectly.
For example, given that we know the \d will only match the last number in the string, I would expect PATH_TO_MY_FILES22&gt to get replaced with PATH_TO_MY_FILES2&gt. However, instead it replaces it with this:
PATH_TO_MY_FILES2PATH_TO_MY_FILES22&gtgt
So basically, it looks like it finds PATH_TO_MY_FILES22&gt, but then replaces only the & with group 1, which is PATH_TO_MY_FILES2.
I tried another regex at Regexr.com to see how it would interpret my grouping, and it looked correct, but maybe a hack around my lack of regex understanding:
(PATH_TO_\D*)(\d*)&gt
This correctly broke my target string into the PATH part and the entire number, so I was happy. But then when I used this in Vim, it found the match, but still replaced only the &.
Question 2: Why is Vim only replacing the &?
Answer 1:
You need to escape the + or it will be taken literally. For example \d\+ works correctly.
Answer 2:
An unescaped & in the replacement portion of a substitution means "the entire matched text". You need to escape it if you want a literal ampersand.

Greedy and non-greedy regex

I currently have this regex: this\.(.*)?\s[=,]\s, however I have come across a pickle I cannot fix.
I tried the following Regex, which works, but it captures the space as well which I don't want: this\.(.*)?(?<=\s)=|(?<!\s),. What I'm trying to do is match identifier names. An example of what I want and the result is this:
this.""W = blah; which would match ""W. The second regex above does this almost perfectly, however it also captures the space before the = in the first group. Can someone point me in the correct direction to fix this?
EDIT: The reason for not simply using [^\s] in the wildcard group is that sometimes I can get lines like this: this. "$ = blah;
EDIT2: Now I have another issue. Its not matching lines like param1.readBytes(this.=!3,0,param1.readInt()); properly. Instead of matching =!3 its matching =!3,0. Is there a way to fix this? Again, I cannot simply use a [^,] because there could be a name like param1.readBytes(this.,3$,0,param1.readInt()); which should match ,3$.
(.*) will match any character including whitespace.
To force it not to end in whitespace change it to (.*[^\s])
Eg:
this\.(.*[^\s])?\s?[=,]\s
For your second edit, it seems like you are doing a language parser. Even though regular expressions are powerful, they do have limits. You need a grammar parser for that.
Maybe you can tell in your first block to capture non space characters, instead of any.
this\.(\S*)?(?<=\s)=|(?<!\s),

Parse with Regex without trailing characters

How can I successfully parse the text below in that format to parse just
To: User <test#test.com>
and
To: <test#test.com>
When I try to parse the text below with
/To:.*<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>/mi
It grabs
Message-ID <CC2E81A5.6B9%test#test.com>,
which I dont want in my answer.
I have tried using $ and \z and neither work. What am I doing wrong?
Information to parse
To: User <test#test.com> Message-ID <CC2E81A5.6B9%test#test.com>
To:
<test#test.com>
This is my parsing information in Rubular http://rubular.com/r/DQMQC4TQLV
Since you haven't specified exactly what your tool/language is, assumptions must be made.
In general regex pattern matching tends to be aggressive, matching the longest possible pattern. Your pattern starts off with .*, which means that you're going to match the longest possible string that ENDS WITH the remainder of your pattern <[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>, which was matched with <CC2E81A5.6B9%test#test.com> from the Message-ID.
Both Apalala's and nhahtdh's comments give you something to try. Avoid the all-inclusive .* at the start and use something that's a bit more specific: match leading spaces, or match anything EXCEPT the first part of what you're really interested in.
You need to make the wildcard match non greedy by adding a question mark after it:
To:.*?<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>

Regular expression with drools

I have a string with multiline as below.
rawMessage=sysUpTimeInstance-->0:0:00:05.00
snmpTrapOID.0-->linkDown.0.0
In the drools when portion i have written the condition as below.
rawMessage matches "(?i).*linkDown(.|\n|\r)*"
but it is not working.Please provide me some pointers to handle multiline.
Its not clear to me what you want to do/achieve. Your regex looks not wrong (I don't know the drools flavour and what you want to match).
In general (.|\n|\r)* is able to match any character including newlines. In your example there is no newline after "linkDown", so what should it match there?
Maybe you need to double escape (I don't know for drools) like this: (.|\\n|\\r)*.
Another possibility is to use the singleline modifer s (Again, I don't know if drools supports this modifier). This makes the . match also newline characters, could then look something like this
rawMessage matches "(?i)(?s).*linkDown.*"
or if it should only match multiline from "linkdown" on
rawMessage matches "(?i).*linkDown(?s).*"
Drools uses standard java regular expressions. As the previous answer mention, your expression looks wrong. And yes, you need to double escape special chars like you would do in java. Just check the javadoc for the Pattern class in the java API.

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];