Regex repeat patterns and non matching groups - c++

Okay so I am having an issue getting a repeat to work at all, let alone the way I want it to work...
I will be bringing in a string with the following information
NETWORK;PASS;1;THIS TEXT|CAN BE|RANDOM|WITH|PIPE|SEPERATORS;\r
what I have so far
(?:NETWORK;.*;(?:0|1);)([^|]*)
this currently leaves me the first block matched
THIS TEXT
what I am trying to do is set it up so I can programmatically specify which block to match. the text separated with pipes will have between 3-7 "blocks" and depending on the situation I may need to match any one of them, but only one at a time.
I had thought about just duplicating
([^|]*)
and adding a non matching operator to all but the one but I cant seem to get it to match anything if I duplicate that group, and neither can I get repeat operators to work on the group.
I am a bit lost so this may not make entire sense if clarification is required I will provide on request. any help is appreciated.

Why not just split THIS TEXT|CAN BE|RANDOM|WITH|PIPE|SEPERATORS on the pipe symbol? Much easier than a dynamically-generated regex.
But if you really want to generate a regex:
Start with (?:NETWORK;.*;(?:0|1);)
To get the nth element (indexed from 0), add (?:[^|]+[|]){n} (replace n with the number to skip), followed by ([^|]+)
Example:
(?:NETWORK;.*;(?:0|1);)(?:[^|]+[|]){3}([^|]+)
Debuggex Demo
Matches WITH in your example. Here's a regex101 demo.

Related

Notepad++ Regex Search XML argument for anything but certain word

I have a well structured XML file with several grouped units, which contain a consistent number of child elements.
I am trying to find a way, through Regex in Notepad++, to search throughout all of these groups for a certain argument that contains a single word. I have found a way of doing this but the problem is I want to find the negation of this word, that means for instance, if the word is "downward" I want to find anything that is NOT "downward".
Here is an example:
<xml:jus id="84" trek="spanned" place="downward">
I've came up with <xml:jus id="\d+" trek="[\w]*" place="\<downward"> to find these tags, but I need to find all other matches that do not have "downward" in place= argument. I tried <xml:jus id="\d+" trek="[\w]*" place="^\<downward"> but without success.
Any help is appreciated.
If the properties and the string is in the same format, you could also make use of SKIP FAIL to first match what you want to exclude.
<xml:jus id="\d+" trek="\w+" place="downward">(*SKIP)(*F)|<xml:jus id="\d+" trek="\w+" place="[^"]+">
Regex demo
You might be able to use a negative lookahead to exclude downward from being the place:
<[^>]+ place="(?!downward").*?"[^>]*>
Demo

Can I improve simplicity using negative lookahead to find the last folder in a file path?

I’m trying to find a simpler solution to locating the last folder path in a file list that does not contain a file of type, but must use lookarounds. Can anyone explain some improvements in my regex code that follows?
Search text:
c:\this\folder\goes\findme.txt
c:\this\folder\cant\findme.doc
c:\this\folder\surecanfind.txt
c:\\anothertest.rtf
c:\t.txt
RegEx:
(?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)
Expected result:
‘goes’
‘folder’
Can the RegEx lookahead be improved and simplified? Thanks for the help.
In your original regex:
(?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)
there isn't really much to improve in terms of the use of lookarounds.
The positive look behind is necessary to tell the regex when it is allowed to begin a match.
The positve look ahead is necessary to terminate the expansion of the +? quantifier.
And the negative look ahead is needed to negate invalid matches.
You might be able to condense both look aheads into one. But keeping them separate is more efficient, since if the evaluation of one fails, it can skip the evaluation of the second.
However, if your looking for a more efficient/"normal" Regex, I would typically use something like:
^.*\\(.+?)\\[^\\]+\.(?!doc).+$
instead of using lookarounds to exclude everything except my desired output from a match, I'd include my desired output in a capture group.
this allows me to tell regex to only check for a match once per line, instead of after ever \ character.
Then, to get my desired output, all I have to do is grab the content of capture group 1 from each match.
working example
orignal (98,150 steps)
Capture Groups (66,586 steps)
Hopefully that'll help you out

Regex - orderless extraction of string

I have 2 strings which are 2 records
string1 = "abc/BS-QANTAS\\/DS-12JUL15\\dfd"
string2 = "/DS-10JUN15\\/BS-AIRFRANCE\\dfdsfsdf"
BS is booking airline
DS is Date
I want to use a single regex and extract the booking source & date. Please let me know if it is feasible.
I have tried lookaheads and still couldn't achieve
The target language is Splunk and not Javascript.
Whatever may be the language please post I'll give a try in Splunk
You mentioned that you've tried lookahead, what about lookbehind?
(?<=BS-|DS-)(\w+)
Tested at Regex101
Here's a more scalable (and more readable, IMO) alternative to miroxlav's answer:
(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+
I'm assuming the fields you're interested in always start with a slash. That allows me to use [^/]+ to safely consume the junk between/around them.
demo
This is effectively three regexes in one, wrapped in a group, to give each one a chance to match in turn, and applied multiple times. If the first alternative matches, you're looking at a "source airline" field, and the name is captured in the group named "source". If then second alternative matches, you're looking at the date, which is captured in the "date" group.
But, because the fields aren't in a predetermined order, the regex has to match the whole string to be sure of matching both fields (in fact, I should have used start and end anchors--^ and $--to enforce that; I've added them below). The third alternative, [^/]+, allows it to consume the parts that the first two can't, thus making an overall match possible. Here's the updated regex:
^(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+$
...and the updated demo. As noted in the comment, the \v is there only because I'm combining your two examples into one multiline string and doing two matches. You shouldn't need it in real life.
This gives you both strings filled either in match groups airline1+date1 or in airline2+date2:
((BS-(?<airline1>\w+).*DS-(?<date1>[\w]+))|(DS-(?<date2>[\w]+).*BS-(?<airline2>\w+)))
>> view at regex101.com
Since there are only 2 groups, I used simple permutation.
This regex will take last of occurrences, if there are more. If you need earliest one (using lookbehind), let me know.

Regex PCRE: Validate string to match first set of string instead of last

I tried quite a few things but Im stuck with my regex whenever meets the criteria 2 consecutive times. In this case it just considers it as one expressions instead of 2.
\[ame\=[^\.]+(.+)youtube\.(.+)v\=([^\]\&\"]+)[\]\'\"\&](.+)\[\/ame\]
E.g.
[ame="http://www.youtube.com/watch?v=brfr5CD2qqY"][B][COLOR=yellow]http://www.youtube.com/watch?v=brfrx5D2qqY[/COLOR][/B][/ame][/U]
[B][COLOR=yellow]or[/COLOR][/B] [B][COLOR=yellow]B[/COLOR][/B]
[ame="http://www.youtube.com/watch?v=M9ak3rKIBAU"][B][COLOR=yellow]http://www.youtube.com/watch?v=M9a3arKIBAU[/COLOR][/B][/ame]
[B][COLOR=yellow]or[/COLOR][/B] [B][COLOR=yellow]C[/COLOR][/B]
[ame="http://www.youtube.com/watch?v=7vh--3pyq5U"][COLOR=yellow]http://www.youtube.com/watch?v=7vh--3pyq5U[/COLOR][/ame]
In that case, this regex would instead of matching all 3 options, it takes it as one.
Any ideas how to make an expression that would say match the first "[/ame]"?
The problem is the use of .+ - they are "greedy", meaning they will consume as much input as possible and still match.
Change them to reluctant quantifiers: .+?, which won't skip forward over the end of the first match to match the end if the last match.
I'm not sure what your objective is (you haven't made that clear yet)
But this will match and capture out the youtube URL for you, ensuring you only match each single instance between [ame= and [/ame]
/\[ame=["'](.*?)["'](.*?)\/ame\]/i
Here's a working example, and a great sandbox to play around in: http://regex101.com/r/jR4lK2

Regex href match a number

Well, here I am back at regex and my poor understanding of it. Spent more time learning it and this is what I came up with:
/(.*)
I basically want the number in this string:
510973
My regex is almost good? my original was:
"/<a href=\"travis.php?theTaco(.*)\">(.*)<\/a>/";
But sometimes it returned me huge strings. So, I just want to get numbers only.
I searched through other posts but there is such a large amount of unrelated material, please give an example, resource, or a link directing to a very related question.
Thank you.
Try using a HTML parser provided by the language you are using.
Reason why your first regex fails:
[0-9999999] is not what you think. It is same as [0-9] which matches one digit. To match a number you need [0-9]+. Also .* is greedy and will try to match as much as it can. You can use .*? to make it non-greedy. Since you are trying to match a number again, use [0-9]+ again instead of .*. Also if the two number you are capturing will be the same, you can just match the first and use a back reference \1 for 2nd one.
And there are a few regex meta-characters which you need to escape like ., ?.
Try:
<a href=\"travis\.php\?theTaco=([0-9]+)\">\1<\/a>
To capture a number, you don't use a range like [0-99999], you capture by digit. Something like [0-9]+ is more like what you want for that section. Also, escaping is important like codaddict said.
Others have already mentioned some issues regarding your regex, so I won't bother repeating them.
There are also issues regarding how you specified what it is you want. You can simply match via
/theTaco=(\d+)/
and take the first capturing group. You have not given us enough information to know whether this suits your needs.