what pattern to get substring using regexp - regex

I have following two strings, how can I get the numbers in them?, ie 233100 and 233800
QA-Ki-233100
QA-Ki-233800-win-vc8-x86-release
This is the pattern I have, but not work.
oRegexp.Pattern = "QA-Ki-\--[\Z]"
Thanks for your help.

This should do:
(?<=-)\d+(?=-|$)
or simply (in this case),
\b\d+\b
In (?<=-)\d+(?=-|$) we have used positive lookbehind and lookahead to make sure only - prepends and either - or the end of the line ($) appends our desired substring \d+ (the numbers between those).
In \b\d+\b, the - and $ both fell under the category of word boundary \b so the regex becomes shorter.
Check: https://regex101.com/r/nL9nR1/1

Related

Regex pattern matching for contains a character

I'm looking for a regex pattern which can do this exactly.
Should match the length which is 12 characters alphaNumeric
Should also check for the occurrence of hyphen - twice in the word
No spaces are allowed.
I have tried the following regex:
^([a-zA-Z0-9]*-[a-zA-Z0-9]*){2}$
Some sample cases
-1234abcd-ab
abcd12-avc-a
-abcd-abcdacb
ac12-acdsde-
The regex should match for all the above.
And should be wrong for the below
-abcd-abcd--a
abcd-abcdefg
I've been using this regex ^([a-zA-Z0-9]*-[a-zA-Z0-9]*){2}$ for matching the above patterns, but the problem is, it doesn't have a length check of 12. I'm not sure how to add that into the above pattern. Help would be appreciated.
Use this:
(?=^.{12}$)(?=^[^-]*-[^-]*-[^-]*$)[a-zA-Z0-9-]+ /gm
The first positive lookahead asserts the total length to be 12.
The second positive lookahead asserts the presence of exactly two hyphens.
Rest is just matching the possible characters in the character set.
Demo

Regex to ignore Cobol comment line

I'd like to use regex to scan a few Cobol files for a specific word but skipping comment lines. Cobol comments have an asterisk on the 7. column. The regex i've gotten so far using a negative lookbehind looks like this:
^(?<!.{6}\*).+?COPY
It matches both lines:
* COPY
COPY
I would assume that .+? overrides the negative lookbehind somehow, but i'm stuck on how to correct this. What would i need to fix to get a regex that only matches the second line?
You may use a lookahead instead of a lookbehind:
^(?!.{6}\*).+?COPY
See the regex demo.
The lookbehind required some pattern to be absent before the start of the string, and thus was redundant, it always returned true. Lookaheads check for a pattern that is to the right of the current location.
So,
^ - matches the start of the string
(?!.{6}\*) - fails the match if there are any 6 chars followed with * from the start of the string (replace . with a space if you need to match just spaces)
.+? - matches any 1+ chars, as few as possible, up to the first
COPY -COPY substring.
If you want to filter out EVERY comment you could use:
^ {6}(?!\*)
That will match only lines starting with spaces that DOES NOT have an '*' at the 7th position.
COBOL can use the position 1-6 for numbering the lines, so may be safter to just use:
^.{6}(?!\*).*$

Regex in middle of text doesn't match

I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.
Possible reasons why the pattern does not work:
^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.
Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).
The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.

How to find words that contain string with a limited size

I need to find all the words in an inputted text that has (?i:val) in it and are no longer that 5 characters.
So far I got: \b([a-zA-Z]*(?i:val)[a-zA-Z]*){1,4}\b
If we take this sample text to look in: In computer science, a value is an expression which cannot be evaluated any further (a normal form). Val is also a match
I get 3 matches (value, evaluated and Val), however evaluated should not match the pattern, as it is too long. What is the right way to get this straight?
Your pattern does not account for the length of the words matched.
Use word boundaries and a lookahead like this:
(?i)\b(?=\w*val)\w{1,5}\b
See regex demo
The regex matches:
\b - a leading word boundary since the next pattern is \w
(?=\w*val) - a lookahead making sure there is a val substring after zero or more word characters
\w{1,5} - matches 1 to 5 word characters
\b - trailing word boundary that stops words of more than 5 characters long from matching
You may use an ASCII JS version of the regex:
/\b(?=[a-z]*val)[a-z]{1,5}\b/i
It's important to understand why the "evaluated" was matched. Note:
[a-zA-Z]* matches the "e"
(?i:val) matches "val"
[a-zA-Z]* matches "uated"
Actually there's not repetition here! The pattern was matched in only one iteration.
You can achieve what you want using lookarounds, but I think that regex is not the best tool for this task. I highly recommend you using other functions depending on what you have.

Regex match any character NOT followed by "? something"

How can I match a path only if there is no "?" plus zero or more character on the end.
I have the following path:
/something/contentimg/coast03.jpg?itok=ABC
I want the filename, but only if there is no "?something" after the file extension.
I tried:
/^.*\/(.*)(?!\?.*)$/
But it matches anyway. This is the result. What am I doing wrong?
Array
(
[0] => /something/contentimg/coast03.jpg?itok=ABC
[1] => coast03.jpg?itok=ABC
)
Using php.
Use parse_url:
print_r(parse_url('/something/contentimg/coast03.jpg?itok=ABC'))
(
[path] => /something/contentimg/coast03.jpg
[query] => itok=ABC
)
The * quantifier behaves greedily and matches everything up to the end of the regular expression, so the negative lookahead kicks in at the end of the input (and of course doesn't find what it's looking for). The regex should be done a little differently:
/^.*\/([^?]+)$/
This expression matches one or more non-question-mark characters and then asserts that it has reached the end of the input string, which is what you want to do.
^.*\/([^?]+)(?![?].+)$
Working DEMO
Your expression does not work, because (.*) matches everything after last \, so there is nothing that could be considered as negative lookahead input.
This is how it's currently matching:
.* - greedily matches up to before the last / - /something/contentimg
\/ - matches /
(.*) - matches the rest of the string - coast03.jpg?itok=ABC
(?!\?.*) - checks that the characters following don't match, since we are at the end already, it obviously won't match.
What you should do:
It seems like you can just check if a ? exists in the string, so try:
/^(?!.*\?)/
Or match up to the last /, then check for a ? from there:
/^(?!.*\/.*\?)/
Explanation:
You already know (?!...) is negative look-ahead, you're just not entirely sure how to use it. Wherever you put it, it tries its best to match the given pattern from that position onwards. If it succeeds, the regex doesn't match. So it might be a good idea to put this at the very beginning and try to match the rest of the string.
So the basic format for this example is:
/^(?!...).*$/
where (?!...) contains a pattern for the strings you want to exclude.
The .*$ at the end shouldn't be required, and if you want to check the entire string, remember the $ at the end of the look-ahead.
/^(?!...$)/