Select first text between two expressions - regex

I want to return the first "abcd" part of the text below.
00abcd126456\x 00abcd126456\x
I want to select all text between the first " 00" and the first (6 digits + "\x"). Every string starts with " 00".
I've been experementing with:
^ 00(.*)\d{6}\\x
but it obviously selects the whole string.
Please help.

Use a non-greedy quantifier:
^ 00(.*?)\d{6}\\x
*? will only match as few as possible characters to allow the match to succeed, instead of * which will match as many characters as possible.
If you don't want to fiddle around with the capturing group you can also use lookaround:
(?<=^ 00).*?(?=\d{6}\\x)
Quick PowerShell test:
PS> ' 00abcd126456\x 00abcd126456\x' -match '(?<=^ 00).*?(?=\d{6}\\x)'; $Matches
True
Name Value
---- -----
0 abcd

Related

Regex - Match multiple instances of specific character thats not followed by numeric value

I need a regex query to match the following:
hello%20world //Wont match
hello % dog //Will match
hello %20 world //Wont match
hello %% world //Will match but twice (Wont match as whole word of %%, will match single "%" and then the second "%")
I am using regex to replace any matches of "%" that is not followed by a number. If it contains "%%", i also want to replace both of those with lets say 'A' so i get "AA" not "A".
My regex attempt:
%[^0-9]
https://regex101.com/r/Z9N7QJ/1
Issue is its matching but also with the next character so my string "Hello % world" matches "% ". And my "%%" is being matched as a pair not singles.
Thanks.
You may use a negative lookahead here:
/%(?!\d)/
Updated RegEx Demo
Lookahead is a Zero width match but your regex %[^0-9] on the contrary consumes next non numeric character as well.

PowerShell Regular Expression match Y or Z

I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.
I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo
PC1-FOO1234567
PC2-FOO1234567/FOO98765
This works for the second example:
'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'
It lets me access the matched strings using $matches[1] and $matches[2] which is great.
It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.
Suggestions?
You may use
'FOO(.*?)(?:/FOO(.*))?$'
It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.
See the regex demo
Details
FOO - literal substring
(.*?) - Group 1: any zero or more chars other than newline, as few as possible
(?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
/FOO - a literal substring
(.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
$ - end of string.
[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]
this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.
the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'# -split [environment]::NewLine
$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'
output ...
1234567
1234567
98765
To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:
PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568 # single match from 1st input string
1234567 # first of 2 matches from 2nd input string
98765
Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.
^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.
(?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
-ne '' filters out the empty elements that result from the input strings starting with a separator.
To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.

Regular Expression to Anonymize Names

I am using Notepad++ and the Find and Replace pattern with regular expressions to alter usernames such that only the first and last character of the screen name is shown, separated by exactly four asterisks (*). For example, "albobz" would become "a****z".
Usernames are listed directly after the cue "screen_name: " and I know I can find all the usernames using the regular expression:
screen_name:\s([^\s]+)
However, this expression won't store the first or last letter and I am not sure how to do it.
Here is a sample line:
February 3, 2018 screen_name: FR33Q location: Europe verified: false lang: en
Method 1
You have to work with \G meta-character. In N++ using \G is kinda tricky.
Regex to find:
(?>(screen_name:\s+\S)|\G(?!^))\S(?=\S)
Breakdown:
(?> Construct a non-capturing group (atomic)
( Beginning of first capturing group
screen_name:\s\S Match up to first letter of name
) End of first CG
| Or
\G(?!^) Continue from previous match
) End of NCG
\S Match a non-whitespace character
(?=\S) Up to last but one character
Replace with:
\1*
Live demo
Method 2
Above solution substitutes each inner character with a * so length remains intact. If you want to put four number of *s without considering length you would search for:
(screen_name:\s+\S)(\S*)(\S)
and replace with: \1****\3
Live demo

Regex expression for digit followed by dot (.)

I want to find a text with with digit followed by a dot and replace it with the same text (digit with dot) and "xyz" string.
For ex.
1. This is a sample
2. test
3. string
**I want to change it to**
1.xyz This is a sample
2.xyz test
3.xyz string
I learnt how to find the matching text (\d.) but the challenge is to find the replace with text.
I'm using notepad ++ editor for this, can anyone suggest the "Replace with" string.
First of all, you need to escape the dot since it means "match anything (except newline depending if the s modifier is set)": (\d\.).
Second, you need to add a quantifier in case you have a 2 digit number or more: (\d+\.).
Third, we don't need group 1 in this case: \d+\..
In the replacement, it's quite simple: just use $0xyz. $0 will refer to group 0 which is the whole match.
For notepad++...
You must escape the period/dot character in the expression - precede it with a backslash:
\.
In my case, I needed to find all instances of "{EnvironmentName}.api.mycompany.com"
(dev.api.mycompany.com, stage.api.mycompany.com, prod.api.mycompany, etc.)
I used this search expression:
.*\.api.mycompany.com
Notepad++ RegEx Search Screenshot
I think the right answer is as follow:
Find: ^(\d)([.])(\s)
Replace: $1$2XYZ
That will work with "n. " being "n" a digit [0-9]. If the input should accept digits with different lengths like 10, 100, 1000... or multiples dots "." after the digit or multiple spaces after the dot, then the answer is:
Find: ^(\d*)([.])([.]*)(\s*)
Replace: $1$2XYZ
Input:
1. This is a sample
2. test
3. string
30. string
10..... string
50005... string
Output:
1.XYZ This is a sample
2.XYZ test
3.XYZ string
30.XYZ string
10.XYZ string
50005.XYZ string

Match a number in a string with letters and numbers

I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S* and it captures only the 3 but not 123.
Regex atoms will match as much as they can.
Initially, the first \S* matched "test123", but the regex engine had to backtrack to allow \d+ to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
The * in your regex are greedy, that's why they "eat" also numbers. Exactly what #Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
"something122320" =~ /(\d+)/ will return 122320; this is probably what you're trying to do ;)
\S matches any non-whitespace characters, including digits. You want \d+:
my ($number) = 'test123' =~ /(\d+)/;
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S* specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3' and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/ or /[0-9]+/