Matching strings with and without escape characters with RegEx

Matching strings with and without escape characters with RegEx - regex

I have different distinguished names from Active Directory objects and need to filter out escape characters when splitting those dn´s into simple names.
I already have a string -split of PowerShell in place, but this does not filter out escape characters. I´ve tried regex with a positive lookbehind but i do need in this case something like a optional positive lookbehind? Maybe I'm just thinking too complicated.
String examples:
OU=External,OU=T1,OU=\+TE,DC=test,DC=dir
OU=\#External,OU=T1,OU=\+TE,DC=test,DC=dir
OU=\+External,OU=T1,OU=\+TE,DC=test,DC=dir
Because + and # are escaped but are the actual name of those objects, I need to remove the escape characters
With following PowerShell it is possible to get the name of the object
($variable -split ',*..=')[1]
Actual Result:
External
\#External
\+External
Expected Result:
External
#External
+External
It is possible to use regex with $variable -creplace "REGEX" but I cant find a regex which fits all those cases.
My try was: (?<=OU=\\).+?(?=,OU=) but just matches if the \ is there
I need this name for the object creation inside Active Directory.

With minimal change you could just add the slash as optional in your current regex. You already do something similar with the leading comma
"OU=\#External,OU=T1,OU=\+TE,DC=test,DC=dir" -split ',?..=\\?'
You could take that farther if you were just going for the first section but that answers your basic question. There is likely other efficiencies to be made but probably not worth it.

For extracting the first OU name from a DN while removing an optional leading backslash at the same time you can use a regular expression like this:
OU=\\?(.*?), *..=.*$
Demonstration:
$dn1 = 'OU=External,OU=T1,OU=\+TE,DC=test,DC=dir'
$dn2 = 'OU=\#External,OU=T1,OU=\+TE,DC=test,DC=dir'
$dn3 = 'OU=\+External,OU=T1,OU=\+TE,DC=test,DC=dir'
$dn1 -replace 'OU=\\?(.*?), *..=.*$', '$1' # output: External
$dn2 -replace 'OU=\\?(.*?), *..=.*$', '$1' # output: #External
$dn3 -replace 'OU=\\?(.*?), *..=.*$', '$1' # output: +External

Related

How to add a line-break and a back reference in Powershell [duplicate]

I am completely clueless on how to use regex and need some help on the problem above. I need to replace <> with new lines but keep the string between <>. So
<'sample text'><'sample text 2'>
becomes
'sample text'
'sample text2'

\<([^>]*)\>
This regex will capture the text between < and > into a capture groups, which you can then reference again and put a newline between them.
\1\n
Check it out here.
EDIT:
In PowerShell
PS C:\Users\shtabriz> $string = "<'sample text'><'sample text 2'>"
PS C:\Users\shtabriz> $regex = "\<([^>]*)\>"
PS C:\Users\shtabriz> [regex]::Replace($string, $regex, '$1'+"`n")
'sample text'
'sample text 2'

This works for me in Textpad:
Example:
String:
" 1) Navigate to record. 2) Navigate to the tab and select. 3) Click the field. 4) Click on the tab and scroll."
Note: For search/replace blow, do NOT include the quotes, I used them to show the presence of a space in the search term
Search: "[0-9]+) "
Replace: "\n$0"
Resulting String:
Navigate to record.
Navigate to the tab and select.
Click the field.
Click on the tab and scroll.
(note... stackoverflow changed my ")" to a ".")

To complement Shawn Tabrizi's helpful answer with a more PowerShell-idiomatic solution and some background information:
PowerShell surfaces the functionality of the .NET System.Text.RegularExpressions.Regex.Replace() method ([regex]::Replace(), from PowerShell) via its own -replace operator.
The most concise solution (but see below for potential pitfalls):
# Note the escaped "$" ("`$")
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', "`$1`n"
Output:
'sample text'
'sample text 2'
$1 is a numbered capture-group substitution, referring to what the 1st (and only) capture group inside the regex ((...)) captured, which are the strings between < and > (.*? is a non-greedy expression that matches any run of characters but stops once the next construct, > in this case, is found).
However, inside a double-quoted string ("..."), also known as an expandable string, $1 would be interpreted as a PowerShell variable reference, so the $ character must be escaped in order to be preserved, using the backtick (`), PowerShell's general escape character: "`$1"
Conversely, if you want the .NET API not to interpret a $ character in the substitution string, use $$ (either $$ inside '...', or "`$`$" inside "...") - but note that inside the regex operand a verbatim $ must be escaped as \$.
"`n" is a PowerShell escape sequence that can be used inside expandable strings (only) - see the conceptual about_Special_Characters help topic.
Caveat:
While convenient here, there are pitfalls with respect to using expandable strings as the regexes and substitution operands, as it isn't always obvious what PowerShell expands (interpolates) up front, and what the .NET API ends up seeing as a result.
Therefore, it is generally preferable to use single-quoted strings ('...', also known as verbatim strings) - both for the substitution operand and the regex itself, and - if needed - use an expression ((...)) to build the overall string, which allows you to separate the verbatim (pass-through) parts from interpolated parts.
This is what Shawn did in his answer; translated to a -replace operation:
# Note the expression used to build the substitution string
# from a verbatim ('...') and an interpolated ("...") part.
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ('${1}' + "`n")
Another option, using -f, the format operator:
"<'sample text'><'sample text 2'>" -replace '<(.*?)>', ("{0}`n" -f '${1}')
Note the use of ${1} instead of just $1: Enclosing the number / name of the referenced capture group in {...} disambiguates it from the characters that follow, which avoids another pitfall, as the following example shows (incidentally, PowerShell's own variable references can be disambiguated the same way):
# FAILS and results in 'f$142', because the .NET API sees
# '$142' as the substitution string, and there is no 142nd capture group.
$suffix = '42'; 'foo' -replace '(oo)', ('$1' + $suffix)
# OK, with disambiguation via {...} -> 'foo42'
$suffix = '42'; 'foo' -replace '(oo)', ('${1}' + $suffix)

Regex in PowerShell to get the city name from the Managedby property in Active Directory

Can anyone help me with this. I need to derive a City name from the "managedby" attribute in Active Directory which looks like this:
CN=Marley\, Bob,OU=Users,OU=PARIS,DC=Domain,DC=com
So I need to take everything out and be left with "PARIS"
I really don't know enough about Regex but assume its going to involve using -replace in some way. I have tried following some examples on the web but I just get lost. I can remove all special characters using:
'CN=Marley\, Bob,OU=Users,OU=PARIS,DC=Domain,DC=com' -replace '[\W]', ''
But I have no idea how to clean that up further.
Any help would be greatly appreciated

Actually you don't need regex for that. If the structure of the distinguished name is always the same you can use nested -splits ... like this:
(('CN=Marley\, Bob,OU=Users,OU=PARIS,DC=Domain,DC=com' -split '=')[3] -split ',')[0]
or this:
(('CN=Marley\, Bob,OU=Users,OU=PARIS,DC=Domain,DC=com' -split ',')[-3] -split '=')[1]
I'd recommend the second version because this way you avoid confusion you can have with commas in the CN part of the distinguished name. ;-)
If you like to do it with regex anyway you can use look-arounds to extract what's between the users OU and the domain like this:
'CN=Marley\, Bob,OU=Users,OU=PARIS,DC=Domain,DC=com' -match '(?<=Users,OU=).+(?=,DC=DOmain)'
$Matches[0]

The following is a -replace-based solution that assumes that the city name follows the last ,OU= in the input string (though it wouldn't be hard to make the regex more specific).
It also supports city names with escaped , characters (\,), such as PARIS\, Texas.
$str = 'CN=Marley\, Bob,OU=Users,OU=PARIS\, Texas,DC=Domain,DC=com'
# -> 'PARIS, Texas'
$str -replace '.+,OU=(.+?),DC=.+', '$1' -replace '\\,', ','
.+,OU= greedily matches one or more (+) arbitrary characters (.) up to the last ,OU= substring in the input string.
(.+?) matches on or more subsequent characters non-greedily (+?), via a capture group (capturing subexpression, (...)).
,DC=.+ matches the next occurrence of substring ,DC followed by whatever is left in the string (.+).
Note that this means that the regex matches the entire string, so that the value of the substitution expression, $1, is the only thing returned:
$1 refers to the value of the 1st capture group, which contains the city name.
The second -replace operation unescapes the \,, i.e. turns it into , - note how the literal \ to replace had to be escaped as \\ in the regex.

Replace Wildcard Value in String with PowerShell

I am trying to simply remove a known string and an unknown number in a string from a string using the Powershell replace command and can't quite figure the syntax for a wildcard out.
My input string looks like this:
MyCatalog_AB_24.xml
However, the number is dynamic and won't always be 24.
And I need to wind up with:
MyCatalog.xml
So, I need to remove anything between MyCatalog and .xml (essentially the _AB_## part).
Here's the commands I've tried:
$_ -replace 'MyCatalog_AB_*.xml', 'MyCatalog.xml'
$_ -replace 'MyCatalog*.xml', 'MyCatalog.xml'
set num='\d'
$_ -replace 'MyCatalog_AB_%num%.xml', 'MyCatalog.xml'
I know I should be using some sort of regular expression, but I have some working code that someone else wrote that does something similar by just inserting an * where the wildcard data is.
Any help would be appreciated.

You may use
$_ -replace 'MyCatalog_AB_\d+\.xml', 'MyCatalog.xml'.
\d+ matches one or more digits, and \. matches a literal dot.

How to filter unwanted parts of a PowerShell string with Regex and replace?

I am confused about the workings of PowerShell's -replace operator in regards to its use with regex. I've looked for documentation online but can't find any that goes into more detail than basic use: it looks for a string, and replaces that string with either another string (if defined) or nothing. Great.
I want to do the same thing as the person in this question where the user wants to extract a simple program name from a complex string. Here is the code that I am trying to replicate:
$string = '% O0033(SUB RAD MSD 50R III) G91G1X-6.4Z-2.F500 G3I6.4Z-8.G3I6.4 G3R3.2X6.4F500 G91G0Z5. G91G1X-10.4 G3I10.4 G3R5.2X10.4 G90G0Z2. M99 %'
$program = $string -replace '^%\sO\d{4}\((.+?)\).+$','$1'
$program
SUB RAD MSD 50R III
As you can see the output string is the string that the user wants, and everything else is filtered out. The only difference for me is that I want a string that is composed of six digits and nothing else. However when I attempt to do it on a string with my regex, I get this:
$string2 = '1_123456_1'
$program2 = $string -replace '(\d{6})','$1'
$program2
1_123456_1
There is no change. Why is this happening? What should my code be instead? Furthermore, what is the $1 used for in the code?

The -replace operator only replaces the part of the string that matches. A capture group matches some subset of the match (or all of it), and the capture group can be referenced in the replace string as you've seen.
Your second example only ever matches that part you want to extract. So you need to ensure that you match the whole string but only capture the part you want to keep, then make the replacement string match your capture:
$string2 = '1_123456_1'
$program2 = $string -replace '\d_(\d{6})_\d','$1'
$program2
How you match "the rest of the string" is up to you; it depends on what could be contained in it. So what I did above is just one possible way. Other possible patterns:
1_(\d{6})_1
[^_]*_(\d{6})_[^_]*
^.*?(\d{6}).*?$

Capturing groups (pairs of unescaped parentheses) in the pattern are used to allow easy access to parts of a match. When you use -replace on a string, all non-overlapping substrings are matched, and these substrings are replaced/removed.
In your case, -replace '(\d{6})', '$1' means you replace the whole match (that is equal to the first capture, since you enclosed the whole pattern with a capturing group) with itself.
Use -match in cases like yours when you want to get a part of the string:
PS> $string2 = '1_123456_1'
PS> $string2 -match '[0-9]{6}'
PS> $Matches[0]
123456
The -match will get you the first match, just what you want.
Use -replace when you need to get a modified string back (reformatting a string, inserting/removing chars and suchlike).

Regex to find text between second and third slashes

I would like to capture the text that occurs after the second slash and before the third slash in a string. Example:
/ipaddress/databasename/
I need to capture only the database name. The database name might have letters, numbers, and underscores. Thanks.

How you access it depends on your language, but you'll basically just want a capture group for whatever falls between your second and third "/". Assuming your string is always in the same form as your example, this will be:
/.*/(.*)/
If multiple slashes can exist, but a slash can never exist in the database name, you'd want:
/.*/(.*?)/

/.*?/(.*?)/
In the event that your lines always have / at the end of the line:
([^/]*)/$
Alternate split method:
split("/")[2]

The regex would be:
/[^/]*/([^/]*)/
so in Perl, the regex capture statement would be something like:
($database) = $text =~ m!/[^/]*/([^/]*)/!;
Normally the / character is used to delimit regexes but since they're used as part of the match, another character can be used. Alternatively, the / character can be escaped:
($database) = $text =~ /\/[^\/]*\/([^\/]*)\//;

You can even more shorten the pattern by going this way:
[^/]+/(\w+)
Here \w includes characters like A-Z, a-z, 0-9 and _
I would suggest you to give SPLIT function a priority, since i have experienced a good performance of them over RegEx functions wherever it is possible to use them.

you can use explode function with PHP or split with other languages to so such operation.
anyways, here is regex pattern:
/[\/]*[^\/]+[\/]([^\/]+)/

I know you specifically asked for regex, but you don't really need regex for this. You simply need to split the string by delimiters (in this case a backslash), then choose the part you need (in this case, the 3rd field - the first field is empty).
cut example:
cut -d '/' -f 3 <<< "$string"
awk example:
awk -F '/' {print $3} <<< "$string"
perl expression, using split function:
(split '/', $string)[2]
etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Matching strings with and without escape characters with RegEx - regex

Related

How to add a line-break and a back reference in Powershell [duplicate]

Regex in PowerShell to get the city name from the Managedby property in Active Directory

Replace Wildcard Value in String with PowerShell

How to filter unwanted parts of a PowerShell string with Regex and replace?

Regex to find text between second and third slashes

Categories

Resources