PowerShell - regex to get string between two strings - regex

I'm not very experienced in Regex. Can you tell me how to get a string value from between two strings?
The subject will always be in this format : //subject/some_other_stuff
I need to get the string found between // and /.
For example:
Full String = //Manhattan/Project
Output = Manhattan
Any help will be very much appreciated.

You can use a negated character class and reference capturing group #1 for your match result.
//([^/]+)/
Explanation:
// # '//'
( # group and capture to \1:
[^/]+ # any character except: '/' (1 or more times)
) # end of \1
/ # '/'

You could use the below regex which uses lookarounds.
(?<=\/\/)[^\/]+(?=\/)

Since the strings are always of the same format, you can simply split them on / and then retrieve the element at index 2 (the third element):
PS > $str = "//Manhattan/Project"
PS > $str.split('/')[2]
Manhattan
PS > $str = "//subject/some_other_stuff"
PS > $str.split('/')[2]
subject
PS >

Related

Powershell Regex expression to get part of a string

I would like to take part of a string to use it elsewhere. For example, I have the following strings:
Project XYZ is the project name - 20-12-11
I would like to get the value "XYZ is the project name" from the string. The word "Project" and character "-" before the number will always be there.
I think a lookaround regular expression would work here since "Project" and "-" are always there:
(?<=Project ).+?(?= -)
A lookaround can be useful for cases that deal with getting a sub string.
Explanation:
(?<= = negative lookbehind
Project = starting string (including space)
) = closing negative lookbehind
.+? = matches anything in between
(?= = positive lookahead
- = ending string
) = closing positive lookahead
Example in PowerShell:
Function GetProjectName($InputString) {
$regExResult = $InputString | Select-String -Pattern '(?<=Project ).+?(?= -)'
$regExResult.Matches[0].Value
}
$projectName = GetProjectName -InputString "Project XYZ is the project name - 20-12-11"
Write-Host "Result = '$($projectName)'"
here is yet another regex version. [grin] it may be easier to understand since it uses somewhat basic regex patterns.
what it does ...
defines the input string
defines the prefix to match on
this will keep only what comes after it.
defines the suffix to match on
this part will keep only what is before it.
trigger the replace
the part in the () is what will be placed into the 1st capture group.
show what was kept
the code ...
$InString = 'Project XYZ is the project name - 20-12-11'
# "^" = start of string
$Prefix = '^project '
# ".+' = one or more of any character
# "$" = end of string
$Suffix = ' - .+$'
# "$1" holds the content of the 1st [and only] capture group
$OutString = $InString -replace "$Prefix(.+)$Suffix", '$1'
$OutString
# define the input string
$str = 'Project XYZ is the project name - 20-12-11'
# use regex (-match) including the .*? regex pattern
# this patterns means (.)any char, (*) any times, (?) maximum greed
# to capture (into brackets) the desired pattern substring
$str -match "(Project.*?is the project name)"
# show result (the first capturing group)
$matches[1]

Pyspark - Regex - Extract value from last brackets

I created the following regular expression with the idea of extracting the last element in brackets. See that if I only have one parenthesis it works fine, but if I have 2 parenthesis it extracts the first one (which is a mistake) or extract with the brackets .
Do you know how to solve it?
tmp= spark.createDataFrame(
[
(1, 'foo (123) oiashdj (hi)'),
(2, 'bar oiashdj (hi)'),
],
['id', 'txt']
)
tmp = tmp.withColumn("old", regexp_extract(col("txt"), "(?<=\().+?(?=\))", 0));
tmp = tmp.withColumn("new", regexp_extract(col("txt"), "\(([^)]+)\)?$", 0));
tmp.show()
+---+--------------------+---+----+
| id| txt|old| new| needed
+---+--------------------+---+----+
| 1|foo (123) oiashdj...|123|(hi)| hi
| 2| bar oiashdj (hi)| hi|(hi)| hi
+---+--------------------+---+----+
To extract the substring between parentheses with no other parentheses inside at the end of the string you may use
tmp = tmp.withColumn("new", regexp_extract(col("txt"), r"\(([^()]+)\)$", 1));
Details
\( - matches (
([^()]+) - captures into Group 1 any 1+ chars other than ( and )
\) - a ) char
$ - at the end of the string.
The 1 argument tells the regexp_extract to extract Group 1 value.
See the regex demo online.
NOTE: To allow trailing whitespace, add \s* right before $: r"\(([^()]+)\)\s*$"
NOTE2: To match the last occurrence of such a substring in a longer string, with exactly the same code as above, use
r"(?s).*\(([^()]+)\)"
The .* will grab all the text up to the end, and then backtracking will do the job.
This should work. Use it with the single line flag.
\([^\(\)]*?\)(?!.*\([^\(\)]*?\))
https://regex101.com/r/Qrnlf3/1

Replacing STX and ETB characters, in VB6, using Regex

I'm reading COM port results using a vb6 application, and I need to replace some characters, using regex expressions.
The issue is primarily this: I'm getting a lot of unnecessary characters between the "R" and "|" characters, which I'd like to remove. For this, I'm using the replace function and regex expressions, but it's not working.
This is the code I've written in vb6:
objReg.Pattern = "R.*\|"
objReg.Global = True
x$ = objReg.Replace(Text1.Text, "R|")
Input Stream:
RDA
3|4|
which is ("R" + ETB + "DA" + STX + "3|4|")
Expected Result:
R|4|
Any help in this regard would be much appreciated, thanks!
You may use
objReg.Pattern = "R[^|]+\|"
x$ = objReg.Replace(Text1.Text, "R|")
See the regex demo
The regex will match R, then one or more chars other than | (with the [^|]+ pattern) and then a literal | char. The whole match will be replaced with R|.
You may also use capturing groups with backreferences here if you need to make any more additions to the pattern:
objReg.Pattern = "(R)[^|]+(\|)"
x$ = objReg.Replace(Text1.Text, "$1$2")
The (R) group will correspond to the $1 backreference and (\|) will correspond to $2.
See another regex demo.

Regex check if a file has any extension

I am looking for a regex to test if a file has any extension. I define it as: file has an extension if there is no slashes present after the last ".". The slashes are always backslashes.
I started with this regex
.*\..*[^\\]
Which translates to
.* Any char, any number of repetitions
\. Literal .
.* Any char, any number of repetitions
[^\\] Any char that is NOT in a class of [single slash]
This is my test data (excluding ##, which is my comments)
\path\foo.txt ## I only want to capture this line
\pa.th\foo ## But my regex also captures this line <-- PROBLEM HERE
\path\foo ## This line is correctly filtered out
What would be a regex to do this?
Your solution is almost correct. Use this:
^.*\.[^\\]+$
Sample at rubular.
I wouldn't use a regular expression here. I'd split on / and ..
var path = '\some\path\foo\bar.htm',
hasExtension = path.split('\').pop().split('.').length > 1;
if (hasExtension) console.log('Weee!');
Here goes a more simple function to check it.
const hasExtension = path => {
const lastDotIndex = path.lastIndexOf('.')
return lastDotIndex > 1 && path.length - 1 > lastDotIndex
}
if (hasExtension(path)) console.log('Sweet')
You can also try even more simpler approach:
(\.[^\\]+)$
Details:
$ = Look from the end of string
[^\\]+ = Any character except path separator one or more time
\. = looks for <dot> character before extension
Live Demo

Regular Expressions: querystring parameters matching

I'm trying to learn something about regular expressions.
Here is what I'm going to match:
/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
My expression should "grabs" abc123 and def456.
And now just an example about what I'm not going to match ("question mark" is missing):
/parent/child/firstparam=abc123&secondparam=def456
Well, I built the following expression:
^(?:/parent/child){1}(?:^(?:/\?|\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?
But that doesn't work.
Could you help me to understand what I'm doing wrong?
Thanks in advance.
UPDATE 1
Ok, I made other tests.
I'm trying to fix the previous version with something like this:
/parent/child(?:(?:\?|/\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?$
Let me explain my idea:
Must start with /parent/child:
/parent/child
Following group is optional
(?: ... )?
The previous optional group must starts with ? or /?
(?:\?|/\?)+
Optional parameters (I grab values if specified parameters are part of querystring)
(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?
End of line
$
Any advice?
UPDATE 2
My solution must be based just on regular expressions.
Just for example, I previously wrote the following one:
/parent/child(?:[?&/]*(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*))*$
And that works pretty nice.
But it matches the following input too:
/parent/child/firstparam=abc123&secondparam=def456
How could I modify the expression in order to not match the previous string?
You didn't specify a language so I'll just usre Perl. So basically instead of matching everything, I just matched exactly what I thought you needed. Correct me if I am wrong please.
while ($subject =~ m/(?<==)\w+?(?=&|\W|$)/g) {
# matched text = $&
}
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
= # Match the character “=” literally
)
\\w # Match a single character that is a “word character” (letters, digits, and underscores)
+? # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
& # Match the character “&” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
\\W # Match a single character that is a “non-word character”
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
Output:
This regex will work as long as you know what your parameter names are going to be and you're sure that they won't change.
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?
Whilst regex is not the best solution for this (the above code examples will be far more efficient, as string functions are way faster than regexes) this will work if you need a regex solution with up to 3 parameters. Out of interest, why must the solution use only regex?
In any case, this regex will match the following strings:
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
It will now only match those containing query string parameters, and put them into capture groups for you.
What language are you using to process your matches?
If you are using preg_match with PHP, you can get the whole match as well as capture groups in an array with
preg_match($regex, $string, $matches);
Then you can access the whole match with $matches[0] and the rest with $matches[1], $matches[2], etc.
If you want to add additional parameters you'll also need to add them in the regex too, and add additional parts to get your data. For example, if you had
/parent/child/?secondparam=def456&firstparam=abc123&fourthparam=jkl01112&thirdparam=ghi789
The regex will become
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?
This will become a bit more tedious to maintain as you add more parameters, though.
You can optionally include ^ $ at the start and end if the multi-line flag is enabled. If you also need to match the whole lines without query strings, wrap this whole regex in a non-capture group (including ^ $) and add
|(?:^\/parent\/child\/?\??$)
to the end.
You're not escaping the /s in your regex for starters and using {1} for a single repetition of something is unnecessary; you only use those when you want more than one repetition or a range of repetitions.
And part of what you're trying to do is simply not a good use of a regex. I'll show you an easier way to deal with that: you want to use something like split and put the information into a hash that you can check the contents of later. Because you didn't specify a language, I'm just going to use Perl for my example, but every language I know with regexes also has easy access to hashes and something like split, so this should be easy enough to port:
# I picked an example to show how this works.
my $route = '/parent/child/?first=123&second=345&third=678';
my %params; # I'm going to put those URL parameters in this hash.
# Perl has a way to let me avoid escaping the /s, but I wanted an example that
# works in other languages too.
if ($route =~ m/\/parent\/child\/\?(.*)/) { # Use the regex for this part
print "Matched route.\n";
# But NOT for this part.
my $query = $1; # $1 is a Perl thing. It contains what (.*) matched above.
my #items = split '&', $query; # Each item is something like param=123
foreach my $item (#items) {
my ($param, $value) = split '=', $item;
$params{$param} = $value; # Put the parameters in a hash for easy access.
print "$param set to $value \n";
}
}
# Now you can check the parameter values and do whatever you need to with them.
# And you can add new parameters whenever you want, etc.
if ($params{'first'} eq '123') {
# Do whatever
}
My solution:
/(?:\w+/)*(?:(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)?|\w+|)
Explain:
/(?:\w+/)* match /parent/child/ or /parent/
(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)? match child?firstparam=abc123 or ?firstparam=abc123 or ?
\w+ match text like child
..|) match nothing(empty)
If you need only query string, pattern would reduce such as:
/(?:\w+/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)
If you want to get every parameter from query string, this is a Ruby sample:
re = /\/(?:\w+\/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)/
s = '/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789'
if m = s.match(re)
query_str = m[1] # now, you can 100% trust this string
query_str.scan(/(\w+)=(\w+)/) do |param,value| #grab parameter
printf("%s, %s\n", param, value)
end
end
output
secondparam, def456
firstparam, abc123
thirdparam, ghi789
This script will help you.
First, i check, is there any symbol like ?.
Then, i kill first part of line (left from ?).
Next, i split line by &, where each value splitted by =.
my $r = q"/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789";
for my $string(split /\n/, $r){
if (index($string,'?')!=-1){
substr($string, 0, index($string,'?')+1,"");
#say "string = ".$string;
if (index($string,'=')!=-1){
my #params = map{$_ = [split /=/, $_];}split/\&/, $string;
$"="\n";
say "$_->[0] === $_->[1]" for (#params);
say "######next########";
}
else{
#print "there is no params!"
}
}
else{
#say "there is no params!";
}
}