Regex That Pulls Certain Bits From a String - regex

So I am trying to work with Regular Expression the string I have is
Successfully created package 'C:\Users\mhopper\Documents\CreateNugetPackage\AjaxControlToolkit.3.5.50401.nupkg'
I am trying to make a regular expression that pulls "Successfully" and "C:\Users\mhopper\Documents\CreateNugetPackage\AjaxControlToolkit.3.5.50401.nupkg"
I haven't used Regular Expression a lot and what I'm doing isn't working, what I have so far is
'.*(Successfully\.*\C\D+).*', '$1'

Regex:
^(\S*)\s.*'(.*)'$
#1 match is the status
#2 match is the path
https://regex101.com/r/eC7vX1/1
Powershell:
$line = "Successfully created package 'C:\Users\mhopper\Documents\CreateNugetPackage\AjaxControlToolkit.3.5.50401.nupkg'"
$values = $line -split "^(\S*)\s.*'(.*)'$"
$status = $values[1]
$path = $values[2]
("status:{0}\npath:{1}" -f $status,$path)

You need to be a little more specific about why you need the regex.
Just getting those two values from the string doesn't really need a regex.
$Status,$PackagePath = ($String.Trim().Split(' ',4))[0,3]

Related

Extracting url from a string with regex and Powershell

I'm using powershell and regex. I'm scraping a web page result to a variable, but I can't seem to extract a generated url from that variable.
this is the content (the actual url varies):
"https://api16-something-c-text.sitename.com/aweme/v2/going/?video_id=v12044gd0666c8ohtdbc77u5ov2cqqd0&
$reg = "([^&]*)&;$" always returns false.
I've been trying -match and Select-String with regex but I'm in need of guidance.
I suggest using a -replace operation:
$str = '"https://api16-something-c-text.sitename.com/aweme/v2/going/?video_id=v12044gd0666c8ohtdbc77u5ov2cqqd0&'
$str -replace '^"(.+)&$', '$1'
It really depends on what format the content is in.
(?<=\") looks behind "&quot" for (.*?) which any numbers of non-newline characters and then looks ahead for (?=\&) which is "&".
Here's a fair start:
$pattern = "(?<=\")(.*?)(?=\&)"
$someText = ""https://api16-something-c-text.sitename.com/aweme/v2/going/?video_id=v12044gd0666c8ohtdbc77u5ov2cqqd0&"
$newText = [regex]::match($someText, $pattern)
$newText.Value
Returns:
https://api16-something-c-text.sitename.com/aweme/v2/going/?video_id=v12044gd0666c8ohtdbc77u5ov2cqqd0

Complex regex - works in Powershell, not in Bash

The below code is a small portion of my code for Solarwinds to parse the output of a Netbackup command. This is fine for our Windows boxes but some of our boxes are RHEL.
I'm trying to convert the below code into something useable on RHEL 4.X but I'm running into a wall with parsing the regex. Obviously the below code has some of the characters escaped for use with Powershell, I have unescaped those characters for use with Shell.
I'm not great with Shell yet, but I will post a portion of my Shell code below the Powershell code.
$output = ./bpdbjobs
$Results = #()
$ColumnName = #()
foreach ($match in $OUTPUT) {
$matches = $null
$match -match "(?<jobID>\d+)?\s+(?<Type>(\b[^\d\W]+\b)|(\b[^\d\W]+\b\s+\b[^\d\W]+\b))?\s+(?<State>(Done)|(Active)|(\w+`-\w+`-\w+))?\s+(?<Status>\d+)?\s+(?<Policy>(\w+)|(\w+`_\w+)|(\w+`_\w+`_\w+))?\s+(?<Schedule>(\b[^\d\W]+\b\-\b[^\d\W]+\b)|(\-)|(\b[^\d\W]+\b))?\s+(?<Client>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Dest_Media_Svr>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Active_PID>\d+)?\s+(?<FATPipe>\b[^\d\W]+\b)?"
$Results+=$matches
}
The below is a small portion of Shell code I've written (which is clearly very wrong, learning as I go here). I'm just using this to test the Regex and see if it functions in Shell - (Spoiler alert) it does not.
#!/bin/bash
#
backups=bpdbjobs
results=()
for results in $backups; do
[[ $results =~ /(?<jobID>\d+)?\s+(?<Type>(\b[^\d\W]+\b)|(\b[^\d\W]+\b\s+\b[^\d\W]+\b))?\s+(?<State>(Done)|(Active)|(\w+\w+\-\w\-+))?\s+(?<Status>\d+)?\s+(?<Policy>(\w+)|(\w+\_\w+)|(\w+\_\w+\_\w+))?\s+(?<Schedule>(\b[^\d\W]+\b\-\b[^\d\W]+\b)|(\-)|(\b[^\d\W]+\b))?\s+(?<Client>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Dest_Media_Svr>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Active_PID>\d+)?/ ]]
done
$results
Below are the errors I get.
./netbackupsolarwinds.sh: line 9: syntax error in conditional expression: unexpected token `('
./netbackupsolarwinds.sh: line 9: syntax error near `/(?'
./netbackupsolarwinds.sh: line 9: ` [[ $results =~ /(?<jobID>\d+)?\s+(?<Type>(\b[^\d\W]+\b)|(\b[^\d\W]+\b\s+\b[^\d\W]+\b))?\s+(?<State>(Done)|(Active)|(\w+\w+\-\w\-+))?\s+(?<Status>\d+)?\s+(?<Policy>(\w+)|(\w+\_\w+)|(\w+\_\w+\_\w+))?\s+(?<Schedule>(\b[^\d\W]+\b\-\b[^\d\W]+\b)|(\-)|(\b[^\d\W]+\b))?\s+(?<Client>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Dest_Media_Svr>(\w+\.\w+\.\w+)|(\w+))?\s+(?<Active_PID>\d+)?/ ]]'
From man bash:
An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).
Meaning that the expression is parsed as a POSIX extended regular expression, which AFAIK does not support either named capturing groups ((?<name>...)) or character escapes (\d, \w, \s, ...).
If you want to use [[ $var =~ expr ]] you need to rewrite the regular expression. Otherwise use grep (which supports PCRE):
grep -P '(?<jobID>\d+)?\s+...' <<<$results
Updated answer, after comments exchange.
The best way to perform your migration quickly is to use the --perl-regexp Perl compatibility option of Grep, like eventually suggested in another answer.
If you still want to perform this operation with pure Bash, you need to rewrite the regular expression accordingly, following the documentation.
Thanks all for the answers. I swapped to Grep -P to no avail, turns out the named capture groups were the problem for Grep -P.
I was also unable to figure out a way to use Grep to output the capture group matches to individual variables.
This lead me to swap over to using perl, as follows, with alterations to my regex.
bpdbjobs | perl -lne 'print "$1" if /(\d+)?\s+((\b[^\d\W]+\b)|(\b[^\d\W]+\b\s+\b[^\d\W]+\b))?\s+((Done)|(Active)|(\w+\w+\-\w\-+))?\s+(\d+)?\s+((\w+)|(\w+\_\w+)|(\w+\_\w+\_\w+))?\s+((b[^\d\W]+\b\-\b[^\d\W]+\b)|(\-)|(\b[^\d\W]+\b))?\s+((\w+\.\w+\.\w+)|(\w+))?\s+((\w+\.\w+\.\w+)|(\w+))?\s+(\d+)?/g'
With $<num> referring to the capture group number. I can now list, display and (the important part) count the number of matches within an individual group, corresponding to the data found in each column.

Perl regular expression pattern matching the urls

I have an old perl code which I need to improvise it by debugging it in apache server but it has some regular expressions in it which I am not able to figure out exactly as I am new to perl. Could some one please explain what does the following code do?
my $target = " ";
$target = $1 if( $url =~ m|^$shorturl(\/.*)$|);
Here,
url is http://127.0.0.1/test.pl/content/dist/hale_bopp_2.mpg
shorturl is http://127.0.0.1/test.pl
Is extracts the "path info" component of the URL, the extra segments of the path after the path to the script.
http://127.0.0.1/test.pl/content/dist/hale_bopp_2.mpg
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(It should really be $target = unescape_uri($1) to handle escaped characters.)
From the language perspective, it matches $url with regexp enclosed in m| | and if it matches, put first capture (part of regex in parens) into $target.

Powershell to get a DLL name out of it's full path

I have a string "....\xyz\abc\0.0\abc.def.ghi.jkl.dll" am trying to get the value of a "abc.def.ghi.jkl.dll" into a variable using powershell.
I am totally new to regex and PS and kinda confused on how to get this done. I read various posts about regex and I am unable to get anything to work
Here is my code,
$str = "..\..\xyz\abc\0.0\abc.def.ghi.jkl.dll"
$regex = [regex] '(?is)(?<=\b\\b).*?(?=\b.dll\b)'
$result = $regex.Matches($str)
Write-Host $result
I would like to get "abc.def.ghi.jkl.dll" into $result. Could someone please help me out
You can use the following regex:
(?is)(?<=\\)[^\\]+\.dll\b
See regex demo
And no need to use Matches, just use a -match (or Match).
Explanation:
(?<=\\) - make sure there is a \ right before the current position in string
[^\\]+ - match 1 or more characters other than \
\.dll\b - match a . symbol followed by 3 letters dll that are followed by a trailing word boundary.
Powershell:
$str = "..\..\xyz\abc\0.0\abc.def.ghi.jkl.dll"
[regex]$regex = "(?is)(?<=\\)[^\\]+\.dll\b"
$match = $regex.match($str)
$result = ""
if ($match.Success)
{
$result = $match.Value
Write-Host $result
}

perl replacing serialized strings from sql dump

I'm having to replace fqdn's inside a SQL dump for website migration purposes. I've written a perl filter that's supposed to take STDIN, replace the serialized strings containing the domain name that's supposed to be replaced, replace it with whatever argument is passed into the script, and output to STDOUT.
This is what I have so far:
my $search = $ARGV[0];
my $replace = $ARGV[1];
my $offset_s = length($search);
my $offset_r = length($replace);
my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };
while (<STDIN>) {
my #fs = split(';', $_);
foreach (#fs) {
chomp;
if (m#$regex#g) {
my ( $len, $extra, $str ) = ( $1, $2, $3 );
my $new_len = $len - $offset_s + $offset_r;
$str =~ eval { s/$search/$replace/ };
print 's:' . $new_len . ':' . $extra . $str . '\"'."\n";
}
}
}
The filter gets passed data that may look like this (this is taken from a wordpress dump, but we're also supposed to accommodate drupal dumps:
INSERT INTO `wp_2_options` VALUES (1,'siteurl','http://to.be.replaced.com/wordpress/','yes'),(125,'dashboard_widget_options','
a:2:{
s:25:\"dashboard_recent_comments\";a:1:{
s:5:\"items\";i:5;
}
s:24:\"dashboard_incoming_links\";a:2:{
s:4:\"home\";s:31:\"http://to.be.replaced.com/wordpress\";
s:4:\"link\";s:107:\"http://blogsearch.google.com/blogsearch?scoring=d&partner=wordpress&q=link:http://to.be.replaced.com/wordpress/\";
}
}
','yes'),(148,'theme_175','
a:1:{
s:13:\"courses_image\";s:37:\"http://to.be.replaced.com/files/image.png\";
}
','yes')
The regex works if I don't have any periods in my $search. I've tried escaping the periods, i.e. domain\.to\.be\.replaced, but that didn't work. I'm probably doing this either in a very roundabout way or missing something obvious. Any help would be greatly appreciated.
There is no need to evaluate (eval) your regular expression because of including variables in them. Also, to avoid the special meaning of metacharacters of those variables like $search, escape them using quotemeta() function or including the variable between \Q and \E inside the regexp. So instead of:
my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };
Use:
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)(\Q$search\E.*)\\\"};
or
my $quoted_search = quotemeta $search;
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)($quoted_search.*)\\\"};
And the same advice for this line:
$str =~ eval { s/$search/$replace/ };
you have to double the escape char \ in your $search variable for the interpolated string to contain the escaped periods.
i.e. domain\.to\.be\.replaced -> domain.to.be.replaced (not wanted)
while domain\\.to\\.be\\.replaced -> domain\.to\.be\.replaced (correct).
I'm not sure your perl regex would replace the DNS in string matching several times the old DNS (in the same serialized string).
I made a gist with a script using bash, sed and one big perl regex for this same problem. You may give it a try.
The regex I use is something like that (exploded for lisibility, and having -7 as the known difference between domain names lengths):
perl -n -p -i -e '1 while s#
([;|{]s:)
([0-9]+)
:\\"
(((?!\\";).)*?)
(domain\.to\.be\.replaced)
(.*?)
\\";#"$1".($2-7).":\\\"$3new.domain.tld$6\\\";"#ge;' file
Which is maybe not the best one but at least it seems to de the job. The g option manages lines containing several serialized strings to cleanup and the while loop redo the whole job until no replacement occurs in serilized strings (for strings containing several occurences of the DNS). I'm not fan enough of regex to try a recursive one.