Extracting specific data from a string with regex and Powershell - regex

I want to extract from this string
blocked-process-report&#x0Aprocess id="process435d948" taskpriority="0" logused="0" waitresource="RID: 7:1:1132932:0" waittime=
"3962166" ownerId="4641198" transactionname="SELECT" lasttranstarted="2011-09-13T17:21:54.950" XDES="0x80c5f060" lockMode="S" schedulerid="4" kpid="18444" status="susp
ended" spid="58" sbid="0" ecid="0"
The value that is in bold, but only the value or 58. And this value can be with different values, sometimes 80 or 1000, etc. but always > 50.
How can I do this using regex and posh?

The quick and dirty:
$found = $string -match '.*spid="(\d+)".*'
if ($found) {
$spid = $matches[1]
}
where $string is your above mentioned string. This would match any string that has spid="somenumberhere", and make the number into a matched group, which you can extract using $matches[1].

Save that as, say $string.
Then do
$string -match 'spid="(\d+)"'
If there is a match, the value you want will be in $matches[1]

Related

Removing specific words from a text string? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
So say you have a variable string like: "Report to Sam.Smith"
What's the best way for you to remove the words 'Report' and 'to' leaving only Sam.Smith using Powershell??
You have to use -replace :
$string = "Report to Sam.Smith"
$string = $string -replace "Report to ",""
$string # Output --> "Sam.Smith"
Or like this :
$string = "Report to Sam.Smith"
$string = $string.replace("Report to ","")
$string # Output --> "Sam.Smith"
But if you need to use Regex because the string's words can vary then you have to rethink the problem.
You won't be looking to erase a part of the string but to extract something out of it.
In you case, I think that you're looking for a username using a name.lastname format which is pretty easy to capture :
$string = "Report to Sam.Smith"
$string -match "\s(\w*\.\w*)"
$Matches[1] # Output --> Sam.Smith
Using -match will return True / False.
If it does return True, an array named $Matches will be created. It will contains on index 0 ($Matches[0]) the whole string that matched the regex.
Every others index greater than 0 will contains the captured text from the regex parenthesis called "capture group".
I would highly recommend using an if statement because if your regex return false, the array $Matches won't exist :
$string = "Report to Sam.Smith"
if($string -match "\s(\w*\.\w*)") {
$Matches[1] # Output --> Sam.Smith
}

Trim More than 20 Characters

I am working on a script that will generate AD usernames based off of a csv file. Right now I have the following line working.
Select-Object #{n=’Username’;e={$_.FirstName.ToLower() + $_.LastName.ToLower() -replace "[^a-zA-Z]" }}
As of right now this takes the name and combines it into a AD friendly name. However I need to name to be shorted to no more than 20 characters. I have tried a few different methods to shorten the username but I haven't had any luck.
Any ideas on how I can get the username shorted?
Probably the most elegant approach is to use a positive lookbehind in your replacement:
... -replace '(?<=^.{20}).*'
This expression matches the remainder of the string only if it is preceded by 20 characters at the beginning of the string (^.{20}).
Another option would be a replacement with a capturing group on the first 20 characters:
... -replace '^(.{20}).*', '$1'
This captures at most 20 characters at the beginning of the string and replaces the whole string with just the captured group ($1).
$str[0..19] -join ''
e.g.
PS C:\> 'ab'[0..19]
ab
PS C:\> 'abcdefghijklmnopqrstuvwxyz'[0..19] -join ''
abcdefghijklmnopqrst
Which I would try in your line as:
Select-Object #{n=’Username’;e={(($_.FirstName + $_.LastName) -replace "[^a-z]").ToLower()[0..19] -join '' }}
([a-z] because PowerShell regex matches are case in-senstive, and moving .ToLower() so you only need to call it once).
And if you are using Strict-Mode, then why not check the length to avoid going outside the bounds of the array with the delightful:
$str[0..[math]::Min($str.Length, 19)] -join ''
To truncate a string in PowerShell, you can use the .NET String::Substring method. The following line will return the first $targetLength characters of $str, or the whole string if $str is shorter than that.
if ($str.Length -gt $targetLength) { $str.Substring(0, $targetLength) } else { $str }
If you prefer a regex solution, the following works (thanks to #PetSerAl)
$str -replace "(?<=.{$targetLength}).*"
A quick measurement shows the regex method to be about 70% slower than the substring method (942ms versus 557ms on a 200,000 line logfile)

Powershell regex to remove everything except key

Given a string like
'The fingerprint is: ABCDEFGHIJKLMNOPQRSTUVWXYZ12345678910111'
How would you remove all text that isn't a 40 character string consisting of A-Z 0-9 ?
Currently I'm looking for the string 'The fingerprint is: ' and removing it, but I feel it would be safer to look for a 40 character alphanumeric.
$foo = $foo -replace 'The fingerprint is: ',''
I expect something like this to work, but no luck.
$foo = $foo -creplace '^[A-Z0-9]{40}',''
I've also tried just looking for the characters that match
$foo = $foo -match '[A-Z0-9]{40}'
Depends a bit, but if it's 40 contiguous and it's the only 40 character string you could use replace:
"The fingerprint is: ABCDEFGHIJKLMNOPQRSTUVWXYZ12345678910111" -replace '.*([A-Z0-9]{40}).*', '$1'
Note: The replacement, $1, is a reference to the match group. It is not a PowerShell variable and is deliberately written in single quotes so it will not expand.
To match the 40 character alphanumeric with no replacement, this
$foo = 'The fingerprint is: ABCDEFGHIJKLMNOPQRSTUVWXYZ12345678910111';
$foo -Match '[A-Z0-9]{40}' | Out-Null;
Write-Output $matches[0];
prints
ABCDEFGHIJKLMNOPQRSTUVWXYZ12345678910111
[A-Z0-9] is a bracket expression matching any of the contained characters (- denotes a range of values)
{40} matches the previous element 40 times
Out-Null suppresses the boolean return value of the -match operator

Extract string after a symbol in Perl

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.
That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.
I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;
You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

Replace only the second occurance of string in a line in perl regex

I have a string like "ven|ven|vett|vejj|ven|ven". Treat each "|" delimiter for each column.
By splitting the string with "|" saving all the columns in array and reading each column into $str
So, I'm trying to do this as
$string =~ s/$str/venky/g if $str =~ /ven/i; # it will do globally.
Which not met the requirement.
On-demand basis, I need to replace string at the particular number of occurrence of the string.
For example, I've a request to change 2nd occurrence of "ven" to venky.
Then how can I met this requirement simply? Is it some-thing like
$string =~ s/ven/venky/2;
As of my knowledge we have 'o' for replace once and 'g' for globally. I'm struggling for the solution to get the replacement at particular occurrence. And I should not use pos() to get the position, because string keeps on change. It becomes difficult to trace it every-time. That's my intention.
Please help me on this regard.
There is no flag that you can add to the regex that will do this.
The easiest way would be to split and loop. However, if you insist to use one regex, it is doable:
/^(?:[^v]|v[^e]|ve[^n])*ven(?:[^v]|v[^e]|ve[^n])*\Kven/
If you want to replace the Nth occurrence instead of the second, you can do:
/^(?:(?:[^v]|v[^e]|ve[^n])*ven){N-1}(?:[^v]|v[^e]|ve[^n])*\Kven/
The general idea:
(?:[^v]|v[^e]|ve[^n])* - matches any string that isn't part of ven
\K is a cool matcher that drops everything matched so far, so you can sort of use it as a lookbehind with variable length
Currently you're replacing every instance of'ven' with 'venky' if your string contains a match for ven, which of course it does.
What I assume you're trying to do is to substitute 'ven' for 'venky' within your string if it's the second element:
my $string = 'ven|ven|vett|vejj|ven|ven';
my #elements = split(/\|/, $string);
my $count;
foreach (#elements){
$count++;
s/$_/venky/g if /ven/i and $count == 2;
}
print join('|', #elements);
print "\n";
Your approach was already pretty good. What you described makes sense, but I think you are having trouble implementing it.
I created a function to do the work. It takes 4 arguments:
$string is the string we want to work on
$n is the nth occurance you want to replace
$needle is the thing you want to replace – thing needle in a haystack
Note that right now we allow to pass stuff that might contain regular expressions. So you would have to use quotemeta on it or match with /\Q$needle\E/
$replacement is the replacement for the $needle
The idea is to split up the string, then check each element if it matches the pattern ($needle) and keep track of how many have matched. If the nth one is reached, replace it and stop processing. Then put the string back together.
use strict;
use warnings;
use feature 'say';
say replace_nth_occurance("ven|ven|vett|vejj|ven|ven", 2, 'ven', 'venky');
sub replace_nth_occurance {
my ($string, $n, $needle, $replacement) = #_;
# take the string appart
my #elements = split /\|/, $string;
my $count = 0; # keep track of ...
foreach my $e (#elements) {
$count++ if $e =~ m/$needle/; # ... how many matches we've found
if ($count == $n) {
$e =~ s/$needle/$replacement/; # replace
last; # and stop processing
}
}
# put it back into the pipe-separated format
return join '|', #elements;
}
Output:
ven|venky|vett|vejj|ven|ven
To replace the n'th occurrence of "ven" to "venky":
my $n = 3;
my $test = "seven given ravens";
$test =~ s/ven/--$n == 0 ? "venky" : $&/eg;
This uses the ability with the /e flag to specify the substitution part as an expression.