Removing specific words from a text string? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
So say you have a variable string like: "Report to Sam.Smith"
What's the best way for you to remove the words 'Report' and 'to' leaving only Sam.Smith using Powershell??

You have to use -replace :
$string = "Report to Sam.Smith"
$string = $string -replace "Report to ",""
$string # Output --> "Sam.Smith"
Or like this :
$string = "Report to Sam.Smith"
$string = $string.replace("Report to ","")
$string # Output --> "Sam.Smith"
But if you need to use Regex because the string's words can vary then you have to rethink the problem.
You won't be looking to erase a part of the string but to extract something out of it.
In you case, I think that you're looking for a username using a name.lastname format which is pretty easy to capture :
$string = "Report to Sam.Smith"
$string -match "\s(\w*\.\w*)"
$Matches[1] # Output --> Sam.Smith
Using -match will return True / False.
If it does return True, an array named $Matches will be created. It will contains on index 0 ($Matches[0]) the whole string that matched the regex.
Every others index greater than 0 will contains the captured text from the regex parenthesis called "capture group".
I would highly recommend using an if statement because if your regex return false, the array $Matches won't exist :
$string = "Report to Sam.Smith"
if($string -match "\s(\w*\.\w*)") {
$Matches[1] # Output --> Sam.Smith
}

Related

Applying Filters in Perl using Regex

I'm trying to extract text and numbers from a string using regex in perl. Here is my code:
$line = "finish=100\n";
($var) = $line =~ /[a-z]+/;
($val) = $line =~ /[0-9]+/;
My expected output is that $var = "finish" and $val = 100. However when I run the code $var = 1 and $val = 1.
Any help would be appreciated!!
Use capturing parentheses inside your regular expressions:
$line = "finish=100\n";
($var) = $line =~ /([a-z]+)/;
($val) = $line =~ /([0-9]+)/;
print "$var $val\n";
Refer to perlre
A regex match in list context (where the regex doesn't use the /g flag) returns
the empty list if it fails
a list of captured substrings ($1, $2, ...) if it succeeds and the pattern contains capturing groups
the list 1 if it succeeds and the pattern doesn't capture anything
Your regexes match, but they don't contain any capturing groups, so that's why you get 1 in $var and $val.
If you add capturing groups (/([a-z]+)/, /([0-9]+)/), you get the matched substrings instead.
Note that it might be easier to just do it all in one match:
my ($var, $val) = $line =~ /^([a-z]+)=([0-9]+)$/;
This way you also validate that the input string has the expected form and isn't just something like "Cat o' 9 tails", which (with your original regexes) would extract $var = "at" and $val = "9".
You can too get two values in one array, maybe with this:
$line = "finish=100\n";
#matches = $line =~ /(\w+)\W(\d+)/;
print "$matches[0], $matches[1]";

Regex Word Boundary in Perl not yield expected results

So I'm having an issue with pulling data from a string between 2 keywords. I understand that in regex I'm suppose to use the \b boundary tags and I've written the following for a test example, however it seems to only match the whole string instead of just the portion I want.
For example, the string: "here are more string words START OF INFORMATION SECTION some other stuff"
I am gathering text between "START" and "SECTION".
So I'm expecting "START OF INFORMATION SECTION", I believe.
This is the following snippet I have written in Perl specifically, but it doesn't yield the results I expected.
#!/usr/bin/perl
# This is perl 5, version 22, subversion 1 (v5.22.1) built for cygwin-thread-multi
use POSIX;
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
# this should provide me with the specific text between my two boundary words
$text =~ /\bSTART\b(.*?)\bSECTION\b/;
print "New String: $text\n";
Your code is simply testing whether the regex pattern matches the string, returning a true or false value to indicate whether there was a match. You discard that indicator
If there was a match then the strings captured using parentheses in the regex pattern will be assigned to the capture variables $1, $2 etc.
It's unclear what you need to do, but this program prints everything between START and SECTION: in this case OF INFORMATION
There's no need for use POSIX, but use strict and use warnings 'all' are essential
#!/usr/bin/perl
use strict;
use warnings 'all';
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
output
Original String: here are more string words START OF INFORMATION SECTION some other stuff
New String: OF INFORMATION
You should use this
$text =~ /\b(START\b(.*?)\bSECTION)\b/;
print "New String: $1\n";
IDEONE DEMO
$1 is the first captured group.
As suggested by borodin
if ( $text =~ /\b(START\b(.*?)\bSECTION)\b/ ) {
my $tmp = $1;
print "New String: $tmp\n";
}
The match operator doesn't change the string it matches.
You can use either of the following to inspect the captured string:
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
or
if ( my ($section) = $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
print "New String: $section\n";
}

Extract string after a symbol in Perl

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.
That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.
I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;
You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Extracting specific data from a string with regex and Powershell

I want to extract from this string
blocked-process-report&#x0Aprocess id="process435d948" taskpriority="0" logused="0" waitresource="RID: 7:1:1132932:0" waittime=
"3962166" ownerId="4641198" transactionname="SELECT" lasttranstarted="2011-09-13T17:21:54.950" XDES="0x80c5f060" lockMode="S" schedulerid="4" kpid="18444" status="susp
ended" spid="58" sbid="0" ecid="0"
The value that is in bold, but only the value or 58. And this value can be with different values, sometimes 80 or 1000, etc. but always > 50.
How can I do this using regex and posh?
The quick and dirty:
$found = $string -match '.*spid="(\d+)".*'
if ($found) {
$spid = $matches[1]
}
where $string is your above mentioned string. This would match any string that has spid="somenumberhere", and make the number into a matched group, which you can extract using $matches[1].
Save that as, say $string.
Then do
$string -match 'spid="(\d+)"'
If there is a match, the value you want will be in $matches[1]