Removing specific words from a text string? [duplicate]

Removing specific words from a text string? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
So say you have a variable string like: "Report to Sam.Smith"
What's the best way for you to remove the words 'Report' and 'to' leaving only Sam.Smith using Powershell??

You have to use -replace :
$string = "Report to Sam.Smith"
$string = $string -replace "Report to ",""
$string # Output --> "Sam.Smith"
Or like this :
$string = "Report to Sam.Smith"
$string = $string.replace("Report to ","")
$string # Output --> "Sam.Smith"
But if you need to use Regex because the string's words can vary then you have to rethink the problem.
You won't be looking to erase a part of the string but to extract something out of it.
In you case, I think that you're looking for a username using a name.lastname format which is pretty easy to capture :
$string = "Report to Sam.Smith"
$string -match "\s(\w*\.\w*)"
$Matches[1] # Output --> Sam.Smith
Using -match will return True / False.
If it does return True, an array named $Matches will be created. It will contains on index 0 ($Matches[0]) the whole string that matched the regex.
Every others index greater than 0 will contains the captured text from the regex parenthesis called "capture group".
I would highly recommend using an if statement because if your regex return false, the array $Matches won't exist :
$string = "Report to Sam.Smith"
if($string -match "\s(\w*\.\w*)") {
$Matches[1] # Output --> Sam.Smith
}

Related

Applying Filters in Perl using Regex

I'm trying to extract text and numbers from a string using regex in perl. Here is my code:
$line = "finish=100\n";
($var) = $line =~ /[a-z]+/;
($val) = $line =~ /[0-9]+/;
My expected output is that $var = "finish" and $val = 100. However when I run the code $var = 1 and $val = 1.
Any help would be appreciated!!

Use capturing parentheses inside your regular expressions:
$line = "finish=100\n";
($var) = $line =~ /([a-z]+)/;
($val) = $line =~ /([0-9]+)/;
print "$var $val\n";
Refer to perlre

A regex match in list context (where the regex doesn't use the /g flag) returns
the empty list if it fails
a list of captured substrings ($1, $2, ...) if it succeeds and the pattern contains capturing groups
the list 1 if it succeeds and the pattern doesn't capture anything
Your regexes match, but they don't contain any capturing groups, so that's why you get 1 in $var and $val.
If you add capturing groups (/([a-z]+)/, /([0-9]+)/), you get the matched substrings instead.
Note that it might be easier to just do it all in one match:
my ($var, $val) = $line =~ /^([a-z]+)=([0-9]+)$/;
This way you also validate that the input string has the expected form and isn't just something like "Cat o' 9 tails", which (with your original regexes) would extract $var = "at" and $val = "9".

You can too get two values in one array, maybe with this:
$line = "finish=100\n";
#matches = $line =~ /(\w+)\W(\d+)/;
print "$matches[0], $matches[1]";

Regex Word Boundary in Perl not yield expected results

So I'm having an issue with pulling data from a string between 2 keywords. I understand that in regex I'm suppose to use the \b boundary tags and I've written the following for a test example, however it seems to only match the whole string instead of just the portion I want.
For example, the string: "here are more string words START OF INFORMATION SECTION some other stuff"
I am gathering text between "START" and "SECTION".
So I'm expecting "START OF INFORMATION SECTION", I believe.
This is the following snippet I have written in Perl specifically, but it doesn't yield the results I expected.
#!/usr/bin/perl
# This is perl 5, version 22, subversion 1 (v5.22.1) built for cygwin-thread-multi
use POSIX;
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
# this should provide me with the specific text between my two boundary words
$text =~ /\bSTART\b(.*?)\bSECTION\b/;
print "New String: $text\n";

Your code is simply testing whether the regex pattern matches the string, returning a true or false value to indicate whether there was a match. You discard that indicator
If there was a match then the strings captured using parentheses in the regex pattern will be assigned to the capture variables $1, $2 etc.
It's unclear what you need to do, but this program prints everything between START and SECTION: in this case OF INFORMATION
There's no need for use POSIX, but use strict and use warnings 'all' are essential
#!/usr/bin/perl
use strict;
use warnings 'all';
my $text = "here are more string words START OF INFORMATION SECTION some other stuff";
print "Original String: $text\n";
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
output
Original String: here are more string words START OF INFORMATION SECTION some other stuff
New String: OF INFORMATION

You should use this
$text =~ /\b(START\b(.*?)\bSECTION)\b/;
print "New String: $1\n";
IDEONE DEMO
$1 is the first captured group.
As suggested by borodin
if ( $text =~ /\b(START\b(.*?)\bSECTION)\b/ ) {
my $tmp = $1;
print "New String: $tmp\n";
}

The match operator doesn't change the string it matches.
You can use either of the following to inspect the captured string:
if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
my $section = $1;
print "New String: $section\n";
}
or
if ( my ($section) = $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
print "New String: $section\n";
}

Extract string after a symbol in Perl

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.

That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.

I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;

You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.

One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";

Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi

The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.

I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge

You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".

If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Extracting specific data from a string with regex and Powershell

I want to extract from this string
blocked-process-report&#x0Aprocess id="process435d948" taskpriority="0" logused="0" waitresource="RID: 7:1:1132932:0" waittime=
"3962166" ownerId="4641198" transactionname="SELECT" lasttranstarted="2011-09-13T17:21:54.950" XDES="0x80c5f060" lockMode="S" schedulerid="4" kpid="18444" status="susp
ended" spid="58" sbid="0" ecid="0"
The value that is in bold, but only the value or 58. And this value can be with different values, sometimes 80 or 1000, etc. but always > 50.
How can I do this using regex and posh?

The quick and dirty:
$found = $string -match '.*spid="(\d+)".*'
if ($found) {
$spid = $matches[1]
}
where $string is your above mentioned string. This would match any string that has spid="somenumberhere", and make the number into a matched group, which you can extract using $matches[1].

Save that as, say $string.
Then do
$string -match 'spid="(\d+)"'
If there is a match, the value you want will be in $matches[1]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing specific words from a text string? [duplicate] - regex

This question already has an answer here: Reference - What does this regex mean? (1 answer) Closed 2 years ago. So say you have a variable string like: "Report to Sam.Smith" What's the best way for you to remove the words 'Report' and 'to' leaving only Sam.Smith using Powershell??

Related

Applying Filters in Perl using Regex

Regex Word Boundary in Perl not yield expected results

Extract string after a symbol in Perl

Perl how do you assign a varanble to a regex match result

Extracting specific data from a string with regex and Powershell

Categories

Resources