Powershell REGEX extract last word - regex

If i have a PowerShell string for example "John Doe Bloggs" or "John Bloggs".
And I wanted to extract the last word after the space so in the above example it would be "Bloggs" what REGEX would I use. The solution must be a REGEX. I've googled my mind away and still not any closer.
Any help would be appreciated.

It's really too bad that the answer "must" be a regex (I'm guessing this is some kind of homework assignment?) because it's pretty simple without.
$string = 'John Doe Bloggs';
$string.split(' ')[-1];

Here's a simple example:
$string = 'John Doe Bloggs'
$regex = '.+\s(.+)'
$string -replace $regex,'$1'
Bloggs

This regular expression will find the last word in the input:
(?<word>\w+)[\s\,\.\?\!]*$
The match is in the group named word - the entire expression matches the final word and optional whitespace / (some) punctuation. Any trailing whitespace / punctuation will not be part of the word group.

Related

SWITCHING NAMES IN FILES

First of all let me apologize for asking this question again. I tried to find the answer, but drew a blank. I want to switch the order of words in a file such as: "dutch, abe - a blank sheet" to "abe dutch - a blank sheet". I'm using regular expressions and I seem to remember it's something like 1, 3, 2. Anyway, thank you in advance.
If each string is on newline you can try like this:
Find:
^(\w+),\s*(\w+)(.*)
Replace:
\2 \1\3
Demo:
https://regex101.com/r/SpKOHE/3
Regex:
(\w+) Match 1 or as many word characters and capture in group.
\s* Match 0 or as many whitespaces.
(.*) Match evertything till the end of line.
Try the following regex pattern:
^(\w+),?\s+?(\w+)(.*)
The substitution order is:
$2, $1$3
https://regex101.com/r/T3c77x/1
You can use ^([a-zA-Z]+),?\s*?([a-zA-Z]+)(.*)$ and then modify your initial string to produce the output as $2, $1$3.
I don't know which language are you using for regex or I would have written the complete code here for you. But the logic will be similar to above.
demo

Regex to get all character to the right of first space?

I am trying to craft a regular expression that will match all characters after (but not including) the first space in a string.
Input text:
foo bar bacon
Desired match:
bar bacon
The closest thing I've found so far is:
\s(.*)
However, this matches the first space in addition to "bar bacon", which is undesirable. Any help is appreciated.
You can use a positive lookbehind:
(?<=\s).*
(demo)
Although it looks like you've already put a capturing group around .* in your current regex, so you could just try grabbing that.
I'd prefer to use [[:blank:]] for it as it doesn't match newlines just in case we're targetting mutli's. And it's also compatible to those not supporting \s.
(?<=[[:blank:]]).*
You don't need look behind.
my $str = 'now is the time';
# Non-greedily match up to the first space, and then get everything after in a group.
$str =~ /^.*? +(.+)/;
my $right_of_space = $1; # Keep what is in the group in parens
print "[$right_of_space]\n";
You can also try this
(?s)(?<=\S*\s+).*
or
(?s)\S*\s+(.*)//group 1 has your match
With (?s) . would also match newlines

Match pattern with exceptions

I want to match a pattern using regular expressions, but I need some exceptions to the match. For instance, match every occurence of "John Doe" except for those occurences where "John Doe" is enclosed by bold tags, i.e. "<b>John Doe</b>".
Match: John Doe
Don't match: <b>John Doe</b>
How can I achieve this with regular expressions?
Clarification: I want to exclude everything between the bold tags. This excluded content may contain a wide variety of characters, line breaks and so on.
If your regex dialect allows lookarounds you may use a negative lookbehind and a negative lookahead to achieve that task:
(?<!<b>)John Doe(?!<b>)
You could use negative look-arounds for this:
(?<!<b>)John Doe(?!</b>)
That wouldn't match <b>John Doe or John Doe</b> either though.
If you only want to not match instances with both the opening and closing tag you could do something like:
John Doe(?!(?<=<b>John Doe)</b>)
Or slightly shorter (but less understandable - 8 is the length of John Doe):
John Doe(?!(?<=<b>.{8})</b>)
Using Perl you can use negative lookbehind:
$ echo "<b>John Doe</b>" | perl -ne 'print if /(?<!<b>)John Doe/'
(above prints nothing - does not match).
$ echo "John Doe" | perl -ne 'print if /(?<!<b>)John Doe/'
John Doe
(above matches).
Symbol (?<!<b>) is a negative lookbehind - string matches if it's not followed by what's inside of it (<b> in this case).

Regular expression for number search

I need a regular expression that will find a number(s) that is not inside parenthesis.
Example abcd 1 (35) (df)
It would only see the 1.
Is this very complex? I've tried and had no luck.
Thanks for any help
An easy solution is to first remove the unwanted values:
my $string = "abcd 12 (35) (df) 2311,22";
$string =~ s/\(\d+\)//g; # remove numbers within parens
my #numbers = $string =~ /\d+/g; # extract the numbers
This is quite hard but something like this will probably do:
^(?:\()(\d+)(?:[^)])|(?:[^(0-9]|^)(\d+)(?:[^)0-9]|^)|(?:[^(])(\d+)(?:\))$
The problem is to match (123, 123) and also to not match the string 123 as the number 2 between the non-parentheses characters 1 and 3. Also there are probably some edge cases for start of and end of string.
My suggestion is to not use a regex for this. Maybe a regex that matches numbers and then use the capture info to check if the surrounding characters are not parentheses.
The regular expression would be:
^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$
The result is the first (and only) matching group of the regex.
Maybe you want to remove the ^ and $ if the regex should not match only if it’s the content of a whole single line. You can also use [a-zA-Z] or [[:alpha:]]. This depends on the regular expression engine you use and, of course, the content you want to match.
Example perl code:
if (m/^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$/) {
print("$1\n");
}
Please note that your question contains not enough information to make a good answer possible (you did not say anything about the general format of your expression, for example if you want to match integers or floating points)
How about
/(?:^|[^\d(])(\d+)(?:[^\d)]|$)/
? This matches a string of digits (\d+) that are
preceded by the beginning of the string, or a character that is not a digit or an open parenthesis ((?:^|[^\d(]))
succeeded by the end of the string, or by a character that is not a digit or a close parenthesis ((?:[^\d)]|$))

How can I preserve whitespace when I match and replace several words in Perl?

Let's say I have some original text:
here is some text that has a substring that I'm interested in embedded in it.
I need the text to match a part of it, say: "has a substring".
However, the original text and the matching string may have whitespace differences. For example the match text might be:
has a
substring
or
has a substring
and/or the original text might be:
here is some
text that has
a substring that I'm interested in embedded in it.
What I need my program to output is:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
I also need to preserve the whitespace pattern in the original and just add the start and end markers to it.
Any ideas about a way of using Perl regexes to get this to happen? I tried, but ended up getting horribly confused.
Been some time since I've used perl regular expressions, but what about:
$match = s/(has\s+a\s+substring)/[$1]/ig
This would capture zero or more whitespace and newline characters between the words. It will wrap the entire match with brackets while maintaining the original separation. It ain't automatic, but it does work.
You could play games with this, like taking the string "has a substring" and doing a transform on it to make it "has\s*a\s*substring" to make this a little less painful.
EDIT: Incorporated ysth's comments that the \s metacharacter matches newlines and hobbs corrections to my \s usage.
This pattern will match the string that you're looking to find:
(has\s+a\s+substring)
So, when the user enters a search string, replace any whitespace in the search string with \s+ and you have your pattern. The, just replace every match with [match starts here]$1[match ends here] where $1 is the matched text.
In regexes, you can use + to mean "one or more." So something like this
/has\s+a\s+substring/
matches has followed by one or more whitespace chars, followed by a followed by one or more whitespace chars, followed by substring.
Putting it together with a substitution operator, you can say:
my $str = "here is some text that has a substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;
print $str;
And the output is:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
A many has suggested, use \s+ to match whitespace. Here is how you do it automaticly:
my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";
my $re = $search;
$re =~ s/\s+/\\s+/g;
$original =~ s/\b$re\b/[match starts here]$&[match ends here]/g;
print $original;
Output:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
You might want to escape any meta-characters in the string. If someone is interested, I could add it.
This is an example of how you could do that.
#! /opt/perl/bin/perl
use strict;
use warnings;
my $submatch = "has a\nsubstring";
my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";
print substr_match($str, $submatch), "\n";
sub substr_match{
my($string,$match) = #_;
$match =~ s/\s+/\\s+/g;
# This isn't safe the way it is now, you will need to sanitize $match
$string =~ /\b$match\b/;
}
This currently does anything to check the $match variable for unsafe characters.