perl regex match from last, skipping the last delimeter - regex

I am trying to write the Regex in perl for the pattern:
""Wagner JS, Adson MA, Van Heerden JA et al (1984) The natural history of hepatic metastases from colorectal cancer. A comparison with resective treatment. Ann Surg 199:502–508""\s
to get the last part: "Ann Surg 199:502–508"
SO I wrote
$string =~ m/\.([^\d]*\s\d*\:\d*\–\d*)\"\"\s$/
The match part I am getting in $1 is: "A comparison with resective treatment. Ann Surg 199:502–508" but I am expecting: "Ann Surg 199:502–508".
In some of the cases it is working but in some of them it is not. Tried searching but didn't get satisfactory answer. Please suggest something.

You only need to add the dot in the character class:
$string =~ m/\.([^\d.]*\s\d*:\d*–\d*)""\\s$/
But a better way is to split the string with dot as delimiter and take the last part.

If you want the last part of every string, then all you need is
$string =~ /([^.]+)$/
or, to avoid the space after the full stop
$string =~ /([^.\s][^.]+)$/

Please give this a try:
$string =~ m/\.\s*([^\.\d]*\s*\d*\:\d*\–\d*)""\\s$/;

Another option, taking everything after the last period excluding leading spaces:
$string =~ m/(?!\s)([^.]+)$/

Related

How do I search and replace with a regex and preserve blank spaces in Perl?

I'm using this code:
$text =~ s/\s(\w)/\u$1/g;
But This is an example
Become ThisIsAnExample
Instead of This Is An Example.
How to preserve blank spaces?
Use lookbehind.
$text =~ s/(?<!\S)(\w)/\u$1/g;
Or use the more efficient \K (Perl 5.10+).
$text =~ s/(?:^|\s)\K(\w)/\u$1/g;
Both of the solutions will make sure the first word is capitalized too. If that's not an issue, the second solution can be simplified to the following:
$text =~ s/\s\K(\w)/\u$1/g;
The matching contains the whitespace, the replacement doesn't.
$text =~ s/(\s)(\w)/$1\u$2/g;
Since \s contains different types of whitespace characters, if you want to keep it in your replacement, you need to capture it and put it back.
An alternative is to use word boundaries and full "words".
$text =~ s/\b(\w+)\b/\u$1/g;

Perl - Removing all special characters except a few

So i came across a Perl regex "term" which allows you to remove all punctuation. Here is the code:
$string =~ s/[[:punct:]]//g;.
However this proceeds to remove all special characters. Is there a way that particular regex expression can be modified so that for example, it removes all special characters except hyphens. As i stated on my previous question with Perl, i am new to the language, thus obvious things don't come obvious to me. Thanks for all the help :_
Change your code like below to remove all the punctuations except hyphen,
$string =~ s/(?!-)[[:punct:]]//g;
DEMO
use strict;
use warnings;
my $string = "foo;\"-bar'.,...*(){}[]----";
$string =~ s/(?!-)[[:punct:]]//g;
print "$string\n";
Output:
foo-bar----
You may also use unicode property:
$string =~ s/[^-\PP]+//g;

How to substitute whitesapaces and tabs in a string with _ in perl?

$string = I am a boy
How to substitute whitespaces between words with underscore ?
You need a regular expression and the substitution operator to do that.
my $string = 'I am a boy';
$string =~ s/\s/_/g;
You can learn more about regex in perlre and perlretut. A nice tool to play around with is Rubular.
Also, your code will not compile. You need to quote your string, and you need to put a semicolon at the end.
$string = 'I am a boy';
$string =~ s/ /_/g;
$string =~ tr( \t)(_); # Double underscore not necessary as per Dave's comment
This is just to show another option in perl. I think Miguel Prz and imbabque showed more smarter ways, personally i follow the way imbabque showed.
my $str = "This is a test string";
$str =~ s/\p{Space}/_/g;
print $str."\n";
and the output is
This_is_a_test_string

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

What does s-/-- and s-/\Z-- in perl mean?

I am a beginner in perl and I have a query regarding pattern matching.
I came across a line in perl where it was written
$variable =~ s-/\Z--;
And as the code goes ahead some another variable was assigned
$variable1 =~ s-/--;
Can you please tell me what does these 2 lines do?
I want to know what does s-/\Z-- and s-/-- mean.
$variable =~ s-/\Z--;
- is used as a delimiter here. However, best practice suggests that you either use / or {} as delimiters.
It could be re-written as:
$variable =~ s{/\Z}{}; # remove a / at the end of a string
Consider:
$variable1 =~ s-/--;
Again, it could be re-written as:
$variable1 =~ s{/}{}; # remove the first /
The s/// operator in Perl is a substitution operation, which performs a search-and-replace on a string using a special kind of pattern called a regular expression. You can read more about regular expressions and Perl's pattern matching in the man pages that come with Perl:
man perlretut
man perlre
If you don't have these on your system, try searching Google for the same.
Applying a substitution to a variable is done with the =~ operator. So the following replaces all instances of 'foo' in the variable $var with 'bar'.
$var =~ s/foo/bar/;
All the Perl operators are documented on the 'perlop' man page.
Even though the most common separator character is a slash (hence s///), you can also use any other punctuation character as a separator. So in this case, the author has decided to use the dash (-) as the separator.
Here's the same line of code above using dash as a separator:
$var =~ s-foo-bar-;
In your case, the dash doesn't seem to add any clarity to the code, so it might be best to update it to use the conventional slashes instead.
The s/// search and replace function in perl can be used with different delimeters, which is what is done in this case. They have replaced / with the minus sign -, or dash.
The s-/-- removes the first / from the string.
The s-/\Z-- matches and removes a slash at the end of the line. I think this is better written: s{/$}{}.
$variable1 =~ s-/--;could be written as
$variable =~ s{/}{}xms;
or this
$variable =~ s/ \/ //xms;
It means delete the first / in the string.
Regarding s-/\Z--, it is usually written like this
$variable =~ s{/ \Z}{}xms;
or this
$variable =~ s/ \/ \Z //xms;
It means delete a / if it is at the end of the string (\Z).