Perl regex extracting a match using braces

Perl regex extracting a match using braces - regex

I tested the following code
#! /usr/bin/perl
use strict;
use English;
#this code extracts the current scripts filename
#by removing the path from the filepath
my $Script_Name = $PROGRAM_NAME;
${Script_Name} =~ s/^.*\\//; #windows path
#${Script_Name} =~ s/^.*\///; #Unix based path
print $Script_Name;
and i don't understand why these braces extract the match without using a /r modifier. can anyone explain why and how this works or point me to some documentation?

You're getting a little confused!
The braces make no difference. ${Script_Name} is identical to $Script_Name.
You code first copies the entire path to the script file from $PROGRAM_NAME to $Script_Name.
Then the substitution removes everything up to and including the last backslash, leaving just the file name.
The /r modifier would be used if you wanted to modify one string and put the result of the modification into another, so you could write your code in one step as
$Script_Name = $PROGRAM_NAME=~ s/^.*\\//r

Related

Find sample name and directory name from a path

My script receives a complete path name of a file from another script and I am trying to break this full path name in perl and pass this information to my script.
I am unable to extract it using split, can anyone please suggest on how to approach this -
I need to split a path which cold look like - path = /usr/local/projects/Tool/Work/Section12/Tool.Sample.2.pdf
to extract these values Sample1 and /usr/local/projects/Tool/Work/, so that I can use these to assign values to two variables in my script for example $Sample_id = Sample and $Dir=/usr/local/projects/Tool/Work/
Can anyone please suggest?
Thanks!

You should use the core File::Spec::Functions module so that your code respects any eccentricities of the platform you're working on. splitdir and catdir from that module separate and recombine path components, making your task mostly simple
I've used splitdir here to put the steps of your $path into array #path. The last element is the file name, which I've copied out using pop, and the second-from-last is Part12, which you don't seem to be interested in, so I've used another pop to get rid of that
Then all that's left is to rebuild $dir_path from what's left of #path, and extract the part of the file name that you're interested in
To do the latter there are several options, depending on what you mean. It could be the second field of the file name split on dots ., or the third from the end, split the same way. I've gone for the field that starts with sample in either upper or lower-case. A regex finds that for me
use strict;
use warnings 'all';
use File::Spec::Functions qw/ splitdir catdir /;
my $path = 'E:/usr/local/projects/Tool/Work/Part12/Tool.Sample01.2.pdf';
my #path = splitdir $path;
my $file = pop #path; # Copy and remove the file name from the end
my $local_dir = pop #path; # Remove `Part12` per requirement
my $dir_path = catdir #path; # Rebuild what is left of the path
# Pick the first subsequence of the file name that starts with `sample`
#
my ($sample) = grep /^sample/i, split /\./, $file;
print "\$sample = $sample\n";
print "\$dir_path = $dir_path\n";
output
$sample = Sample01
$dir_path = /usr/local/projects/Tool/Work

There are two parts to this -- split the full path, and extract particular components of some of its parts. Splitting a file name with the full path into its components is nicely done by a few modules. Here I'll use the core module File::Basename. Then the path and filename can be processed for specific requirements, and here I'll use regex.
use warnings;
use strict;
use File::Basename qw(fileparse);
my $fullname = '/usr/local/projects/Tool/Work/Section12/Tool.Sample.2.pdf';
# Parse it into the path and filename
my ($filename, $path) = fileparse($fullname);
# Extract needed part of the path: all except last directory
my ($dirs) = $path =~ m|(.*)/.*/|; # / stop editor coloring
# Extract needed part of filename: between the first `.` and the next
my ($tag) = $filename =~ /[^.]+\.([^.]+)/;
print "$dirs\n$tag\n";
This prints
/usr/local/projects/Tool/Work
Sample
The regex for pulling parts out of the path and filename are both specific to the task. The first one uses the fact that we only need to drop the last component of the path, so the greediness of .* works out right. In the second one, I use the fact that the pattern goes between the very first . and the next.
Note that in the basic invocation above the extension is not extracted and the filename is returned with its extension. Thanks to Borodin for bringing this up in a comment. See the documentation, as should always be done with any suggested modules.
This is by far the most common need when working with full paths. But if you want to get the extension split off as well then pass another argument, which can be a list of extensions to seek or a regex. Then the file-name part will be returned without the extension.
my ($base, $path, $ext) = fileparse($fullname, #suffix_list);
For example, #suffix_list can be qr/\.[^.]*/ and in this case we have
my ($base, $path, $ext) = fileparse($fullname, qr/\.[^.]*/);
print "$path\n$base\n$ext\n";
printing
/usr/local/projects/Tool/Work/Section12/
Tool.Sample.2
.pdf
A note on reliability, from docs:
You are guaranteed that $directories . $filename . $suffix will denote the same location as the original $path.

How to trim the file modification value from SVN log output with PowerShell

I have an SVN log being captured in PowerShell which I am then trying to modify and string off everything except the file URL. The problem I am having is getting a regex to remove everything before the file URL. My entry is matched as:
M /trunk/project/application/myFile.cs
There are two spaces at the beginning which originally I was trying to replace with a Regex but that did not seem to work, so I use a trim and end up with:
M /trunk/project/application/myFile.cs
Now I want to get rid of the File status indicator so I have a regular expression like:
$entry = $entry.Replace("^[ADMR]\s+","")
Where $entry is the matched file URL but this doesn't seem to do anything, even removing the caret to just look for the value and space did not do anything. I know that $entry is a string, I originally thought Replace was not working as $entry was not a string, but running Get-Member during the script shows I have a string type. Is there something special about the svn file indicator or is the regex somehow off?

Given your example string:
$entry = 'M /trunk/project/application/myFile.cs'
$fileURL = ($entry -split ' /')[1]

Your regex doesn't work because string.Replace just does a literal string replacement and doesn't know about regexes. You'd probably want [Regex]::Replace or just the -replace operator.
But when using SVN with PowerShell, I'd always go with the XML format. SVN allows a --xml option to all commands which then will output XML (albeit invalid if it dies in between).
E.g.:
$x = [xml](svn log -l 3 --verbose --xml)
$x.log.logentry|%{$_.paths}|%{$_.path}|%{$_.'#text'}
will give you all paths.
But if you need a regex:
$entry -replace '^.*?\s+'
which will remove everything up to (and including) the first sequence of spaces which has the added benefit that you don't need to remember what characters may appear there, too.

Why is using string substitiution to form a regex not working?

I have a regular expression for use with awk to find any of the specified words in a line of a file. It looks like this awk "/word1/||/word2/||/word3/" filename. As an alternative, I have been trying to specify the words like this WORDS="word1 word2 word3" and then use bash string substitution to form the regular expression to pass to awk.
I've tried numerous ways of doing this to no avail. awk simply dumps the contents of the entire file or spits out some complaint about the regex form.
Here's what I have:
#!/bin/bash
FILE="myfile"
WORDS="word1 word2 word3"
# use BASH string substitution to obtain the regex which should look like this:
# "/word1/||/word2/||/word3/"
REGEX=\"/${WORDS// //||/}/\"
awk ${REGEX} $FILE
I'm fairly sure it has to do with quoting and I've tried various methods using echo and back ticks and can get it look right (when echoed) but when actually trying to use it, it fails.

Try to replace:
REGEX=\"/${WORDS// //||/}/\"
with:
REGEX="/${WORDS// //||/}/"
Note that there is no need to escape double quotes since they are not really part of the regular expression.

How can I remove the text before and after a particular character?

I have been trying to remove the text before and after a particular character in each line of a text. It would be very hard to do manually since it contain 5000 lines and I need to remove text before that keyword in each line. Any software that could do it, would be great or any Perl scripts that could run on Windows. I run Perl scripts in ActivePerl, so scripts that could do this and run on ActivePerl would be helpful.
Thanks

I'd use this:
$text =~ s/ .*? (keyword) .* /$1/gx;

You don't need software, you can make this part of your existing script. Multiline regex replace along the lines of /a(b)c/ then you can backref b in the replacer with $1. Without knowing more about the text you're working with it's hard to guess what the actual pattern would be.

Presuming that you have the following:
text1 text2 keyword text3 text4 text5 keyword text6 text7
and what you want is
s/.*?keyword(.*?)keyword.*/keyword$1keyword/;
otherwise you can just replace the whole line with keyword
An example of the data may help us be clearer

I'd say, that if $text contains your whole text, you can do :
$text =~ s/^.*(keyword1|keyword2).*$/$1/m;
The m modifier makes ^ and $ see a beginning and an ending of line, and not the beginning and ending of the string.

Assuming you want to remove all text to the left of keyword1 and all text to the right of keyword2:
while (<>) {
s/.*(keyword1)/$1/;
s/(keyword2).*/$1/;
print;
}
Put this into a perl script and run it like this:
fix.pl original.txt > new.txt
Or if you just want to do this inplace, perhaps on several files at once:
perl -i.bak -pe 's/.*(keyword1)/$1/; s/(keyword2).*/$1/;' original.txt original2.txt
This will do inplace editing, renaming the original to have a .bak extension, use an implicit while-loop with print and execute the search and replace pattern before each print.
To be safe, verify it without the -i option first, or at the very least on only one file...

How can I make this Perl one-liner to toggle character in line in a file?

I am attempting to write a one-line Perl script that will toggle a line in a configuration file from "commented" to not and back. I have the following so far:
perl -pi -e 's/^(#?)(\tDefaultServerLayout)/ ... /e' xorg.conf
I am trying to figure out what code to put in the replacement (...) section. I would like the replacement to insert a '#' if one was not matched on, and remove it if it was matched on.
pseudo code:
if ( $1 == '#' ) then
print $2
else
print "#$2"
My Perl is very rusty, and I don't know how to fit that into a s///e replacement.
My reason for this is to create a single script that will change (toggle) my display settings between two layouts. I would prefer to have this done in only one script.
I am open to suggestions for alternate methods, but I would like to keep this a one-liner that I can just include in a shell script that is doing other things I want to happen when I change layouts.

perl -pi -e 's/^(#?)(?=\tDefaultServerLayout)/ ! $1 && "#" /e' foo
Note the addition of ?= to simplify the replacement string by using a look-ahead assertion.
Some might prefer s/.../ $1 ? "" : "#" /e.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl regex extracting a match using braces - regex

Related

Find sample name and directory name from a path

How to trim the file modification value from SVN log output with PowerShell

Why is using string substitiution to form a regex not working?

How can I remove the text before and after a particular character?

How can I make this Perl one-liner to toggle character in line in a file?

Categories

Resources