regular expression url rewrite based on folder - regex

I need to be able to take /calendar/MyCalendar.ics where MyCalendar.ics coudl be about anything with an ICS extention and rewrite it to /feeds/ics/ics_classic.asp?MyCalendar.ics
Thanks

Regular expressions are meant for searching/matching text. Usually you will use regex to define what you search for some text manipulation tool, and then use a tool specific way to tell the tool with what to replace the text.
Regex syntax use round brackets to define capture groups inside the whole search pattern. Many search and replace tools use capture groups to define which part of the match to replace.
We can take the Java Pattern and Matcher classes as example. To complete your task with the Java Matcher you can use the following code:
Pattern p = Pattern.compile("/calendar/(.*\.(?i)ics)");
Matcher m = p.matcher(url);
String rewritenUrl = "";
if(m.matches()){
rewritenUrl = "/feeds/ics/ics_classic.asp?" + url.substring( m.start(1), m.end(1));
}
This will find the requested pattern but will only take the first regex group for creating the new string.
Here is a link to regex replacement information in (imho) a very good regex information site: http://www.regular-expressions.info/refreplace.html

C:\x>perl foo.pl
Before: a=/calendar/MyCalendar.ics
After: a=/feeds/ics/ics_classic.asp?MyCalendar.ics
...or how about this way?
(regex kind of seems like overkill for this problem)
b=/calendar/MyCalendar.ics
index=9
c=MyCalendar.ics (might want to add check for ending with '.ics')
d=/feeds/ics/ics_classic.asp?MyCalendar.ics
Here's the code:
C:\x>type foo.pl
my $a = "/calendar/MyCalendar.ics";
print "Before: a=$a\n";
my $match = (
$a =~ s|^.*/([^/]+)\.ics$|/feeds/ics/ics_classic.asp?$1.ics|i
);
if( ! $match ) {
die "Expected path/filename.ics instead of \"$a\"";
}
print "After: a=$a\n";
print "\n";
print "...or how about this way?\n";
print "(regex kind of seems like overkill for this problem)\n";
my $b = "/calendar/MyCalendar.ics";
my $index = rindex( $b, "/" ); #find last path delim.
my $c = substr( $b, $index+1 );
print "b=$b\n";
print "index=$index\n";
print "c=$c (might want to add check for ending with '.ics')\n";
my $d = "/feeds/ics/ics_classic.asp?" . $c;
print "d=$d\n";
C:\x>
General thoughts:
If you do solve this with a regex, a semi-tricky bit is making sure your capture group (the parens) exclude the path separator.
Some things to consider:
Are your paths separators always forward-slashes?
Regex seems like overkill for this; the simplest thing I can think of of to get the index of your last path separator and do simple string manipulation (2nd part of sample program).
Libraries often have routines for parsing paths. In Java I'd look at the java.io.File object, for example, specifically
getName()
Returns the name of the file or directory denoted by
this abstract pathname. This is just the last name in
the pathname's name sequence

Related

How to construct this regular expression?

How to ignore abc and def,01 in below expression using regex. I tried ignore me it doesn’t work.
Abc-def-smdp-01
One way to go is to split the original string into prefix, portion of interest, and suffix, removing the unwanted charcters in the affixes thereafter:
$raw = "abc-def-smdp-01";
preg_match ( "/^(.*)(-smdp-)(.*)$/", $raw, $matches ); // separate original string into prefix, trunk, suffix
$matches[1] = preg_replace ( "/[^-]/", "", $matches[1] ); // non-'-' characters deleted in prefix
$matches[3] = preg_replace ( "/[^-]/", "", $matches[3] ); // non-'-' characters deleted in suffixfix
$result = $matches[1].$matches[2].$matches[3]; // composing target string
echo $result;
Online demo available here.
NB
Problems similar to this one can be tackled easily with some knowledge of the php function library whose doc within the online php manual comes with exakt syntax, example code, and user comments.
In this case, look up:
preg_match
preg_replace
Of course, finding suitable candidates for the intended functionality assumes at least a perfunctory grasp of the available facilities which makes the 15 minutes of browsing the hierarchical index a judicious investment of time
I would use this pattern:
preg_match("/\w+\-\w+\-(\w+)\-\d+/", "Abc-def-smdp-01", $output);
Echo $output[1];
http://www.phpliveregex.com/p/ftB
EDIT: or do you need the dashes?
In that case you need to use this pattern:"/\w+(\-)\w+(\-)(\w+)(\-)\d+/"
And then output it as:
echo $output[1].$output[2].$output[3].$output[4];
or loop it and build a string with .=

Perl split string based on forward slash

I am new to Perl, so this is basic question. I have a string as shown below. I am interested in taking date out of it, so thinking of splitting it using slash
my $path = "/bla/bla/bla/20160306";
my $date = (split(/\//,$path))[3];#ideally 3 is date position in array after split
print $date;
However, I don't see the expected output, but instead I see 5 getting printed.
Since the path starts with the pattern / itself, split returns a list with an empty string first (to the left of the first /); one element more. Thus the posted code miscounts by one and returns the one before last element (subdirectory) in the path, not the date.
If date is always the last thing in the string you can pick the last element
my $date = (split '/', $path)[-1];
where i've used '' for delimiters so to not have to escape /. (This, however, may confuse since the separator pattern is a regex and // convey that, while '' may appear to merely quote a string.)
This can also be done with regex
my #parts = $path =~ m{([^/]+)}g;
With this there can be no inital empty string. Or, the last part can be picked out of the full list as above, with ($path =~ m{...}g)[-1], but if you indeed only need the last bit then extract it directly
my ($last_part) = $path =~ m{.*/(.*)};
Here the "greedy" .* matches everything in the string up to the last instance of the next subpattern (/ here), thus getting us to the last part of the path, which is then captured. The regex match operator returns its matches only when it is in the list context so parens on the left are needed.
What brings us to the fact that you are parsing a path, and there are libraries dedicated to that.
For splitting a path into its components one tool is splitdir from File::Spec
use File::Spec;
my #parts = File::Spec->splitdir($path);
If the path starts with a / we'll again get an empty string for the first element (by design, see docs). That can then be removed, if there
shift #parts if $parts[0] eq '';
Again, the last element alone can be had like in the other examples.
Simply bind it to the end:
(\d+)$
# look for digits at the end of the string
See a demo on regex101.com. The capturing group is only for clarification though not really needed in this case.
In Perl this would be (I am a PHP/Python guy, so bear with me when it is ugly)
my $path = "/bla/bla/bla/20160306";
$path =~ /(\d+)$/;
print $1;
See a demo on ideone.com.
Try this
Use look ahead for to do it. It capture the / by splitting. Then substitute the data using / for remove the slash.
my $path = "/a/b/c/20160306";
my $date = (split(/(?=\/)/,$path))[3];
$date=~s/^\///;
print $date;
Or else use pattern matching with grouping for to do it.
my $path = "/a/b/c/20160306";
my #data = $path =~m/\/(\w+)/g;
print $data[3];

Perl Regex E-Mail TLD

i have this code:
if ( $Mail =~ /$Tld{$_}/ ) {
$TldFound = 1;
}
The variable $Mail has for example the info "mail#mail.com". The variable $Tld has the info ".com". How can i cut the variable $Mail that only the tld .com will remain?
You should use Email::Address to parse email addresses.
To be able to extract a TLD with certainty requires a list of what you consider to be TLDs. For example, do .co.uk, or .com.tr count? Or, do you just want the last string of non-dot characters?
If you restrict your attention to 2 - 3 character TLDs such as .co, .com, .io, .net, .org, .us etc, you can do my ($tld) = ($email =~ /[.] ([a-z]{2,3}) \z/x); and then check with if ($tld and ($tld eq 'com')) { ... } etc, but you really want a good list of acceptable strings that can be TLDs: Net::Domain::TLD, Mozilla::PublicSuffix.
Naive Regex Solutions
The following solutions will solve your problem as posted, but are not intended to address every possible edge case. Parsing email addresses in a comprehensive way is non-trivial, and requires a parser such as Email::Address if you want to handle the full complexity of the RFCs.
Printing Your TLD from a String
Since you already know the string you want to print on success (e.g. ".com"), you don't actually need the result of your regular expression match; you can print the string stored in $Tld when the match is true using a post-statement condition. For example:
$Mail = 'mail#mail.com';
$Tld = '.com';
print "$Tld\n" if $Mail =~ /${Tld}$/;
This will correctly print:
.com
Printing the Match
If you really want the full match, there are a number of ways to do it. One way would be to use the special $& variable:
$Mail = 'mail#mail.com';
$Tld = '.com';
if ($Mail =~ /${Tld}$/) {
print "$&\n";
}
This will also correctly print:
.com
Partitioning the String
All of the previous examples will solve your problem as posted, but the best generic solution short of a parser is really to partition the TLD, and treat the last segment of the domain as an unvalidated TLD. Ruby has the super-handy String#rpartition method, but I'm unaware of a similar function in Perl. However, you can use an anchored match to accomplish much the same thing. For example:
$Mail = 'mail#mail.com';
$Mail =~ /(\.[[:alpha:]]+)$/;
print "$1\n";
If you need to validate the TLD against an expected value such as .com, you can compare it to a string or variable. For example:
$Mail = 'mail#mail.com';
$Tld = '.com';
$Mail =~ /(\.[[:alpha:]]+)$/;
print "$1\n" if $1 eq $Tld

Parsing of a string with the length specified within the string

Example data:
029Extract this specific string. Do not capture anything else.
In the example above, I would like to capture the first n characters immediately after the 3 digit entry which defines the value of n. I.E. the 29 characters "Extract this specific string."
I can do this within a loop, but it is slow. I would like (if it is possible) to achieve this with a single regex statement instead, using some kind of backreference. Something like:
(\d{3})(.{\1})
With perl, you can do:
my $str = '029Extract this specific string. Do not capture anything else.';
$str =~ s/^(\d+)(.*)$/substr($2,0,$1)/e;
say $str;
output:
Extract this specific string.
You can not do it with single regex, while you can use knowledge where regex stop processing to use substr. For example in JavaScript you can do something like this http://jsfiddle.net/75Tm5/
var input = "blahblah 011I want this, and 029Extract this specific string. Do not capture anything else.";
var regex = /(\d{3})/g;
var matches;
while ((matches = regex.exec(input)) != null) {
alert(input.substr(regex.lastIndex, matches[0]));
}
This will returns both lines:
I want this
Extract this specific string.
Depending on what you really want, you can modify Regex to match only numbers starting from line beginning, match only first match etc
Are you sure you need a regex?
From https://stackoverflow.com/tags/regex/info:
Fools Rush in Where Angels Fear to Tread
The tremendous power and expressivity of modern regular expressions
can seduce the gullible — or the foolhardy — into trying to use
regular expressions on every string-related task they come across.
This is a bad idea in general, ...
Here's a Python three-liner:
foo = "029Extract this specific string. Do not capture anything else."
substr_len = int(foo[:3])
print foo[3:substr_len+3]
And here's a PHP three-liner:
$foo = "029Extract this specific string. Do not capture anything else.";
$substr_len = (int) substr($foo,0,3);
echo substr($foo,3,substr_len+3);

Getting just the file name from full path

I need to get from a full file path just the name of the file. I've tried to use:
$out_fname =~ s/[\/\w+\/]+//;
but it "eats up" also purts of the file name.
example:
for a file:
/bla/bla/folder/file.part.1.file,
it returned:
.part.1,file
You can do:
use File::Basename;
my $path = "/bla/bla/folder/file.part.1.file";
my $filename = basename($path);
Besides File::Basename, there's also Path::Class, which can be handy for more complex operations, particularly when dealing with directories, or cross-platform/filesystem operations. It's probably overkill in this case, but might be worth knowing about.
use Path::Class;
my $file = file( "/bla/bla/folder/file.part.1.file" );
my $filename = $file->basename;
I agree with the other answers, but just wanted to explain the mistake in your pattern. Regex is tricky, but worth it to learn well.
The square brackets defines a class of objects that will match. In your case, it will match with the forward slash, a word character (from the \w), the + character, or the forward slash character (this is redundant). Then you are saying to match 1 or more of those. There are multiple strings that could match. It will match the earliest starting character, so the first /. Then it will grab as much as possible.
This is not what you intended clearly. For example, if you had a . in one of your directory names, you would stop there. /blah.foo/bar/x.y.z would return .foo/bar/x.y.z.
The way to think of this is that you want to match all characters up to and including the final /.
All characters then slash: /.*\//
But to be safer, add a caret at front to make sure it starts there: /^.*\//
And to allow forward and backslashes, make a class for that: /^.*[\/\\]/ (i.e. elusive's answer).
A really good reference is Learning Perl. There are about 3 really good regex chapters. They are applicable to non-Perl regex users as well.
Using split on the directory separator is another alternative. This has the same caveats as using a regex (i.e. with filenames it's better to use a module where someone else has already thought about edge cases, portability, different filesystems, etc, and so you don't need matching on both back- and forward-slashes), but useful as another general technique where you have a string with a repeated separator.
my $file = "/bla/bla/folder/file.part.1.file";
my #parts = split /\//, $file;
my $filename = $parts[-1];
This is exactly what I would expect it to retain in the given substitution. You are saying replace the longest string of slashes and word characters with nothing. So it grabs all the characters up until the first character you didn't specify and deletes them.
It's doing what you are asking it to do. I join with others in saying use File::Basename for what you are trying to do.
But here is the quickest way to do the same thing:
my $fname = substr( $out_fname, rindex( $out_fname, '/' ) + 1 );
Here, it says find the last occurrence of '/' in the string and give me the text starting one after that position. I'm not anti-regex by any stretch, but it's a simple expression of what you actually want to do. I've had to do stuff like this for so long, I wrote a last_after sub:
sub last_after {
my ( $string, $delim ) = #_;
unless ( length( $string ) and my $ln = length( $delim )) {
return $string // '';
}
my $ri = rindex( $string, $delim );
return $ri == -1 ? $string : substr( $string, $ri + $ln );
}
I also needed to pull just the last field from a bunch of path names. This worked for me:
grep -o '/\([^/]*\)$' inputfile > outputfile
What about this:
$out_fname =~ s/^.*[\/\\]//;
It should remove everything in front of your filename.