Slashes and hashes in Perl and metacharacters - regex

Thanks for the previous assistance everyone!. I have a query regarding RegExp in Perl
My issue is..
I know, when matching you can write m// or // or ## (must include m or s if you use this). What is causing me the confusion is a book example on escaping characters I have. I believe most people escape lots of characters, as a sure fire way of the program working without missing a metacharacter something ie: \# when looking to match # say in an email address.
Here's my issue and I know what this script does:
$date= "15/12/99"
$date=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#; << why are no forward slashes escaped??
print($date);
Yet the later example I have, shows it rewritten, as (which i also understand and they're escaped)
$date =~ s/()(\d+)\/(\d+)\/(d+)/$2\/$1\/$3; <<<<which is escaping the forward slashes.
I know the slashes or hashes are programmer preference and their use. What I don't understand is why the second example, escapes the slashes, yet the first doesn't - I have tried and they work both ways. No escaping slashes with hashes? What's even MORE confusing is, looking at yet another book example I also have earlier to this one, using hashes again, they too escape the # symbol.
if ($address =~ m#\##) { print("That's an email address"); } or something similar
So what do you escape from what you don't using hashes or slashes? I know you have to escape metacharacters to match them but I'm confused.

When you build a regexp, you define a character as a delimiter for your regexp i.e. doing // or ##.
If you need to use that character inside your regexp, you will need to escape it so that the regexp engine does not see it as the end of the regexp.
If you build your regexp between forward slashes /, you will need to escape the forward slashes contained in your regexp, hence the escaping in your second example.
Of course, the same rule apply with any character you use as a regexp delimiter, not just forward slashes.

The forward slashes are not meta characters in themselves - only the use of them in the second example as expression separators makes them "special".
The format of a substitute expression is:
s<expression separator char><expression to look for><expression separator char><expression to replace with><expression separator char>
In the first example, using a hash as the first character after the =~ s, makes that character the expression separator, so forward slash is not special and does not require any escaping.
in the second example, the expression separator is indeed the forward slash, so it must be escaped within the expressions themselves.

The regex match-operator allows to define a custom non-whitespace-character as seperator.
In your first example the '#' is used as seperator. So in this regex you don't need to escape the '/' because it hase no special meaning. In the second regex, the seperator char isn't changed. So the default '/' is used. Now you have to escape all '/' in your pattern. Otherwise the parser is confused. :)

If you are not use slashes, the recommend practice is to use the curly braces and the /x modifier.
$date=~ s{ (\d+) \/ (\d+) \/ (\d+) }{$1/$2/$3}x;
Escaping the non-alphanumerics is also a standard even if they are not meta-characters. See perldoc -f quotemeta.

There is another depth to this question about escaping forward slashes with the s operator.
With my example the capturing becomes the problem.
$image_name =~ s/((http:\/\/.+\/)\/)/$2/g;
For this to work the typo with the addition of a second forward slash, had to be captured.
Also, trying to work with just the two slashes did not work. The first slash has to be led by more than one character.
Changing "http://world.com/Photos//space_shots/out_of_this_world.jpg"
To: "http://world.com/Photos/space_shots/out_of_this_world.jpg"

Related

Perl regex to replace a underscore or forward slash with a dash

While there are several regex examples here showing the many variations, simply I just want to use regex in Perl to search 2 different strings with one string as an underscore(_) and the other string as a forward slash (/) and replace each string with a hyphen (-) plus string. I am using the delimiter backslash, however it is the incorrect output.
Input: Output:
_APPLE -APPLE
/APPLE -APPLE
Here is my code:
$string1 =~ s/\_\/APPLE/-APPLE
$string2 =~ s/\/\/APPLE/-APPLE
The code has an extra (escaped) / and would match strings with _/ (and // in the second case). That is not in your data, which has either _ or /, not both.
Also, there is no need to escape the _, and neither the / if it is not the delimiter.
To match either of a few characters the cleanest and most efficient is the character class
$string =~ s{[_/](\w+)}{-$1};
The alternation also works here
$string =~ s{(?:_|/)(\w+)}{-$1};
but it is more suitable when possibilities to match have more characters (word|another).
There are quite a few assumptions here, given how little is specified in the question. For one, \w also matches digits and _ along with letters. If you clarify the requirements I'll edit as needed.
I assume that the missing closing delimiter, needed for the code to compile, is a typo in posting.

How can make gvim identify "/" in the search pattern [duplicate]

For instance, if I wanted to a find and replace with strings containing backward or forward slashes, how would this be accomplished in vim?
Examples
Find & Replace is: :%s/foo/bar/g
what if I wanted to find all occurrences of <dog/> and replace it with <cat\>
Same way you escape characters most anywhere else in linuxy programs, with a backslash:
:%s/<dog\/>/<cat\\>
But note that you can select a different delimiter instead:
:%s#<doc/>#<cat\\>#
This saves you all typing all those time-consuming, confusing backslashes in patterns with a ton of slashes.
From the documentation:
Instead of the / which surrounds the pattern and replacement string, you
can use any other single-byte character, but not an alphanumeric character,
\, " or |. This is useful if you want to include a / in the search
pattern or replacement string.
%s:<dog/>:<cat>
You can replace the / delimiters if they become annoying for certain patterns.
Quote them with a backslash. Also, it often helps to use another delimiter besides slash.
:%s#<dog/>#<cat\\>#
or if you have to use slash as the substitute command delimiter
:%s/<dog\/>/<cat\\>/
I was looking for something similar, to search for register values containing the / character (to record a macro). The solution was to search using the ? token instead of the /.
The syntax is:
:%s/<dog\/>/<cat\\>/g
backslash slash backslash star
/(<- the prompt)\/\*
so after you type it looks like
/\/\*

Backslashes in regexp pattern for PHP

I'm trying to perform a regex operation in my PHP code (preg_replace). Now I'm working with:
|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i
That matches URLs like http://google.com, etc... but now I'm guessing what if the URL I want to match is also like this one?
http:\/\/asd.domain.com\/path\/of\/url\/something.else
I've tried with 2x backslashes and 4x backslashes and it doesn't seem to work.
Any advice?
Thanks in advance.
Well, if you simply want to match the string, add some backslashes:
^http(s)?:\\?/\\?/[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(\\?/.*)?$
I used a ? quantifier so that it still matches the URLs it could match before that. Since \ is an escaping character, you need two of those, the first to escape the escaping properties of the second \.
See demo (I only escaped forward slashes there because of how the regex tester works -- the delimiters are slashes).

How does one escape backslashes and forward slashes in VIM find/search?

For instance, if I wanted to a find and replace with strings containing backward or forward slashes, how would this be accomplished in vim?
Examples
Find & Replace is: :%s/foo/bar/g
what if I wanted to find all occurrences of <dog/> and replace it with <cat\>
Same way you escape characters most anywhere else in linuxy programs, with a backslash:
:%s/<dog\/>/<cat\\>
But note that you can select a different delimiter instead:
:%s#<doc/>#<cat\\>#
This saves you all typing all those time-consuming, confusing backslashes in patterns with a ton of slashes.
From the documentation:
Instead of the / which surrounds the pattern and replacement string, you
can use any other single-byte character, but not an alphanumeric character,
\, " or |. This is useful if you want to include a / in the search
pattern or replacement string.
%s:<dog/>:<cat>
You can replace the / delimiters if they become annoying for certain patterns.
Quote them with a backslash. Also, it often helps to use another delimiter besides slash.
:%s#<dog/>#<cat\\>#
or if you have to use slash as the substitute command delimiter
:%s/<dog\/>/<cat\\>/
I was looking for something similar, to search for register values containing the / character (to record a macro). The solution was to search using the ? token instead of the /.
The syntax is:
:%s/<dog\/>/<cat\\>/g
backslash slash backslash star
/(<- the prompt)\/\*
so after you type it looks like
/\/\*

Need to test for a "\\" (backslash) in this Reg Ex

Currently I use this reg ex:
"\bI([ ]{1,2})([a-zA-Z]|\d){2,13}\b"
It was just brought to my attention that the text that I use this against could contain a "\" (backslash). How do I add this to the expression?
Add |\\ inside the group, after the \d for instance.
This expression could be simplified if you're also allowing the underscore character in the second capture register, and you are willing to use metacharacters. That changes this:
([a-zA-Z]|\d){2,13}
into this ...
([\w]{2,13})
and you can also add a test for the backslash character with this ...
([\w\x5c]{2,13})
which makes the regex just a tad easier to eyeball, depending on your personal preference.
"\bI([\x20]{1,2})([\w\x5c]{2,13})\b"
See also:
WP Metacharacter
Metacharacters
Shorthand character class
Both #slavy13 and #dreftymac give you the basic solution with pointers, but...
You can use \d inside a character class to mean a digit.
You don't need to put blank into a character class to match it (except, perhaps, for clarity, though that is debatable).
You can use [:alpha:] inside a character class to mean an alpha character, [:digit:] to mean a digit, and [:alnum:] to mean an alphanumeric (specifically not including underscore, unlike \w). Note that these character classes might mean more characters than you expect; think of accented characters and non-arabic digits, especially in Unicode.
If you want to capture the whole of the information after the space, you need the repetition inside the capturing parentheses.
Contrast the behaviour of these two one-liners:
perl -n -e 'print "$2\n" if m/\bI( {1,2})([a-zA-Z\d\\]){2,13}\b/'
perl -n -e 'print "$2\n" if m/\bI( {1,2})([a-zA-Z\d\\]{2,13})\b/'
Given the input line "I a123", the first prints "3" and the second prints "a123". Obviously, if all you wanted was the last character of the second part of the string, then the original expression is fine. However, that is unlikely to be the requirement. (Obviously, if you're only interested in the whole lot, then using '$&' gives you the matched text, but it has negative efficiency implications.)
I'd probably use this regex as it seems clearest to me:
m/\bI( {1,2})([[:alnum:]\\]{2,13})\b/
Time for the obligatory plug: read Jeff Friedl's "Mastering Regular Expressions".
As I pointed out in my comment to slavy's post, \\ -> \b as a backslash is not a word character. So my suggestion is
/\bI([ ]{1,2})([\p{IsAlnum}\\]{2,13})(?:[^\w\\]|$)/
I assumed that you wanted to capture the whole 2-13 characters, not just the first one that applies, so I adjusted my RE.
You can make the last capture a lookahead if the engine supports it and you don't want to consume it. That would look like:
/\bI([ ]{1,2})([\p{IsAlnum}\\]{2,13})(?=[^\w\\]|$)/