Display "/" character in Regex - regex

I'd like the name of the month with regex.
Of which:
Image/video date: December
This:
December
regex: /[\s\S]*?\s*Image/video date:\s*((?:\S+[^\S\n]?)+?)[^\S\n]*\n[\s\S]*/
So, I want to get the word "December" after "Image/video date:".
The above regex would work perfectly for me. The only problem with it is that I don't know how to display the "/" character so I can reach the goal.

To match a special character in a regex, there is a way to escape it. In your flavor of regex, you'd want to write \/ to match a forward slash, just as you can do \[ to match a left bracket without starting a character class.

The '/' is not special in a regex. Normally you need to delimit it and it is this the character selected to mark the beginning and end of the expressión. That depends on the language you are using, but normally, if that's the case, you can avoid it being interpreted as the regex delimiter by usin the escape character \\. So, for example, in vi(1) you can:
:1,$s/\/dev\/\(.*\)/\1/
to change all paths of the form /dev/(.*) into \1 (just eliminate the /dev/ part)
NOTE:
The avobe command could have been written in vi(1) (changing the delimiter) as:
:1,$s:/dev/\(.*\):\1:

Related

How to exclude part of string using regex and change add this part and the and of string?

I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)

Which characters must be escaped in a Perl regex pattern

Im trying to find files that are looking like this:
access_log-20160101
access_log-20160304
...
with perl regex i came up with something like this:
/^access_log-\d{8}$/
But im not sure about the "_" and the "-". are these metacharacter?
What is the expression for this?
i read that "_" in regex is something like \w, but how do i use them in my exypression?
/^access\wlog-\d{8}$/ ?
Underscore (_) is not a metacharacter and does not need to be quoted (though it won't change anything if you quote it).
Hyphen (-) IS a metacharacter that defines the range between two symbols inside a bracketed character class. However, in this particular position, it will be interpreted verbatim and doesn't need quoting since it is not inside [] with a symbol on both sides.
You can use your regexp as is; hyphens (-) might need quoting if your format changes in future.
Your regex pattern is exactly right
Neither underscore _ nor hyphen - need to be escaped. Outside a square-bracketed character class, the twelve Perl regex metacharacters are
Brackets ( ) [ {
Quantifiers * + ?
Anchors ^ $
Alternator |
Wild character .
The escape itself \
and only these must be escaped
If the pattern of your file names doesn't vary from what you have shown then the pattern that you are using
^access_log-\d{8}$
is correct, unless you need to validate the date string
Within a character class like [A-F] you must escape the hyphen if you want it to be interpreted literally. As it stands, that class is the equivalent to [ABCDEF]. If you mean just the three characters A, - or F then [A\-F] will do what you want, but it is usual to put the hyphen at the start or end of the class list to make it unambiguous. [-AF] and [AF-] are the same as [A\-F] and rather more readable

Regular Expression with wildcards to match any character

I am new to regex and I am trying to come up with something that will match a text like below:
ABC: (z) jan 02 1999 \n
Notes:
text will always begin with "ABC:"
there may be zero, one or more spaces between ':' and (z).
Variations of (z) also possible - (zz), (zzzzzz).. etc but always a
non-digit character enclosed in "()"
there may be zero,one or more
spaces between (z) and jan
jan could be jan, january, etc
date couldbe in any format and may/may not contain other text as part of it so
I would really like to know if there is a regex I can use to capture
anything and everything that is found between '(z)' and '\n'
Any help is greatly appreciated! Thank you
The following should work:
ABC: *\([a-zA-Z]+\) *(.+)
Explanation:
ABC: # match literal characters 'ABC:'
* # zero or more spaces
\([a-zA-Z]+\) # one or more letters inside of parentheses
* # zero or more spaces
(.+) # capture one or more of any character (except newlines)
To get your desired grouping based on the comments below, you can use the following:
(ABC:) *(\([a-zA-Z]+\).+)
Without knowing the exact regex implementation you're making use of, I can only give general advice. (The syntax I will be perl as that's what I know, some languages will require tweaking)
Looking at ABC: (z) jan 02 1999 \n
The first thing to match is ABC: So using our regex is /ABC:/
You say ABC is always at the start of the string so /^ABC/ will ensure that ABC is at the start of the string.
You can match spaces with the \s (note the case) directive. With all directives you can match one or more with + (or 0 or more with *)
You need to escape the usage of ( and ) as it's a reserved character. so \(\)
You can match any non space or newline character with .
You can match anything at all with .* but you need to be careful you're not too greedy and capture everything.
So in order to capture what you've asked. I would use /^ABC:\s*\(.+?\)\s*(.+)$/
Which I read as:
Begins with ABC:
May have some spaces
has (
has some characters
has )
may have some spaces
then capture everything until the end of the line (which is $).
I highly recommend keeping a copy of the following laying about
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
This should fulfill your requirements.
ABC:\s*(\(\D+\)\s*.*?)\\n
Here it is with some tests http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRiEjiUM/index.html
Futher reading on regular expressions: http://www.regular-expressions.info/characters.html

Regex for getting domain name from Referer

Am using the following regex to capture different parts of referer url. I want to capture protocol and domain and used it in diff scenarios.
Pattern pr=new Patters("^\w+://|[^\/:]+|[\w\W]*$");
But eclipse is giving me and error
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )..
Am new to regex. Can anyone help me on this?
You're supply a string to the Pattern constructor, so you need to escape the backslashes.
e.g.:
Pattern pr = new Pattern("^\\w+://|[^/:]+|[\\w\\W]*$");
Your regexp is probably not complete - you need to "group" the scheme and domain sections with brackets:
Pattern pr = new Pattern("^(\\w+)://([^/:]+)");
I've ignored everything after the next colon or slash - you said you only wanted the scheme and domain.
Regex uses "\"(i.g., \w, \W, \d, \D) as the starting character to define regex syntax. Java also uses "\" as well. Java also allows "\" to be used by adding an extra "\", so you would end up with "\\" in your code, this will escape the other backslash.
Just in case your solution in not what you expected try using "regexpal.com".
Remember, whenever you expect a single slash("\") in your outcome use a double slash("\\") in your code.

Slashes and hashes in Perl and metacharacters

Thanks for the previous assistance everyone!. I have a query regarding RegExp in Perl
My issue is..
I know, when matching you can write m// or // or ## (must include m or s if you use this). What is causing me the confusion is a book example on escaping characters I have. I believe most people escape lots of characters, as a sure fire way of the program working without missing a metacharacter something ie: \# when looking to match # say in an email address.
Here's my issue and I know what this script does:
$date= "15/12/99"
$date=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#; << why are no forward slashes escaped??
print($date);
Yet the later example I have, shows it rewritten, as (which i also understand and they're escaped)
$date =~ s/()(\d+)\/(\d+)\/(d+)/$2\/$1\/$3; <<<<which is escaping the forward slashes.
I know the slashes or hashes are programmer preference and their use. What I don't understand is why the second example, escapes the slashes, yet the first doesn't - I have tried and they work both ways. No escaping slashes with hashes? What's even MORE confusing is, looking at yet another book example I also have earlier to this one, using hashes again, they too escape the # symbol.
if ($address =~ m#\##) { print("That's an email address"); } or something similar
So what do you escape from what you don't using hashes or slashes? I know you have to escape metacharacters to match them but I'm confused.
When you build a regexp, you define a character as a delimiter for your regexp i.e. doing // or ##.
If you need to use that character inside your regexp, you will need to escape it so that the regexp engine does not see it as the end of the regexp.
If you build your regexp between forward slashes /, you will need to escape the forward slashes contained in your regexp, hence the escaping in your second example.
Of course, the same rule apply with any character you use as a regexp delimiter, not just forward slashes.
The forward slashes are not meta characters in themselves - only the use of them in the second example as expression separators makes them "special".
The format of a substitute expression is:
s<expression separator char><expression to look for><expression separator char><expression to replace with><expression separator char>
In the first example, using a hash as the first character after the =~ s, makes that character the expression separator, so forward slash is not special and does not require any escaping.
in the second example, the expression separator is indeed the forward slash, so it must be escaped within the expressions themselves.
The regex match-operator allows to define a custom non-whitespace-character as seperator.
In your first example the '#' is used as seperator. So in this regex you don't need to escape the '/' because it hase no special meaning. In the second regex, the seperator char isn't changed. So the default '/' is used. Now you have to escape all '/' in your pattern. Otherwise the parser is confused. :)
If you are not use slashes, the recommend practice is to use the curly braces and the /x modifier.
$date=~ s{ (\d+) \/ (\d+) \/ (\d+) }{$1/$2/$3}x;
Escaping the non-alphanumerics is also a standard even if they are not meta-characters. See perldoc -f quotemeta.
There is another depth to this question about escaping forward slashes with the s operator.
With my example the capturing becomes the problem.
$image_name =~ s/((http:\/\/.+\/)\/)/$2/g;
For this to work the typo with the addition of a second forward slash, had to be captured.
Also, trying to work with just the two slashes did not work. The first slash has to be led by more than one character.
Changing "http://world.com/Photos//space_shots/out_of_this_world.jpg"
To: "http://world.com/Photos/space_shots/out_of_this_world.jpg"