Decode the regexp string that matches the word in string - regex

I have the following regexp
var value = "hello";
"(?<start>.*?\W*?)(?<term>" + Regex.Escape(value) + #")(?<end>\W.*?)"
I'm trying to figure out the meaning, because it doesnt work against the single word.
for example, it matches "they said hello us", but fails for just "hello"
can you please help me to decode what does this regexp string mean?!
PS: it's .NET regexp

Its because of \W in last part. \W is non A-Z0-9_ char.
In "they said hello us", there is space after hello, but "hello" there is nothing there, thats why.
If you change it to (?<end>\W*.*?) it may work.
Actually, the regex itself does not make sense for me, it should rather like
"\b" + Regex.Escape(value) + "\b"
\b is word boundary

The regex may be trying to find a pattern comprising whole words, so that your hello example doesn't match, say, Othello. If so, the word boundary regex, \b, is tailor-made for the purpose:
#"\b(" + Regex.Escape(value) + #")\b"

if this is .NET regex and the Regex.escape() part is replaced with just 'hello' .. Regex Buddy says it means:
(?<start>.*?\W*?)(?<term>hello)(?<end>\W.*?)
Options: case insensitive
Match the regular expression below and capture its match into backreference with name “start” «(?<start>.*?\W*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “non-word character” «\W*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference with name “term” «(?<term>hello)»
Match the characters “hello” literally «hello»
Match the regular expression below and capture its match into backreference with name “end” «(?<end>\W.*?)»
Match a single character that is a “non-word character” «\W»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

Related

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

Simple Regex: match everything until the last dot

Just want to match every character up to but not including the last period
dog.jpg -> dog
abc123.jpg.jpg -> abc123.jpg
I have tried
(.+?)\.[^\.]+$
Use lookahead to assert the last dot character:
.*(?=\.)
Live demo.
This will do the trick
(.*)\.
Regex Demo
The first captured group contains the name. You can access it as $1 or \1 as per your language
Regular expressions are greedy by default. This means that when a regex pattern is capable of matching more characters, it will match more characters.
This is a good thing, in your case. All you need to do is match characters and then a dot:
.*\.
That is,
. # Match "any" character
* # Do the previous thing (.) zero OR MORE times (any number of times)
\ # Escape the next character - treat it as a plain old character
. # Escaped, just means "a dot".
So: being greedy by default, match any character AS MANY TIMES AS YOU CAN (because greedy) and then a literal dot.

regex in Perl to replace content containing double equal signs

I need a regex in Perl to turn this:
(== doc_url html/arbitrary_file_name.html ==)
into this:
(/doc_assets/legacy/html/arbitrary_file_name.html)
I've tried all kinds of things. My current attempt looks like this:
$content =~ s!\=\= doc_url ([\w\W]+?)\=\=!/doc_assets/legacy/$1!gis;
(In this particular attempt, I'm just letting the enclosing parentheses remain, since that doesn't change from the input to the output.)
Anyway, nothing is working for me. I assume it's the == throwing things off. Any help will be greatly appreciated.
I guess you need something like:
s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg
i.e.:
#!/usr/bin/perl
$subject = "(== doc_url html/arbitrary_file_name.html ==)";
$subject =~ s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg;
print $subject;
#(/doc_assets/legacy/html/arbitrary_file_name.html)
Ideone Demo
Regex Explanation:
.*?doc_url (.*?/.*?) .*
Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ don’t match at line breaks; Numbered capture
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character string “doc_url ” literally (case sensitive) «doc_url »
Match the regex below and capture its match into backreference number 1 «(.*?/.*?)»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “/” literally «/»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “ ” literally « »
Match any single character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
(/doc_assets/legacy/$1)
Insert the character string “(/doc_assets/legacy/” literally «(/doc_assets/legacy/»
Insert the text that was last matched by capturing group number 1 «$1»
Insert the character “)” literally «)»

Why does this regular expression match as few characters as possible?

I can match an 'a' followed by at least 2 other characters before another 'a' with the following regular expression.
a.{2,}?a
Interestingly, including the question mark makes the regex match the instance with the fewest number of middle characters possible, so for instance, given the following string,
abbabbbba
the regex will match the leftmost abba instead of the whole string. Why does including the question mark cause the regex to match the instance with the fewest number of middle characters?
The question mark after a quantifier makes the quantifier lazy. It is a basic feature of regex, you need to learn more about it.
a link: regular-expressions.info
(?:or|and) the one in hwnd comment.
? implies a lazy match
here is the details of your regex
/a.{2,}?a/
a matches the character a literally (case sensitive)
. matches any character (except newline)
{2,} Quantifier: Between 2 and unlimited times
? as few times as possible, expanding as needed [lazy]
a matches the character a literally (case sensitive)

regex extract word in path

I need a regex for get a word in a path
example:
(update)
/var/log/rsyslog/apache/test1/2014/05/file1.log
/var/log/rsyslog/apache/test2/2014/05/file2.log
/var/log/rsyslog/apache/test3/2014/05/file3.log
the output should be
test1
test2
test3
thank you for your help
I'm not sure which language you're using, in general, this regex works:
/\/(.*?)\.log/
Regex Explanation
/(.*?)\.log
Match the character “/” literally «/»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “.” literally «\.»
Match the characters “log” literally «log»