regex in Perl to replace content containing double equal signs - regex

I need a regex in Perl to turn this:
(== doc_url html/arbitrary_file_name.html ==)
into this:
(/doc_assets/legacy/html/arbitrary_file_name.html)
I've tried all kinds of things. My current attempt looks like this:
$content =~ s!\=\= doc_url ([\w\W]+?)\=\=!/doc_assets/legacy/$1!gis;
(In this particular attempt, I'm just letting the enclosing parentheses remain, since that doesn't change from the input to the output.)
Anyway, nothing is working for me. I assume it's the == throwing things off. Any help will be greatly appreciated.

I guess you need something like:
s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg
i.e.:
#!/usr/bin/perl
$subject = "(== doc_url html/arbitrary_file_name.html ==)";
$subject =~ s!.*?doc_url (.*?/.*?) .*!(/doc_assets/legacy/$1)!sg;
print $subject;
#(/doc_assets/legacy/html/arbitrary_file_name.html)
Ideone Demo
Regex Explanation:
.*?doc_url (.*?/.*?) .*
Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ don’t match at line breaks; Numbered capture
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character string “doc_url ” literally (case sensitive) «doc_url »
Match the regex below and capture its match into backreference number 1 «(.*?/.*?)»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “/” literally «/»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “ ” literally « »
Match any single character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
(/doc_assets/legacy/$1)
Insert the character string “(/doc_assets/legacy/” literally «(/doc_assets/legacy/»
Insert the text that was last matched by capturing group number 1 «$1»
Insert the character “)” literally «)»

Related

How would I detect superscript for one word if there's no parentheses, but if there are parentheses, for all the contents of them?

I want to detect the two following circumstances, preferably with one regex:
This is a sentence ^that I wrote today.
And:
This is a sentence ^(that I wrote) today.
So basically, if there are parentheses after the caret, I want to match whatever is inside them. Otherwise, I just want to match just the next word.
I'm new to regex. Is this possible without making it too complicated?
\^(\w+|\([\w ]+\))
Options: case insensitive; ^ and $ match at line breaks
Match the character “^” literally «\^»
Match the regular expression below and capture its match into backreference number 1 «(\w+|\([\w ]+\))»
Match either the regular expression below (attempting the next alternative only if this one fails) «\w+»
Match a single character that is a “word character” (letters, digits, etc.) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\([\w ]+\)»
Match the character “(” literally «\(»
Match a single character present in the list below «[\w ]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A word character (letters, digits, etc.) «\w»
The character “ ” « »
Match the character “)” literally «\)»
Created with RegexBuddy

Javascript transformation

Is there any simple way to transform:
"<A[hello|home]>"
to:
"hello|home"
Thanks!
Apart from the clever advice in the comments to simply remove certain characters, if you are unable to remove these characters because they are present elsewhere in the text and do want to match that format, here is a way to do it with regex:
Search: <\w+\[([^|]*\|[^\]]*)\]>
Replace: \1 or $1 depending on editor or regex engine.
See the Substitution pane at the bottom of the demo.
Explanation
<\w+\[([^|]*\|[^\]]*)\]>
Match the character “<” literally <
Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the character “[” literally \[
Match the regex below and capture its match into backreference number 1 ([^|]*\|[^\]]*)
Match any character that is NOT a “|” [^|]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “|” literally \|
Match any character that is NOT a “]” [^\]]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “]” literally \]
Match the character “>” literally >
\1
Insert the backslash character \
Insert the character “1” literally 1

regex extract word in path

I need a regex for get a word in a path
example:
(update)
/var/log/rsyslog/apache/test1/2014/05/file1.log
/var/log/rsyslog/apache/test2/2014/05/file2.log
/var/log/rsyslog/apache/test3/2014/05/file3.log
the output should be
test1
test2
test3
thank you for your help
I'm not sure which language you're using, in general, this regex works:
/\/(.*?)\.log/
Regex Explanation
/(.*?)\.log
Match the character “/” literally «/»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “.” literally «\.»
Match the characters “log” literally «log»

Regex for 2 items but with one exclusion

I am building a RegEx that needs to find lines that have either:
DateTime.Now
or
Date.Now
But cannot have the literal "SystemDateTime" on the same line.
I started with this (DateTime\.Now|Date\.Now) but now I am stuck with where to put the "SystemDateTime"
Use this. Assuming you are not using /s modifier(or DOTALL) which takes newline characters under the dot(.)
(?!.*SystemDateTime)(DateTime\.Now|Date\.Now)
(?!.*SystemDateTime) means there is no SystemDateTime in front.
You could use negative lookahead like this:
(?!.*SystemDateTime)\bDate(?:Time)?\.Now\b
/(?!.*SystemDateTime)Date(?:Time)?\.Now/
DEMO
EXPLANATION:
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*SystemDateTime)»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the characters “SystemDateTime” literally «SystemDateTime»
Match the characters “Date” literally «Date»
Match the regular expression below «(?:Time)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the characters “Time” literally «Time»
Match the character “.” literally «\.»
Match the characters “Now” literally «Now»

Decode the regexp string that matches the word in string

I have the following regexp
var value = "hello";
"(?<start>.*?\W*?)(?<term>" + Regex.Escape(value) + #")(?<end>\W.*?)"
I'm trying to figure out the meaning, because it doesnt work against the single word.
for example, it matches "they said hello us", but fails for just "hello"
can you please help me to decode what does this regexp string mean?!
PS: it's .NET regexp
Its because of \W in last part. \W is non A-Z0-9_ char.
In "they said hello us", there is space after hello, but "hello" there is nothing there, thats why.
If you change it to (?<end>\W*.*?) it may work.
Actually, the regex itself does not make sense for me, it should rather like
"\b" + Regex.Escape(value) + "\b"
\b is word boundary
The regex may be trying to find a pattern comprising whole words, so that your hello example doesn't match, say, Othello. If so, the word boundary regex, \b, is tailor-made for the purpose:
#"\b(" + Regex.Escape(value) + #")\b"
if this is .NET regex and the Regex.escape() part is replaced with just 'hello' .. Regex Buddy says it means:
(?<start>.*?\W*?)(?<term>hello)(?<end>\W.*?)
Options: case insensitive
Match the regular expression below and capture its match into backreference with name “start” «(?<start>.*?\W*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “non-word character” «\W*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference with name “term” «(?<term>hello)»
Match the characters “hello” literally «hello»
Match the regular expression below and capture its match into backreference with name “end” «(?<end>\W.*?)»
Match a single character that is a “non-word character” «\W»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»