The following code below is trying to match a format like
[a=>]b[->c][d:e]
where a=>, ->c, d:e are optional.
($reg =~ /^
(?:([\w\/]+)=>)? # (optional)
(\w+) # (required)
(?:->(\w+))? # (optional)
(\[\d+\]|\[\d+:\d+\])? # (optional)
.$/x)
or croak ("-E Invalid register format );
When I give the input as sample=>STATUS as $reg value, the last S of STATUS is getting truncated. Why?
The regex symbol . just before your $ line-end symbol captures "one thing" which in your case, seems to be the last letter S
This means that your regex is almost right, but that "one thing" needed to be satisfied by the regex, so the regex matcher rewound the required (\w+) pattern by one character to give the . its demanded character.
you need to add, and escape, the square brackets
my $regex=qr{^
(?:(\[[\w\/]+)=>\])? # (optional)
(\w+) # (required)
(?:\[->(\w+)\])? # (optional)
(?:\[\w+\]|\[\w+:\w+\])? # (optional)
}x;
Related
Have a string that starts with a # symbol and would like to add the same # symbol.
The string could contain any type of lower/upper, numbers, comas, periods, etc. Each string is a single separate line.
Here is what I have tried with no success:
Find: (?=#)([\S\s]+) # www.admitme.app
Find: (?=#\B)([\S\s]+) # Carlo Baxter
Find: (?=#\B)([A-Za-z0-9]+) # resumes in 15 minutes
Replace: $1 # # resumes in 15 minutes #
Yes I'm a noob with regex...
Thanks in advance
Hank K
The following pattern is working in your demo:
(?=#\B)(.*)
This works in multiline mode, because then the .* will not match across newlines. You were using [\s\S]*, which will match across newlines, even in multiline mode. Here is the updated demo.
You can do the same replacement without lookarounds or capture groups using one of these patterns. The point is to match any character without newlines using .* (And not have a flag set to have the dot matches newlines)
#\B.*
# .*
In the replacement use the matched text followed by a space and #
$0 #
See a regex demo.
Is it possible to perform my Regex on the second occurrence of a specific symbol only?
Regex being used:
#.*
Example data:
Stack#overflow:Stack#overflow
Desired output:
Stack#overflow:Stack
As you can see in the output, everything including and after the second occurrence of an # has been removed, but the text before it stays.
I'm using Notepad++ or any text editor which allows Regex's to be used.
The pattern #.* will match the first occurrence of # and will then match any char except a newline until the end of the string and the dot will also match all following # signs.
If if : should also be a boundary, you could use a negated character class to match any char except # or :
[^#:]+#[^#:]+:[^#:]+
[^#:]+ Match any char except # or :
# Match literally
[^#:]+ Match any char except # or :
: Match literally
[^#:]+ Match any char except # or :
Regex demo
You can try
^[^#]*#[^#]*
Regex Demo
Ctrl+H
Find what: ^.+\K#.+$
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
# # literally #
.+ # 1 or more any character but newline
$ # end of line
Given:
Stack#overflow:Stack#overflow
Result for given example:
Stack#overflow:Stack
Screen capture (before):
Screen capture (after):
Try this approach:
regex = /(.*?#.*?)#.*/;
or just
(.*?#.*?)#.*
link to solution
I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this:
translate('hello world');
The other params are optional, but could be included like this:
translate('hello world', true, 1, 'foobar', 'etc');
And the string itself could contain escaped single quotes, like this:
translate('hello\'s world');
To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect:
grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" .
The problem with this though, is that if the call is something like this:
translate('hello \'world\', you\'re great!');
My grep would only return this:
hello \'world\
So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \
Hopefully I'm making sense. Any suggestions please?
You can use this grep with PCRE regex:
grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .
Here is a regex demo
RegEx Breakup:
\b # word boundary
translate # match literal translate
\( # match a (
\s* # match 0 or more whitespace
\K # reset the matched information
' # match starting single quote
(?: # start non-capturing group
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
) # end non-capturing group
(?: # start non-capturing group
\\\\. # match a backslash followed by char that is "escaped"
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
)* # end non-capturing group
' # match ending single quote
Here is a version without \K using look-arounds:
grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .
RegEx Demo 2
I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern.
In these cases, I like to use something like the following reasoning:
A string is a quote, zero or more characters, and a quote: '.*'
A character is anything that isn't a quote (because a quote terminates the string): '[^']*'
Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'
Put it all together and you get
grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .
I would highly appreciate if somebody could help me understand the following.
=~/(?<![\w.])($val)(?![\w.])/gi)
This what i picked up but i dont understand this.
Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively).
The regex seems to search for $val (i.e. string that matches the contents of the variable $val) not surrounded by word characters or dots.
Putting $val into parentheses remembers the corresponding matched part in $1.
See perlre for details.
Note that =~ is not part of the regex, it's the "binding operator".
Similarly, gi) is part of something bigger. g means the matching happens globally, which has different effects based on the context the matching occurs in, and i makes the match case insensitive (which could only influence $val here). The whole expression was in parentheses, probably, but we can't see the opening one.
Read (?<!PAT) as "not immediately preceded by text matching PAT".
Read (?!PAT) as "not immediately followed by text matching PAT".
I use these sites to help with testing and learning and decoding regex:
https://regex101.com/: This one dissects and explains the expression the best IMO.
http://www.regexr.com/
define $val then watch the regex engine work with rxrx - command-line REPL and wrapper for Regexp::Debugger
it shows output like this but in color
Matched
|
VVV
/(?<![\w.])(dog)(?![\w.])/
|
V
'The quick brown fox jumps over the lazy dog'
^^^
[Visual of regex at 'rxrx' line 0] [step: 189]
It also gives descriptions like this
(?<! # Match negative lookbehind
[\w.] # Match any of the listed characters
) # The end of negative lookbehind
( # The start of a capturing block ($1)
dog # Match a literal sequence ("dog")
) # The end of $1
(?! # Match negative lookahead
[\w.] # Match any of the listed characters
) # The end of negative lookahead
I am trying to write a regular expression to replace a string in 1st column of text file using perl. I have tried the following
foreach(#filecontents)
{
$_=~ s/($usersearch)\t|$usersearch\s\w+\t/$userreplace/gi;
}
This works with the data i have tested but Is there a better way to do it?
You can use the ^ anchor (start of the string anchor) and you can short the pattern a little:
$_ =~ s/^$usersearch(?:\s\w+)??\t/$userreplace/i;
Instead of using a lazy quantifier ?? you can write:
$_ =~ s/^$usersearch(?:[^\S\t]\w+)?\t/$userreplace/i;
The result can be a little faster with this second version.
Descriptions:
(?:..) # is a non capturing group, it's only used to group elements
# together without capturing
?? # is the lazy version of the ? quantifier (zero or one time)
(?:..)?? # means "match the group only if needed"
# (vs (?:..)? # means "match the group if it is possible")
[^\S\t] # a character class that contains all white characters except the tab
# the ^ at the begining is used to negate the class, \S is all that
# is not a white character ( \s <=> [^\S] ), you only need to add \t
# to exclude it.
Note: if your variable $usersearch may contain regex special characters, don't forget to use quotemeta before using it in a pattern.