I am having a lot of trouble interpreting this expression and I am getting really lost trying to read it. can someone help me?
^[^?](?:htaccess|access_log)(?:[.][^/?])?(?:[~])?(?:[?].*)?$
I know that ^ means to start at the beginning of the line, [^?] not matching a "?" i think, and then (?:) not sure what this does or how to interpret the rest of the line. Im thinking that htaccess|access_log means its an or statement so either htacces or access_log. [.][^/?] is a . followed by not a "?" but then what would the earlier [^?] mean...
What would an example of something this matches?
There are plenty of explainers that will breakdown a regular expression for you.
To be concise, the caret inside of a character class [^ ] is the negation operator, meaning match anything NOT in the character class. The ?: placed inside of an opening parentheses is a non-capturing group which specifies that the group is not to be captured, but to group expressions, and | is the alternation operator.
I would recommend taking a look at these sites for basic use of regular expressions.
Regular-Expressions.info
Rexegg (Regex Tutorial)
Regular Expression:
^ # the beginning of the string
[^?] # any character except: '?'
(?: # group, but do not capture:
htaccess # 'htaccess'
| # OR
access_log # 'access_log'
) # end of grouping
(?: # group, but do not capture (optional):
[.] # any character of: '.'
[^/?] # any character except: '/', '?'
)? # end of grouping
(?: # group, but do not capture (optional):
[~] # any character of: '~'
)? # end of grouping
(?: # group, but do not capture (optional):
[?] # any character of: '?'
.* # any character except \n (0 or more times)
)? # end of grouping
$ # before an optional \n, and the end of the string
Related
I have this pattern (?<!')(\w*)\((\d+|\w+|.*,*)\) that is meant to match strings like:
c(4)
hello(54, 41)
Following some answers on SO, I added a negative lookbehind so that if the input string is preceded by a ', the string shouldn't match at all. However, it still partially matches.
For example:
'c(4) returns (4) even though it shouldn't match anything because of the negative lookbehind.
How do I make it so if a string is preceded by ' NOTHING matches?
Since nobody came along, I'll throw this out to get you started.
This regex will match things like
aa(a , sd,,,f,)
aa( as , " ()asdf)) " ,, df, , )
asdf()
but not
'ab(s)
This will fix the basic problem (?<!['\w])\w*
Where (?<!['\w]) will not let the engine skip over a word char just
to satisfy the not quote.
Then the optional words \w* to grab all the words.
And if a 'aaa( quote is before it, then it won't match.
This regex here embellishes what I think you are trying to accomplish
in the function body part of your regex.
It might be a little overwhelming to understand at first.
(?s)(?<!['\w])(\w*)\(((?:,*(?&variable)(?:,+(?&variable))*[,\s]*)?)\)(?(DEFINE)(?<variable>(?:\s*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')\s*|[^()"',]+)))
Readable version (via: http://www.regexformat.com)
(?s) # Dot-all modifier
(?<! ['\w] ) # Not a quote, nor word behind
# <- This will force matching a complete function name
# if it exists, thereby blocking a preceding quote '
( \w* ) # (1), Function name (optional)
\(
( # (2 start), Function body
(?: # Parameters (optional)
,* # Comma (optional)
(?&variable) # Function call, get first variable (required)
(?: # More variables (optional)
,+ # Comma (required)
(?&variable) # Variable (required)
)*
[,\s]* # Whitespace or comma (optional)
)? # End parameters (optional)
) # (2 end)
\)
# Function definitions
(?(DEFINE)
(?<variable> # (3 start), Function for a single Variable
(?:
\s*
(?: # Double or single quoted string
"
[^"\\]*
(?: \\ . [^"\\]* )*
"
|
'
[^'\\]*
(?: \\ . [^'\\]* )*
'
)
\s*
| # or,
[^()"',]+ # Not quote, paren, comma (can be whitespace)
)
) # (3 end)
)
Please help create a regular expression that would be allocated "|" character everywhere except parentheses.
example|example (example(example))|example|example|example(example|example|example(example|example))|example
After making the selection should have 5 characters "|" are out of the equation. I want to note that the contents within the brackets should remain unchanged including the "|" character within them.
Considering you want to match pipes that are outside any set of parentheses, with nested sets, here's the pattern to achieve what you want:
Regex:
(?x) # Allow comments in regex (ignore whitespace)
(?: # Repeat *
[^(|)]*+ # Match every char except ( ) or |
( # 1. Group 1
\( # Opening paren
(?: # chars inside:
[^()]++ # a. everything inside parens except nested parens
| # or
(?1) # b. nested parens (recurse group 1)
) #
\) # Until closing paren.
)?+ # (end of group 1)
)*+ #
\K # Keep text out of match
\| # Match a pipe
regex101 Demo
One-liner:
(?:[^(|)]*+(\((?:[^()]++|(?1))\))?+)*+\K\|
regex101 Demo
This pattern uses some advanced features:
Possessive quantifiers
Recursion
Resetting the match start
I would like to remove some strings from filename.
I want to remove every string in bracket but not if there is a string "remix" or "Remix" or "REMIX"
Now I have got
sed "s/\s*\(\s?[A-z0-9. ]*\)//g"
but how to exclude cases when there is remix in string?
You can use a capture group:
sed 's/\(\s*([^)]*remix[^)]*)\)\|\s*(\s\?[a-z0-9. ]*)/\1/gi'
When the "remix branch" doesn't match, the capture group is not defined and the matched part is replaced with an empty string.
When the "remix branch" succeeds, the matched part is replaced by the content of the capture group, so by itself.
Note: if that helps to avoid false positive, you can add word-boundaries around "remix": \bremix\b
pattern details:
\( # open the capture group 1
\s* # zero or more white-spaces
( # a literal parenthesis
[^)]* # zero or more characters that are not a closing parenthesis
remix
[^)]*
)
\) # close the capture group 1
\| # OR
# something else between parenthesis
\s* # note that it is essential that the two branches are able to
# start at the same position. If you remove \s* in the first
# branch, the second branch will always win when there's a space
# before the opening parenthesis.
(\s\?[a-z0-9. ]*)
\1 is the reference to the capture group 1
i makes the pattern case-insensitive
[EDIT]
If you want to do it in a POSIX compliant way, you must use a different approach because several Gnu features are not available, in particular the alternation \| (but also the i modifier, the \s character class, the optional quantifier \?).
This other approach consists to find all eventual characters that are not an opening parenthesis and all eventual substrings enclosed between parenthesis with "remix" inside, followed by eventual white-spaces and an eventual substring enclosed between parenthesis.
As you can see all is optional and the pattern can match an empty string, but it isn't a problem.
All before the parenthesis part to remove is captured in group 1.
sed 's/\(\([^(]*([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)[^ \t(]*\([ \t]\{1,\}[^ \t(]\{1,\}\)*\)*\)\([ \t]*([^)]*)\)\{0,1\}/\1/g;'
pattern details:
\( # open the capture group 1
\(
[^(]* # all that is not an opening parenthesis
# substring enclosed between parenthesis without "remix"
( [^)]* [Rr][Ee][Mm][Ii][Xx] [^)]* )
# Let's reach the next parenthesis without to match the white-spaces
# before it (otherwise the leading white-spaces are not removed)
[^ \t(]* # all that is not a white-space or an opening parenthesis
# eventual groups of white-spaces followed by characters that are
# not white-spaces nor opening parenthesis
\( [ \t]\{1,\} [^ \t(]\{1,\} \)*
\)*
\) # close the capture group 1
\(
[ \t]* # leading white-spaces
([^)]*) # parenthesis
\)\{0,1\} # makes this part optional (this avoid to remove a "remix" part
# alone at the end of the string)
The word boundaries in this mode aren't available too. So the only way to emulate them is to list the four possibilities:
([Rr][Ee][Mm][Ii][Xx]) # poss1
([Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss2
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx]) # poss3
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss4
and to replace ([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*) with:
\(poss1\)\{0,\}\(poss2\)\{0,\}\(poss3\)\{0,\}\(poss4\)\{0,\}
Just skip the lines matching "remix":
sed '/([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)/! s/([^)]*)//g'
where bracket are (US) :[]
sed '/remix\|REMIX\|Remix/ !s/\[[^]]*]//g'
where bracet (ROW): ()
sed '/remix\|REMIX\|Remix/ !s/([^)]*)//g'
assuming:
- there is no internal bracket
- Other form of remix are excluced (ReMix, ...), so line is deleted
- Remix could be any place in title (i love remix) [if needed specify which to take and remove]
I'm trying to create a regex which will match either one of the following -
FVAL(A)
FVAL("A")
FVAL(A,B)
FVAL("A",B)
FVAL("A","B")
FVAL(A,"B")
FVAL(A,B,C)
FVAL("A",B,C)
FVAL("A","B",C)
FVAL("A","B","C")
FVAL("A",B,"C")
FVAL(A,"B","C")
Regex -
FVAL\s*\(\s*["*]\s*\w+\s*["*]\s*,*\s*["*]\s*\w+\s*["*]\s*,*\s*,*\s*["*]\s*\w+\s*["*]\s*\)
This regex is supposed to return all and any form of the function that is used.
For e.g. -
If match string were - FVAL(A,"B")+5 then match group should be FVAL(A,"B")
P.S. - I'm ignoring white spaces in match string, but they can be there.
Your expression is way too complicated.
FVAL\("?\w+"?(?:,"?\w+"?){0,2}\)
Breakdown:
FVAL # "FVAL"
\( # "("
"? # an optional double quote
\w+ # at least one word character
"? # an optional double quote
(?: # group
, # a comma
"?\w+"? # quote - word character - quote
){0,2} # end group, repeat 0-2 times
\) # ")"
Insert whitespace \s into the expression where you see fit.
/^([a-z]:)?\//i
I don't quite understand what the ? in this regex if I had to explain it from what I understood:
Match begin "Group1 is a to z and :" outside ? (which I don't get what its doing) \/ which makes it match / and option /i "case insensitive".
I realize that this will return 0 or 1 not quiet sure why because of the ?
Is this to match directory path or something ?
If I test it:
$var = 'test' would get 0 while $var ='/test'; would get 1 but $var = 'test/' gets 0
So anything that begins with / will get 1 and anything else 0.
Can someone explain this regex in elementary terms to me?
See YAPE::Regex::Explain:
#!/usr/bin/perl
use strict; use warnings;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/^([a-z]:)?\//i)->explain;
The regular expression:
(?i-msx:^([a-z]:)?/)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
)? end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
It matches a lower- or upper case letter ([a-z] with the i modifier) positioned at the start of the input string (^) followed by a colon (:) all optionally (?), followed by a forward slash \/.
In short:
^ # match the beginning of the input
( # start capture group 1
[a-z] # match any character from the set {'A'..'Z', 'a'..'z'} (with the i-modifier!)
: # match the character ':'
)? # end capture group 1 and match it once or none at all
\/ # match the character '/'
? will match one or none of the preceding pattern.
? Match 1 or 0 times
See also: perldoc perlre
Explanation:
/.../i # case insensitive
^(...) # match at the beginning of the string
[a-z]: # one character between 'a' and 'z' followed by a colon
(...)? # zero or one time of the group, enclosed in ()
So in english: Match anything which begins with a / (slash) or some letter followed by a colon followed by a /. This looks like it matches pathnames across unix and windows, e.g.
it would match:
/home/user
and
C:/Applications
etc.
It looks like it is looking for a "rooted" path. It will successfully match any string that either starts with a forward slash (/test), or a drive letter followed by a colon, followed by a forward slash (c:/test).
Specifically, the question mark makes something optional. It applies to the part in parentheses, which is a letter followed by a colon.
Things that will match:
C:/
a:/
/
(That last item above is why the ? is there)
Things that will not match:
C:
a:
ab:/
a/