What does this regex expression in Qt mean? I can't understand the meaning behind the ?=.
QRegularExpression functionPattern("\\b[A-Za-z_][A-Za-z0-9_]*(?=\\()");
P.S.: This is a regex expression about parsing the C language function name.
First, remember to unescape the double backward slashes \\ into a single backslash \, obtaining the actual regex \b[A-Za-z_][A-Za-z0-9_]*(?=\().
Feeding the above regex into the debug page for this regex, we have the following:
As Nole pointed out, you should unescape the double backward slashes into a single backslash. A single backslash followed by certain characters has special meaning. E.g., \b means the boundary of a word and it doesn't capture anything. For example, \bword\b matches word, something, word, something else, but not password. (?=…) is a positive lookahead and it is a non-capturing group, i.e., it doesn't capture anything. It means that there should be … in that position. In our case, (?=\() means there should be ( in the position. Note that the single backslash before ( used to mean the literal ( and not its meaning in RegEx context, i.e., grouping.
The whole pattern means a word (and not part of a word, since we used \b at the beginning) that Starts with a letter or an underscore ([A-Za-z_]) that "may" followed by a letter, an underscore or a number ([A-Za-z0-9_]*; "may" refers to the *). And it should be followed by (.
Note again that this pattern captures whole the word up until the ( and not ( itself.
Related
I'm trying to capture text (any text) that falls between some kind of delimiter with word boundaries on each end, like so:
This is not the text. ##This is the text I want to capture.## This is also not the text. ##But I would like to capture this, too##.
I thought this would be easy with regex like this
\b([#]{2})(.*)(\1)\b
This doesn't produce a match and I can't figure why.
Note, I would also like to avoid capturing the text between the first '##' and the last '##', capturing both sections with all the text in between.
In other words I don't want one of the matches to be:
##This is the text I want to capture.## This is also not the text. ##But I would like to capture this, too##
georg and Ulugbek Umirov posted the perfect answer on this question as comment. I repeat the expression here with an explanation mainly to give the question an answer and therefore remove it from the list of unanswered questions.
##\b(.+?)## searches for a string
starting and ending with ## and
with a word character at beginning and
having 1 or more characters between.
Because of the parentheses the string found between ## is marked for backreference.
The question mark ? after the + multiplier changes the matching behavior from greedy to non greedy. The greedy expression .+ matches everything from first ## to last ## whereas the non greedy expression .+? matches just everything from first ## to next ##.
\b means word boundary and therefore the first character after ## must be a word character (letter, digit or underscore).
The matching behavior of . depends on a flag. The dot can match any character including line terminating characters, or any character except line terminating characters. Line terminating characters are carriage return (= \r = CR) and line feed (= newline = \n = LF).
If matching everything between two delimiter strings should be independent on matching behavior of the dot, it is better to use the regular expression ##\b([\w\W]+?)## like Ulugbek Umirov suggested as \w matches any word character and \W matches any non word character. Both in a character class definition matches therefore always any character including CR and LF.
It would be also possible to use ##\b([\s\S]+?)## where \s matches any whitespace character and \S matches any non whitespace character resulting with both in a character class definition in matching any character including CR and LF, too.
Further it would be possible to use ##(\w[\s\S]*?)## or ##\w([\w\W]*?)## or ##(\w.*?)## all resulting in the same matching behavior as all other expressions above, if the matching behavor for dot is any character including CR+LF.
Last, if the used regular expression engine supports lookbehind and lookahead, it would be also possible to match only the string between ## without matching the delimiters by using for example the regular expression (?<=##)\b[\w\W]+?(?=##) which makes the need of a marking group unnecessary. (?<=##) is a positive lookbehind expression and (?=##) is a positive lookahead expression both for the string ##.
the regex expression is as below:
if ($ftxt =~ m|/([^=]+)="(.+)"|o)
{
.....
}
this regex seems different from many other regex.What makes me confused is the "|" ,most regex use "/" instead of "|". And , group ([^=]+) also makes me confused.I know [^=] means "the start of the string" or "=",but what does it mean by repeat '^' one or more times? ,how to explain this?
You can use different delimiters instead of /. For instance you could use:
m#/([^=]+)="(.+)"#o
Or
m~/([^=]+)="(.+)"~o
The advantage here of using something different than / is that you don't have to escape slashes, because otherwise, you'd have to use:
m/\/([^=]+)="(.+)"/o
^
[Or [/]]
([^=]+) is a capture group, and inside, you have [^=]+. [^=] is a negated class and will match any character which is not a =.
^ behaves differently at the beginning of a character class and is not the same as ^ outside a character class which means 'beginning of line'.
As for the last part o, this is a flag which I haven't met so far so a little search brought me to this post, I quote:
The /o modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is.
Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use /o to compile the regex once despite further changes to the variable, but as the other answers noted, qr// is better for that.
Some regexp implementations allow you to use other special characters besides / as the delimiter. This is useful if you need to use that special character inside the regular expression itself, since you don't have to escape it. (In and of itself / is not a special character in regexp syntax, but it needs escaping if it's used in the regexp literal syntax of the host language.) The docs on Perl's quote operators mention this.
This is tutorial-level stuff: square brackets ([abc]) denote a character class - it means "any of the characters inside the brackets". (In my example, it means "either a or b or c.) Inside them, the ^ special character has a different meaning, it inverts the character class. So, [^=] means "any character except =", and [^=]+ means "one or more characters that aren't =".
Quoting the docs on Perl's RE syntax:
You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.
It is meant to match equation like expressions, to capture the key and values separately. Imagine you have a statement like height="30px", and you want to capture the height attribute name, as well as its value 30px.
So you have m|/([^=]+)="(.+)"|.
The key is supposed to be everything before the = is encountered. So [^=] captures it. The ^ is a negation metacharacter when used as the first character inside [] brackets. It means that it will match any character except =, which is what you want. The / is probably a mistake, if you need to capture the group, you should not use it, or if it is indeed intended, it means to literally match an opening parentheses. Since it is a special character, it needs to be escaped, that's why \(. if you mean to capture the group, it should be ([^=]+).
Next comes the = sign, which you don't care about. Then the quotes which contain the value. So you capture it like "(.+)". the .+ will go on matching greedily every character, including the final ". But then it will find that it can't match the final " in the regex, so it will backtrack, give up the last " the regex (.+) captured, so that leaves the string within the quotes to be captured in the group. Now you are ready to access the key and value through $1 and $2. Cool, isn't it?
I'd like to ask what the following emacs regular expression means (if anyone wonders, this is the regexp that erlang-mode uses for matching a single-quoted atom):
'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'
specifically I'm having trouble finding explanations for three things.
First, the question mark which supposedly should either make the preceding item optional or specify that the preceding quantifier make lazy, but there is no item or quantifier here, only the start of a new group so what effect does it have here?
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Third, the quadruple escape \\., wouldn't this leave you with an escaped backslash and a \. which would make it an invalid regexp?
Thanks
"[^\\']"
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Firstly note that In Emacs regexp syntax, \` matches the start of the string, and \' matches the end of the string. In multi-line strings this is different to the more familiar ^ and $, which match the beginning of a line and the end of a line.
However that is not relevant within a character alternative (square brackets), so this sequence is actually matching any character other than a backslash or an apostrophe.
Edit:
So from the comments, this is still causing confusion, so let's break it down:
"'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'"
That code evaluates to this string/regexp:
'\(?:[^\']\|\(?:\\.\)\)*'
' matches an apostrophe
\(?:foo\)* matches zero or more foo
foo\|bar matches either of foo or bar
[^\'] matches any character other than a backslash or an apostrophe
\(?:\\.\) could (in this case, being a non-capturing group which occurs exactly once) be rewritten as simply \\., and matches a backslash followed by any character other than a newline.
' matches an apostrophe
So the whole thing matches a single-quoted string in which:
any other single-quotes must each be preceded by a backslash
any backslash must be paired with another non-newline character (which could also be a backslash)
Which of course sounds like a typical string syntax in which backslashes can be used to escape special characters, including backslashes themselves and any instances of the delimiting quote character.
First: (?: groups multiple tokens together without creating a capturing group. This allows you to apply quantifiers to the full group.
Second and third, I think those are escaped bars. Each pair means \, and the quadruple means \\. So, its not scaping apostrophe at all.
I'm new to regular expression and I having trouble finding what "\'.-" means.
'/^[A-Z \'.-]{2,20}$/i'
So far from my research, I have found that the regular expression starts (^) and requires two to twenty ({2,20}) alphabetical (A-Z) characters. The expression is also case insensitive (/i).
Any hints about what "\'.-" means?
The character class is the entire expression [A-Z \'.-], meaning any of A-Z, space, single quote, period, or hyphen. The \ is needed to protect the single quote, since it's also being used as the string quote. This charclass must be repeated 2 to 20 times, and because of the leading ^ and trailing $ anchors that must be the entire content of the matching string.
It means to escape the single quote (') that delmits the regex (as to not prematurely end the string), and then a . which means a literal . and a - which means a literal -.
Inside of the character range, the . is treated literally, and if the - isn't part of a valid range, e.g. a-z, then it is treated literally as well.
Your regex says Match the characters a-zA-Z '.- between 2 and 20 times as the entire string, with an optional trailing \n.
This regex is in a string. The backslash is there to escape the single quote so the string doesn't end early, in the middle of the regex. The dot and dash are just what they are, a period and a dash.
So, you were nearly right, except it's 2-20 characters that are letters, space, single quote, period, or dash.
It's quoting the quote.
The regular expression is ^[A-Z'.-]{2,20}$.
In the programming language you are using, you write it as a quoted string:
'SOMETHING'
To get a single quote in there, it's been backslashed.
Everything inside the square brackets is part of the character class, and will match a single character listed. In your example, the characters listed are the letters A through Z, a space, a single quote, a period, or a hyphen. (Note the hyphen must be listed last to avoid indicating a range, like A-Z.) Your full regular expression will match between 2 and 20 of the listed characters. The single quote is needed so the compiler knows you are not ending the string that defines the regular expression.
Some examples of things this will match:
....................
abaca af - .
AAfa- - ..
.z
And so on.
When I searched about something, I found an answered question in this site. 2 of the answers contain
/([^.]*)\.(.*)/
on their answer.
The question is located at Find & replace jquery. I'm newbie in javascript, so I wonder, what does it mean? Thanks.
/([^.]*)\.(.*)/
Let us deconstruct it. The beginning and trailing slash are delimiters, and mark the start and end of the regular expression.
Then there is a parenthesized group: ([^.]*) The parentheseis are there just to group a string together. The square brackets denote a "character group", meaning that any character inside this group is accepted in its place. However, this group is negated by the first character being ^, which reverse its meaning. Since the only character beside the negation is a period, this matches a single character that is not a period. After the square brackets is a * (asterisk), which means that the square brackets can be matched zero or more times.
Then we get to the \.. This is an escaped period. Periods in regular expressions have special meaning (except when escaped or in a character group). This matches a literal period in the text.
(.*) is a new paranthesized sub-group. This time, the period matches any character, and the asterisk says it can be repeated as many times as needs to.
In summary, the expression finds any sequence of characters (that isn't a period), followed by a single period, again followed by any character.
Edit: Removed part about shortening, as it defeats the assumed purpose of the regular expression.
It's a regular expression (it matches non-periods, followed by a period followed by anything (think "file.ext")). And you should run, not walk, to learn about them. Explaining how this particular regular expression works isn't going to help you as you need to start simpler. So start with a regex tutorial and pick up Mastering Regular Expressions.
Original: /([^.]*)\.(.*)/
Split this as:
[1] ([^.]*) : It says match all characters except . [ period ]
[2] \. : match a period
[3] (.*) : matches any character
so it becomes
[1]Match all characters which are not . [ period ] [2] till you find a .[ period ] then [3] match all characters.
Anything except a dot, followed by a dot, followed by anything.
You can test regex'es on regexpal
It's a regular expression that roughly searches for a string that doesn't contain a period, followed by a period, and then a string containing any characters.
That is a regular expression. Regular expressions are powerful tools if you use them right.
That particular regex extracts filename and extension from a string that looks like "file.ext".
It's a regular expression that splits a string into two parts: everything before the first period, and then the remainder. Most regex engines (including the Javascript one) allow you to then access those parts of the string separately (using $1 to refer to the first part, and $2 for the second part).
This is a regular expression with some advanced use.
Consider a simpler version: /[^.]*\..*/ which is the same as above without parentheses. This will match just any string with at least one dot. When the parentheses are added, and a match happens, the variables \1 and \2 will contain the matched parts from the parentheses. The first one will have anything before the first dot. The second part will have everything after the first dot.
Examples:
input: foo...bar
\1: foo
\2: ..bar
input: .foobar
\1:
\2: foobar
This regular expression generates two matching expressions that can be retrieved.
The two parts are the string before the first dot (which may be empty), and the string after the first dot (which may contain other dots).
The only restriction on the input is that it contain at least one dot. It will match "." contrary to some of the other answers, but the retrived groups will be empty.
IMO /.*\..*/g Would do the same thing.
const senExample = 'I am test. Food is good.';
const result1 = senExample.match(/([^.]*)\.(.*)/g);
console.log(result1); // ["I am test. Food is good."]
const result2 = senExample.match(/^.*\..*/g);
console.log(result2); // ["I am test. Food is good."]
the . character matches any character except line break characters the \r or \n.
the ^ negates what follows it (in this case the dot)
the * means "zero or more times"
the parentheses group and capture,
the \ allows you to match a special character (like the dot or the star)
so this ([^.]*) means any line break repeated zero or more times (it just eats up carriage returns).
this (.*) part means any string of characters zero or more times (except the line breaks)
and the \. means a real dot
so the whole thing would match zero or more line breaks followed by a dot followed by any number of characters.
For more information and a really great reference on Regular Expressions check out: http://www.regular-expressions.info/reference.html
It's a regular expression, which basically is a pattern of characters that is used to describe another pattern of characters. I once used regexps to find an email address inside a text file, and they can be used to find pretty much any pattern of text within a larger body of text provided you write the regexp properly.