MarkLogic matches function with regular expression - regex

I am new to XQuery so could you please help me to understand what is § and §.*$ in below MarkLogic XQuery:
if (matches($cite, '§'))
replace($cite,'§.*$','')
here $cite := "HI CONST Preamble"

In regex, the $ in a regex expression is an anchor point to the end of the input string value.
§ is a numeric entity reference of the § character, and . is a wildcard and the * is a quantifier meaning zero to many.
The matches() expression is testing whether the $cite contains the § character. If it does, then it attempts to replace() the § and all of the characters following it until the end of the input with nothing.
For example:
let $cite := "HI CONST Preamble"
return
if (matches($cite, '§'))
then replace($cite,'§.*$','')
else "no match"
returns: "no match" because it doesn't contain the § at all.
However, this:
let $cite := "HI CONST Preamble §foo bar baz."
return
if (matches($cite, '§'))
then replace($cite,'§.*$','')
else "no match"
Returns: "HI CONST Preamble" because it does contain §, so §foo bar baz. is replaced with "".

Related

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

recursive matching for string delimiter with regular expression

In verilog language, the statements are enclosed in a begin-end delimiter instead of bracket.
always# (*) begin
if (condA) begin
a = c
end
else begin
b = d
end
end
I'd like to parse outermost begin-end with its statements to check coding rule in python. Using regular expression, I want results with regular expression like:
if (condA) begin
a = c
end
else begin
b = d
end
I found similar answer for bracket delimiter.
int funcA() {
if (condA) {
b = a
}
}
regular expression:
/({(?>[^{}]+|(?R))*})/g
However, I don't know how to modify atomic group ([^{}]) for "begin-end"?
/(begin(?>[??????]+|(?R))*end)/g
The point of the [??????]+ part is to match any text that does not match a char that is equal or is the starting point of the delimiters.
So, in your case, you need to match any char other than a char that starts either begin or end substring:
/begin(?>(?!begin|end).|(?R))*end/gs
See the regex demo
The . here will match any char including line break chars due to the s modifier. Note that the actual implementation might need adjustments (e.g. in PHP, the g modifier should not be used as there are specific functions/features for that).
Also, since you recurse the whole pattern, you need no outer parentheses.

How to remove only symbols from string in dart

I want to remove all special symbols from string and have only words in string
I tried this but it gives same output only
main() {
String s = "Hello, world! i am 'foo'";
print(s.replaceAll(new RegExp('\W+'),''));
}
output : Hello, world! i am 'foo'
expected : Hello world i am foo
There are two issues:
'\W' is not a valid escape sequence, to define a backslash in a regular string literal, you need to use \\, or use a raw string literal (r'...')
\W regex pattern matches any char that is not a word char including whitespace, you need to use a negated character class with word and whitespace classes, [^\w\s].
Use
void main() {
String s = "Hello, world! i am 'foo'";
print(s.replaceAll(new RegExp(r'[^\w\s]+'),''));
}
Output: Hello world i am foo.
Fully Unicode-aware solution
Based on What's the correct regex range for javascript's regexes to match all the non word characters in any script? post, bearing in mind that \w in Unicode aware regex is equal to [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}], you can use the following in Dart:
void main() {
String s = "Hęllo, wórld! i am 'foo'";
String regex = r'[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s]+';
print(s.replaceAll(RegExp(regex, unicode: true),''));
}
// => Hęllo wórld i am foo
The docs for the RegExp class state that you should use raw strings (a string literal prefixed with an r, like r"Hello world") if you're constructing a regular expression that way. This is particularly necessary where you're using escapes.
In addition, your regex is going to catch spaces as well, so you'll need to modify that. You can use RegExp(r"[^\s\w]") instead - that matches any character that's not whitespace or a word character
I found this question looking for how to remove a symbol from a string. For others who come here wanting to do that:
final myString = 'abc=';
final withoutEquals = myString.replaceAll(RegExp('='), ''); // abc
First solution
s.replaceAll(RegExp(",|!|'"), ""); // The | operator works as OR
Second solution
s.replaceAll(",", "").replaceAll("!", "").replaceAll("'", "");
Removing characters "," from string:
String myString = "s, t, r";
myString = myString.replaceAll(",", ""); // myString is "s t r"

Why does this Swift RegEx match "e1234"?

Here's the regular expression:
let legalStr = "(?:[eE][\\+\\-]?[0-9]{1,3})?$"
Here's the invocation:
if let match = sender.stringValue.rangeOfString(legalStr, options: .RegularExpressionSearch) {
print("\(sender.stringValue) is legal")
}
else {
print( "\(sender.stringValue) is not legal")
}
If I type garbage, like "abcd" is returns illegal string.
If I type something like "e123" it returns legal string.
(note that the empty string is also legal.)
However, if I type "e1234" it still returns "legal". I'd expect it to return "not legal". Am I missing something here? BTW, note the "$" at the end of the regular expression. The three digits should appear at the end of the string.
If it's not immediately clear, the source of the string is a text edit box.
Your pattern is only anchored at the end, and matches the empty string. So any string at all will match successfully by just matching your pattern as an empty string at the end.
Add a ^ to the front to anchor it on that side, too.

Difference between std::regex_match & std::regex_search?

Below program has been written to fetch the "Day" information using the C++11 std::regex_match & std::regex_search. However, using the first method returns false and second method returns true(expected). I read the documentation and already existing SO question related to this, but I do not understand the difference between these two methods and when we should use either of them? Can they both be used interchangeably for any common problem?
Difference between regex_match and regex_search?
#include<iostream>
#include<string>
#include<regex>
int main()
{
std::string input{ "Mon Nov 25 20:54:36 2013" };
//Day:: Exactly Two Number surrounded by spaces in both side
std::regex r{R"(\s\d{2}\s)"};
//std::regex r{"\\s\\d{2}\\s"};
std::smatch match;
if (std::regex_match(input,match,r)) {
std::cout << "Found" << "\n";
} else {
std::cout << "Did Not Found" << "\n";
}
if (std::regex_search(input, match,r)) {
std::cout << "Found" << "\n";
if (match.ready()){
std::string out = match[0];
std::cout << out << "\n";
}
}
else {
std::cout << "Did Not Found" << "\n";
}
}
Output
Did Not Found
Found
25
Why first regex method returns false in this case?. The regex seems to be correct so ideally both should have been returned true. I ran the above program by changing the std::regex_match(input,match,r) to std::regex_match(input,r) and found that it still returns false.
Could somebody explain the above example and, in general, use cases of these methods?
regex_match only returns true when the entire input sequence has been matched, while regex_search will succeed even if only a sub-sequence matches the regex.
Quoting from N3337,
§28.11.2/2 regex_match [re.alg.match]
Effects: Determines whether there is a match between the regular expression e, and all of the character sequence [first,last). ... Returns true if such a match exists, false otherwise.
The above description is for the regex_match overload that takes a pair of iterators to the sequence to be matched. The remaining overloads are defined in terms of this overload.
The corresponding regex_search overload is described as
§28.11.3/2 regex_search [re.alg.search]
Effects: Determines whether there is some sub-sequence within [first,last) that matches the regular expression e. ... Returns true if such a sequence exists, false otherwise.
In your example, if you modify the regex to r{R"(.*?\s\d{2}\s.*)"}; both regex_match and regex_search will succeed (but the match result is not just the day, but the entire date string).
Live demo of a modified version of your example where the day is being captured and displayed by both regex_match and regex_search.
It's very simple. regex_search looks through the string to find if any portion of the string matches the regex. regex_match checks if the whole string is a match for the regex. As a simple example, given the following string:
"one two three four"
If I use regex_search on that string with the expression "three", it will succeed, because "three" can be found in "one two three four"
However, if I use regex_match instead, it will fail, because "three" is not the whole string, but only a part of it.