Based in this example, I'm using $expr and $regexMatch to implement "reverse regex" queries in MongoDB. For instance this example works
However, this only seems to work when the regex is in a first level field in the MongoDB document. In the case the regex is within an element in an array (as in this other example I get errors like this:
query failed: (Location51105) Executor error during find command :: caused by :: $regexMatch needs 'regex' to be of type string or regex
Is there any way of supporting this case?
The regex allows only string input, You can use $map operator to loop the array elements and check the condition,
$map to iterate loop of patterns.pattern array and check $regexMatch condition, it will return boolean value
$anyElementTrue to check if any element is true then it will true
db.collection.find({
"$expr": {
"$anyElementTrue": {
"$map": {
"input": "$patterns.pattern",
"in": {
"$regexMatch": {
"input": "Room1",
"regex": "$$this",
"options": "i"
}
}
}
}
}
})
Playground
For a fun exercise I wondered if I could tokenize simple arithmetic expressions (containing only positive integers and the four basic operations) using a regular expression, so I came up with the following:
But the test cases below do not behave as I expected due to the failures listed at the end (Go Playground):
func TestParseCalcExpression(t *testing.T) {
re := regexp.MustCompile(`^(\d+)(?:([*/+-])(\d+))*$`)
for _, eg := range []struct {
input string
expected [][]string
}{
{"1", [][]string{{"1", "1", "", ""}}},
{"1+1", [][]string{{"1+1", "1", "+", "1"}}},
{"22/7", [][]string{{"22/7", "22", "/", "7"}}},
{"1+2+3", [][]string{{"1+2+3", "1", "+", "2", "+", "3"}}},
{"2*3+5/6", [][]string{{"2*3+5/6", "2", "*", "3", "+", "5", "/", "6"}}},
} {
actual := re.FindAllStringSubmatch(eg.input, -1)
if !reflect.DeepEqual(actual, eg.expected) {
t.Errorf("expected parse(%q)=%#v, got %#v", eg.input, eg.expected, actual)
}
}
}
// === RUN TestParseCalcExpression
// prog.go:24: expected parse("1+2+3")=[][]string{[]string{"1+2+3", "1", "+", "2", "+", "3"}}, got [][]string{[]string{"1+2+3", "1", "+", "3"}}
// prog.go:24: expected parse("2*3+5/6")=[][]string{[]string{"2*3+5/6", "2", "*", "3", "+", "5", "/", "6"}}, got [][]string{[]string{"2*3+5/6", "2", "/", "6"}}
// --- FAIL: TestParseCalcExpression (0.00s)
// FAIL
I was hoping that the "zero or more repetition" of the non-matching subgroup ((?:...)*) which identifies and groups operators and numbers (([*/+-])(\d+)) would match all occurrences of that sub-expression but it only appears to match the last one.
On the one hand, this makes sense because the regex literally has only three matching groups, so it follows that any resulting match could only have three matches. However, the "zero or more repetition" makes it seem like it's missing all the "middle" repeated items in the failed tests (e.g. +2 in 1+2+3).
// expected parse("1+2+3")=
// [][]string{[]string{"1+2+3", "1", "+", "2", "+", "3"}},
// got [][]string{[]string{"1+2+3", "1", "+", "3"}}
Is there a way to parse these kinds of arithmetic expressions using go regular expressions or is this a fundamental limitation of regular expressions (or go/re2 regexps, or the general combination of non/capturing groups)?
(I realize I could just split by word boundaries and scan the tokens to validate the structure but I'm more interested in this limitation of non/capturing groups than the example problem.)
package main
import (
"reflect"
"regexp"
"testing"
)
func TestParseCalcExpression(t *testing.T) {
re := regexp.MustCompile(`(\d+)([*/+-]?)`)
for _, eg := range []struct {
input string
expected [][]string
}{
{"1", [][]string{{"1", "1", ""}}},
{"1+1", [][]string{{"1+", "1", "+"}, {"1", "1", ""}}},
{"22/7", [][]string{{"22/", "22", "/"}, {"7", "7", ""}}},
{"1+2+3", [][]string{{"1+", "1", "+"}, {"2+", "2", "+"}, {"3", "3", ""}}},
{"2*3+5/6", [][]string{{"2*", "2", "*"}, {"3+", "3", "+"}, {"5/", "5", "/"}, {"6", "6", ""}}},
} {
actual := re.FindAllStringSubmatch(eg.input, -1)
if !reflect.DeepEqual(actual, eg.expected) {
t.Errorf("expected parse(%q)=%#v, got %#v", eg.input, eg.expected, actual)
}
}
}
Playground link
As mentioned in this question about Swift (I'm not a Swift or regex expert so I'm just guessing this applies to Go as well), you can only return one match for each matching group in your regex. It seems to just identify the last match if the group is repeating.
From the Go standard library regexp package documentation:
If 'Submatch' is present, the return value is a slice identifying the successive submatches of the expression. Submatches are matches of parenthesized subexpressions (also known as capturing groups) within the regular expression, numbered from left to right in order of opening parenthesis. Submatch 0 is the match of the entire expression, submatch 1 the match of the first parenthesized subexpression, and so on.
Given this convention, returning multiple matches per match group would break the numbering and therefore you wouldn't know which items were associated with each matching group. It seems it's possible that a regex engine could return multiple matches per group, but this package couldn't do that without breaking this convention stated in the documentation.
My solution is to make your problem more regular. Instead of treating the entire expression as one match, which gave us the problem that we can only return finitely many strings per match, we treat the entire expression as simply a series of pairs.
Each pair is composed of a number (\d+), and an optional operator ([*/+-]?).
Then doing a FindAllStringSubmatch on the whole expression, we extract a series of these pairs and get the number and operator for each.
For example:
"1+2+3"
returns
[][]string{{"1+", "1", "+"}, {"2+", "2", "+"}, {"3", "3", ""}}}
This only tokenizes the expression; it doesn't validate it. If you need the expression to be validated, then you'll need another initial regex match to verify that the string is indeed an unbroken series of these pairs.
I am checking in my validation form if there are repeated more than two times the same character.
I have tried this expression ([a-zA-Z0-9])\1{2,} but it doesn't work properly because if I add aaA it founds the string and it shouldn't because "aaA" is permitted. Also it doesn't check if it is repeated a special character.
Here is how I applied my code:
this.form = this.formBuilder.group(
{
newpassword: new FormControl(
'',
Validators.compose([
Validators.required,
CustomValidators.patternValidator(/[(\[a-zA-Z0-9\])\\1{2,}]/, {
hasRepeatedCharacters: true,
}),
])
),
},
{ validators: this.password }
);
Any idea?
If I understand correctly what you are considering to be invalid, you want this:
/(.)\1{2,}/
Use the following regex to detect any character repeated 2 or more times:
(.)\1{2,}
In order to capture aaA (repeated letters irrespective of their case) as well, you'll need to add the case-insensitive i flag.
You can use /(.)(?=\1.*\1)/, assuming you allow the repeated characters to be non-consecutive:
const pat = /(.)(?=.*\1.*\1)/;
[
"a",
"aa",
"aaa",
"zba1a1za",
"aaA",
"aaAA",
"aAaAa",
"aAbbAb",
].forEach(e => console.log(`'${e}' => ${pat.test(e)}`));
I have a problem with counting words
I want to count word in projects.log.subject.
ex) count [A],[B],[C]..
I searched how to use map reduce.. but I don't understand how to use it for result i want.
{
"_id": ObjectID("569f3a3e9d2540764d8bde59"),
"A": "book",
"server": "us",
"projects": [
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "[A][B]I WANT THIS"
}
],
"before": "234234234"
},
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "[B][C]I WANT THIS"
}
],
"before": "234234234"
},....
] //end of projects
}//end of document
This is a basic principle of using regular expressions and testing each string against the source string and emitting the found count for the result. In mapReduce terms, you want your "mapper" function to possibly emit multiple values for each "term" as a key, and for every array element present in each document.
So you basically want a source array of regular expressions to process ( likely just a word list ) to iterate and test and also iterate each array member.
Basically something like this:
db.collection.mapReduce(
function() {
var list = ["the", "quick", "brown" ]; // words you want to count
this.projects.forEach(function(project) {
project.log.forEach(function(log) {
list.forEach(function(word) {
var res = log.subject.match(new RegExp("\\b" + word + "\\b","ig"));
if ( res != null )
emit(word,res.length); // returns number of matches for word
});
});
});
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
So the loop processes the array elements in the document and then applies each word to look for with a regular expression to test. The .match() method will return an array of matches in the string or null if done was found. Note the i and g options for the regex in order to search case insensitive and beyond just the first match. You might need m for multi-line if your text includes line break characters as well.
If null is not returned, then we emit the current word as the "key" and the count as the length of the matched array.
The reducer then takes all output values from those emit calls in the mapper and simply adds up the emitted counts.
The result will be one document keyed by each "word/term" provided and the count of total occurances in the inspected field within the collection. For more fields, just add more logic to sum up the results, or similarly just keep "emitting" in the mapper and let the reducer do the work.
Note the "\\b" represents a word boundary expression to wrap each term escaped by` in order to construct the expression from strings. You need these to discriminate "the" from "then" for example, by specifying where the word/term ends.
Also that as regular expressions, characters like [] are reserved, so if you actually were looking for strings like that the you similarly escape, i.e:
"\[A\]"
But if you were actually doing that, then remove the word boundary characters:
new RegExp( "\[A\]", "ig" )
As that is enough of a complete match in itself.
I have a map function as follows, which reads from an array of lines generated by a unix command.
my %versions = map {
if (m/(?|(?:^Patch\s(?(?=description).*?(\w+)\sPATCH).*?(\d+(?:\.\d+)+).*)|(?:^(OPatch)\s(?=version).*?(\d+(\.\d+)+)))/)
{ 'hello' => 'bye'; }
} #dbnode_versions;
print Dumper(\%versions); gives
$VAR1 = {
'' => undef,
'hello' => 'bye',
'bye' => ''
};
which I find extremely odd, as the hello and bye values should only get added if the regex is true. Anyone able to help me out?
Well, you have to consider what happens when the regex doesn't match, and the if is false. The if will evaluate to some value, although you shouldn't rely on the value of a statement.
Especially, if (cond) { expression } is roughly equivalent to cond and expression. This means that if the regex (our cond) will not match, we'll get a false value.
use Data::Dump;
dd [map { /foo(bar)/ and (hello => 'bye') } qw/foo foobar bar/];
What is your expected output? You may have thought ["hello", "bye"]. But actually, we get
["", "hello", "bye", ""]
because "" represents the false value returned by the regex match on failure.
If you want to return nothing in failure cases, you should explicitly return an empty list:
map { /foo(bar)/ ? (hello => 'bye') : () } qw/foo foobar bar/
or use grep, which filters a list for those elements that match a condition:
my %hash =
map { hello => 'bye' } # replace each matching element
grep { /foo(bar)/ } # filter for matching elements
qw/foo foobar bar/;
The %hash will them either be () or (hello => 'bye'), as each key can only occur once.