MongoDB: Match multiple values in string field - regex

I have a collection of entities that contain a string field. I'm looking for a way to query the collection with a combined number of values, and get all entities that contain all of these values, with these specifications:
contain ALL provided query values, not just some of them
case-insensitive
regardless of order
'word' query values can be part of something bigger (for example separated by _ or any other character)
So as an example, if I provide these words as the query values:
i am spiderman
(I can separate them by whitespace, give an array, or whatever works..)
I expect these results:
- "i am_spiderMan" // should match
- "AM i spiderman?!" // should match
- "who am I? supermanspiderman" // should match
- "I am superman" // should not match
- "i am spider_man" // should not match
I hope this covers all the cases I tried to describe.
I tried regex, and also did some research with similar questions but could not get it to work.

You could use regular expr. This is working perfectly. When you pass the sentence, you need to put all worlds into array as I have shown below. Refer $all to include all words to find. Reg expr case insensitive
db.collection.find ({ key: { $all: [ /spiderman/i, /i/i, /am/i ] } })

Related

Replacing Everything Except specific pattern BigQuery

I would like to use regex to replace everything (except a specific pattern) with empty string in BigQuery. I have following values:
AX/88/8888888
AX/99/999999
AX/11/222222 - AX/22/33333 - AX/999/99999
BX/99/9999
1234455121
AX/00/888888 // BX/890/90890
NULL
[XYZ-ASA
BX/890/90890 + AX/10/1010101
AX/99/9999M
AX/111/111,AX-99
AX/11/222222 BX/99/99 AX/22/33333
The pattern will always have "AX" in the beginning, then a slash (/) and some numbers and slash(/) again and some numbers after it. (The pattern would always be AX/\d+/\d+)
I would like to replace anything (any character,brackets,digit etc) that doesn't follow that pattern mention above.
For the cases where the pattern doesn't match at all for example (BX/99/9999,1234455121, NULL,[XYZ-ASA) are the only cases from the above dataset.
** doesn't match at all means cases where the entire values doesn't have any value
that matches with the AX/\d+/\d+. In those situations, I would like to return then original text as final output.
The case where we have matching pattern for example AX/00/888888 // BX/890/90890, AX/111/111,AX-99 the pattern matches but the latter part needs to be replaced i.e [// BX/890/90890] and [,AX-99] , which should then return only the AX/00/888888, and AX/111/111 as final output.
The expected output from the above example is following:
AX/88/8888888
AX/99/999999
AX/11/222222 AX/22/33333 AX/999/99999
BX/99/9999
1234455121
AX/00/888888
NULL
[XYZ-ASA
AX/10/1010101
AX/99/9999
AX/111/111
AX/11/222222 AX/22/33333
Later I would like to split all the values by space, to get each AX/xx/xx on a different row where I have multiple of those for example case 3 from above would produce 3 rows.
AX/88/8888888
AX/99/999999
AX/11/222222
AX/22/33333
AX/999/99999
BX/99/9999
1234455121
AX/00/888888
NULL
[XYZ-ASA
AX/10/1010101
AX/99/9999
AX/111/111
AX/11/222222
AX/22/33333
Use below
select coalesce(result, col) as col
from your_table
left join unnest(regexp_extract_all(col, r'AX/\d+/\d+')) result
if applied to sample data in your question
output is

Wrong regexp query for elasticsearch

I have some problems with the regexp query for elasticsearch. In my index there's a text field with comma-separated numeric values (IDs), f.e.
2,140,3,2495
And I have the following query term:
"regexp" : {
"myIds" : {
"value" : "^2495,|,2495,|,2495$|^2495$",
"boost" : 1
}
}
But my result list is empty.
Let me say that I know that regexp queries are kind of slow but the index still exists and is filled with millions of documents so unfortunately it's not an option to restructure it. So I need a regex solution.
In ElasticSearch regex, patterns are anchored by default, the ^ and $ are treated as literal chars.
What you mean to use is "2495,.*|.*,2495,.*|.*,2495|2495" - 2495, at the start of string, ,2495, in the middle, ,2495 at the end or a whole string equal to 2495.
Or, you may use a simpler
"(.*,)?2495(,.*)?"
That means
(.*,)? - an optional text (not including line breaks) ending with ,
2495 - your value
(,.*)? - an optional text (not including line breaks) ending with ,
Here is an online demo showing how this expression works (not a proof though).
Ok, I got it to work but run in another problem now. I built the string as follows:
(.*,)?2495(,.*)?|(.*,)?10(,.*)?|(.*,)?898(,.*)?
It works good for a few IDs but if I have let's say 50 IDs, then ES throws an exception which says that the regexp is too complex to process.
Is there a way to simplify the regexp or restructure the query it selves?

Translate specific return query into mgo

I have a query which returns all names from a collection's documents which contain a specific text. In the following example, return all names which contain the sequence "oh" case-insensitively; do not return other fields in the document:
find({name:/oh/i}, {name:1, _id:0})
I have tried to translate this query into mgo:
Find([]bson.M{bson.M{"name": "/oh/i"}, bson.M{"name": "1", "_id": "0"}})
but there are always zero results when using mgo. What is the correct syntax for such a query using mgo?
This question is different from the alleged duplicates because none of those questions deal with how to restrict MongoDB to return only a specific field instead of entire documents.
To execute queries that use regexp patterns for filtering, use the bson.RegEx type.
And to exclude fields from the result documents, use the Query.Select() method.
Like in this example:
c.Find(bson.M{"name": bson.RegEx{Pattern: "oh", Options: "i"}}).
Select(bson.M{"name": 1, "_id": 0})
Translation of the regexp:
name:/oh/i
This means to match documents where the name field has a value that contains the "oh" sub-string, case insensitive. This can be represented using a bson.RegEx, where the RegEx.Pattern field gets the pattern used in the above expression ("oh"). And the RegEx.Options may contain options now to apply / match the pattern. The doc lists the possible values. If the Options field contains the 'i' character, that means to match case insensitive.
If you have a user-entered term such as "[a-c]", you have to quote regexp meta characters, so the final pattern you apply should be "\[a-c\]" To do that easily, use the regexp.QuoteMeta() function, e.g.
fmt.Println(regexp.QuoteMeta("[a-c]")) // Prints: \[a-c\]
Try it on the Go Playground.

parse comma seperated values in argumentlist that's seperated by commas

So i have this regex:
=([0-9A-Za-z_-]+),?
and i need have a string like:
foo=bar,pine=apple,tree,bar=bie
or
foo=bar,pine=apple,tree
or
pine=apple,tree
the regex works for cases where i only have 1 value.
but since we have comma's in the list of values for the key.
the regex just craps out and my code does half of what i want it to do but doesn't get the 2nd value.
How do i fix my regex to take both values regardless of where in the string it is?
alone, between 2 others, at the end.
i tried some stuff but couldn't figure it out.
Attempt 1:
=([0-9A-Za-z,_-]+),=?
In this case, it matches the one where it's in the middle but it fails on the others because = does not exist.
Attempt 2:
=[0-9A-Za-z_-]+([,]+[0-9A-Za-z_-]*),?
Matches too bar,pine and tree,bar for example
EDIT::
This seems to work maybe....
=('[0-9A-Za-z,_-]+'),*|=([0-9A-Za-z_-]+),*
if i use quotes for multi values..
You can split on variable names - that will leave only the values:
s := regexp.MustCompile("[^,\\s]+=").Split("foo=bar,pine=apple,tree,bar=bie", -1)
fmt.Println(s)
# => [ "bar", "apple,tree", "bie"]
Go Demo
Regex Demo

Swift 3: extract regex matches with non matching parts

I want to analyze a string by many different patterns for numbers, dates and other strings. So I have an array of patterns I want to check in that order.
let patterns = [... "\\d{6}", "\\d{4}", "\\d" ] // to be extended :-)
let s = "IMG_123456_2006.10.03-13.52.59 Testfile_2009_5"
Starting with the first item in pattern I need a search in string s. If found, the string should be split in found parts e.g. "2006" and "2009" and the non matching parts. The remaining parts will be searched with the next pattern and so on. Assuming I already had the pattern defined for time/date in the middle which should be placed at the first item, the splitted string should look like:
"IMG_", "123456", "_", "2006.10.03-13.52.59", " Testfile_", "2009", "_", "5"
Can I use a build in functionality of regex.matches, or do I have to write everything by my own?
I already been able to find a match. But then I have to use the ranges to split the string and do it again and again for the remaining parts until no further matches are indicated. This will need a lot more calculations than I would expect using the results in match.numberOfRanges. Any small solutions available?