Partial matches using mongo's primitive package - regex

I am using Mongo's Primitive package to get a bson value based on what was submitted. This is what I am currently doing
school = "Havard"
value = primitive.Regex{Pattern: school, Options: ""}
This will only match bson values that are Havard, how do I make this regex case insensitive and make it match for example, hava
In all, if I use hava for a search, I should also get Havard

The expression primitive.Regex{Pattern: school} matches substrings too, but it's not case insensitive. Use the "i" option to make it case insensitive:
value = primitive.Regex{Pattern: school, Options: "i"}
Also note that if the value of school contains special regexp characters, that might give you unexpected results or errors. So best is to quote it with e.g. using regexp.QuoteMeta():
value = primitive.Regex{Pattern: regexp.QuoteMeta(school), Options: "i"}

For GO users the filter looks like this:
filter := bson.D{{"column_name", primitive.Regex{Pattern: school, Options: "i"}}}

Related

MongoDB: Match multiple values in string field

I have a collection of entities that contain a string field. I'm looking for a way to query the collection with a combined number of values, and get all entities that contain all of these values, with these specifications:
contain ALL provided query values, not just some of them
case-insensitive
regardless of order
'word' query values can be part of something bigger (for example separated by _ or any other character)
So as an example, if I provide these words as the query values:
i am spiderman
(I can separate them by whitespace, give an array, or whatever works..)
I expect these results:
- "i am_spiderMan" // should match
- "AM i spiderman?!" // should match
- "who am I? supermanspiderman" // should match
- "I am superman" // should not match
- "i am spider_man" // should not match
I hope this covers all the cases I tried to describe.
I tried regex, and also did some research with similar questions but could not get it to work.
You could use regular expr. This is working perfectly. When you pass the sentence, you need to put all worlds into array as I have shown below. Refer $all to include all words to find. Reg expr case insensitive
db.collection.find ({ key: { $all: [ /spiderman/i, /i/i, /am/i ] } })

Wrong regexp query for elasticsearch

I have some problems with the regexp query for elasticsearch. In my index there's a text field with comma-separated numeric values (IDs), f.e.
2,140,3,2495
And I have the following query term:
"regexp" : {
"myIds" : {
"value" : "^2495,|,2495,|,2495$|^2495$",
"boost" : 1
}
}
But my result list is empty.
Let me say that I know that regexp queries are kind of slow but the index still exists and is filled with millions of documents so unfortunately it's not an option to restructure it. So I need a regex solution.
In ElasticSearch regex, patterns are anchored by default, the ^ and $ are treated as literal chars.
What you mean to use is "2495,.*|.*,2495,.*|.*,2495|2495" - 2495, at the start of string, ,2495, in the middle, ,2495 at the end or a whole string equal to 2495.
Or, you may use a simpler
"(.*,)?2495(,.*)?"
That means
(.*,)? - an optional text (not including line breaks) ending with ,
2495 - your value
(,.*)? - an optional text (not including line breaks) ending with ,
Here is an online demo showing how this expression works (not a proof though).
Ok, I got it to work but run in another problem now. I built the string as follows:
(.*,)?2495(,.*)?|(.*,)?10(,.*)?|(.*,)?898(,.*)?
It works good for a few IDs but if I have let's say 50 IDs, then ES throws an exception which says that the regexp is too complex to process.
Is there a way to simplify the regexp or restructure the query it selves?

Translate specific return query into mgo

I have a query which returns all names from a collection's documents which contain a specific text. In the following example, return all names which contain the sequence "oh" case-insensitively; do not return other fields in the document:
find({name:/oh/i}, {name:1, _id:0})
I have tried to translate this query into mgo:
Find([]bson.M{bson.M{"name": "/oh/i"}, bson.M{"name": "1", "_id": "0"}})
but there are always zero results when using mgo. What is the correct syntax for such a query using mgo?
This question is different from the alleged duplicates because none of those questions deal with how to restrict MongoDB to return only a specific field instead of entire documents.
To execute queries that use regexp patterns for filtering, use the bson.RegEx type.
And to exclude fields from the result documents, use the Query.Select() method.
Like in this example:
c.Find(bson.M{"name": bson.RegEx{Pattern: "oh", Options: "i"}}).
Select(bson.M{"name": 1, "_id": 0})
Translation of the regexp:
name:/oh/i
This means to match documents where the name field has a value that contains the "oh" sub-string, case insensitive. This can be represented using a bson.RegEx, where the RegEx.Pattern field gets the pattern used in the above expression ("oh"). And the RegEx.Options may contain options now to apply / match the pattern. The doc lists the possible values. If the Options field contains the 'i' character, that means to match case insensitive.
If you have a user-entered term such as "[a-c]", you have to quote regexp meta characters, so the final pattern you apply should be "\[a-c\]" To do that easily, use the regexp.QuoteMeta() function, e.g.
fmt.Println(regexp.QuoteMeta("[a-c]")) // Prints: \[a-c\]
Try it on the Go Playground.

Postgresql - How do I extract the first occurence of a substring in a string using a regular expression pattern?

I am trying to extract a substring from a text column using a regular expression, but in some cases, there are multiple instances of that substring in the string.
In those cases, I am finding that the query does not return the first occurrence of the substring. Does anyone know what I am doing wrong?
For example:
If I have this data:
create table data1
(full_text text, name text);
insert into data1 (full_text)
values ('I 56, donkey, moon, I 92')
I am using
UPDATE data1
SET name = substring(full_text from '%#"I ([0-9]{1,3})#"%' for '#')
and I want to get 'I 56' not 'I 92'
You can use regexp_matches() instead:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1];
As no additional flag is passed, regexp_matches() only returns the first match - but it returns an array so you need to pick the first (and only) element from the result (that's the [1] part)
It is probably a good idea to limit the update to only rows that would match the regex in the first place:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1]
where full_text ~ 'I [0-9]{1,3}'
Try the following expression. It will return the first occurrence:
SUBSTRING(full_text, 'I [0-9]{1,3}')
You can use regexp_match() In PostgreSQL 10+
select regexp_match('I 56, donkey, moon, I 92', 'I [0-9]{1,3}');
Quote from documentation:
In most cases regexp_matches() should be used with the g flag, since
if you only want the first match, it's easier and more efficient to
use regexp_match(). However, regexp_match() only exists in PostgreSQL
version 10 and up. When working in older versions, a common trick is
to place a regexp_matches() call in a sub-select...

RegEx : Replace parts of dynamic strings

I have a string
IsNull(VSK1_DVal.RuntimeSUM,0),
I need to remove IsNull part, so the result would be
VSK1_DVal.RuntimeSUM,
I'm absolute new to RegEx, but it wouldn't be a problem, if not one thing :
VSK1 is dynamic part, can be any combination of A-Z,0-9 and any length. How to replace strings with RegEx? I use MSSQL 2k5, i think it uses general set of RegEx rules.
EDIT : I forgot to say, that I'm doing replacement in SSMS Query window's Replace Box (^H) - not building RegEx query
br
marius
here's a regex that should work:
[^(]+\(([^,]+),[^)]\)
Then use $1 capture group to extract the part that you need.
I did a sanity check in ruby:
orig = "IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /[^(]*\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => VSK1_DVal.RuntimeSUM,
It gets trickier if you have a prefix that you want to retain. Like if you have this:
"somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
In this case, you need someway to identify the start of the pattern. Maybe you can use '=' to identify the start of the pattern? If so, this should work:
orig = "somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /=\s*\w+\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
But then the case where you don't have an equals sign will fail. Maybe you can use 'IsNull' to identify the start of the pattern? If so, try this (note the '/i' representing case insensitive matching):
orig = "somestuff = isnull(VSK1_DVal.RuntimeSUM,0),"
regex = /IsNull\(([^,]+),[^)]\)/i
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
/IsNULL\((A-Z0-9+),0\)/
Then pick group match number 1.
Here's a very useful site: http://www.regexlib.com/RETester.aspx
They have a tester and a cheatsheet that are very useful for quick testing of this sort.
I tested the solution by Dave and it works fine except it also removes the trailing comma you wanted retained. Minor thing to fix.
Try this:
IsNULL\((.*,)0\)
You say in your question
I use MSSQL 2k5, i think it uses
general set of RegEx rules.
This is not true unless you enable CLR and compile and install an assembly. You can use its native pattern matching syntax and LIKE for this as below.
WITH T(C) AS
(
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,0),' UNION ALL
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,123465),' UNION ALL
SELECT 'No Match'
)
SELECT SUBSTRING(C,8,1+LEN(C)-8-CHARINDEX(',',REVERSE(C),2))
FROM T
WHERE C LIKE 'IsNull(%,_%),'