find string containing special characters in mongodb using $regex

find string containing special characters in mongodb using $regex - regex

using $regex in mongodb, I want to find the name B&B Hôtel which contain some special characters like & and ô by typing BB Hotel.
I tried this code:
db.txt.find({ "name": {'$regex': query, $options:'i'}})
where query can be BB Hotel.

You don't want regex search, you want diacritic insensitive text search
"name":{
$text:
{
$search: "\"B&B Hotel\""
$caseSensitive: false,
$diacriticSensitive: false
}
}
Note that $diacriticSensitive defaults to false, but I never trust the defaults. If you are running with older indexes (version 2 or less text index), you may not be able to use the indexes. The escaped " in the search part is to search for this phrase.

Related

eval certain regex from file to replace chars in string

I'm new to ruby so please excuse my ignorance :)
I just learned about eval and I read about its dark sides.
what I've read so far:
When is eval in Ruby justified?
Is 'eval' supposed to be nasty?
Ruby Eval and the Execution of Ruby Code
so what I have to do is to read a file in which there are some text such as /e/ 3 which will replace each e with a 3 after evaluation.
so here what i did so far:(working but..)
def evaluate_lines
result="elt"
IO.foreach("test.txt") do |reg|
reg=reg.chomp.delete(' ')
puts reg
result=result.gsub(eval(reg[0..2]),"#{reg[3..reg.length]}" )
p result
end
end
contents of the test.txt file
/e/ 3
/l/ 1
/t/ 7
/$/ !
/$/ !!
this only works because I know the length of the lines in the file.
so assuming my file has the following /a-z/ 3 my program would be not able to do what is expected from it.
Note
I tried using Regexp.new reg and this resulted with the following /\/e\/3/ which isn't very helpful in this case.
simple example to the `Regexp
str="/e/3"
result="elt"
result=result.gsub(Regexp.new str)
p result #outputs: #<Enumerator: "elt":gsub(/\/e\/3/)>
i already tried stripping off the slashes but even though this wont deliver the desired result thus the gsub() takes two parameters, such as this gsub(/e/, "3").
for the usage of the Regexp, I have already read Convert a string to regular expression ruby

While you can write something to parse that file, it rapidly gets complicated because you have to parse regular expressions. Consider /\/foo\\/.
There are a number of incomplete solutions. You can split on whitespace, but this will fail on /foo bar/.
re, replace = line.split(/\s+/, 2)
You can use a regex. Here's a first stab.
match = "/3/ 4".match(%r{^/(.*)/\s+(.+)})
This fails on escaped /, we need something more complex.
match = '/3\// 4'.match(%r{\A / ((?:[^/]|\\/)*) / \s+ (.+)}x)
I'm going to guess it was not your teacher's intent to have you parsing regexes. For the purposes of the assignment, splitting on whitespace is probably fine. You should clarify with your teacher.
This is a poor data format. It is non-standard, difficult to parse, and has limitations on the replacement. Even a tab-delimited file would be better.
There's little reason to use a non-standard format these days. The simplest thing is to use a standard data format for the file. YAML or JSON are the most obvious choices. For such simple data, I'd suggest JSON.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1" }
]
Parsing the file is trivial, use the built-in JSON library.
require 'json'
specs = JSON.load("test.json")
And then you can use them as a list of hashes.
specs.each do |spec|
# No eval necessary.
re = Regexp.new(spec["re"])
# `gsub!` replaces in place
result.gsub!(re, spec["replace"])
end
The data file is extensible. For example, if later you want to add regex options.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1", "options": ['IGNORECASE'] }
]
While the teacher may have specified a poor format, pushing back on bad requirements is good practice for being a developer.

Here's a really simple example that uses vi notation like s/.../.../ and s/.../.../g:
def rsub(text, spec)
_, mode, repl, with, flags = spec.match(%r[\A(.)\/((?:[^/]|\\/)*)/((?:[^/]|\\/)*)/(\w*)\z]).to_a
case (mode)
when 's'
if (flags.include?('g'))
text.gsub(Regexp.new(repl), with)
else
text.sub(Regexp.new(repl), with)
end
end
end
Note the matcher looks for non-slash characters ([^/]) or a literal-slash combination (\\/) and splits out the two parts accordingly.
Where you can get results like this:
rsub('sandwich', 's/and/or/')
# => "sorwich"
rsub('and/or', 's/\//,/')
# => "and,or"
rsub('stack overflow', 's/o/O/')
# => "stack Overflow"
rsub('stack overflow', 's/o/O/g')
# => "stack OverflOw"
The principle here is you can use a very simple regular expression to parse out your input regular expression and feed that cleaned up data into Regexp.new. There is absolutely no need for eval here, and if anything that severely limits what you can do.
With a little work you could alter that regular expression to parse what's in your existing file and make it do what you want.

Translate specific return query into mgo

I have a query which returns all names from a collection's documents which contain a specific text. In the following example, return all names which contain the sequence "oh" case-insensitively; do not return other fields in the document:
find({name:/oh/i}, {name:1, _id:0})
I have tried to translate this query into mgo:
Find([]bson.M{bson.M{"name": "/oh/i"}, bson.M{"name": "1", "_id": "0"}})
but there are always zero results when using mgo. What is the correct syntax for such a query using mgo?
This question is different from the alleged duplicates because none of those questions deal with how to restrict MongoDB to return only a specific field instead of entire documents.

To execute queries that use regexp patterns for filtering, use the bson.RegEx type.
And to exclude fields from the result documents, use the Query.Select() method.
Like in this example:
c.Find(bson.M{"name": bson.RegEx{Pattern: "oh", Options: "i"}}).
Select(bson.M{"name": 1, "_id": 0})
Translation of the regexp:
name:/oh/i
This means to match documents where the name field has a value that contains the "oh" sub-string, case insensitive. This can be represented using a bson.RegEx, where the RegEx.Pattern field gets the pattern used in the above expression ("oh"). And the RegEx.Options may contain options now to apply / match the pattern. The doc lists the possible values. If the Options field contains the 'i' character, that means to match case insensitive.
If you have a user-entered term such as "[a-c]", you have to quote regexp meta characters, so the final pattern you apply should be "\[a-c\]" To do that easily, use the regexp.QuoteMeta() function, e.g.
fmt.Println(regexp.QuoteMeta("[a-c]")) // Prints: \[a-c\]
Try it on the Go Playground.

Sublime Workflow for replacing quotes

I use text editor Sublime Text 3 to edit code, and very often I'll have a string literal wrapped in double quotes, that I want to change to single quotes, or vise versa. Right now I scroll to each quotation mark, and replace it with the one I want. Is there a faster workflow for this? Say, highlighting the word or a hotkey or something? I would find it super useful.

If you have a large number of such strings in a file and you want to convert all of them at once, you could use a regex find/replace operation to find and replace them all. You would use Find > Replace... or Find > Find in files... to search for a matching regex that captures the text in the quotes.
For example you could use \"([^"\n]*)\" as a search term and '\1' as the replacement text to swap all double quoted strings for single quotes.
You can't bind something like that to a key directly because Find/Replace can't be used in a Macro, but you could use the RegReplace package to do this if you want to go that route.
You can potentially speed up the workflow that you're currently using by taking advantage of multiple cursors, if you're not already doing that.
You could for example select the first quote, then press Ctrl+D or Option+D to select the other one. Now that you have two cursors, press Backspace to delete both quotes and press the new quote character to insert the new ones.
This can't be macro-ized and bound to a key because the find_under_expand command can't be used in a macro, though.
For a full key press solution, as far as I'm aware you would need a plugin of some sort to do this for you. One such example appears to be ChangeQuotes, although I've never personally used it.
It's also possible to write your own small plugin such as the following:
import sublime
import sublime_plugin
class SwapQuotesCommand(sublime_plugin.TextCommand):
pairs = ["'", '"']
def run(self, edit):
self.view.run_command("expand_selection", {"to": "scope"})
for sel in self.view.sel():
self.toggle(edit, sel)
def toggle(self, edit, region):
begin = self.view.substr(region.begin())
end = self.view.substr(region.end() - 1)
if begin == end and begin in self.pairs:
index = self.pairs.index(begin) + 1
new = self.pairs[index % len(self.pairs)]
for point in (region.begin(), region.end() - 1):
self.view.replace(edit, sublime.Region(point, point+1), new)
This expands the selection in all of the cursors out by the current scope, and then if both ends of the selection are a matching quote, the quote in use is swapped.
In use, you would use a key binding such as the following, which includes a context to make the key only trigger while the cursor is inside of a string so that it doesn't mess up your selection in cases where it definitely won't work.
{
"keys": ["ctrl+shift+'"], "command": "swap_quotes",
"context": [
{ "key": "selector", "operator": "equal", "operand": "string.quoted", "match_all": true }
]
},

Using regular expression in Word merge condition

Having trouble getting a condition to match when creating a Word mail merge. What I want is to match based on the first letter of the field. If the first letter is K-Z, then it should evaluate True.
I have the following in Word:
{ IF { MERGEFIELD Provider } = "[K-Z]*" "Person1" "Person2" }
which does not work. I've tried escaping the square brackets, but this also has had no success.
I can't find anything useful on a search. Has anyone got any ideas how to make this work?

You can't use regex expressions in Word IF fields - not even the limited regex that you can use in Word's Find and Replace function. In an IF field, all you get is the wildcards ? (to match any character) and * (to match multiple characters). Even these have their limitations.
So you have to find another way. One is the tedious one where you enumerate all the possibilities - in this case you could use something like
{ SET KtoZ 0
}{ IF "{ MERGEFIELD Provider }" = "K*" "{ SET KtoZ 1}"
}{ IF "{ MERGEFIELD Provider }" = "L*" "{ SET KtoZ 1}"
}{...
}{ IF "{ MERGEFIELD Provider }" = "Z*" "{ SET KtoZ 1}"
}{IF { REF KtoZ } = 1 "Person1" "Person2" }
(with similar IF fields for M..Y where I have put "..."). If you need to deal with upper/lower case you can add a suitable switch to your MERGEFIELD fields to.
Another way, depending on your situation and on the data source, might be to do the comparison in the data source. That requires either that you can create a view (or in Access, a query) that performs the comparison and returns, for example, a field called KtoZ, or that you can construct your query in SQL and issue it in a Word VBA OpenDataSource call. In the latter case, your data source must use a SQL dialect that lets you do that, and your query must be less than the 255/511 character limit that Word VA imposes.

MongoDB, performance of query by regular expression on indexed fields

I want to find an account by name (in a MongoDB collection of 50K accounts)
In the usual way: we find with string
db.accounts.find({ name: 'Jon Skeet' }) // indexes help improve performance!
How about with regular expression? Is it an expensive operation?
db.accounts.find( { name: /Jon Skeet/ }) // worry! how indexes work with regex?
Edit:
According to WiredPrairie:
MongoDB use prefix of RegEx to lookup indexes (ex: /^prefix.*/):
db.accounts.find( { name: /^Jon Skeet/ }) // indexes will help!'
MongoDB $regex

Actually according to the documentation,
If an index exists for the field, then MongoDB matches the regular
expression against the values in the index, which can be faster than a
collection scan. Further optimization can occur if the regular
expression is a “prefix expression”, which means that all potential
matches start with the same string. This allows MongoDB to construct a
“range” from that prefix and only match against those values from the
index that fall within that range.
http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use
In other words:
For /Jon Skeet/ regex ,mongo will full scan the keys in the index then will fetch the matched documents, which can be faster than collection scan.
For /^Jon Skeet/ regex ,mongo will scan only the range that start with the regex in the index, which will be faster.

In case anyone still has an issue with search performance, there is a way to optimize regex search even if it searches for a word in a sentence (not necessarily at the beginning ^ or the end $ of the string).
The field should have a text index
db.someCollection.createIndex({ someField: "text" })
and the queries on should use regex only after performing a plain search first
db.someCollection.find({ $and:
[
{ $text: { $search: "someWord" }},
{ someField: { $elemMatch: {$regex: /test/ig, $regex: /other/ig}}}
]
})
This ensures that the regex will run only for the results of the initial, plain search, which should be quite fast thanks to the index on this field.
It might have a huge impact on search performance, depending on how large the collection is.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

find string containing special characters in mongodb using $regex - regex

using $regex in mongodb, I want to find the name B&B Hôtel which contain some special characters like & and ô by typing BB Hotel. I tried this code: db.txt.find({ "name": {'$regex': query, $options:'i'}}) where query can be BB Hotel.

Related

eval certain regex from file to replace chars in string

Translate specific return query into mgo

Sublime Workflow for replacing quotes

Using regular expression in Word merge condition

MongoDB, performance of query by regular expression on indexed fields

Categories

Resources