Mongo reverse regex [duplicate] - regex

This question already has answers here:
MongoDB reverse regex
(2 answers)
Closed 8 years ago.
In SQL it is possible to run a query similar to
SELECT result FROM table WHERE 'abc-def-ghi' LIKE col1
on a table like this:
col1 | result
abc-% | 1
abc-d% | 2
as% | 3
... | ...
and get a result set with 1 and 2.
Problem: how do I achieve same effect in mongodb?
I can run regex to match against fields but is there a way that the fields could be match agains supplied data?

You can use the $where operator, e.g.:
db.col.find({$where: "\"abc\".match(this.col1)"})
Although MongoDB documentation doesn't recommend using $where, it is the only possibility as far as I know.
(This question is probably a duplicate. See MongoDB reverse regex)

db.stuff.find( { col1 : /supplied_data/ }, { result : 1 } );
Will return only the result field.

Related

Remove specific characters from string to tidy up URLs [duplicate]

This question already has answers here:
Extracting rootdomains from URL string in Google Sheets
(3 answers)
Closed 2 years ago.
Hi I have a column of messy URL links within Google Sheets I'm trying to clean up, I want all formats of website links to be the same so that I can run a duplicate check on them.
For example, I have a list of URLs with various http, http://, https:// etc. I am trying to use the REGEXREPLACE tool to remove all http combination elements from the column entries, however cannot get it to work. This is what I have:
Before:
http://www.website1.com/
https://website2.com/
https://www.website3.com/
And I want - After:
website.com
website2.com
website3.com
It is ok if this takes place over a number of formulas and thus columns to the end result.
try:
=ARRAYFORMULA(IFERROR(REGEXEXTRACT(INDEX(SPLIT(
REGEXREPLACE(A1:A, "https?://www.|https?://|www.", ), "/"),,1),
"\.(.+\..+)"), INDEX(IFERROR(SPLIT(
REGEXREPLACE(A1:A, "https?://www.|https?://|www.", ), "/")),,1)))
or shorter:
=INDEX(IFERROR(REGEXEXTRACT(A1:A, "^(?:https?:\/\/)?(?:www\.)?([^\/]+)")))
You can try the following formula
=ArrayFormula(regexreplace(LEFT(P1:P3,LEN(P1:P3)-1),"(.*//www.)|(.*//)",""))
Please do adjust ranges as needed.

How to conditionally transform text in a column in power query?

I am building a workbook in PowerBI and I have the need for doing a conditional appending of text to column A if it meets a certain criteria. Specifically, if column A does not end with ".html" then I want to append the text ".html" to the column.
A sample of the data would look like this:
URL | Visits
site.com/page1.html | 5
site.com/page2.html | 12
site.com/page3 | 15
site.com/page4.html | 8
where the desired output would look like this:
URL | Visits
site.com/page1.html | 5
site.com/page2.html | 12
site.com/page3.html | 15
site.com/page4.html | 8
I have tried using the code:
#"CurrentLine" = Table.TransformColumns(#"PreviousLine", {{"URL", each if Text.EndsWith([URL],".html") = false then _ & ".html" else "URL", type text}})
But that returns an error "cannot apply field access to the type Text".
I can achieve the desired output in a very roundabout way if I use an AddColumn to store the criteria value, and then another AddColumn to store the new appended value, but this seems like an extremely overkill way to approach doing a single transformation to a column. (I am specifically looking to avoid this as I have about 10 or so transformations and don't want to have so many columns to add and cleanup if there is a more succinct way of coding)
You don't want [URL] inside Text.EndWith. Try this:
= Table.TransformColumns(#"PreviousLine",
{{"URL", each if Text.EndsWith(_, ".html") then _ else _ & ".html", type text}}
)

How to query in Mongo for a String based on expressions [duplicate]

This question already has answers here:
Matching a Forward Slash with a regex
(9 answers)
Closed 3 years ago.
I have lot of Data in Mongo DB, I wanted to query based on a String value and that value contains a url
"url" : "http://some-host/api/name/service/list/1234/xyz"
I got records count when executed the below query
db.mycollection.find({url:"http://some-host/api/name/service/list/1234/xyz"}).count()
I want to get all the records which match with
some-host/api/name/service/list/
I tried using below saamples
db.mycollection.find({url:"some-host/api/name/service/list/"}).count()
Got zero records
db.mycollection.find({url:/.*"some-host/api/name/service/list/".*/}).count()
Got error
db.mycollection.find({"url":/.*"some-host/api/name/service/list/".*/}).count()
Got error
db.mycollection.find({"url":/.*some-host/api/name/service/list/.*/}).count()
Got Error
db.mycollection.find({"url":/.*some-host//api//name//service//list//.*/}).count()
Got ...
...
Then no response
Did you try with something like this:
db.mycollection.find({'url': {'$regex': 'sometext'}})
Please check also here

Hive REGEXP_EXTRACT extract the second occurrence of a pattern [duplicate]

This question already has answers here:
hive regexp_extract weirdness
(2 answers)
Closed 4 years ago.
I am querying data in Hive and extracting a code from a column. I recently discovered that due to data entry/business process issues, users have been overloading the field and entering two separate job codes when there should only be one.
Sample data from the column:
NOV2 WAA UW FOO DISPLAY_W2100008/ SOMETHING DISPLAY W2100106
I've been using REGEXP_EXTRACT(column,'([A-Z]\\d{7})',1) as id will correctly extract the first code W2100008, but I am unable to extract the second code W21001061.
I want to use REGEXP_EXTRACT twice and alias id_1 and id_2 so we can analyze the second codes referenced. Is there a way to reference the second time the pattern is matched?
REGEXP_EXTRACT(column,'_([A-Z]\\d{7})',0) returns the first match
REGEXP_EXTRACT(column,'([A-Z]\\d{7})',1) returns the first match
REGEXP_EXTRACT(column,'([A-Z]\\d{7})',2) returns an error
The extracted value will be used to join to another column, so the result needs to return a single value, not an array.
Replace all '.*?([A-Z]\\d{7})' with delimiter(space) + ([A-Z]\\d{7}). Remove first space using trim, split by ' ' to get array:
hive> select split(trim(regexp_replace('NOV2 WAA UW FOO DISPLAY_W2100008/SOMETHING DISPLAY W2100106','.*?([A-Z]\\d{7})',' $1')),' ');
OK
["W2100008","W2100106"]
Get first element:
hive> select split(trim(regexp_replace('NOV2 WAA UW FOO DISPLAY_W2100008/ SOMETHING DISPLAY W2100106','.*?([A-Z]\\d{7})',' $1')),' ')[0];
OK
W2100008
Time taken: 0.065 seconds, Fetched: 1 row(s)
And second element is
split(trim(regexp_replace('NOV2 WAA UW FOO DISPLAY_W2100008/ SOMETHING DISPLAY W2100106','.*?([A-Z]\\d{7})',' $1')),' ')[1]
better use subquery to parse array one time.
select display_array[0] as id_1 , display_array[1] as id_2
from
(
select split(trim(regexp_replace('NOV2 WAA UW FOO DISPLAY_W2100008/ SOMETHING DISPLAY W2100106','.*?([A-Z]\\d{7})',' $1')),' ') as display_array
)s;
Use explode() if you want each element per row.

MongoDb not registering field in query specifier [duplicate]

This question already has an answer here:
How do I search for a string in a MongoDB document array and project the array value in a find operation?
(1 answer)
Closed 7 years ago.
Im querying a mongodb collection and only retriving a subdocument like this:
db.Category.find({"Children.Url" : "www.myurl.com"}, {"Children.$" : 1})
But when trying to run a similar query but with regex :
db.Category.find({"Children.Url" : /myurl/i}, {"Children.$" : 1})
I get this error:
error: {
"$err" : "positional operator (Children.$) requires corresponding field in query specifier",
"code" : 16352
}
Why is it not registering that the corresponding field is actually specified when using regex?
This is a known Jira SERVER-9028 issue. A workaround is:
db.Category.find({"Children": {$elemMatch: {Url: /myurl/i}}}, {"Children.$": 1})
This seems to be a bug in mongodb and also previously answered here : How do I search for a string in a MongoDB document array and project the array value in a find operation?
Modifying the query like this gives expected results:
db.Category.find({"Children.Url" : {$in : [/myurl/i]}}, {"Children.$" : 1})