Negative Look Around in MongoDB Query - regex

For some fields, I uploaded. I want to make sure they didn't get corrupted (not by mongo, but my data generator).
The field of interest would take this regex:
donor_\d{1,2}_\d+
for example:
donor_17_82635294
There is no exception to that rule, so I was wondering if I could use negative look around in regex to find fields that don't meet this rule. The problem with negative look around examples on SO is it seems you have to know what you are looking for, which I don't. I want something like this.
db.collection.find({field:*not*/donor_\d{1,2}_\d+/i})
My other option is just to create a new collection with everything that matches my regex, but this would be much easier.
Thanks
J

Yes you can do negation of regular expression like this:
db.collection.find({field: { $not: /donor_\d{1,2}_\d+/i } })

Related

How to exclude the last part of a variable string using regex

I am currently making a bunch of landing pages that use similar URL structure, but each URL varies in number of words.
So it's something like:
http://landingpage.xyz/page-number-five
http://landingpage.xyz/page-number-fifty-four
http://landingpage.xyz/page-for-a-different-topic
and for the sent page I just postfix -sent like this. The reason I am not adding it as /sent is because the platform I am using handles URLs this way.
http://landingpage.xyz/page-number-five-sent
http://landingpage.xyz/page-number-fifty-four-sent
http://landingpage.xyz/page-for-a-different-topic-sent
Now I found it easy to make a regular expression that identifies all the sent pages which is let's say:
\/([a-z0-9\-]*)-sent
The thing is that I am not sure how to identify the ones that are not sent. I tried using a similar regular expression using something like this, but it's not working as expected:
\/([a-z0-9\-]*)(?!-sent)
What's the best way to design the regex for this? Or I am approaching it in the wrong way?
A lookahead should be considered where there are some characters left to match. So one at the end of regex doesn't look for anything. As long as I'm not sure whether or not your environment supports lookbehinds, this should be a workaround:
\/(?!.*-sent\b)([a-z0-9\-]*)

Extract only the text field needed

I am at the beginning of learning Regex, and I use every opportunity to understand how it's working. Currently I am trying to extract dates from a text file (which is in fact a vnt-file type from my mobile phone). It looks like following:
BEGIN:VNOTE
VERSION:1.1
BODY;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:18.07.=0A14.08.=0A15.09.=0A15.10.=
=0A13.11.=0A13.12.=0A12.01.=0A03.02. Grippe=0A06.03.=0A04.04.2015=0A0=
5.05.2015=0A03.06.2015=0A03.07.2015=0A02.08.2015=0A30.08.2015=0A28.09=
17.11.2017=0A
DCREATED:20171118T095601
X-IRMC-LUID:150
END:VNOTE
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
and so on. If the date has also a year, it should also be displayed.
I almost found out how to detect the dates by the following regex:
.+(\d\d\.\d\d\.(2015|2016|2017)?).+
But it only detect very few of the dates. The result is this:
BEGIN:VNOTE
VERSION:1.1
15.10.
04.04.2015
30.08.2015
24.01.2016
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
Then I tried to add a question mark which makes the .+ not greedy, as far as I read in tutorials. Then the regex looks like:
.+?(\d\d\.\d\d\.(2015|2016|2017)?).+?
But the result is still not what I am looking for:
BEGIN:VNOTE
VERSION:1.1
21.03.20.04.18.05.18.06.18.07.14.08.15.09.15.10.
13.11.13.12.12.01.03.02.06.03.04.04.20150A0=
03.06.201503.07.201502.08.201530.08.20150A28.09=
28.10.201525.11.201528.12.201524.01.20160A
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
For someone who is familiar with regex I am pretty sure this is very easy to solve, but I don't get it. It's very confusing when you are new to regex. I tried to find a hint in some tutorials or stackoverflow posts, but all I found is this: Notepad++ how to extract only the text field which is needed?
But it doesn't work for me. I assume it might have something to do with the fact that my text file is not one single line.
I have my example on regex101 too.
I would be very thankful if maybe someone can give me a hint what else I can try.
Edit: I would like to detect the dates with the regex and as a result have a list with only the dates (maybe it is called substitute?)
Edit 2: Sorry for not mentioning it earlier: I just want to use the regex in e.g. Notepad++ or an online regex test website. Just to get the result of the dates and save the result in a new txt-file. I don't want to use the regex in an programming language. My apologies for not being precisely before.
Edit 3: The result should be a list with the dates, and each date in a new line:
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
I suggest this pattern:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)
This makes use of the \G flag that, in this case, allows for multiple matches from the very start of the match without letting any single unmatched character in the text, thus allowing the removal of all but what's wanted.
If you want to remove the extra matches as well, add |.* at the end:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)|.*
regex101 demo
In N++, make sure the options underlined are selected, and that the cursor is at the beginning. In the picture below, I replaced then undid the replacement, only to show that matches were identified (16 replacements).
You can try using the following pattern:
\d{2}\.\d{2}\.(?:\d{4})?
This will match day.month dates of the form 18.07., but it also allows such a date to be followed by a four digit year, e.g. 18.07.2017. While it would be nice to make the pattern more restrictive, to avoid false fire matches, I do not see anything obvious which can be added to the above pattern. Follow the demo link below to see the pattern in action.
Demo

Match all cases between string and comma using Regex

I have a large string. Here is a part of it:
{"status":"ok","items":[{"image_versions":[{"url":"http:\/\/distilleryimage8.instagram.com\/11a67042c62311e1bf341231380f8a12_7.jpg","width":612,"type":7,"height":612},{"url":"http:\/\/distilleryimage8.instagram.com\/11a67042c62311e1bf341231380f8a12_6.jpg","width":306,"type":6,"height":306},{"url":"http:\/\/distilleryimage8.instagram.com\/11a67042c62311e1bf341231380f8a12_5.jpg","width":150,"type":5,"height":150}],"code":"MrMBxJo-O8","has_more_comments":true,"taken_at":1341438972.0,"comments":[{"media_id":228329104165036988,"_spam":false,"text":"I live in Oklahoma! :D Shoot them off with me! :D","created_at":1341441914.0,"user":{"username":"heather_all_over","pk":13296276,"profile_pic_url":"http:\/\/images.instagram.com\/profiles\/profile_13296276_75sq_1339538236.jpg","full_name":"Heather\ud83c\udf80","is_private":false},"content_type":"comment","pk":228353791620276525,"type":0},{"media_id":228329104165036988,"_spam":false,"text":"Wish I had that much money to spend.......","created_at":1341441916.0,"user":{"username":"l_mcnair","pk":23775741,"profile_pic_url":"http:\/\/images.instagram.com\/profiles\/profile_23775741_75sq_1339894045.jpg","full_name":"Lauryn","is_private":true},"content_type":"comment","pk":228353803204944174,"type":0},{"media_id":228329104165036988,"_spam":false,"text":"You should video tape you setting them all off","created_at":1341441939.0,"user":{"username":"ahrii_","pk":37732021,"profile_pic_url":"http:\/\/images.instagram.com\/profiles\/profile_37732021_75sq_1340907381.jpg","full_name":"Ahriana;-*","is_private":false},"content_type":"comment","pk":228353997065675057,"type":0},{"media_id":228329104165036988,"_spam":false,"text":"When did skrillex start selling
I am trying to match every number after "pk":". I have been trying look aheads but can't quite seem to get it right. I don't know much about regex so if somebody could point me in the right direction that would be great!
This looks like a JSON response. Why not just parse the JSON and pull out the values for all the "pk" keys?
Depending on what language you're using, the regex might look different, but this should work on most languages:
/"pk":(\d+)/g
That basically looks for the string "pk": and then all the digits after that, placing those digits in a capturing group. The g at the end makes it search for all occurrences. Depending on the language you're using though, you might not be able to retrieve all of captures.
If you want the part after something you should use look-behind:
(?<="pk":)\d+

Regex: Finding a value for a "key" inside brackets over multiple lines

I'm currently having some trouble wrapping my head around regular expressions, which is where I hope some of you regex guru's out there might be able to assist with.
I'll briefly explain my problem with an example of what I am trying to achieve.
I have an input string, with a key and value I am looking for, which look somewhat like this:
G01::Notice ((The customer already exists))
G01::MyNotice ((The customer already exists, nevermind...))
G02::OrderConfirm ((The order has been comfirmed!
Please inform the customer that his orders will arrive soon.))
In the above examples, I would like to get everything for G01:: which is enclosed within the parentheses. So my pattern is
Looking at the three input strings, I should add a few notices:
I am not sure if your question is complete ...
Is it this what you want?
G01::[^(]*\(\(([^)]*)
See it here on Regexr. The text within the brackets is in the capture group 1.
Try this regex: G01::\w+ \(\((.*?)\)\)
Not a complete question, but how about this one?
.*\((.*?)\)
Result 1
The customer already exists
Result 2
The customer already exists, nevermind...
Result 3
The order has been comfirmed! Please inform the customer that his orders will arrive soon.
On rubular
G01.*\(\((.*)\)\) seems to work (unless I misunderstood your question).

Need to create a gmail like search syntax; maybe using regular expressions?

I need to enhance the search functionality on a page listing user accounts. Rather than have multiple search boxes for each possible field, or a drop down menu where the user can only search against one field, I'd like a single search box and to use a gmail like syntax. That's the best way I can describe it, and what I mean by a gmail like search syntax is being able to type the following into the input box:
username:bbaggins type:admin "made up plc"
When the form is submitted, the search string should be split into it's separate parts, which will allow me to construct a SQL query. So for example, type:admin would form part of the WHERE clause so that it would find any record where the field type is equal to admin and the same for username. The text in quotes may be a free text search, but I'm not sure on that yet.
I'm thinking that a regular expression or two would be the best way to do this, but that's something I'm really not good at. Can anyone help to construct a regular expression which could be used for this purpose? I've searched around for some pointers but either I don't know what to search for or it's not out there as I couldn't find anything obvious. Maybe if I understood regular expressions better it would be easier :-)
Cheers,
Adam
No, you would not use regular expressions for this. Just split the string on spaces in whatever language you're using.
You don't necessarily have to use a regex. Regexes are powerful, but in many cases also slow. Regex also does not handle nested parameters very well. It would be easier for you to write a script that uses string manipulation to split the string and extract the keywords and the field names.
If you want to experiment with Regex, try the online REGex tester. Find a tutorial and play around, it's fun, and you should quickly be able to produce useful regexes that find any words before or after a : character, or any sentences between " quotation marks.
thanks for the answers...I did start doing it without regex and just wondered if a regex would be simpler. Sounds like it wouldn't though, so I'll go back to the way I was doing it and test it again.
Good old Mr Bilbo is my go to guy for any naming needs :-)
Cheers,
Adam