RegEx if target string is a superset of the regex - regex

[edited for hopefully more clarity.]
I'm probably confused, but I have a Mongodb dataset of simple words:
Items:
Boston BeerBoston BreweryCoors Brewing Light
I have an input string:
"Boston Beer Company"
I want to find any item That is contained within the input string. In this case,
'Boston Beer' would be a match.
The trouble is, given any input string, I don't know which words in the string would find a match in a field. (The match is not anchored to the beginning or end.)
In Javascript, I'd just create a loop and test.
inputString.indexOf(currentItem) >= 0
I may have confused myself, but I can't find a way to express a regEx where the RegEx is the target string and I am testing if any individual item (field) is contained within the longer string.
I hope this is somewhat clearer.
Thanks in advance-

Related

Regex to match words in good order

I want to program an alert system by checking if several lists of keywords are present in one address.
This is my two variables in PHP :
$MyAdress = "210, street Cardinal Avenue, Canada"; (to testing)
$SearchAdress = "210 Cardinal Avenue"; (from my list of possible keywords to find)
I want to test if my SearchAddress is present in my address and check if words are in the good position, how is it possible ?
With Regex ? (It's always been gobbledegook to me)
By example "210 Cardinal Avenue" return TRUE
but "210 Avenue Cardinal" must to return FALSE.
This code PHP check if two keywords occur in String is interesting, but the order is not respected.
Also I resolved problem to transform text in lower and replace foreign characters in a String.
Just wrap the words into \b word boundaries and concatenate them with .+?
(?i)\b210\b.+?\bcardinal\b.+?\bavenue\b
See this demo at regex101 or a PHP demo at tio.run (used i flag for ignorecase)
This would match the words in sequence with one or more of any characters in between.
To also match 210CardinalAvenue drop word boundaries between and use .*? (demo).

Regex for url route with query string

I am having hard time learning regex and honestly I have no time at the moment.
I am looking for a regex expression that would match url route with query string
What I need is regex to match population?filter=nation of course where nation can be any string.
Based on my current regex knowledge I have also tried with regex expression /^population\/(?P<filterval>\d+)\/filter$/ to match population/nation/filter but this does not work.
Any suggestion and help is welcome.
This does match only your first query string format:
population\?filter=[\w]+[-_]?[\w]+
Addiotionally it allows for - and _ as bindings between words. If you know, that your string ends right there, you can also add an $ to the end to mark it so.
If you know that the nation is only alphabetical characters, yu can use the simplified version:
population\?filter=[\w]+
Demo

Regex to match a string only if a string does not exist

Can anyone suggest me a solution in Regular Expression, to match a string if it does not exist in the given string.
Suppose I have a String
Rohan is going to Home
and I don't want that
Going to
string should exist, then it will not return. But if the string does not contain contains "going to", then string will be returned
Valid
Rohan is at Home
Invalid
Rohan is going to Home
I have heard that regex isn't well suited to negate something (except a single character). Regex is more intended to match strings, not "not match" them. Still if someone have solution please suggest in Regular Expresion
I have tried to create a Regular Expression,but didn't get success till now.
SELECT 1
FROM DUAL
WHERE REGEXP_LIKE (' Rohan is 12 home'
, '^\s[^going to])$','i');
Try this, which uses a negative look ahead:
^(?!.*going to.*$)
Check out negative lookaheads.

Expressing basic Access query criteria as regular expressions

I'm familiar with Access's query and filter criteria, but I'm not sure how to express similar statements as regular expression patterns. I'm wondering if someone can help relate them to some easy examples that I understand.
If I were using regular expressions to match fields like Access, how would I express the following statements? Examples are similar to those found on this Access Query and Filter Criteria webpage. As in Access, case is insensitive.
"London"
Strings that match the word London exactly.
"London" or "Paris"
Strings that match either the words London or Paris exactly.
Not "London"
Any string but London.
Like "S*"
Any string beginning with the letter s.
Like "*st"
Any string ending with the letters st.
Like "*the*dog*"
Any strings that contain the words 'the' and 'dog' with any characters before, in between, or at the end.
Like "[A-D]*"
Any strings beginning with the letters A through D, followed by anything else.
Not Like "*London*"
Any strings that do not contain the word London anywhere.
Not Like "L*"
Any strings that don't begin with an L.
Like "L*" And Not Like "London*"
Any strings that begin with the letter L but not the word London.
Regex as much more powerful than any of the patterns you have been used to for creating criteria in Access SQL. If you limit yourself to these types of patterns, you will miss most of the really interesting features of regexes.
For instance, you can't search for things like dates or extracting IP addresses, simple email or URL detection or validation, basic reference code validation (such as asking whether an Order Reference code follows a mandated coding structure, say something like PO123/C456 for instance), etc.
As #Smandoli mentionned, you'd better forget your preconceptions about pattern matching and dive into the regex language.
I found the book Mastering Regular Expressions to be invaluable, but tools are the best to experiment freely with regex patterns; I use RegexBuddy, but there are other tools available.
Basic matches
Now, regarding your list, and using fairly standardized regular expression syntax:
"London"
Strings that match the word London exactly.
^London$
"London" or "Paris"
Strings that match either the words London or Paris exactly.
^(London|Paris)$
Not "London"
Any string but London.
You match for ^London$ and invert the result (NOT)
Like "S*"
Any string beginning with the letter s.
^s
Like "*st"
Any string ending with the letters st.
st$
Like "*the*dog*"
Any strings that contain the words 'the' and 'dog' with any characters before, in between, or at the end.
the.*dog
Like "[A-D]*"
Any strings beginning with the letters A through D, followed by anything else.
^[A-D]
Not Like "*London*"
Any strings that do not contain the word London anywhere.
Reverse the matching result for London (you can use negative lookahead like:
^(.(?!London))*$, but I don't think it's available to the more basic Regex engine available to Access).
Not Like "L*"
Any strings that don't begin with an L.
^[^L] negative matching for single characters is easier than negative matching for a whole word as we've seen above.
Like "L*" And Not Like "London*"
Any strings that begin with the letter L but not the word London.
^L(?!ondon).*$
Using Regexes in SQL Criteria
In Access, creating a user-defined function that can be used directly in SQL queries is easy.
To use regex matching in your queries, place this function in a module:
' ----------------------------------------------------------------------'
' Return True if the given string value matches the given Regex pattern '
' ----------------------------------------------------------------------'
Public Function RegexMatch(value As Variant, pattern As String) As Boolean
If IsNull(value) Then Exit Function
' Using a static, we avoid re-creating the same regex object for every call '
Static regex As Object
' Initialise the Regex object '
If regex Is Nothing Then
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.IgnoreCase = True
.MultiLine = True
End With
End If
' Update the regex pattern if it has changed since last time we were called '
If regex.pattern <> pattern Then regex.pattern = pattern
' Test the value against the pattern '
RegexMatch = regex.test(value)
End Function
Then you can use it in your query criteria, for instance to find in a PartTable table, all parts that are matching variations of screw 18mm like Pan Head Screw length 18 mm or even SCREW18mm etc.
SELECT PartNumber, Description
FROM PartTable
WHERE RegexMatch(Description, "screw.*?d+\s*mm")
Caveat
Because the regex matching uses old scripting libraries, the flavour of Regex language is a bit more limited than the one found in .Net available to other programming languages.
It's still fairly powerful as it is more or less the same as the one used by JavaScript.
Read about the VBScript regex engine to check what you can and cannot do.
The worse though, is probably that the regex matching using this library is fairly slow and you should be very careful not to overuse it.
That said, it can be very useful sometimes. For instance, I used regexes to sanitize data input from users and detect entries with similar patterns that should have been normalised.
Well used, regexes can enhance data consistency, but use sparingly.
Regex is difficult to break into initially. Honestly, looking for spoon-fed examples is not going to help as much as "getting your hands dirty" with it. Also, MS Access is not a good springboard. Regex doesn't "cognate" well with the SQL query process -- not in application, and not in mental orientation. What you need is some text files to process, using a text editor.
Our solution was to open the Excel file in OpenCalc (part of Apache OpenOffice, https://www.openoffice.org/) which provides what seems like full regular expressions for both the find and replace.
We test the regular expressions at http://regexr.com/

Nested Groups in Regex

I'm constructing a regex that is looking for dates. I would like to return the date found and the sentence it was found in. In the code below, the strings on either side of date_string should check for the conditions of a sentence. For your sake, I've omitted the regex for date_string - sufficed to say, it works for picking out dates. While the inside of date_string isn't important, it is grouped as one entire regex.
"((?:[^.|?|!]*)"+date_string+"(?:[^.|?|!]*[.|?|!]\s*))"
The problem is that date_string is only matching the last number of any given date, presumably because the regex in front of date_string is matching too far and overrunning the date regex. For example, if I say "Independence Day is July 4.", I will get the sentence and 4, even though it should match 'July 4'. In case you're wondering, my regex inside date_string are ordered in such a way that 'July 4' should match first. Is there any way to do this all in one regex? Or do I need to split it up somehow (i.e. split up all text into sentences, and then check each sentence)?
There are several things wrong with your regex.
There is no alternation in character classes. You want [^.?!], not [^.|?|!].
You don't need the non-capturing groups at all.
You probably don't need any "outer" grouping, since the entire match is what you look for.
Your match part preceding the date is greedy where it should not be (this runs over part of your date).
You make assumptions about what resembles a sentence that do not match reality. Your own example proves that, if you try.
Putting that last point aside for the moment, you end up with this version:
[^.?!]*?(July 4)[^.?!]*[.?!]\s*
Where the literal July 4 stands in for your date regex. This matches in your question text:
' For example, if I say "Independence Day is July 4.'
'", I will get the sentence and 4, even though it should match 'July 4'. '
which pretty much proves my point #5.
You can make the repetition operator non-greedy by adding a question mark. In your case it would be
[^.?!]*?
And yes, splitting the text into sentences (preferably excluding the last character) would make it really easier.
(Seems like I didn't look at what was in the character class. Replaced it with tloflin's.)