Multiple words in any order using regex [duplicate]

Multiple words in any order using regex [duplicate] - regex

This question already has answers here:
Regex to match string containing two names in any order
(9 answers)
Closed 3 years ago.
As the title says , I need to find two specific words in a sentence. But they can be in any order and any casing. How do I go about doing this using regex?
For example, I need to extract the words test and long from the following sentence whether the word test comes first or long comes.
This is a very long sentence used as a test
UPDATE:
What I did not mention in the first part is that it needs to be case insensitive as well.

You can use
(?=.*test)(?=.*long)
Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record

Use a capturing group if you want to extract the matches: (test)|(long)
Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern.
/(?i)(\btest\b.*\blong\b|\blong\b.*\btest\b)/

without knowing what language
/test.*long/
or
/long.*test/
or
/test/ && /long/

Try this:
/(?i)(?:test.*long|long.*test)/
That will match either test and then long, or long and then test. It will ignore case differences.

Vim has a branch operator \& that allows an even terser regex when searching for a line containing any number of words, in any order.
For example,
/.*test\&.*long
will match a line containing test and long, in any order.
See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.

I was using libpcre with C, where I could define callouts. They helped me to easily match not just words, but any subexpressions in any order. The regexp looks like:
(?C0)(expr1(?C1)|expr2(?C2)|...|exprn(?Cn)){n}
and the callout function guards that every subexpression is matched exactly once,like:
int mycallout(pcre_callout_block *b){
static int subexpr[255];
if(b->callout_number == 0){
//callout (?C0) - clear all counts to 0
memset(&subexpr,'\0',sizeof(subexpr));
return 0;
}else{
//if returns >0, match fails
return subexpr[b->callout_number-1]++;
}
}
Something like that should be possible in perl as well.

I don't think that you can do it with a single regex. You'll need to d a logical AND of two - one searching for each word.

Related

Regular expression dilemma

I'm trying for a few hours to write a pattern for some matching algorithm and I can't manage to find something for the following issue: given the example "my_name_is", I need to extract all words individually, as well as the whole expression. Consider that it may be a list of n examples, some that can be matched, some that cannot be matched.
"my_name_is" => ["my", "name", "is", "my_name_is"]
How can I do this, how should the regexp look like? Looking forward for your answers, thank you!

Regular Expressions are patterns used to match a string of characters. We usually use them to validate a string of characters, or to find and replace a specific pattern within text.
Here, it seems the outcome you're looking for is an array of strings that have been split using an underscore. Regex isn't what you're looking for.
Implementation would change based on language, but consider the following code:
function stringToArray(myStr)
{
words = str_split(myStr, '_');
return array_merge(words, [myStr]);
}

use re.findall with the following as your regex:
([^_]+)+?
This should match all sets of consecutive characters that don't contain the underscore.
As for the whole thing? You already have it, so there's no reason to regex the whole string

Parsing valid parent directories with regex

Given the string a/b/c/d which represents a fully-qualified sub-directory I would like to generate a series of strings for each step up the parent tree, i.e. a/b/c, a/b and a.
With regex I can do a non-greedy /(.*?)\// which will give me matches of a, b and c or a greedy /(.*)\// which will give me a single match of a/b/c. Is there a way I can get the desired results specified above in a single regex or will it inherently be unable to create two matches which eat the same characters (if that makes sense)?
Please let me know if this question is answered elsewhere... I've looked, but found nothing.
Note this question is about whether it's possible with regex. I know there are many ways outside of regex.

One solution building on idea in this other question:
reverse the string to be matched: d/c/b/a For instance in PHP use strrev($string )
match with (?=(/(?:\w+(?:/|$))+))
This give you
/c/b/a
/b/a
/a
Then reverse the matches with strrev($string )
This give you
a/b/c/
a/b/
a/
If you had .NET not PCRE you could do matching right to left and proably come up with same.

Completely different answer without reversing string.
(?<=((?:\w+(?:/|$))+(?=\w)))
This matches
a/
a/b/
a/b/c/
but you have to use C# which use variable lookbehind

Yes, it's possible:
/([^\/]*)\//
So basically it replaces your .*? with [^/]*, and it does not have to be non-greedy. Since / is a special character in your case, you will have to escape it, like so: [^\/]*.

Regex - How to search for singular or plural version of word [duplicate]

This question already has answers here:
Regex search and replace with optional plural
(4 answers)
Closed 6 years ago.
I'm trying to do what should be a simple Regular Expression, where all I want to do is match the singular portion of a word whether or not it has an s on the end. So if I have the following words
test
tests
EDIT: Further examples, I need to this to be possible for many words not just those two
movie
movies
page
pages
time
times
For all of them I need to get the word without the s on the end but I can't find a regular expression that will always grab the first bit without the s on the end and work for both cases.
I've tried the following:
([a-zA-Z]+)([s\b]{0,}) - This returns the full word as the first match in both cases
([a-zA-Z]+?)([s\b]{0,}) - This returns 3 different matching groups for both words
([a-zA-Z]+)([s]?) - This returns the full word as the first match in both cases
([a-zA-Z]+)(s\b) - This works for tests but doesn't match test at all
([a-zA-Z]+)(s\b)? - This returns the full word as the first match in both cases
I've been using http://gskinner.com/RegExr/ for trying out the different regex's.
EDIT: This is for a sublime text snippet, which for those that don't know a snippet in sublime text is a shortcut so that I can type say the name of my database and hit "run snippet" and it will turn it into something like:
$movies= $this->ci->db->get_where("movies", "");
if ($movies->num_rows()) {
foreach ($movies->result() AS $movie) {
}
}
All I need is to turn "movies" into "movie" and auto inserts it into the foreach loop.
Which means I can't just do a find and replace on the text and I only need to take 60 - 70 words into account (it's only running against my own tables, not every word in the english language).
Thanks!
- Tim

Ok I've found a solution:
([a-zA-Z]+?)(s\b|\b)
Works as desired, then you can simply use the first match as the unpluralized version of the word.
Thanks #Jahroy for helping me find it. I added this as answer for future surfers who just want a solution but please check out Jahroy's comment for more in depth information.

For simple plurals, use this:
test(?=s| |$)
For more complex plurals, you're in trouble using regex. For example, this regex
part(y|i)(?=es | )
will return "party" or "parti", but what you do with that I'm not sure

Here's how you can do it with vi or sed:
s/\([A-Za-z]\)[sS]$/\1
That replaces a bunch of letters that end with S with everything but the last letter.
NOTE:
The escape chars (backslashes before the parens) might be different in different contexts.
ALSO:
The \1 (which means the first pattern) may also vary depending on context.
ALSO:
This will only work if your word is the only word on the line.
If your table name is one of many words on the line, you could probably replace the $ (which stands for the end of the line) with a wildcard that represents whitespace or a word boundary (these differ based on context).

RegExp Find skip letter in the word

I want to find word even this word is written with skip letter.
For example I want to find
references
I want also find refrences or refernces, but not refer
I write this Regexp
(\brefe?r?e?n?c?e?s?\b)
And I want to add checking for length of matched group, this group should be greather than 8.
Can I do only with regexp methods?

I don't think regex is a good tool to find similar words like you try to. What are you doing if two letters are swapped, like "refernece"? Your regex will not find it.
But to show the regex way to check for the length, you could do this by using a lookahead like this
(\b(?=.{8,}\b)refe?r?e?n?c?e?s?\b)
The (?=.{8,}\b) will check if the length from the first \b to the next \b is at least 8 characters ({8,})
See it here on Regexr

I think that using regex is not a good idea. You need more power functions. For example, if you are programming in php, you need function like similar_text. More details here: http://www.php.net/manual/en/function.similar-text.php

Basically you are asking that (in pseudo code):
input == "references" or (levenshtein("references", input)==1 and length(input) == (lenght("references")-1))
Levenshtein distance is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
Since you want to detect only the strings where a char was skipped, you must add the constraint on the string length.

Regular Expression to exclude set of Keywords

I want an expression that will fail when it encounters words such as "boon.ini" and "http". The goal would be to take this expression and be able to construct for any set of keywords.

^(?:(?!boon\.ini|http).)*$\r?\n?
(taken from RegexBuddy's library) will match any line that does not contain boon.ini and/or http. Is that what you wanted?

An alternative expression that could be used:
^(?!.*IgnoreMe).*$
^ = indicates start of line
$ = indicates the end of the line
(?! Expression) = indicates zero width look ahead negative match on the expression
The ^ at the front is needed, otherwise when evaluated the negative look ahead could start from somewhere within/beyond the 'IgnoreMe' text - and make a match where you don't want it too.
e.g. If you use the regex:
(?!.*IgnoreMe).*$
With the input "Hello IgnoreMe Please", this will will result in something like: "gnoreMe Please" as the negative look ahead finds that there is no complete string 'IgnoreMe' after the 'I'.

Rather than negating the result within the expression, you should do it in your code. That way, the expression becomes pretty simple.
\b(boon\.ini|http)\b
Would return true if boon.ini or http was anywhere in your string. It won't match words like httpd or httpxyzzy because of the \b, or word boundaries. If you want, you could just remove them and it will match those too. To add more keywords, just add more pipes.
\b(boon\.ini|http|foo|bar)\b

you might be well served by writing a regex that will succeed when it encounters the words you're looking for, and then invert the condition.
For instance, in perl you'd use:
if (!/boon\.ini|http/) {
# the string passed!
}

^[^£]*$
The above expression will restrict only the pound symbol from the string. This will allow all characters except string.

Which language/regexp library? I thought you question was around ASP.NET in which case you can see the "negative lookhead" section of this article:
http://msdn.microsoft.com/en-us/library/ms972966.aspx
Strictly speaking negation of a regular expression, still defines a regular language but there are very few libraries/languages/tool that allow to express it.
Negative lookahed may serve you the same but the actual syntax depends on what you are using. Tim's answer is an example with (?...)

I used this (based on Tim Pietzcker answer) to exclude non-production subdomain URLs for Google Analytics profile filters:
^\w+-*\w*\.(?!(?:alpha(123)*\.|beta(123)*\.|preprod\.)domain\.com).*$
You can see the context here: Regex to Exclude Multiple Words

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiple words in any order using regex [duplicate] - regex

You can use (?=.test)(?=.long) Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record

Use a capturing group if you want to extract the matches: (test)|(long) Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern. /(?i)(\btest\b.\blong\b|\blong\b.\btest\b)/

without knowing what language /test.long/ or /long.test/ or /test/ && /long/

Try this: /(?i)(?:test.long|long.test)/ That will match either test and then long, or long and then test. It will ignore case differences.

I don't think that you can do it with a single regex. You'll need to d a logical AND of two - one searching for each word.

Related

Regular expression dilemma

Parsing valid parent directories with regex

Regex - How to search for singular or plural version of word [duplicate]

RegExp Find skip letter in the word

Regular Expression to exclude set of Keywords

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiple words in any order using regex [duplicate] - regex

You can use (?=.*test)(?=.*long) Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record

Use a capturing group if you want to extract the matches: (test)|(long) Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern. /(?i)(\btest\b.*\blong\b|\blong\b.*\btest\b)/

without knowing what language /test.*long/ or /long.*test/ or /test/ && /long/

Try this: /(?i)(?:test.*long|long.*test)/ That will match either test and then long, or long and then test. It will ignore case differences.

I don't think that you can do it with a single regex. You'll need to d a logical AND of two - one searching for each word.

Related

Regular expression dilemma

Parsing valid parent directories with regex

Regex - How to search for singular or plural version of word [duplicate]

RegExp Find skip letter in the word

Regular Expression to exclude set of Keywords

Categories

Resources

You can use (?=.test)(?=.long) Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record

I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern. /(?i)(\btest\b.\blong\b|\blong\b.\btest\b)/

without knowing what language /test.long/ or /long.test/ or /test/ && /long/

Try this: /(?i)(?:test.long|long.test)/ That will match either test and then long, or long and then test. It will ignore case differences.