This question already has answers here:
Regex to match string containing two names in any order
(9 answers)
Closed 2 years ago.
Given a sentence,
Scheme is such a bizarre programming language.
So any sentence that contains is and language should return true. I found | means or, but couldn't find any symbol means and.
Thanks,
You can use the idiom.
(?=expr)
For example,
(?=.*word1)(?=.*word2)
For more details, please refer to this threads.
Try the following regex:
\bis\b.*\blanguage\b
This one will match if the two words appear in exactly that order. \b (word boundary) means that the words are standalone.
Kinda ugly, but it should work (regardless of the how 'is' and 'language' are ordered):
(.*is.*language.*|.*language.*is.*)
In c# (and I know you didn't ask about c#, but it illustrates how this can be done much quicker)...
string s = "Scheme is such a bizarre programming language.";
if ((s.Contains(" is") || s.Contains("is ")) &&
(s.Contains(" language") || s.Contains("language ")))
{
// found match if you got here
}
Regexs can be slow and hard to parse by someone who is reading your code. Simple string matches are quicker generally.
EDIT: This doesn't care about the order of the words and works for simple whitespace only
Try this one if you don't care about the order of the words in the sentence:
\bis\b.*\blanguage\b|\blanguage\b.*\bis\b
Related
This question already has answers here:
Apply Perl RegExp to Remove Parenthesis and Text at End of String
(1 answer)
Regex for Comma delimited list
(12 answers)
Closed 2 years ago.
I have a bunch of strings such as:
Super Mario Bros. 8 (En,Fr,De,Es,It)
Donald Duck in Whacky Land (En,Fr,De,Es,Sv)
Toadstool Adventures 3D (En)
Chinaland (En,De)
A title which doesn't have any such thing
...
That is, a title of a product followed by (sometimes) a list of one or more language codes in parentheses.
I really struggle to come up with a (PCRE) regexp to safely remove these from the strings in a safe manner. That is, not likely to touch the titles.
I know that ([A-Z]{1}[a-z]{1}) must be involved somewhere, to match a single language code such as "It" or "De", but how I should handle the possibility of any number of such in a row, with commas between or no comma (if it's just one), is beyond my regular expression skills.
I really wish that they had used some kind of unambiguous separator between the title part and the "metadata" part of the filenames... Then I wouldn't need to do all this manual trial-and-error removal. But they didn't.
Something like this would do it:
\([A-Z][a-z](?:,[A-Z][a-z])*\)$
https://regex101.com/r/xxNQ8h/1
Try it like this:
\(([A-Z][a-z],?)+\).*$
Online Demo
This question already has answers here:
Regular Expressions: Is there an AND operator?
(14 answers)
Closed 6 years ago.
I am really not good with regular expressions and I come here for some assistance :). I am trying to combine regular expressions with something like AND. For example if we have a text file with:
abc1-xyz
abc1-ertxyz
abc1xyz
postxyz
abc1
I would like to match everything that starts with "abc1" AND also contains the letters "xyz" somewhere.
I know that I can start with:
/^abc1/
but I am not sure how to combine so it can also match to contain "xyz".
Thank you for your assistance in advance.
You should tell us with which language you are coding, regex engines are not always the same.
There is another ambiguous point : Do you need your string to CONTAIN xyz or to END WITH?
Considering you are coding on Javascript..
If you want it to contain xyz, try :
/^abc1.*xyz/
If you want it to end with xyz, try :
/^abc1.*xyz$/
This question already has answers here:
Can you make just part of a regex case-insensitive?
(5 answers)
Closed 3 years ago.
Okay this might not be tricky at all for some but at the moment really screwing up with my head.
First of all i don't know what engine i am dealing with, but it doesn't seem to identify uppercase.
I have a string for example
Circuit Ref
Service Type
A End Address
Z End Address
52GD J32SD41 O2AE EVC001
Evolve Internet
And I am only trying to extract the string "52GD J32SD41 O2AE EVC001". I have already tried quite a few combinations like
[0-9A-Z]{4}\s[0-9A-Z]+\s[0-9A-Z]+\s[0-9A-Z]+
[A-Z0-9]{4}\s\W+\s\W+\s\W+
[A-Z0-9]{4}\s[A-Z0-9\s]*[A-Z0-9\s]*[A-Z0-9\s]*
Nothing seem to work...I want to keep the expression fairly flexible as the expression can change order of the letters and digits. but the pattern is mostly same. Any nudge in a right direction will be greatly appreciated.
Thanks
This is wild guess, but please try following things:
in front of the regex add (?-i) (Related question, regular-expressions.info, net page about regex)
enclose regex with (?-i: ... )
enclose regex with (?I: ... )
BTW. Regarding 2nd case that you tried: [A-Z0-9]{4}\s\W+\s\W+\s\W+.
Seem that you tried to use \W as "upper case word character", but it is not what it means.
\W means anything that is not \w. That is any non-word character.
This question already has answers here:
Regex: Determine if two regular expressions could match for the same input?
(5 answers)
Closed 8 years ago.
Given two regular expressions, is it possible to detect whether there is any possible string that matches them both?
For example, given regexes A and ., I can see that string "A" matches them both. That's a simple case.
My question is for the broader case -- given any two valid regexes, would it be possible to definitively say whether there is any possible string that would match both regexes? Assume that there is no sample set of input strings to test. All I have are the regexes. I don't necessarily need to produce matching strings -- I just need to determine that there are possible strings that match both.
Will accept discussions for any of the common regex specifications -- .NET, Java, PERL, sed, grep, etc.
Basically, you want to test if the intersection of two RegExps is non-empty. Since intersection - just like complement - is a potentially expensive operation (it requires determinization of the NFA), it is not implemented in many RegExp implementations. One exception I know of is the BRICS Automaton Library, which allows enabling the intersection operator &.
To test the property in question, you could use the BRICS (Java) library like this:
RegExp re = new RegExp("(.) & (a)", RegExp.INTERSECTION); // Parse RegExp
Automaton a = re.toAutomaton(); // convert RegExp to automaton
if(a.isEmpty()) { // Test if intersection is empty
System.out.println("Intersection is empty!");
}
else {
// Print the shortest accepted string
System.out.println("Intersection is non-empty, example: " + a.getShortestExample(true));
}
Yes, it's possible theoretically.
But it basically comes down to try all possible options and see which matches both regexes. But it's more a theoretical computer science question, with modern day regular expressions in programming languages this would be a problem in NP (http://en.wikipedia.org/wiki/NP_(complexity))
If you're talking more about the formal language theory definition of a regular language, than I would say it should be possible by converting both regexes to a DFA and walk through both simultaneously to see what would match.
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I have this RegEx:
('.+')
It has to match character literals like in C. For example, if I have 'a' b 'a' it should match the a's and the ''s around them.
However, it also matches the b also (it should not), probably because it is, strictly speaking, also between ''s.
Here is a screenshot of how it goes wrong (I use this for syntax highlighting):
I'm fairly new to regular expressions. How can I tell the regex not to match this?
It is being greedy and matching the first apostrophe and the last one and everything in between.
This should match anything that isn't an apostrophe.
('[^']+')
Another alternative is to try non-greedy matches.
('.+?')
Have you tried a non-greedy version, e.g. ('.+?')?
There are usually two modes of matching (or two sets of quantifiers), maximal (greedy) and minimal (non-greedy). The first will result in the longest possible match, the latter in the shortest. You can read about it (although in perl context) in the Perl Cookbook (Section 6.15).
Try:
('[^']+')
The ^ means include every character except the ones in the square brackets. This way, it won't match 'a' b 'a' because there's a ' in between, so instead it'll give both instances of 'a'
You need to escape the qutoes:
\'[^\']+\'
Edit: Hmm, we'll I suppose this answer depends on what lang/system you're using.