Regex AND'ing - regex

I have to two strings that I want to match everything that doesn't equal them, the first string can be followed by a number of characters. I tried something like this, negating two ors and negating that result.
?!(?!^.*[^Factory]$|?![^AppName])
Any ideas?

Try this regular expression:
(?!.*Factory$|.*AppName)^.*
This matches every string that does not end with Factory and does not contain AppName.

dfa's answer is by far the best option. But if you can't use it for some reason, try:
^(?!.*Factory|AppName)
It's very difficult to determine from your question and your regex what you're trying to do; they seem to imply opposite behaviors. The regex I wrote will not match if Factory appears anywhere in the string, or AppName appears at the beginning of it.

what about
if (!match("(Factory|AppName)")) {
// your code
}

Would it work if you looked for the existence of those two strings and then negated the regex?

Related

Regular expression dilemma

I'm trying for a few hours to write a pattern for some matching algorithm and I can't manage to find something for the following issue: given the example "my_name_is", I need to extract all words individually, as well as the whole expression. Consider that it may be a list of n examples, some that can be matched, some that cannot be matched.
"my_name_is" => ["my", "name", "is", "my_name_is"]
How can I do this, how should the regexp look like? Looking forward for your answers, thank you!
Regular Expressions are patterns used to match a string of characters. We usually use them to validate a string of characters, or to find and replace a specific pattern within text.
Here, it seems the outcome you're looking for is an array of strings that have been split using an underscore. Regex isn't what you're looking for.
Implementation would change based on language, but consider the following code:
function stringToArray(myStr)
{
words = str_split(myStr, '_');
return array_merge(words, [myStr]);
}
use re.findall with the following as your regex:
([^_]+)+?
This should match all sets of consecutive characters that don't contain the underscore.
As for the whole thing? You already have it, so there's no reason to regex the whole string

Regex to match a number not in a list

I've been trying to create a regex to match a number that is not in a list, but haven't been able to figure it out. It might not be possible.
For example, given the list "1,3,13,17,21,30" I would like the regex to match "40" (which is not in the list), but not match "3" which is in the list.
Does anyone know how I might be able to achieve this?
If you know the only text that will be in the string to search for is the actual number only, simply anchor to the beginning and end of the string, and exclude the numbers from the list using negative lookahead:
(?!^1$|^3$|^13$|^17$|^21$|^30$)^\d+$
This regex can be reduced a little, but I think in this form it is more obvious You're excluding matches from a list, improving readability.
If the string has other text, however, you might try to anchor it to a separator character(such as a space or simply \b)
It seems you're looking to use Negative Lookahead as follows:
\b(?!(?:1[37]?|21|30?)\b)\d+\b
Live Demo
Just use two separate regexes in your language of choice, like:
/\d/ && ! /(1|3|13|17|21|30)/
...or something similar, probably you'll need anchors in each depending on the string you're matching against (which, btw, you should provide when asking questions like this on SO).
i don't know in what context you would use this functionality, but if you know the list contains all digits and the one you want to filter is digits too, why don't you use a set?
in python(assuming your list is stored in l):
s = {int(s) for s in l.split(',')}
if int(some_num_txt) in s:
pass
else:
pass
it's way faster and more intuitive than regex.
edited: should work now
\b(?!(1\b|3\b|13\b|17\b|21\b|31\b))\d+\b

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];

Single Regex for filtering roman numerals from the text files

I am stuck in between of a problem where only one pass of regular expression is allowed( some old hard code). I need the regex for roman numerals.
I have tried the standard one i.e. ^(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$, but the problem is it allows null('') values also.
Is there any way around to check is problem?
To require that at least one character must be present, you can use a lookahead (?=.) at the start of your regular expression:
^(?=.)(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$
Another solution is to separately test that your string is not the empty string.
I like this one:
\b(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.