Can not find specific regular expression - regex

I can not find a regular expression that matches what I'm looking for.
I would like a regular expression that matches 15 consecutive characters (except space, exclamation point, comma, period). So far the expression is [^!\?.\s!,]{20}. But I do not want that match if in these 15 characters, 10 are identical.
So match with "jqshjsdfhjsdlfdjqlsmskjm" but not with "thaaaaaaaaaaaaaaank"
thank you

You can achieve something close to that: (([^!\?.\s!,])(?!\1)){15}. See the solution working here.
This solution, however, has a setback: it fails when it finds patterns like 131 or bab. If even with this setback the solution works for you, then good. If not, then this is as far as regex goes. You'll have to work out that logic programatically.
Disclaimer: I'm out of time right now and will edit my answer later to include an explanation of the regex and the reason why it has a setback Although someone else could edit this answer and do it for me : ) .

Related

Regular Expression: Filenames

Extremely new to this and have been trying to figure this out on my own, but no luck.
It seems simple. I have files that are named either starting with L or P, followed by 6 numbers. I need to have 2 expressions, one that only reads files starting with L and one that only reads files starting with P.
I have tried using derivatives of ^[K-M], ^\L.*
No luck so far. Hoping someone can offer a suggestion.
Thanks for your time!
Try ^P\d{6} and ^L\d{6}. The ^ says start at the beginning of the string. The \d{6} matches 6 digits.
If at some point you wanted to match both in one go, you could do ^[LP]\d{6}. The [LP] says match one of L or P.
If the above doesn't work, you might be working with a more limited regex implementation. You could try ^P\d\d\d\d\d\d and ^L\d\d\d\d\d\d to get the same results.
If that doesn't work, you could try ^P[0-9][0-9][0-9][0-9][0-9][0-9] and ^L[0-9][0-9][0-9][0-9][0-9][0-9] which should work on all regex implementations. The \d is just shorthand for [0-9] anyway.
Seth's answer is correct.
If it doesn't matter what comes after the 'P' or 'L' you could also just use ^P and ^L.
In the future, you should try testing how regexes match your input strings using a regex tester such as RegexPal or Regular Expression Editor.

Regular Expression for input validation

I am trying to learn regular expressions and was hoping someone could help me out. WOuld appreciate if someone can help me come up with a regular expression to validate that an input must be of the form
Graph: XY5, YZ4, ST7
Each part such as XY5 represents an edge in the graph and the number represents a the edge weight. There can be any number of such edges.
This is what I have till now. It's probably not correct
"^Graph:\\s{1}[A-ZA-Z\\d,\\s]+"
This might be what you're looking for:
/^Graph: (?:[A-Z]{2}\d(?:$|, ?))+/
See it here in action: http://regexr.com?309av
Here's an explanation of what the regex does (screenshot from RegexBuddy, which is probably the best tool for you if you're trying to learn Regular Expressions):
Try this
/^Graph:(\s+[A-Z][A-Z]\d+)+$/
You should explain your input format a little better. This might do it, from the single example I have and what you said. It does not allow a graph to be empty, which may or may not be part of your requirements.
"^Graph:(\s\w{2}\d+,?)+"
to explain:
^Graph: will cover the start of the line
(\s\w{2}\d+,?)+
\s is a space
\w{2} matches exactly 2 alphanumeric characters (hint: you could make this better!)
\d+ matches 1 or more digits, since I am assuming an edge can have a two digit length ( such as 10)
,? matches a comma optionally. (hint: you could make this better as well, as it will not necessitate a comma between each entry!, maybe by using an or and the end of string delimiter!)
I purposely left some room for improvement, because if you think of some of it on your own, you will accomplish your goal of becoming better with regular expressions.

Inverting a regex as a whole

OK, I read through this thread and it hasn't helped me so far.
I have a regex in TextPad (not sure of its engine) like so:
[[:digit:]]+ \= [[:digit:]]+ \+ [[:digit:]]+
that finds a string such as:
1783635 = 1780296 + 3339
and I want to find everything else. I tried encasing the whole expression with [^ expression ] as the TextPad manual says to do, with no luck. I also tried [^][ expression ], ^( expression ), and [^]( expression ), with no luck.
From the thread above, I tried (?! expression ), again, with no luck.
Thoughts?
It is not really possible to match "the opposite of" with regex. Regular expressions must match to succeed, they cannot not match and still succeed.
Depending on your exact situation (and TextPad's regex capabilities), there might be a way around this limitation.
More detail is necessary to say that for sure, though. Please provide a real-world text sample and describe what you want to do with it.
The best way I have found to do this is not with JUST regular expressions, but to use additional functionality from TextPad (bookmarks).
In this case, I needed to identify all lines that did NOT start with PHVS. So I did the following:
Perform "Match All" search with regular expression "^PHVS". This marked every line that started with PHVS
Go to Edit -> Invert Bookmarks. This marked every line that did NOT start with PHVS
Created a macro that pressed F2 (to go to next bookmark) and fix the line the way I needed it
Ran the macro to the end of the file.

Match a line without number followed by "."

Update: I would like to match a line, started with (" followed by a number and then anything except "." . For example
("10 Advanced topics 365" "#382")
is a match, while
("10.1 Approximation Algorithms 365" "#382")
is not a match.
My regex is
^\(\"\d+(?!\.).*?$
but it will match both examples above including the second one. So what am I missing here?
Thanks and regards!
While it's possible to write a RE that will match such a thing (see manji's answer) I hate such things; they're very hard to comprehend later on. I find it's easier to write an RE to match the case that you don't want, and then make the rest of the logic of the program conditional on that RE not matching. This is virtually always trivial to do.
EDIT:
Sometimes you can do better. If we're seeking to distinguish between the types of lines you describe, where good lines don't have a period after the first digit and there's always some text at that point:
("10 Advanced topics 365" "#382")
("10.1 Approximation Algorithms 365" "#382")
Then a regular expression of this form will suffice:
^\("\d+[^.].*
Potentially you might need more to properly match the remainder of the line more precisely (e.g., detecting whether it ends with the right character sequence) but that's separate.
Via update:
^\("\d[^.]*$
Try this pattern:
(?m)^(?!.*?\d\.).*$

is it the right reqular expression

i have following regular expression but it's not working properly it takes only three values after # sign but i want it to be any number length
"/^[a-zA-Z0-9_\.\-]+\#([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9]{2,4}$/"
this#thi This is validated
this#this It is not validating this expression
Can you please tell me what's the problem with the expression...
Thanks
If you want your regex to match "any number length" then why are you using {2,4}?
I think a better example of the strings you're trying to match might give others a better idea of what you want, because based on your regex it is a bit confusing what you're looking for.
Try this:
^[a-zA-Z0-9_.-]+#([a-zA-Z0-9-]+\.)+[a-zA-Z0-9]{2,4}$
The main problem is that you didn't escape the dot: \.. In regular expression the dot matches everything (mostly), making your regex quite liberal.