Match a specific string with several constants using Regex - regex

There are now different requirements to the regex I am looking for, and it is too complex to solve it on my own.
I need to search for a specific string with the following requirements:
String starts with "fu: and ends with "
In between those start and end requirements there can be any other string which has the following requirements:
2.1. Less than 50 characters
2.2. Only lower case
2.3. No trailing spaces
2.4. No space between "fu: and the other string.
The result of the regex should be cases where case no' 1 matches but cases no' 2./2.1/2.2/2.3/2.4 don't.
At the moment I have following regex: "fu:([^"]*?[A-Z][^"]*?)",
which finds strings with start with "fu: and end with " with any upper case inbetween like this one:
"fu:this String is wrong cause the s from string is upper case"
I hope it all makes sense, I tried to get into regex but this problem seems to complex for someone who is not working with regex every day.
[Edit]
Apparently I was not clear enough. I want to have matches which are "wrong".
I am looking for the complement of this regex: "fu:(?:[a-z][a-z ]{0,47}[a-z]|[a-z]{0,2})"
some examples:
Match: "fu: this is a match"
Match: "fu:This is a match"
Match: "fu:this is a match "
NO Match: "fu:this is no match"
Sorry, its not easy to explain :)

Try the following:
"fu:([a-z](?:[a-z ]{0,48}[a-z])?)"
This will match any string that begins with "fu: and ends with a " and the string between those will contain 1-50 characters - only lower-case and not able to begin with a space nor have trailing spaces.
"fu: # begins with "fu:
( # group to match
[a-z] # starts with at least one character
(?: # non-matching sub-group
[a-z ]{0,48} # matches 0-48 a-z or space characters
[a-z] # sub-group must end with a character
)? # group is not required
)
" # ends with "
EDIT: In the event that you need an empty-string to match too, i.e. the full string is "fu:", you can add another ? to the end of the matching-group in the regex:
"fu:([a-z](?:[a-z ]{0,48}[a-z])?)?"
I've kept the two regexes separated (one that allows 1-50 characters in the string and one that allows 0-50) to show the minor difference.
EDIT #2: To match the inverse of the above, i.e. - to find all strings that do not match the required format, you can use:
^((?!"fu:([a-z](?:[a-z ]{0,48}[a-z])?)?").)*$
This will explicitly match any line that does not match that pattern. This will consequently also match lines that do not contain "fu: - if that matters.
The only way I can figure out to truly match the opposite of the above and still include the anchors of "fu: and " are to explicitly attempt to match the rules that fail:
"fu:([^a-z].*|[^"]{51,}|[a-z]([^"]*?[A-Z][^"]*?)+|[a-z ]{0,49}[ ])"
This regex will match anything that starts with not a lowercase a-z character, any string that's longer than 50 characters, any string that contains an uppercase letter, or any string that has trailing whitespace. For each additional rule, you'll need to update the regex to match the opposite of what's needed.
My recommendation is, in whatever language you're using, to match all input strings that actually follow your requirements - and if there are no matches then that string must violate your rules.

"fu:([^A-Z" ](?:[^A-Z"]{0,48}[^A-Z" ])?)"
The above regex should match the specified requirements.

That's probably what you need
"fu:([a-z](?:[a-z ]{,48}[a-z])?)"

Try this:
"fu:(?:[a-z][a-z ]{0,47}[a-z]|[a-z]?)"

Related

Regular expression to match a word that contains ONLY one colon

I am new to regex, basically I'd like to check if a word has ONLY one colons or not.
If has two or more colons, it will return nothing.
if has one colon, then return as it is. (colon must be in the middle of string, not end or beginning.
(1)
a:bc:de #return nothing or error.
a:bc #return a:bc
a.b_c-12/:a.b_c-12/ #return a.b_c-12/:a.b_c-12/
(2)
My thinking is, but this is seems too complicated.
^[^:]*(\:[^:]*){1}$
^[-\w.\/]*:[-\w\/.]* #this will not throw error when there are 2 colons.
Any directions would be helpful, thank you!
This will find such "words" within a larger sentence:
(?<= |^)[^ :]+:[^ :]+(?= |$)
See live demo.
If you just want to test the whole input:
^[^ :]+:[^ :]+$
To restrict to only alphanumeric, underscore, dashes, dots, and slashes:
^[\w./-]+:[\w./-]+$
I saw this as a good opportunity to brush up on my regex skills - so might not be optimal but it is shorter than your last solution.
This is the regex pattern: /^[^:]*:[^:]*$/gm and these are the strings I am testing against: 'oneco:on' (match) and 'one:co:on', 'oneco:on:', ':oneco:on' (these should all not match)
To explain what is going on, the ^ matches the beginning of the string, the $ matches the end of the string.
The [^:] bit says that any character that is not a colon will be matched.
In summary, ^[^:] means that the first character of the string can be anything except for a colon, *: means that any number of characters can come after and be followed by a single colon. Lastly, [^:]*$ means that any number (*) of characters can follow the colon as long as they are not a colon.
To elaborate, it is because we specify the pattern to look for at the beginning and end of the string, surrounding the single colon we are looking for that only the first string 'oneco:on' is a match.

How to overcome multiple matches within same sentence (regex) [duplicate]

I am trying to implement a regex which includes all the strings which have any number of words but cannot be followed by a : and ignore the match if it does. I decided to use a negative look ahead for it.
/([a-zA-Z]+)(?!:)/gm
string: lame:joker
since i am using a character range it is matching one character at a time and only ignoring the last character before the : .
How do i ignore the entire match in this case?
Link to regex101: https://regex101.com/r/DlEmC9/1
The issue is related to backtracking: once your [a-zA-Z]+ comes to a :, the engine steps back from the failing position, re-checks the lookahead match and finds a match whenver there are at least two letters before a colon, returning the one that is not immediately followed by :. See your regex demo: c in c:real is not matched as there is no position to backtrack to, and rea in real:c is matched because a is not immediately followed with :.
Adding implicit requirement to the negative lookahead
Since you only need to match a sequence of letters not followed with a colon, you can explicitly add one more condition that is implied: and not followed with another letter:
[A-Za-z]+(?![A-Za-z]|:)
[A-Za-z]+(?![A-Za-z:])
See the regex demo. Since both [A-Za-z] and : match a single character, it makes sense to put them into a single character class, so, [A-Za-z]+(?![A-Za-z:]) is better.
Preventing backtracking into a word-like pattern by using a word boundary
As #scnerd suggests, word boundaries can also help in these situations, but there is always a catch: word boundary meaning is context dependent (see a number of ifs in the word boundary explanation).
[A-Za-z]+\b(?!:)
is a valid solution here, because the input implies the words end with non-word chars (i.e. end of string, or chars other than letter, digits and underscore). See the regex demo.
When does a word boundary fail?
\b will not be the right choice when the main consuming pattern is supposed to match even if glued to other word chars. The most common example is matching numbers:
\d+\b(?!:) matches 12 in 12,, but not in 12:, and also 12c and 12_
\d+(?![\d:]) matches 12 in 12, and 12c and 12_, not in 12: only.
Do a word boundary check \b after the + to require it to get to the end of the word.
([a-zA-Z]+\b)(?!:)
Here's an example run.

Regex with start and end match

I'm having trouble matching the start and end of a regex on Python.
Essentially I'm confused about the when to use word boundaries /b and start/end anchors ^ $
My regex of
^[A-Z]{2}\d{2}
matches 4 letter characters (two uppercase letters, two digits) which is what I'm after
Matches AJ99, RD22, CP44 etc
However, I also noted that AJAJAJAJAJAJAJAJAJSJHS99 could be matched as well. I've tried used ^ and $ together to match the whole string. This doesn't work
^[A-Z]{2}\d{2}$ # this doesn't work
but
^[A-Z]{2}\d{2} # this is fine
[A-Z]{2}\d{2}$ # this is fine
The string I'm matching against is 4 characters long, but in the first two examples the regex could pick the start and end of a longer string respectively.
s = "NZ43" # 4 characters, match perfect! However....
s = "AM27272727" # matches the first example
s = "HAHSHSHSHDS57" # matches the second example
The position anchors ^ and $ place a restriction on the position of your matched chars:
Analyzing your complete regex:
^[A-Z]{2}\d{2}$
^ matches only at the beginning of the text
[A-Z]{2} exactly 2 uppercase Ascii alphabetic characters
\d{2} exactly 2 digits (equivalent to [0-9]{2})
$ matches only at the end of the text
If you remove one or both of the 2 position anchors (^ or $) you can match a substring starting from the beginning or the end as you stated above.
If you want to match exactly a word without using the start/end of the string use the \b anchor, like this:
``\b[A-Z]{2}\d{2}\b``
\b matches at the start/end of text and between a regex word (in regex a word char \w is intended as one of [a-zA-Z0-9_]) and one char not in the word group (available as \W).
The regex above matches WS24 in all the next strings:
WS24 alone
before WS24
WS24 after
before WS24 after
NZ43
It doesn't match:
AM27272727 (it will do if is AM27 272727 or AM27"272727
HAHSHSHSHDS57 (it will do if HAHSHSHSH DS75 or...you get it)
A demo online (the site will be useful to you also to experiment with regex).
The fact that your shown behaviour is like it's supposed to be, your question suggests that you maybe does not have fully understood how regular expressions work.
As a addition to the very good and informative answer of GsusRecovery, here's a site, that guides you through the concepts of regular expressions and tries to teach you the basics with a lessons-based system. To be clear, I do not want to tout this website, as there are plenty of those, but however I could really made a use of this one and so it's the one I'm suggesting.

regex match till a character from a second occurance of a different character

My question is pretty similar to this question and the answer is almost fine. Only I need a regexp not only for character-to-character but for a second occurance of a character till a character.
My purpose is to get password from uri, example:
http://mylogin:mypassword#mywebpage.com
So in fact I need space from the second ":" till "#".
You could give the following regex a go:
(?<=:)[^:]+?(?=#)
It matches any consecutive string not containing any : character, prefixed by a : and suffixed by a #.
Depending on your flavour of regex you might need something like:
:([^:]+?)#
Which doesn't use lookarounds, this includes the : and # in the match, but the password will be in the first capturing group.
The ? makes it lazy in case there should be any # characters in the actual url string, and as such it is optional. Please note that that this will match any character between : and # even newlines and so on.
Here's an easy one that does not need look-aheads or look-behinds:
.*:.*:([^#]+)#
Explanation:
.*:.*: matches everything up to (and including) the second colon (:)
([^#]+) matches the longest possible series of non-# characters
# - matches the # character.
If you run this regex, the first capturing group (the expression between parentheses) will contain the password.
Here it is in action: http://regex101.com/r/fT6rI0

Regular expression for letters, numbers and - _

I'm having trouble checking in PHP if a value is is any of the following combinations
letters (upper or lowercase)
numbers (0-9)
underscore (_)
dash (-)
point (.)
no spaces! or other characters
a few examples:
OK: "screen123.css"
OK: "screen-new-file.css"
OK: "screen_new.js"
NOT OK: "screen new file.css"
I guess I need a regex for this, since I need to throw an error when a give string has other characters in it than the ones mentioned above.
The pattern you want is something like (see it on rubular.com):
^[a-zA-Z0-9_.-]*$
Explanation:
^ is the beginning of the line anchor
$ is the end of the line anchor
[...] is a character class definition
* is "zero-or-more" repetition
Note that the literal dash - is the last character in the character class definition, otherwise it has a different meaning (i.e. range). The . also has a different meaning outside character class definitions, but inside, it's just a literal .
References
regular-expressions.info/Anchors, Character Classes and Repetition
In PHP
Here's a snippet to show how you can use this pattern:
<?php
$arr = array(
'screen123.css',
'screen-new-file.css',
'screen_new.js',
'screen new file.css'
);
foreach ($arr as $s) {
if (preg_match('/^[\w.-]*$/', $s)) {
print "$s is a match\n";
} else {
print "$s is NO match!!!\n";
};
}
?>
The above prints (as seen on ideone.com):
screen123.css is a match
screen-new-file.css is a match
screen_new.js is a match
screen new file.css is NO match!!!
Note that the pattern is slightly different, using \w instead. This is the character class for "word character".
API references
preg_match
Note on specification
This seems to follow your specification, but note that this will match things like ....., etc, which may or may not be what you desire. If you can be more specific what pattern you want to match, the regex will be slightly more complicated.
The above regex also matches the empty string. If you need at least one character, then use + (one-or-more) instead of * (zero-or-more) for repetition.
In any case, you can further clarify your specification (always helps when asking regex question), but hopefully you can also learn how to write the pattern yourself given the above information.
you can use
^[\w.-]+$
the + is to make sure it has at least 1 character. Need the ^ and $ to denote the begin and end, otherwise if the string has a match in the middle, such as ####xyz%%%% then it is still a match.
\w already includes alphabets (upper and lower case), numbers, and underscore. So the rest ., -, are just put into the "class" to match. The + means 1 occurrence or more.
P.S. thanks for the note in the comment about preventing - to denote a range.
This is the pattern you are looking for
/^[\w-_.]*$/
What this means:
^ Start of string
[...] Match characters inside
\w Any word character so 0-9 a-z A-Z
-_. Match - and _ and .
* Zero or more of pattern or unlimited
$ End of string
If you want to limit the amount of characters:
/^[\w-_.]{0,5}$/
{0,5} Means 0-5 characters
To actually cover your pattern, i.e, valid file names according to your rules, I think that you need a little more. Note this doesn't match legal file names from a system perspective. That would be system dependent and more liberal in what it accepts. This is intended to match your acceptable patterns.
^([a-zA-Z0-9]+[_-])*[a-zA-Z0-9]+\.[a-zA-Z0-9]+$
Explanation:
^ Match the start of a string. This (plus the end match) forces the string to conform to the exact expression, not merely contain a substring matching the expression.
([a-zA-Z0-9]+[_-])* Zero or more occurrences of one or more letters or numbers followed by an underscore or dash. This causes all names that contain a dash or underscore to have letters or numbers between them.
[a-zA-Z0-9]+ One or more letters or numbers. This covers all names that do not contain an underscore or a dash.
\. A literal period (dot). Forces the file name to have an extension and, by exclusion from the rest of the pattern, only allow the period to be used between the name and the extension. If you want more than one extension that could be handled as well using the same technique as for the dash/underscore, just at the end.
[a-zA-Z0-9]+ One or more letters or numbers. The extension must be at least one character long and must contain only letters and numbers. This is typical, but if you wanted allow underscores, that could be addressed as well. You could also supply a length range {2,3} instead of the one or more + matcher, if that were more appropriate.
$ Match the end of the string. See the starting character.
Something like this should work
$code = "screen new file.css";
if (!preg_match("/^[-_a-zA-Z0-9.]+$/", $code))
{
echo "not valid";
}
This will echo "not valid"
[A-Za-z0-9_.-]*
This will also match for empty strings, if you do not want that exchange the last * for an +