combination "+" with "$" in regex - regex

Thanks to everyone who has replied.
I think I have to tweak my first question a little bit.
I'm a little bit confusing because of the definition of $ sign.
It just asserts that there are between 6 and 10 word chars at the very end of the string.
That's it! Right? Then, It has to be matched with my test string "123a56A781231231231241" in my opinion. Because it doesn't break the rule! 6-10 word chars at the very beginning of string, and at the very end of string. Perfect, isn't it?
Plus, I want to know the difference between ^(?=\w{6,10}$) and ^(?=\w{6,10})$.
One more, Casimir et Hippolyte you said The + doesn't change anything, this means only that the quantifier ( {6,10} here) is possessive and doesn't allow backtracks.
Is that means + sign makes $ sign disable?
Thank you guys in advance.
Before I go any further, I want you guys to know that it's been only 2 days since I started to study about regex. I'm totally newbie.
First. ^(?=\w{6,10}$) This is pattern. Why the dollar signal has to be inside of () ? I know it's a dumb question but I'm curious. I tried to locate the dollar sign at the outside of (). But it didn't work as I expected.
Second. I found several tutorial site and it says the dollar sign means
"$ may appear at the end of a pattern to require the match to occur at the very end of a line. For example, abc$ matches 123abc but not abc123."
So $ is used to assert that the matched part of string is at the very end of a line. Right?
If that is true, why this pattern : "^(?=\w{6,10}$)" can't be matched with my test string : "123a56A781231231231241".
As you see, my test string contains 6~10 word characters at the very beginning of a line and 6~10 word characters at the very end of a line.
Third. As I mention earlier, this pattern : ^(?=\w{6,10}$) can't be matched with my test string : "123a56A781231231231241" But! if I add + sign behind of \w{6,10} like ^(?=\w{6,10}+$)
it works.
Is it because + sign is possessive? I mean,as far as I know, + sign tells the engine not to backtrack once a match has been made. So I hazard the guess, the $ sign doesn't do his job as it doesn't even do backtracking(I'm not sure about this,of course,as I don't know how the $ sign works behind). Is it right?

If that's your whole regex, you don't need a look-ahead. ie these two regexes are equivalent:
^(?=\w{6,10}$)
^\w{6,10}$
Why the $ needs to be inside the bracket? That's because the (anchored) look ahead ^(?=\w{6,10}) just asserts that there are between 6 and 10 word chars at the front of the input. But it will succeed if there's more than 6-10 word chars at the front of the input.
By putting the $ inside the look ahead, it will only succeed if there are 6-10 word chars in the whole input.
You would only use a look ahead if you also wanted to have another restriction. For example, to match
6-10 word chars, and "a" appears before "b"
you would use the regex:
^(?=\w{6,10}$).*a.*b

The (?=..) is a lookahead, it's a zero-width assertion, this means that it is just a check and matches nothing. In other word a lookahead means followed by.
The pattern ^(?=\w{6,10}$) means:
begining of the string followed by between 6 and 10 word characters until the end of the string.
Note that there isn't any character matched since all is inside a lookahead exêct the ^ that is zero-width too.
A match function can only return an empty string as match result, but will return true if the condition is met (otherwhise false)
The + doesn't change anything, this means only that the quantifier ( {6,10} here) is possessive and doesn't allow backtracks. More informations about this feature here: www.regular-expressions.info/possessive.html

I can't help you with this because I don't know what you mean. Are you trying to match against the test string in 2 and 3?
^(?=\w{6,10}$) is trying to match the beginning of the string, followed by 6-10 word characters and the end of the string. Your string is longer than 10 characters, so that won't match.
When you add the + it matches one or more instances of the 6-10 character string.
Adding the + should still not match, because either way you are looking to match a string exactly 6-10 chars long, but your test string is longer. Making it possessive won't change the match in this instance.

Related

Regular expression to match a word that contains ONLY one colon

I am new to regex, basically I'd like to check if a word has ONLY one colons or not.
If has two or more colons, it will return nothing.
if has one colon, then return as it is. (colon must be in the middle of string, not end or beginning.
(1)
a:bc:de #return nothing or error.
a:bc #return a:bc
a.b_c-12/:a.b_c-12/ #return a.b_c-12/:a.b_c-12/
(2)
My thinking is, but this is seems too complicated.
^[^:]*(\:[^:]*){1}$
^[-\w.\/]*:[-\w\/.]* #this will not throw error when there are 2 colons.
Any directions would be helpful, thank you!
This will find such "words" within a larger sentence:
(?<= |^)[^ :]+:[^ :]+(?= |$)
See live demo.
If you just want to test the whole input:
^[^ :]+:[^ :]+$
To restrict to only alphanumeric, underscore, dashes, dots, and slashes:
^[\w./-]+:[\w./-]+$
I saw this as a good opportunity to brush up on my regex skills - so might not be optimal but it is shorter than your last solution.
This is the regex pattern: /^[^:]*:[^:]*$/gm and these are the strings I am testing against: 'oneco:on' (match) and 'one:co:on', 'oneco:on:', ':oneco:on' (these should all not match)
To explain what is going on, the ^ matches the beginning of the string, the $ matches the end of the string.
The [^:] bit says that any character that is not a colon will be matched.
In summary, ^[^:] means that the first character of the string can be anything except for a colon, *: means that any number of characters can come after and be followed by a single colon. Lastly, [^:]*$ means that any number (*) of characters can follow the colon as long as they are not a colon.
To elaborate, it is because we specify the pattern to look for at the beginning and end of the string, surrounding the single colon we are looking for that only the first string 'oneco:on' is a match.

RegEx more than multiple characters before number

I really don't use RegEx that much. You could say I am RegEx n00b. I have been working on this issue for a half a day.
I am trying to write a pattern that looks backward from a number character. For example:
1. bob1 => bob
2. cat3 => cat
3. Mary34 => Mary
So far I have this (?![A-Z][a-z]{1,})([A-Za-z_])
It only matches for individual characters, I want all the characters before the number character. I tried to add the ^ and $ into my pattern and using an online simulator. I am unsure where to put the ^ and $.
NOTE: I am using RegEx for the .NET Framework
You may use a regex like
[\p{L}_]+(?=\d)
or
[\w-[\d]]+(?=\d)
See the regex demo
Pattern details
[\p{L}_]+ - any 1 or more letters (both lower- and uppercase) and/or _
OR
[\w-[\d]]+ - 1 or more word chars except digits (the -[] inside a character class is a character class subtraction construct)
(?=\d) - a positive lookahead that requires a digit to appear immediately to the right of the current location
If we break down your RegEx, we see:
(?![A-Z][a-z]{1,}) which says "look ahead to find a string that is NOT one uppercase letter followed one or more lowercase letters" and ([A-Za-z_]) which says "match one letter or underscore". This should end up matching any single lowercase letter.
If I understand what you want to achieve, then you want all of the letters before a number. I would write something like that as:
\b([a-zA-Z]+)[0-9]
This will start at a word boundary \b, match one or more letters, and require a digit right after the matched string.
(The syntax I used seems to match this document about .NET RegEx: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions)
In light of Wiktor Stribizew's comment, here is a pure match RegEx:
\b[a-zA-Z_]+(?=[0-9])
This matches the pattern and then looks ahead for the digit. This is better than my first lookahead attempt. (Thank you Wiktor.)
http://www.rexegg.com/regex-lookarounds.html

Regex confution

I know that there are a lot of topics like this one. I've spent a lot of hours checking expressions to make my code work. I don't really understand how regex work, so I hope you can help me out.
I want to validate this inputs (I hope I am not pushing it)
Only letters (with latin characters too)
Address (including dots, commas, colon, number sign and hyphen)
Telephone (numbers and hyphen)
like:
/[a-zA-ZÑñÁáÉéÍíÓóÚú]+$/ /* Only letters */
/[a-zA-Z0-9\sñáéíóúü .,:#-]+$/ /* Address */
/^[\d-]+$/ /* Telephone */
They work fine, when I include an special character at the end of the string but if I enter that special character between accepted characters it does not work. Allow me to write an example please:
For the "Only letters" expression:
ab[(% - Does not pass
a[(%b - It pass and it shouldn't!
Thanks a lot for your time, any help will be appreciate!
You forgot the ^ start of string anchor at the beginning of the 2 first patterns.
See demo 1:
^[a-zA-ZÑñÁáÉéÍíÓóÚú]+$
^
Same with the second regex. There, you also have a literal space and \s, so literal space can be removed:
^[a-zA-Z0-9\sñáéíóúü.,:#-]+$
^
See demo 2
And as for your third regex, it is not optimal since it will match ----1123.
Use
/^(?:\d+-)+\d+$/
See demo 3. Here, we match sequences of digits and hyphen (with (?:\d+-)+) and then a sequence of digits, from beginning till end.
The expression /[..]+$/ says that the test subject must have any of the characters (..) at its end. $ symbolises the end of the string. The beginning of the string does not have to match. If you want to enforce that for the entire string, use the beginning anchor as well:
/^[..]+$/
This now says the string must have any of the characters (..) between its beginning and end, and there's no room for anything else.
You're already doing this for the telephone regex.

regex not working as it should

I'm trying to catch up on regex and I have made one as below;
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
and the sample is;
ä;1234;00126434;K;11821111;00000000;SOME TEXT ; 0;00000000;0;0;
As far as I've read
. is all chars, \d is digits, {n} and variations indicates n time and depending on variation, more repetitions.
What could be the problem?
A few suggestions/observations:
You can remove all {1}s, they don't do anything.
[A,K] means "A, , or K". If you want to match any letter between A and K, use [A-K].
You should place the capturing group around the repetitions: (\d{7,8}) captures a 7-8 digit number; (\d){7,8} will only capture the last digit.
[ ,\d]{1} fails on your regex because there are two characters (space and 0) at that point in the string.
you might need to remove the space before the final $, unless there actually is a space in your string after the last semicolon.
Here's a version that matches (and captures each element in a separate group):
^(.);(\d{4});(\d{8});([A-K]);(\d{7,8});(\d{8});([A-Z ]+);([ ,\d]+);(\d{8});(\d);(\d); *$
See it in action on regex101.com.
Please, don't abuse regexps for everything.
Your format is a CSV format, just split at ; and the validate the individual parts properly. This is perfectly valid, usually similarly efficient, and easier to debug.
With regexp, make sure you properly escape (i.e. double escape!). In most programming languages, \ is a reserved character in strings, and you will need to use \\ to get the desired effect.
Try this:
^(.){1};(\d){4};(\d){8};[A-K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ \d]{2};(\d){8};(\d){1};(\d){1};$
Here what was happening in your regex
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};[ ,\d]{1};(\d){8};(\d){1};(\d){1}; $
You have extra space before $ at the end.
To specify range use - and not comma, Your range should be [A-K].
In [ ,\d] range You have restricted it to 1 character {1} it should be {2} one for
space and 1 for digit.
Additional: You don't need to specify {1} as it will match one preceding token by default
If yours does not work, you can try this one :
^(.){1};(\d){4};(\d){8};[A,K]{1};(\d){7,8};(\d){8};[A-Z ]{1,};( \d){1};(\d){8};(\d){1};(\d){1};$

Regex matching beginning AND end strings

This seems like it should be trivial, but I'm not so good with regular expressions, and this doesn't seem to be easy to Google.
I need a regex that starts with the string 'dbo.' and ends with the string '_fn'
So far as I am concerned, I don't care what characters are in between these two strings, so long as the beginning and end are correct.
This is to match functions in a SQL server database.
For example:
dbo.functionName_fn - Match
dbo._fn_functionName - No Match
dbo.functionName_fn_blah - No Match
If you're searching for hits within a larger text, you don't want to use ^ and $ as some other responders have said; those match the beginning and end of the text. Try this instead:
\bdbo\.\w+_fn\b
\b is a word boundary: it matches a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. This regex will find what you're looking for in any of these strings:
dbo.functionName_fn
foo dbo.functionName_fn bar
(dbo.functionName_fn)
...but not in this one:
foodbo.functionName_fnbar
\w+ matches one or more "word characters" (letters, digits, or _). If you need something more inclusive, you can try \S+ (one or more non-whitespace characters) or .+? (one or more of any characters except linefeeds, non-greedily). The non-greedy +? prevents it from accidentally matching something like dbo.func1_fn dbo.func2_fn as if it were just one hit.
^dbo\..*_fn$
This should work you.
Well, the simple regex is this:
/^dbo\..*_fn$/
It would be better, however, to use the string manipulation functionality of whatever programming language you're using to slice off the first four and the last three characters of the string and check whether they're what you want.
\bdbo\..*fn
I was looking through a ton of java code for a specific library: car.csclh.server.isr.businesslogic.TypePlatform (although I only knew car and Platform at the time). Unfortunately, none of the other suggestions here worked for me, so I figured I'd post this.
Here's the regex I used to find it:
\bcar\..*Platform
Scanner scanner = new Scanner(System.in);
String part = scanner.nextLine();
String line = scanner.nextLine();
String temp = "\\b" + part + "|" + part + "\\b";
Pattern pattern = Pattern.compile(temp.toLowerCase());
Matcher matcher = pattern.matcher(line.toLowerCase());
System.out.println(matcher.find() ? "YES" : "NO");
If you need to determine if any of the words of this text start or end with the sequence, you can use this regex: \bsubstring|substring\b:
anythingsubstring
substringanything
anythingsubstringanything
The simplest thing that you can do is:
dbo.*_fn$
It searches with dbo, followed by any characters, and then ends with _fn.
If you can identify what’s the right next character after n if it’s space, you can replace $ with space .