regex: literal followed by one or more digit - regex

I would like to search for the literal US followed by a digit, that is repeated one or more times, followed by anything except a dash. For example, these should match:
US3.
US22?
US134!
while these don't
US5-
US66-
US789-
I have tried
r'US[0-9]+(?=[^-])'
but it also matches
'US6', 'US78'
How do I modify this?

Mention the list of characters after matching digits in a character class.
Regex: US\d+[?!&%.]
Regex101 Demo

I'm not sure if this is the absolute best way to go about doing this, but if you know that there is only one of these each line, you could just add a $ to the positive lookahead like so:
US[0-9]+(?=[^-]$)
Regexr.com example

You have to mention the list of characters in character class that are allowed.
I have attached the screenshot of the output , you can verify it !
US[0-9]+[?!&%.]

Related

Regex String Within Negative Lookahead being selected

So i am trying to write something that will select the colon and number in this situation... ie ":1"
"phoneNumber":1111111111
but not in a situation where the colon followed by a digit is between a pair of quotes... ie not match ':0' and ':2'
"lastLogon":"2019-04-17 14:08:25.732576"
I have this expression which selects everything in quote pairs.
((?=["]).+?(?=["])")
Which I tried to do the following with...
:\s?([-\d])(?!((?=["]).+?(?=["])"))
But this selects both of the occurrences above. Does anyone have a workaround, I think I might be misunderstanding how negative look-ahead works.
Thanks!
Edit:
Added info on what strings I wanted to match.
Just match a colon followed by 3 or more digits:
:\d{3,}

Regex, match number with hyphen and without

I have got this string [lat:50.000] and I need to get the number out of it, however sometimes it might have a hyphen at the front of it as it could be a minus number.
I have got this regex at the moment [\-]\d+(\.\d{1,10})? however it will only match the number if it has got the hyphen at the front, I need a regex that will match it with and without the hyphen. So I would be left with 50.000 or in some cases -2.000.
Hope this makes sense.
You need a quantifier to state that the hyphen is optional:
[\-]?\d+(\.\d{1,10})?
You can also improve the expression a bit and put the hyphen out of the character class (since it's just one character):
-?\d+(\.\d{1,10})?
Use this regex: \-?\d+\.\d{1,10}
A question mark quantifier ? following a character or group will indicate that it is optional :
-?\d+(\.\d{1,10})?
This is the equivalent of using the {0,1} quantifier.
Yet another one:
(-?\d[\d.,]+)
# - or not (optional)
# followed by at least a digit
# followed by digits, dots and commas
See a demo on regex101.com.
Here is a simple expression
\-?\d*\.?\d*

How to match digits and dots. It has to start with digits first

For this example hello.1.2.3.4.world I want to match a result which gives me 1.2.3.4. Number of digits between dots doesn't matter. As long as it follow digit.digit pattern
My part solution was following regular-expression [\d.]+.[^.a-z], which gives me .1.2.3.4 as result. And I strip the first dot by using trim or similar method.
Any regexp master who can tell me how to rid the first dot with one regular expression only?
How about this: \.(\d(?:\.\d)*)\.\D
EDIT:
(\d+(?:\.\d+)*)
Demo
If you want to use your current regex you can put a lookahead at the start, and escape the literal dot when not inside a character group (?=\d)[\d.]+\.[^.a-z]
The lookahead (?=\d) will make sure the first character matched is a digit.
Demo here

Regex matching zero or none and Or

How do I match a sequence of optional characters or a different character?
For example:
I started with matching the letters "KQkq"
these are in sequence but optional, so "K?Q?k?q?"
however the input is either one of those four letters or "-", so I tried "(K?Q?k?q?|-)"
this works for the letters, but won't match the "-"
If the letters weren't optional I'd use "(KQkq|-)", which works fine.
I've tried a number of different things, like putting the letters in a group "((K?Q?k?q?)|-)" but I can't quite find a way to express what I need.
*** Note: As I stated in the question: I'm matching the letters "KQkq" "in sequence but optional". Sequence means they come one after the other so "KQkq" is valid, "KkQq" is not valid, nor is "kqKQ" or "kkkk" or anything else that doesn't match the sequence KQkq. Optional means that a character may or may not exist. So "KQkq" is valid, as is "K" or "Kk" or "Qkq". Character classes, for those that don't know, will match any of the characters in the class with no sense of sequence. So [KQkq]{1,4} would indeed match "KQkq" and "Qkq" however it would also match "KKKK", "qkQK", "qqqq" none of which are valid.
^(?:(?:K?Q?k?q?)|-)$
Try this.See demo.
https://regex101.com/r/gQ3kS4/2
Your regex is working fine, in order to capture the dash you just need to anchor the regex:
^(K?Q?k?q?|-)$
Without anchor, the first part K?Q?k?q? matches anything, included empty string and -.
Have you tried doing ([KQkq]|-) or even ([KQkq]|[-])
Example: Regex Example
Try using square brackets, like this: /[KQkq]|-/. Anything inside square brackets is optional. It literally means match anything between the brackets.
I think this will do what you need: (K?Q?k?q?|-)

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.