RegEx padding numbers surrounded by other characters - regex

I am looking for a RegEx that captures a series of digits surrounded by a string pattern and pads that series of digits with leading zeros up to 4 digits. At the same time all spaces should be removed from the entire string.
Some examples:
"F12b" should capture "12" and return "F0012b"
"AB 214/3" should capture "214" and return "AB0214/3"
"G0124" should capture "0124" and return the original string unchanged
The source string should adhere to the following rules:
- should start with [a-zA-Z]
- after the above pattern can be any number of optional spaces
- the numeric sequence can be followed by another string
- the numeric sequence can be any number of digits. Only if there are less than 4 digits is the sequence to be padded with leading zeros, otherwise it remains unchanged.
- I am only interested in the first occurrance within a string
I am posting this question here because I don't use RegEx often enough to figure this one out, but I know it's a perfect case for RegEx.
Any help is greatly appreciated, and an explanation of the expression would certainly help me understand it.

To match that and extract the info you want, regex is fine, you can use this:
^([a-zA-Z]+)\s*(\d+)(.*)
See it here on regexr. You see only that the space has been removed in your second example, but all needed information is captured in $1, $2 and $3
Regular expressions are a tool to match patterns. Using that pattern within a replacement method and how the replacement string can be build is completely language dependent and has nothing to do with regex. Without knowing the language this part can not be answered.

Related

Regex: grab the string that begins after a certain string and ends when it hits any other character

I am trying to use Regex to grab a substring of a large string.
The overall string has certain text, 'cow/', then any number of characters or spaces that are not digits. The first digit hit is the start of the desired substring I want.
This desired substring consists of only digits and periods, the first character or space seen that is not a digit or period indicates the end of the desired substring.
For example:
'cow/ a12.34 -123'
The desired substring is '12.34'.
So far I have this regex that partially works (I think the '| .' is not entirely correct):
(?<=([A-z]|[0-9])/\s*).?(?=\s[^0-9 |.])
Thanks in advance.
This should be easy to achieve by relying on capturing groups:
cow/[^0-9]*([0-9.]+)
The group will contain the text that you want to extract, in Java group(index), in C# with Groups[index]. Other languages provide similar features.
Don't try to solve everything inside the regular expression, but leverage the power of your runtime :)
Edit after comment on the OP:
Azure Kusto has the extract(regex, captureGroup, text [, typeLiteral]) function to extract groups from regular expression matches:
extract("cow/[^0-9]*([0-9.]+)", 1, "cow/ a12.34 -123") == "12.34";
The argument 1 tells Kusto to extract the first capturing group (the expression inside the parentheses).

Regex - Keep all digits with length of 10-13 digits

search for regex where Keep all digits with length of 10-13 digits and delete the rest in notepad++
my regex doesnt work
[^\d{10,13}]
it finds numbers with commas too :(
Searching for
^(?:.*?(\d{10,13}).*|.*)$
and replacing with
\1
you keep just the 10 to 13 digit long numbers (and empty lines).
Remove the empty lines searching for
^\n
and replacing with nothing.
See it in action: RegEx101.
Addressing #WiktorStribiżew's comments: Relying on the sought after numbers to be always surrounded by white space (which has been checked with OP - but not for the potential case, lines to (effectively) hold just numbers) the search expression could be adjusted to
^(?:.*\s(\d{10,13})\s.*|.*)$
still replacing with
\1
to handle comma holding strings of numbers correctly: RegEx101
By the way:
[^\d{10,13}]
is a character class, which matches anything, which is not:
a number, or
any character out of "{10,3}" (without the quotes, but including the curly braces).
Please comment if and as this requires adjustment / further detail.
To match numbers that are not exactly 3 digits long:
\b(\d{1,9}|\d{14,})\b
You can find all 10-13 length stand alone digits like this
(?<!\d)\d{10,13}(?!\d)
What you do then is up to you.
I don`t know how does notepad works, but this I think this is the regex you are looking for: ^([0-9]){10,13}$
A good page to create/test regex: http://regexr.com/

Regex to capture Letters & Spaces

I have a spec that says a particular field will be alpha-text, right-padded with spaces to be 10 characters long, and I want to capture the alpha-part of the match.
This expression captures the entire section:
"([[:alpha:][:s:]]{10})"
However, I only want to capture the alpha-part, and still match (but not capture) on the remaining white-space. So if the alpha is 3-characters long, the next match needs to 7 white-spaces.
How can I do this?
I would say your best bet is to use 2 regular expressions. Regex doesn't really have support for what you're trying to do.
The first regular expression would get all strings length 10 right padded by spaces
([a-zA-Z\s]{10})
After that, just capture the word part. We know each string is only 10 characters at this point.
(\w+)\s*
This regex pattern will match a string, starting with (optional) [A-Za-z] characters, ending with upto 10 spaces, for a total string length of 10.
"^([A-Za-z]+)?\\ {0,10}"
Then, I added a positive lookahead to ensure the pattern only matches when the string length is 10.
"^(?=.{10}$)([A-Za-z]+)?\\ {0,10}$"
Edit: Try this using the [:alpha:] and [:space:]
"^(?=.{10}$)([:alpha:]+)?[:space:]{0,10}$"

Using regex to find arbitrary length consecutive blocks

I have a string containing ones and zeroes. I want to determine if there are substrings of 1 or more characters that are repeated at least 3 consecutive times. For example, the string '000' has a length 1 substring consisting of a single zero character that is repeated 3 times. The string '010010010011' actually has 3 such substrings that each are repeated 3 times ('010', '001', and '100').
Is there a regex expression that can find these repeating patterns without knowing either the specific pattern or the pattern's length? I don't care what the pattern is nor what its length is, only that the string contains a 3-peat pattern.
Here's something that might work, however, it will only tell you if there is a pattern repeated three times, and (I don't think) can't be extended to tell you if there are others:
/(.+).*?\1.*?\1/
Breaking that out:
(.+) matches any 1 or more characters, starting anywhere in the string
.*? allows any length of interposing other characters (0 or more)
\1 matches whatever was captured by the (...+) parentheses
.*? 0 or more of anything
\1 the original pattern, again
If you want the repetitions to occur immediately adjacent, then instead use
/(.+)\1\1/
… as suggested by #Buh Buh — the \1 vs. $1 notation may vary, depending on your regexp system.
(.+)\1\1
The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more.
The \1 means repeat the 1st match.
it looks weird, but this could be the solution:
/000000000|100100100|010010010|001001001|110110110|011011011|101101101|111111111/
This contains all possible combinations for three times. So your regular expression will match for these numbers (i.e.):
10010010011
00010010011
10110110110
But not for these:
101010101010
001110111110
111000111000
And it doesn't matter where the sequence appears in the whole string.

regex strings separated by commas validating length of each string

I have a set of strings separated with commas,
like : cat,dog,Elephant
what to validate is like strings separated with commas should
range from length of 3 to 6 . (strings can be anything like .&^*#$)
i.e a9&,bbbb,cc,ddddddd
in the above strings cc,ddddddd are invalid since dint come into
the range of length 3 t0 6.
In this way a9&,bbbb,ccc,a12$%,adsdff
I went through many question that where posted in stack overflow
and got some ideas from it
^[1-9]\d([,][1-9]\d){0,3}$ this is a regex i got from stackoverflow posted question
this accepts digits alone but I need alphanumeric
I tired to change but dint work
^1-9a-zA-z{0,3}$
Could you please help me out?
and explain what does each symbol means so that i could learn more from
you people.
Thank you for posting answers for my previous questions too.
[^,] will accept everything BUT the comma that you are using as a separator. It isn't clear what your regex should give you, if the substrings that are not long 3-6, the substrings that ARE long 3-6, both mixed, both divided or what.
Try this:
Regex rx = new Regex("^(?:(?:([^,]{3,6})|(?:[^,]*))(?:,|$))*");
var matches = rx.Match("AA,BB&B,!CC,DDDDDD,EE");
foreach (Capture capture in matches.Groups[1].Captures) {
string oneCapture = capture.Value;
}
The captures will be only the strings that are long 3-6.
I believe what you want is the following;
^([^,]{3,6},)*[^,]{3,6}$
To break this down the first ^ matches the beginning of a line the [^,]{3,6}, means 3 to 6 characters of anything but a comma followed by a single comma. the ( )* enclosing that means repeat this 0 or more times then the last [^,]{3,6}$ part says end this with 3 to 6 characters which aren't a comma.
This should do the trick if the regex you mentioned already works fine for digits.
^.\d([,].\d){0,3}$
For reference I often use msdn reference, but it's kept a bit short to begin with, maybe someone else can provide a good tutorial.
There are some tools out there like expresso which help test and develop regexes.
I think the following expression does what you want:
^(?:([^,]{3,6}),?)*$
The [^,]{3,6} part means "any character that is not a comma, 3 to 6 repetitions". That is the core of the expression. The parenthesis make a group, which will allow you to retrieve the values that were captured by that group.
The ,? part means "a comma, zero or one times".
These parts are surrounded by a non-capturing group (?: ... ). That means that the contained expression is grouped, but you won't be able to retrieve the values that were captured by it. That group is necessary to apply a repetition charater *, which means "repeat the previous group zero or more times".
The anchors ^ and $ mean "beginning of string" and "end of string". They prevent the expression from matching only part of a string. If you were searching for a pattern inside a larger string, you wouldn't want them.
You might want to try Expresso to learn more about regular expressions. The program has an analyzer that describes the various parts of the expression.