Regex for a string that contains anything but consecutive spaces - regex

I think it's easiest to start off with an example line
3432543 name that % might 7 include pretty . much 433 anything 545231 4522
I'm looking for an expression that matches the name that % might 7 include pretty . much 433 anything portion so that it includes everything until it encounters 2 spaces, and will not match if it had to include 2 or more spaces.
So for example I want this pattern
^\d+ +(pattern) +\d+$
to end up not matching, and not ending up including the 545231 as part of the name.
Please keep in mind that this is just a simple example to illustrate the problem, it will be included in much more complex expressions matching more complex strings.

You may use this regex to match substring that is only single space separated:
\S+(?:\s\S+)*
RegEx Demo
This match will start with 1+ non-whitespace text followed 0+ of such words separated by a single space only.
For your desired pattern match use:
^\d+\s+\S+(?:\s\S+)*\s+\d+$

Related

How to allow spaces in between words?

EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)

Regex; How to match all that is not a trailing number?

I'm not sure if I may ask questions like this here, but I'll try.
I have multiple files. The file name has the following pattern:
Lorem_Ipsum1054.html
The Lorem_Ipsum isn't fixed in length. Using Better Rename 10 I want to change the file name as follows: 1054.html.
Means: I need to match everything except of the trailing number. This number may vary in length.
Means: I need to match ever everything that is not a trailing number to replace it with Better Rename 10 with nothing.
Who can help me?
To match anything until the last digit sequence use
.*\D(?=\d)
The .* would slurp anything (including digits, spaces, etc.) until a non-digit followed by a digit (\D(?=\d)). And since the .* is greedy, it will take up anything until the last non-digit before a digit.
Demo: https://regex101.com/r/uOoqXX/1
Try with ^(\w*?)(?=\d+\.html) and replace with "" empty string.
Regex101 Demo
Since extension is excluded by default and numbers and spaces are present in file name go with following regex.
Regex: ^([\w ]*?)(?=\d+$)
Explanation:
^([\w ]*?) will look for characters, spaces as few as it can before trailing digits which is found by (?=\d+$) but not matched.
Replace with empty string ''.
Regex101 Demo

Regex a decimal number with comma

I'm heaving trouble finding the right regex for decimal numbers which include the comma separator.
I did find a few other questions regarding this issue in general but none of the answers really worked when I tested them
The best I got so far is:
[0-9]{1,3}(,([0-9]{3}))*(.[0-9]+)?
2 main problems so far:
1) It records numbers with spaces between them "3001 1" instead of splitting them to 2 matches "3001" "1" - I don't really see where I allowed space in the regex.
2) I have a general problem with the beginning\ending of the regex.
The regex should match:
3,001
1
32,012,111.2131
But not:
32,012,11.2131
1132,012,111.2131
32,0112,111.2131
32131
In addition I'd like it to match:
1.(without any number after it)
1,(without any number after it)
as 1
(a comma or point at the end of the number should be overlooked).
Many Thanks!
.
This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).
(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))
DEMO on RegExr
The things that make it so long:
Matching multiple numbers on the same line separated by only 1 character (a space) whilst not allowing partial matchs requires a lookahead and a lookbehind.
Matching numbers ending with . and , without including the . or , in the match requires another lookahead.
(?=[,.](\s|$)) Explanation
When writing this explanation I realised the \s needs to be a (\s|$) to match 1, at the very end of a string.
This part of the regex is for matching the 1 in 1, or the 1,000 in 1,000. so let's say our number is 1,000. (with the . on the end).
Up to this point the regex has matched 1,000, then it can't find another , to repeat the thousands group so it moves on to our (?=[,.](\s|$))
(?=....) means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.
So It checks if there is a , or a . and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000
Had the lookahead not matched, it would have moved on to trying to match decimal places.
This works for all the ones that you have listed
^[0-9]{1,3}(,[0-9]{3})*(([\\.,]{1}[0-9]*)|())$
. means "any character". To use a literal ., escape it like this: \..
As far as I know, that's the only thing missing.

Regex possible Whitespaces and trailing characters

I have texts similar to the following (whitespaces intended), which i run a RegEx on line-by-line:
Smith-Petersen X1l
Jonas Henry
Foord. 82a 221.
12345 Somewhere
I now want to use the RegEx to capture anything before 3 or more whitespaces occur (which might or might not occur) in the first match group. The allowed chars:
[a-zA-Z0-9,. '\-AÖÜäöüß]
What I want is : Smith-Petersen, Jonas Henry, Foord. 82a and 12345 Somewhere.
After trying desperately, I hope to find help with this here...I just can't get it to work since my expression grabs the blanks and what follows and puts it into the first group as well. Is there a ways to reverse the way the RegEx? Can anyone help me with this?
Assuming by "may or may not occur" you mean the line may end before 3 spaces are encountered:
^\s*([-a-zA-Z0-9,\.'AÖÜäöüß ]+?)(?=\s{3}|\s{0,2}$)
This regex is using a positive look ahead to assert that either there's 3 spaces following or there's up to 2 spaces then end-of-input.
The anchor to start of input avoids matching the junk at the end of the longer lines.
Your target is in group 1.
See a live demo on rubular
Here is my approach.
^ *([a-zA-Z0-9,.'AÖÜäöüß-]+(?: {1,2}[a-zA-Z0-9,.'AÖÜäöüß-]+)*)
What you want is in match group 1. This regex uses only greedy operators and works on all four cases found in your sample text.
Basically it matches all words at the beginning of a line that are separated from one another by no more than two spaces. Once more than 2 spaces are found, the match is completed.

regex for weird string

I need to help about writing regex for below string. I have tried lots of pattern but all failed.
I have a string like
package1[module11,module12,module13],package2[module21,module22,module23,module24,module25],package3[module31]
and I want to split this string like
package1
module11,module12,module13
package2
module21,module22,module23,module24,module25
package3
module31
I know it is weird to ask a regex from here but ...
You can match using the pattern:
(\w+)\[(\w+(?:,\w+)*)\]
Example: http://www.rubular.com/r/rPUEWBoU1d
The pattern is pretty simple, really:
(\w+) - capture the first word (package1)
\[
(\w+(?:,\w+)*) - A sequence of at least one word (module11), followed by comma separated words (assuming they are well formed)
\]
In all cases, you may want to change \w to your alphabet (maybe even [^,\[\]] - not comma or brackets). You also may want to check the whole string matches, as the above pattern may skip over unwanted parts (for example: a[b]$$$$c[d])