I am looking for regex that not allowing only spaces (but more than one). One allows (blank space).
I got something like this .*\S.*' or this .*[^ ].* but i want to allow only one space but not more than one only spaces.
You can use
pattern="\S*(?:\s\S*)?"
The pattern will get parsed as a ^(?:\S*(?:\s\S*)?)$ pattern and will match
^ - start of string
(?: - start of a non-capturing group:
\S* - zero or more chars other than whitespace
(?:\s\S*)? - an optional sequence of a whitespace and zero or more non-whitespace chars
) - end of a non-capturing group
$ - end of string.
Related
I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.
I cannot figure out how to add two regex together, I have these requirements:
Letters and space ^[\p{L} ]+$
Cannot be whitespace ^[^\s]+$
I cannot figure out how to write one regex that will combine both? There is perhaps some other solution?
You may use
^(?! +$)[\p{L} ]+$
^(?!\s+$)[\p{L}\s]+$
^\s*\p{L}[\p{L}\s]*$
Details
^ - start of string
(?!\s+$) - no 1 or more whitespaces are allowed till the end of the string
[\p{L}\s]+ - 1+ letters or whitespaces
$ - end of string.
See the regex demo.
The ^\s*\p{L}[\p{L}\s]*$ is a regex that matches any 0+ whitespaces at the start of the string, then requires a letter that it consumes, and then any 0+ letters/whitespaces may follow.
See the regex demo.
I want to parse a nested structure like this one in MATLAB :
structure NAME_PART_1
Some content
block NAME_PART_2
Some other content
end NAME_PART_2
block NAME_PART_3
subblock NAME_PART_4
Some content++
end NAME_PART_4
end NAME_PART_3
end NAME_PART_1
structure
NAME_PART_5
end NAME_PART_5
First, I would like to extract the content of each structure. It's quite easy because a structure content is always between "structure NAME" and "end NAME".
So, I would like to use regex. But I don't know in advance what the structure name will be.
So, I wrote my regex like this :
\bstructure\s+([\w.-]*)((?:\s|.)*)\bend\b\s+XXXX
But, I don't know by what I should replace "XXXX", in order to "reference" the content of the first class of this regex. But is that even possible?
Try this Regex:
structure\s+([\w.-]+)\s*((?:(?!end\s+\1)[\s\S])*)end\s+\1
Click for Demo
Explanation:
structure - matches structure
\s+ - matches 1+ occurrences of a white-space
([\w.-]+) - matches 1+ occurrences of either a word character or a . or a -. This sub-match which contains the structure name is captured in Group 1.
\s* - matches 0+ occurrences of a white-space
((?:(?!end\s+\1)[\s\S])*) - Tempered Greedy Token - Matches 1+ occurrences of any character [\s\S] which does not start with the sequence end followed by Group 1 contents \1 i.e, structure name. This sub-match is captured in Group 2 which contains the contents of the structure
end\s+\1 - matches the word end followed by 1+ white-spaces followed by Structure Name contained in Group 1 \1.
Apart from making use of a backreference \1 to refer what is captured, you might replace the alternation in the capturing group ((?:\s|.)*) with matching a newline followed by 0+ characters and repeat that while capturing it ((?:\n.*)+)
Also you might omit the word boundary after end end\b\s+ as 1+ whitespace characters is what follows after end and instead add a word boundary at the end so that \1 is not part of a larger match.
\bstructure\s+([\w.-]+)((?:\n.*)+)\bend\s+\1\b
Regex demo
Explanation
\bstructure\s+ Match structure followed by 1+ whitespace chars
([\w.-]+) Capture in a group repeating 1+ times any of the listed chars
( Capturing group
(?:\n.*)+ Match newline followed by 0+ times any char except a newline
) Close capturing group
\bend Match end
\s+\1\b Match 1+ times a whitespace char followed by a backreference to group 1 and end with a word boundary.
I have following regex;
^(\s)*[+-]?\d+$
It fails if input contains multiple whitespaces before first non-whitespace character.
Currently it is working on next examples
- :false
-1 :true
+1 :true
What I want is same logic if there is 0,1 or more whitespaces at the beginning:
: true (empty input string)
: true (one or more spaces)
-: false
-1: true
+1: true
235: true
Here I'm matching numbers, but on more general scheme I would like same behaviour if there are decimalan, on some special words etc.
So, basicly, I want that my regex match if there is any number of whitespaces at the beginning or empty string, followed by something I wannna match (number, email, special words...)
You need to make the whole pattern optional with an optional grouping construct and put the \s* before the group:
^\s*(?:[+-]?\d+)?$
^^^ ^^
See the regex demo
Details:
^ - start of a string
\s* - 0+ whitespaces
(?: - start of a non-capturing group (if the engine does not support non-capturing groups, remove ?:) matching....
[+-]? - an optional (1 or 0 occurrences) + or - symbols
\d+ - 1+ digits
)? - .... 1 or 0 times
$ - end of string.
I think you want the asterisk in with the \s:
^\s*[-+]?\d+$
I have a source string that looks like this: mID00231mID00008mID00231mID00054mID00013mID00008mID00065
The pattern I am trying to create, using this example, is: For the last occurrence of "mID00231" in the string, one or more occurrences of each of {mID00054, mID00013, mID00008, mID00065} must follow it (in any order).
Examples of matches:
mID00231mID00008mID00231mID00054mID00013mID00008mID00065
mID00231mID00013mID00054mID00008mID00065mID00008
Example of no match because of missing "mID00065":
mID00231mID00054mID00013mID00008
Example of no match because the last occurrence of "mID00231" is not followed by a "mID00054" and a "mID00008":
mID00231mID00013mID00065mID00054mID00008mID00231mID00013mID00065
I am fairly new to regex but usually arrive at something that works. This one has been very difficult. I tried this:
(?:mID00231)(?:(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
It works if there is only one occurrence of the first element (mID00231). If the element repeats, the pattern fails. Any help is appreciated.
You need to fail the match if there is the same value with a negative lookahead:
mID00231((?!.*mID00231)(?=.*mID00054)(?=.*mID00013)(?=.*mID00008)(?=.*mID00065).*)
^^^^^^^^^^^^^^
See the regex demo.
Details:
mID00231 - match a literal mID00231 text
( - start of the capturing group
(?!.*mID00231) - there cannot be mID00231 anywhere after 0+ any chars but a newline
(?=.*mID00054) - there must be mID00054 anywhere after 0+ any chars but a newline
(?=.*mID00013) - there must be mID00013 anywhere after 0+ any chars but a newline
(?=.*mID00008) - there must be mID00008 anywhere after 0+ any chars but a newline
(?=.*mID00065) - there must be mID00065 anywhere after 0+ any chars but a newline
.* - 0+ any chars but a newline
) - end of the capturing group.