I have a regex like "^[a-zA-Z]:(\\\\+[^\\/:*?"<>|]+)*([\\\\]+)?$" which is responsible for file path validation.
It successfully validates paths like C:\Users\data and C:\\Users\\data
I want the string which comes after "C:\" to not start with space and not have (^\\/:*?"<>|) characters in it.
You could use match the start of the string up till the colon and use your negated character class to not match your unwanted characters right after. You could add a space or \s to that character class to not match that as well.
Also you might use a capturing group and backreference to which variant is used for the backslashed \\ or \
After that you could use a repeating pattern and specify which characters to allow for the rest of the string.
^[a-zA-Z]:(\\+)(?:[^\\/:*?"<>|\s][\w&]+(?: [\w&]+)*(?:\1[a-zA-Z&]+)*)?$
Regex demo
That will match:
^ Start of the string
[a-zA-Z]: - [a-zA-Z]: Match a-zA-Z and a colon
(\\+) Capture in a group 1+ times a backslash to reference it
(?: Non capturing group
[^\\/:*?"<>|\s] Negated character class to not match 1+ times what is listed (Added \s but you could also just use a space)
[\w&]+(?: [\w&]+)* Match 1+ times a word char and repeat 0+ times matching a space and 1+ times a word char. Note that you can extend the character class to match what you want.
(?: Non capturing group
\1[a-zA-Z&]+ Match backreference to what is captured in group 1 followed by 1+ times a-zA-Z (You can add to the character class what you would like to match as well)
)* Close non capturing group and repeat it 0+ times
)? Close non capturing group and make it optional
$ End of the string
As said here
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u)
So you can mix it with if-then-else regex statement like (?(?!your_pattern_in_regex)match_then|match_else)
Related
I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars
The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo
You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.
Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]
You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.
You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*
need an expression to allow only the below pattern
end word(dot)(space)start word [eg: end. start]
in other words
no space before colon,semicolon and dot |
one space after colon,semicolon and dot
rest of the all other patterns need to get capture to identify such as
end.start || end . start || end .start
i used
"([\s{0,}][\.]|[\.][\s{2,}a-z]|[\.][\s{0,}a-z])"
but not working as i expected.Need your support please
need_regex_patterns aim_of_regex_need
You could match 1+ word characters using \w+ and match either a colon or semi colon using a character class [;:] between optional spaces ?.
After that, match again 1+ word characters.
\w+ ?[;:] ?\w+
Regex demo
To match the dot followed by a single space variant, you don't need a character class but you could match the dot only using \.
\w+\. \w+
Regex demo
Edit
To highlight all the matches for the punctuations:
(?: [.:;]|[.:;] {2,}|(?<=\S)[;:.](?=\S))
Explanation
(?: Non capture group
[.:;] match a space followed by either . : or ;
| Or
[.:;] {2,} Match one of the listed followed by 2 or more spaces
| Or
(?<=\S)[;:.](?=\S) Match one of the listed surrounded by non whitespace chars
) Close group
Regex demo
I want to parse a nested structure like this one in MATLAB :
structure NAME_PART_1
Some content
block NAME_PART_2
Some other content
end NAME_PART_2
block NAME_PART_3
subblock NAME_PART_4
Some content++
end NAME_PART_4
end NAME_PART_3
end NAME_PART_1
structure
NAME_PART_5
end NAME_PART_5
First, I would like to extract the content of each structure. It's quite easy because a structure content is always between "structure NAME" and "end NAME".
So, I would like to use regex. But I don't know in advance what the structure name will be.
So, I wrote my regex like this :
\bstructure\s+([\w.-]*)((?:\s|.)*)\bend\b\s+XXXX
But, I don't know by what I should replace "XXXX", in order to "reference" the content of the first class of this regex. But is that even possible?
Try this Regex:
structure\s+([\w.-]+)\s*((?:(?!end\s+\1)[\s\S])*)end\s+\1
Click for Demo
Explanation:
structure - matches structure
\s+ - matches 1+ occurrences of a white-space
([\w.-]+) - matches 1+ occurrences of either a word character or a . or a -. This sub-match which contains the structure name is captured in Group 1.
\s* - matches 0+ occurrences of a white-space
((?:(?!end\s+\1)[\s\S])*) - Tempered Greedy Token - Matches 1+ occurrences of any character [\s\S] which does not start with the sequence end followed by Group 1 contents \1 i.e, structure name. This sub-match is captured in Group 2 which contains the contents of the structure
end\s+\1 - matches the word end followed by 1+ white-spaces followed by Structure Name contained in Group 1 \1.
Apart from making use of a backreference \1 to refer what is captured, you might replace the alternation in the capturing group ((?:\s|.)*) with matching a newline followed by 0+ characters and repeat that while capturing it ((?:\n.*)+)
Also you might omit the word boundary after end end\b\s+ as 1+ whitespace characters is what follows after end and instead add a word boundary at the end so that \1 is not part of a larger match.
\bstructure\s+([\w.-]+)((?:\n.*)+)\bend\s+\1\b
Regex demo
Explanation
\bstructure\s+ Match structure followed by 1+ whitespace chars
([\w.-]+) Capture in a group repeating 1+ times any of the listed chars
( Capturing group
(?:\n.*)+ Match newline followed by 0+ times any char except a newline
) Close capturing group
\bend Match end
\s+\1\b Match 1+ times a whitespace char followed by a backreference to group 1 and end with a word boundary.
I would like to match space characters () only if they are followed by a hash (#).
This is what ( #) below is trying to do, which is a capture group. (I tried escaping the brackets, otherwise the brackets are not recognised properly within a group set). However, this is not working.
The below regex
/#[a-zA-Z\( #\)]+/g
matches all of the below
#CincoDeMayo #Derby party with UNLIMITED #seafood towers
while I would like to match #CincoDeMayo #Derby and separately #seafood
Is there any way to specify captures groups () within a character set []?
Character classes are meant to match a single character, thus, it is not possible to define a character sequence inside a character class.
I think you want to match specific consecutive hashtags. Use
/#[a-zA-Z]+(?: +#[a-zA-Z]+)*/g
or
/#[a-zA-Z]+(?:\s+#[a-zA-Z]+)*/g
See the regex demo.
Details
#[a-zA-Z]+ - a # followed with 1+ ASCII letters
(?: - start of a non-capturing group...
\s+ - 1+ whitespaces
#[a-zA-Z]+ - a # followed with 1+ ASCII letters
)* - ... that repeats 0 or more times.