trying to crate a regular expression for string - regex

I have a string like a Taxi:[(h19){h12}], HeavyTruck :[(h19){h12}] wherein I want to keep information before the ":" that is a taxi or heavy truck . can somebody help me with this?

This will capture a single word if it's followed by :[ allowing spaces before and after :.
[A-Za-z]+(?=\s*:\s*\[)
You'll need to set regex global flag to capture all occurrences.

I think this will do the trick in your case: (?=\s)*\w+(?=\s*:)
Explanation:
(?=\s)* - Searches for 0 or more spaces at the begging of the word without including them in the selection .
\w+ - Selects one or more word characters.
(?=\s*:) - Searches for 0 or more white spaces after the word followed by a column without including them in the selection.

To match the information in your provided data before the : you could try [A-Za-z]+(?= ?:) which matches upper or lowercase characters one or more times and uses a positive lookahead to assert that what follows is an optional whitespace and a :.
If the pattern after the colon should match, your could try: [A-Za-z]+(?= ?:\[\(h\d+\){h\d+}])
Explanation
Match one or more upper or lowercase characters [A-Za-z]+
A positive lookahead (?: which asserts that what follows
An optional white space ?
Is a colon with the pattern after the colon using \d+ to match one or more digits (if you want to match a valid time you could update this with a pattern that matches your time format) :\[\(h\d+\){h\d+}]
Close the positive lookahead )

Related

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]
You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.
You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*

Regex match an optional number of digits

I have a list that could look sort of like
("!Goal 27' Edward Nketiah"),
("!Goal 33' 46' Pierre Emerick-Aubameyang"),
("!Sub Nicolas Pepe"),
("Jordan Pickford"),
and I'm looking to match either !Sub or !Goal 33' 46' or !Goal 27'
Right now I'm using the regex (!\w+\s) which will match !Goal and !Sub, but I want to be able to get the timestamps too. Is there an easy way to do that? There is no limit on the number of timestamps there could be.
As I mentioned in my comment, you can use the following regex to accomplish this:
(!\w+(?:\s\d+')*)
Explanation:
(!\w+(?:\s\d+')*) capture the following
! matches this character literally
\w+ matches one or more word characters
(?:\s\d+')* match the following non-capture group zero or more times
\s match a whitespace character
\d+ matches one or more digits
' match this character literally
Additionally, the first capture group isn't necessary - you can remove it to simply match:
!\w+(?:\s\d+')*
If you need each timestamp, you can use !\w+(\s\d+')* and split capture group 1 on the space character.
If your input always follows the format "bang text blank digits apostrophe blank digits apostrophe etc", then it should be as simple as:
!\w+(?:\s\d+')*
Explanation:
! matches an exclamation mark
\w+ matches 1 or more word-characters (letters, underscores)
(?:…) is a non-capturing group
\s matches a single whitespace character
\d+ matches one or more digits
' matches the apostrophe character
* repeatedly matches the group 0 or more times
this :
(!\w+(?:\s\d+')*)
will capture :
"!Goal 27'"
"!Goal 33' 46'"
"!Sub"

RegEx for identifying a date followed by a special pattern

I have a pattern of strings/values occurring at different interval. The Pattern is as follows:
30/09/2016 2,085,669 0 0 UC No
Date>SPACE>Number separated by comma>SPACE> NUMBER> SPACE> NUMBER> SPACE>STRING>SPACE>NUMBER
How do i identify this and extract from a cell. I have been trying to use regex to solve this problem. Please note the pattern can occur at any instance in single cell. Viz.
Somestring(space)(30/09/2016 2,085,669 0 0 UC No)(space) More string
Somemorestring(space)(30/09/2016 2,085,669 0 0 UC No)
Brackets are for illustration only
To identify for date I am using the below regex, not the best way, but does my job.
(^\d{1,2}\/\d{1,2}\/\d{4}$)
How to stitch this with remaining pattern?
You are only matching the date like part between the anchors to assert the start ^ and the end $ of the string.
Note that if you only want to match the value you can omit the parenthesis () to make it a capturing group around the expression.
You could extend it to:
^\d{1,2}\/\d{1,2}\/\d{4} \d+(?:,\d+)+ \d+ \d+ [A-Za-z]+ [A-Za-z]+$
Explanation
^ Start of string
\d{1,2}\/\d{1,2}\/\d{4} Match date like pattern
\d+(?:,\d+)+ Match 1+ digits and repeat 1+ times matching a comma and a digit
\d+ \d+ Match two times 1+ digits followed by a space
[A-Za-z]+ [A-Za-z]+ Match 2 times 1+ chars a-z followed by a space
$ End of string
Regex demo
If you only wish to extract the date from anywhere in a string, this expression uses two capturing groups before and after the date, and the middle group captures the desired date:
(.*?)(\d{1,2}\/\d{1,2}\/\d{4})(.*)
You may not want to use start ^ and end $ chars and it would work.
If you wish to match and capture everything, you might just want to add more boundaries, and match patterns step by step, maybe similar to this expression:
(.*?)(\d{1,2}\/\d{1,2}\/\d{4})\s+([0-9,]+)\s+([0-9]+)\s+([0-9]+)\s+([A-Z]+)\s+(No)(.*)
This tool can help you to edit/modify/change your expressions as you wish.
I have added extra boundaries, just to be safe, which you can simplify it.
RegEx Descriptive Graph
This link helps you to visualize your expressions:

How to remove some special character from word using regular expression?

I am splitting file in words. I am able to splitting it into word but in some word there is special character like '___'. I want to skip that special character nd also split that word from that special character.
The file which contains data like this
Yahoo$$$Yahoo OK : ___GET
Gmail$$$Gmail Ok:___GET
google_data$$$Google.com.in___POST
using ((?!:)[.0-9a-zA-Z\s]\w+)+ gives me
Yahoo
Yahoo OK
___GET
Gmail
Gmail Ok
GET
google_data
Google.com.in___POST
I don't want that '___' and also the following string:
Google.com.in___POST
has to be split in two words, like:
Google.com.in
POST
Can any one help me with this ?
Using \w will also match an underscore. Looking at the example data, you want to match characters a-z or a digit, and in between there can be a space, dot or underscore.
Instead of splitting, you might match the values:
[0-9a-zA-Z]+(?:[._ ][0-9a-zA-Z]+)*
Explanation
[0-9a-zA-Z]+ Match a digit or a-z in lower or uppercase 1+ times
(?: Non caputuring group
[._ ] Match a . _ or space
[0-9a-zA-Z]+ Match a digit or a-z in lower or uppercase 1+ times
)* Close on capturing group and repeat 0+ times
Regex demo

Regex code , Python-2 alphanumeric [duplicate]

My regex knowledge is pretty limited, but I'm trying to write/find an expression that will capture the following string types in a document:
DO match:
ADY123
AD12ADY
1HGER_2
145-DE-FR2
Bicycle1
2Bicycle
128D
128878P
DON'T match:
BICYCLE
183-329-193
3123123
Is such an expression possible? Basically, it should find any string containing letters AND digits, regardless of whether the string contains a dash or underscore. I can find the first two using the following two regex:
/([A-Z][0-9])\w+/g
/([0-9][A-Z)\w+/g
But searching for possible dashes and hyphens makes it more complicated...
Thanks for any help you can provide! :)
MORE INFO:
I've made slight progress with: ([A-Z|a-z][0-9]+-*_*\w+) but it doesn't capture strings with more than one hyphen.
I had a document with a lot of text strings and number strings, which I don't want to capture. What I do want is any product code, which could be any length string with or without hyphens and underscores but will always include at least one digit and at least one letter.
You can use the following expression with the case-insensitive mode:
\b((?:[a-z]+\S*\d+|\d\S*[a-z]+)[a-z\d_-]*)\b
Explanation:
\b # Assert position at a word boundary
( # Beginning of capturing group 1
(?: # Beginning of the non-capturing group
[a-z]+\S*\d+ # Match letters followed by numbers
| # OR
\d+\S*[a-z]+ # Match numbers followed by letters
) # End of the group
[a-z\d_-]* # Match letter, digit, '_', or '-' 0 or more times
) # End of capturing group 1
\b # Assert position at a word boundary
Regex101 Demo