I have a pattern of strings/values occurring at different interval. The Pattern is as follows:
30/09/2016 2,085,669 0 0 UC No
Date>SPACE>Number separated by comma>SPACE> NUMBER> SPACE> NUMBER> SPACE>STRING>SPACE>NUMBER
How do i identify this and extract from a cell. I have been trying to use regex to solve this problem. Please note the pattern can occur at any instance in single cell. Viz.
Somestring(space)(30/09/2016 2,085,669 0 0 UC No)(space) More string
Somemorestring(space)(30/09/2016 2,085,669 0 0 UC No)
Brackets are for illustration only
To identify for date I am using the below regex, not the best way, but does my job.
(^\d{1,2}\/\d{1,2}\/\d{4}$)
How to stitch this with remaining pattern?
You are only matching the date like part between the anchors to assert the start ^ and the end $ of the string.
Note that if you only want to match the value you can omit the parenthesis () to make it a capturing group around the expression.
You could extend it to:
^\d{1,2}\/\d{1,2}\/\d{4} \d+(?:,\d+)+ \d+ \d+ [A-Za-z]+ [A-Za-z]+$
Explanation
^ Start of string
\d{1,2}\/\d{1,2}\/\d{4} Match date like pattern
\d+(?:,\d+)+ Match 1+ digits and repeat 1+ times matching a comma and a digit
\d+ \d+ Match two times 1+ digits followed by a space
[A-Za-z]+ [A-Za-z]+ Match 2 times 1+ chars a-z followed by a space
$ End of string
Regex demo
If you only wish to extract the date from anywhere in a string, this expression uses two capturing groups before and after the date, and the middle group captures the desired date:
(.*?)(\d{1,2}\/\d{1,2}\/\d{4})(.*)
You may not want to use start ^ and end $ chars and it would work.
If you wish to match and capture everything, you might just want to add more boundaries, and match patterns step by step, maybe similar to this expression:
(.*?)(\d{1,2}\/\d{1,2}\/\d{4})\s+([0-9,]+)\s+([0-9]+)\s+([0-9]+)\s+([A-Z]+)\s+(No)(.*)
This tool can help you to edit/modify/change your expressions as you wish.
I have added extra boundaries, just to be safe, which you can simplify it.
RegEx Descriptive Graph
This link helps you to visualize your expressions:
Related
I'm working on a regular expression for SSN with the rules below. I have successfully applied all matching rules except #7. Can someone help alter this expression to include the last rule, #7:
^((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$|(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}$)
Hyphens should be optional (this is handled above by using 2 expressions with an OR
Cannot begin with 000
Cannot begin with 666
Cannot begin with 900-999
Middle digits cannot be 00
Last four digits cannot 0000
Cannot be all the same numbers ex: 111-11-1111 or 111111111
Add the following negative look ahead anchored to start:
^(?!(.)(\1|-)+$)
See live demo.
This captures the first character then asserts the rest of the input is not made of that captured char or hyphen.
The whole regex can be shortened to:
^(?!(.)(\1|-)+$)(?!000|666|9..)(?!...-?00)(?!.*0000$)\d{3}(-?)\d\d\3\d{4}$
See live demo.
The main trick to not having to repeat the regex both with and without the hyphens was to capture the optional hyphen (as group 3), then use a back reference \3 to the capture in the next position, so are either both there or both absent.
First, let's shorten the pattern as it contains two next-to identical alternatives, one matching SSN with hyphens, and the other matching the SSN numbers without hyphens. Instead of ^(x-y-z$|xyz$) pattern, you can use a ^x(-?)y\1z$ pattern, so your regex can get reduced to ^(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$, see this regex demo here.
To make a pattern never match a string that contains only identical digits, you may add the following negative lookahead right after ^:
(?!\D*(\d)(?:\D*\1)*\D*$)
It fails the match if there are
\D* - zero or more non-digits
(\d) - a digit (captured in Group 1)
(?:\D*\1)* - zero or more occurrences of any zero or more non-digits and then then same digit as in Group 1, and then
\D*$ - zero or more non-digits till the end of string.
Now, since I suggested shortening the regex to the pattern with backreference(s), you will have to adjust the backreferences after adding this lookahead.
So, your solution looks like
^(?!\D*(\d)(?:\D*\1)*\D*$)(?!000|666)[0-8]\d{2}(-?)(?!00)\d{2}\2(?!0000)\d{4}$
^(?![^0-9]*([0-9])(?:[^0-9]*\1)*[^0-9]*$)(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\2(?!0000)[0-9]{4}$
Note the \1 in the pattern without the lookahead turned into \2 as (-?) became Group 2.
See the regex demo.
Note also that in some regex flavors \d is not equal to [0-9].
In Scala, is it possible to actually insert commas via a regex to separate thousands in numbers where the comma definitely is not there to start with?
For example, I'd like to convert 30000.00 into 30,000.00.
I am not sure this is exactly what you need, but you can use this:
val formatter = java.text.NumberFormat.getNumberInstance
println(formatter.format(30000.00)) // prints 30,000
This is not scala based answer.
You can use regex \d{1,3}(?=(?:\d{3})+\.) to find the matches and substitute each match with the same match plus an extra comma $0,.
See the online demo.
Explanation:
\d{1,3} This matches a decimal character between 1 and 3 times
(?= Positive lookahead starts
(?: This indicates a Non-capturing group
\d{3} matches a digit exactly 3 times
) end of Non-capturing group.
+ matches the previous group one or more times
\. matches the character . literally
) Positive lookahead ends.
I need to extract 1234567 from below URLs
http://www.test.in/some--wonders-1234567---2
http://www.test.in/some--wonders-1234567
I tried with .*\-([0-9]+)(?:-{2,}2)?.
but for the first URL it returned 2, but this is in non-capturing group.
Please give me a solution. I am digging it for so long. not getting any idea.
Try this one:
.*?\-([0-9]+)(?:-{2,}2|$)
It sets lazy mode for first .* pattern, you can also remove it at all with same effect:
\-([0-9]+)(?:-{2,}2|$)
If your regex engine supports negative look behinds (some do not), you can do it this way:
(?<!\d+-+)\d+
It gives you any non-empty digit string, which is not preceded by (minuses followed by digits).
Big advantage is that you don't have to use groups here - regex itself returns what you want.
You could match a - followed by one or more digits which you could capture in a group ([0-9]+). This group will contain the value you want to extract.
Then an optional part (?:-{2,}[0-9]+)? that would match ---2 followed by asserting the end of the line $.
-(\d+)(?:-{2,}\d+)?$
Explanation
- Match literally
(\d+) Capture one or more digits in a group
(?: Non capturing group
-{2,} Match 2 or more times -
\d+ Match one or more digits
)? close non capturing group and make it optional
$ Assert position at the end of the line
I have a string like a Taxi:[(h19){h12}], HeavyTruck :[(h19){h12}] wherein I want to keep information before the ":" that is a taxi or heavy truck . can somebody help me with this?
This will capture a single word if it's followed by :[ allowing spaces before and after :.
[A-Za-z]+(?=\s*:\s*\[)
You'll need to set regex global flag to capture all occurrences.
I think this will do the trick in your case: (?=\s)*\w+(?=\s*:)
Explanation:
(?=\s)* - Searches for 0 or more spaces at the begging of the word without including them in the selection .
\w+ - Selects one or more word characters.
(?=\s*:) - Searches for 0 or more white spaces after the word followed by a column without including them in the selection.
To match the information in your provided data before the : you could try [A-Za-z]+(?= ?:) which matches upper or lowercase characters one or more times and uses a positive lookahead to assert that what follows is an optional whitespace and a :.
If the pattern after the colon should match, your could try: [A-Za-z]+(?= ?:\[\(h\d+\){h\d+}])
Explanation
Match one or more upper or lowercase characters [A-Za-z]+
A positive lookahead (?: which asserts that what follows
An optional white space ?
Is a colon with the pattern after the colon using \d+ to match one or more digits (if you want to match a valid time you could update this with a pattern that matches your time format) :\[\(h\d+\){h\d+}]
Close the positive lookahead )
I am trying to write a regular expression that takes a string and parses it into three different capturing groups:
$3.99 APP DOWNLOAD – 200 11/19 – 1/21 3.99
Group 1: $3.99 APP DOWNLOAD – 200
Group 2: 11/29 – 1/28
Group 3: 3.99
Does anyone have any ideas???
I do not have much experience with capturing groups and do not know how to create them.
i.e. I believe this expression would work for identifying the dates?
/(\d{2}\/\d{2})/
Any help would be greatly appreciated!
Regex:
([$]\d+[.]\d{2}.*?)\s*(\d{1,2}/\d{2}.*?\d{1,2}/\d{2})\s(\d+[.]\d{2})
So with this we have 3 capture groups (()) separated by \s* which means 0+ characters of whitespace (this isn't necessary, but it will remove trailing spaces from your captured groups).
The first capture group [$]\d+[.]\d{2}.*? matches a dollar sign, followed by 1+ digits, followed by a period, followed by 2 digits, followed by a lazy match of 0+ characters (.*?). What this lazy match does is match anything up until the next match in our expression (in this case, our next capture group).
Our second capture group \d{1,2}/\d{2}.*?\d{1,2}/\d{2} matches 1-2 digits, a slash, and 2 digits. Then we use another lazy match of any characters followed by another date.
Our final capture group \d+[.]\d{2} looks for 1+ digits, a period, and 2 more digits.
Note: I used ~ as delimiters so that we do not need to escape our / in the dates. Also, I put $ and . in character classes because I think it looks cleaner than escaping them ([$] vs \$)..either works though :)