Extract USD amout at beginning of string using regex - regex

Need help to extract the usd amount at beginning of string.
Example
String : $50,000.00 NAMED INSURED FOR $100K/$300K LIMITS
Output : 50,000
This is what I tried \$\d.*?000\.00 but I am sure there is a better way.

^\$?([\d,\.]+)\b
Or something of the sort. Broken down:
^ # anchor to start of line
\$? # look for dollar sign (but optional)
( # begin capture group
[\d,\.]+ # match 1 or more of the following: `0-9`, `,` or `.`
) # end capture group
\b # word boundary
Example of the above can be found here: http://regexr.com?372t4
note: if you're looking to convert this in to a double/float value in a language, you're going to need a parser that ignores commas or do some simple string manipulation to remove them.

Try this regular expression:
\$(\S+)
which will give you 50,000.00 -- the exact amount.
Find the $ sign and then match all non-whitespace characters.

Try:
^\$(\d{1,3}(?=,\d{3})*.\d{2})

Related

How to find regex for multiple conditions

I am trying to find regex which would find below matches. I would replace these with blank. I am able to create regex for few of these conditions individually, but I am not able to figure out how to create one regex for all of these
Strings:
song1 artist (SiteWithMp3Keyword.com).mp3
02.song2 | siteWithdownloadKeyword.in 320 Kbps
song3 [SitewithDjKeyword.in] 128kbps.mp3
Output
song1 artist.mp3
song2
song3.mp3
Criteria for match:
Case Insensitive
Find Strings with particular keyword and remove whole word, even if inside any braces
Find kpbs keyword and remove it along with any number before it (128/320)
if string ends in .mp3, keep it as it is.
Remove junk characters (like | ) and replace _ with space.
Remove number if present at start of string, like 001_ 02. etc.
Trim whitespaces before and after remaining string
Example Regex for 2.
\S+(mp3|dj|download)\S+
https://regex101.com/r/nxp4d3/1
Try this regex ....
Find:^[0-9. ]*(song\d+ (\w+ )?).*?(\.mp3 ?)?$
Replace with:$1$3
P.S , if this code doesn't solve your problem, please share a sample of your real data, so someone well better understand you,
Thanks...
For the example data, you might use:
^\h*(?:\d+\W*)?(\w+(?:\h+\w+)*).*?(\.mp3)?\h*$
The pattern matches:
^ Start of string
\h* Match optional leading spaces
(?:\d+\W*)? Match 1+ digits followed by optional non word characters
(\w+(?:\h+\w+)*) Capture group 1, match word characters optionally repeated with a space in between
.*? Match any character except a newline, as least as possible
(\.mp3)? Optionally capture .mp3 in group 2
\h* Match optional trailing spaces
$ End of string
Regex demo
Replace with capture group 1 and group 2
$1$2

How to write a RegEx pattern that accepts a string with at most one of each letter, but unordered?

I have tried this:
[a]?[b]?[c]?[d]?[e]?[f]?[g]?[h]?[i]?[j]?[k]?[l]?[m]?[n]?[o]?[p]?[q]?[r]?[s]?[t]?[u]?[v]?[w]?[x]?[y]?[z]?
But this RegEx rejects string where the order in not alphabetical, like these:
"zabc"
"azb"
I want patterns like these two to be accepted too. How could I do that?
EDIT 1
I don't want letter repetitions, i.e., I want the following strings to be rejected:
aazb
ozob
Thanks.
You can use a negative lookahead assertion to make sure no two characters are the same:
^(?!.*(.).*\1)[a-z]*$
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match the following:
.* # any number of characters
(.) # followed by one character (capture that in group 1)
.* # followed by any number of characters
\1 # followed by the same character as the one captured before
) # End of lookahead
[a-z]* # Match any number of ASCII lowercase letters
$ # End of string
Test it live on regex101.com.
Note: This regex needs to brute-force check all possible character pairs, so performance may be a problem with larger strings. If you can use anything besides regex, you're going to be happier. For example, in Python:
if re.search("^[a-z]*$", mystring) and len(mystring) == len(set(mystring)):
# valid string

java regex : getting a substring from a string which can vary

I have a String like - "Bangalore,India=Karnataka". From this String I would like to extract only the substring "Bangalore". In this case the regex can be - (.+),.*=.*. But the problem is, the String can sometimes come like only "Bangalore". Then in that case the above regex wont work. What will be the regex to get the substring "Bangalore" whatever the String be ?
Try this one:
^(.+?)(?:,.*?)?=.*$
Explanation:
^ # Begining of the string
( # begining of capture group 1
.+? # one or more any char non-greedy
) # end of group 1
(?: # beginig of NON capture group
, # a comma
.*? # 0 or more any char non-greedy
)? # end of non capture group, optional
= # equal sign
.* # 0 or more any char
$ # end of string
Updated:
I thougth OP have to match Bangalore,India=Karnataka or Bangalore=Karnataka but as farr as I understand it is Bangalore,India=Karnataka or Bangalore so the regex is much more simpler :
^([^,]+)
This will match, at the begining of the string, one or more non-comma character and capture them in group 1.
matcher.matches()
tries to match against the entire input string. Look at the javadoc for java.util.regex.Matcher. You need to use:
matcher.find()
Are you somehow forced to solve this using one regexp and nothing else? (Stupid interview question? Extremely inflexible external API?) In general, don't try to make regexes do what plain old programming constructs do better. Just use the obvious regex, and it it doesn't match, return the entire string instead.
Try this regex, This will grab any grouping of characters at the start followed by a comma but not the comma itself.
^.*(?=,)
If you are only interested to check that "Bangalore" is contained in the string then you don't need a regexp for this.
Python:
In [1]: s = 'Bangalorejkdjiefjiojhdu'
In [2]: 'Bangalore' in s
Out[2]: True

regular expression for matching

It is for a normal register name, could be 1-n characters with a-zA-Z and -, like
larry-cai, larrycai, larry-c-cai, l,
but - can't be the first and end character, like
-larry, larry-
my thinking is like
^[a-zA-Z]+[a-zA-Z-]*[a-zA-Z]+$
but the length should be 2 if my regex
should be simple, but don't how to do it
Will be nice if you can write it and pass http://tools.netshiftmedia.com/regexlibrary/
You didn't specify which regex engine you're using. One way would be (if your engine supports lookaround):
^(?!-)[A-Za-z-]+(?<!-)$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]+ # Match one or more "allowed" characters
(?<!-) # Assert that the previous character isn't a dash...
$ # ...at the end of the string.
If lookbehind is not available (for example in JavaScript):
^(?!-)[A-Za-z-]*[A-Za-z]$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]* # Match zero or more "allowed" characters
[A-Za-z] # Match exactly one "allowed" character except dash
$ # End of string
This should do it:
^[a-zA-Z]+(-[a-zA-Z]+)*$
With this there need to be one or more alphabetic characters at the begin (^[a-zA-Z]+). And if there is a - following, it needs to be followed by at least one alphabetic character (-[a-zA-Z]+). That pattern can be repeated arbitrary times until the end of the string is reached.
A simple answer would be:
^(([a-zA-Z])|([a-zA-Z][a-zA-Z-]*[a-zA-Z]))$
This matches either a string with length 1 and characters a-zA-Z or it matches an improved version of your original expression which is fine for strings with length greater than 1.
Credit for the improvement goes to Tim and ridgerunner (see comments).
Try this:
^[a-zA-Z]+([-]*[a-zA-Z])*$
Not sure which lazy group takes precedence..
^[a-zA-Z][a-zA-Z-]*?[a-zA-Z]?$
maybe this?
^[^-]\S*[^-]$|^[^-]{1}$

BEGINNER: REGEX Match numeric sequence except where the word "CODE" exists on a line

I've been able to stumble my way through regular expressions for quite some time, but alas, I cannot help a friend in need.
My "friend" is trying to match all lines in a text file that match the following criteria:
Only a 7 to 10 digit number (0123456 or 0123456789)
Only a 7 to 10 digit number, then a dash, then another two digits (0123456-01 or 0123456789-01)
Match any of the above except where the words Code/code or Passcode/passcode is before the numbers to match (Such as "Access code: 16434629" or "Passcode 5253443-12")
EDIT: Only need the numbers that match, nothing else.
Here is the nastiest regex I have ever seen that "he" gave me:
^(?=.*?[^=/%:]\b\d{7,10}((\d?\d?)|(-\d\d))?\b)((?!Passcode|passcode|Code|code).)*$
...
Question: Is there a way to use a short regex to find all lines that meet the above criteria?
Assume PCRE. My friend thanks you in advance. ;-)
BTW - I have not been able to find any other questions listed in stackoverflow.com or superuser.com which can answer this question accurately.
EDIT: I'm using Kodos Python Regex Debugger to validate and test the regex.
(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?
Commented version:
(?<! # Begin zero-width negative lookbehind. (Makes sure the following pattern can't match before this position)
(?: # Begin non-matching group
[Pp]asscode # Either Passcode or passcode
| # OR
[Cc]ode # Either Code or code
) # End non-matching group
.* # Any characters
) # End lookbehind
[0-9]{7,10} # 7 to 10 digits
(?: # Begin non-matching group
-[0-9]{2} # dash followed by 2 digits
) # End non-matching group
? # Make last group optional
Edit: final version after comment discussion -
/^(?!\D*(?:[Pp]asscode|[Cc]ode))\D*([0-9]{7,10}(?:-[0-9]{2})?)/
(result in first capture buffer)
You can get by with a nasty regex you have to get help with ...
... or you can use two simple regexes. One that matches what you want, and one that filters what you don't want. Simpler and more readable.
Which one would you like to read?
$foo =~ /(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?/
or
$foo =~ /\d{7,10}(-\d{2})?/ and $foo !~ /(access |pass)code/i;
Edit: case-insensitivity.