regex to catch dollar amount catching # symbol too - regex

I'm trying to match simple dollar strings ($34.21). My regex is as follows.
\$\d+.([0-9][0-9])
I'm catching the following string though and I don't know why.
#$23.23
Does the at symbol have some kind of special meaning? I don't see it on my regex cheat sheet and this is bugging me.
Thanks, mj

You should escape the \. and you probably need to add start (^) and end ($) anchors around the pattern:
^\$\d+\.([0-9][0-9])$
The anchors are used to ensure that no other characters are allowed in the input string before or after the matched string.
Also, depending on the exact language / platform your using*, this can probably be further simplified to:
^\$\d+\.(\d\d)$
* Some regex engines treat \d as equivalent to [0-9], while on others it will match any Unicode digit, including those from other numeral systems.

Use line start and line end anchors to make sure you don't match unwanted input:
^\$\d+\.([0-9][0-9])$
OR
^\$\d+\.\d{1,2}$

Related

Regex to match a word or a dot

This should be a fairly trivial question but I have spent quite some time and Im unable to do it -
If this is my string -
"this/DT word/NN is/VBZ a/DT dot/NN ./."
I want to extract the immediate neighbors of / , be it a word,comma or a full stop.
(\\w+)/(\\w+) gives the words before n after / but not the full stops etc.
I tried this - "\\.\\/\\.|(\\w+)/(\\w+)" for grabbing the full stops but doesn't seem to work.
Can someone help please.( I am trying this in R)
Thanks!
Note that \w only matches letters, digits and an underscore. A dot/period belongs to punctuation and can be captured with Perl-like \p{P} or POSIX class [:punct:]. Thus, theoretically, you could use something like ([\\w[:punct:]]+)/([\\w[:punct:]]+) (or even a more POSIXish ([[:alpha:][:punct:]]+)/([[:alpha:][:punct:]]+)), but I guess matching non-whitespace characters on both sides of / suits your purpose best.
Here is an alternative to the (\\S+)/(\\S+) regex:
([^\\s]+)/([^\\s]+)
See regex demo
The [^\s] means any symbol other than a whitespace. Note that \S means *any non-whitespace character.
If you can have no non-whitespace characters on either side of /, I believe
([^\\s]*)/([^\\s]*)
or
(\\S*)/(\\S*)
will work better for you since * will match 0 or more characters.
See another demo
You can use this regex
"(\\S+)/(\\S+)"
i.e. grab each non-space text before and after /.
RegEx Demo

Perl Reg Expressions, word anchor and special characters

I am trying to write a script to detect variable types and have been having an issue with my regular expression.
if(/\$\b([a-zA-Z]|_ )(\w)*\b /x && !/ /)
is what I am using to detect a scalar. The problem right now though is that \b \b doesn't seem to be working with the special characters (!##$, etc). For example it would count $var### as a valid name. Any ideas?
The regex you have is correct. all you need to do is to anchor it to the start and end of the string
^\$\b([a-zA-Z]|_ )(\w)*\b$
example http://regex101.com/r/uD1eR7/1
Changes made
^ anchors the regex at the start of the string
$ anchors the regex at the end of the string
Note you can also move the underscore _ into the character class and remove the word boundaries as it does not give extra advantage
^\$[a-zA-Z_]\w*$

Regex to extract only text after string and before space

I want to match text after given string. In this case, the text for lines starting with "BookTitle" but before first space:
BookTitle:HarryPotter JK Rowling
BookTitle:HungerGames Suzanne Collins
Author:StephenieMeyer BookTitle:Twilight
Desired output is:
HarryPotter
HungerGames
I tried: "^BookTitle(.*)" but it's giving me matches where BookTitle: is in middle of line, and also all the stuff after white space. Anyone help?
you can have positive lookbehind in your pattern.
(?<=BookTitle:).*?(?=\s)
For more info: Lookahead and Lookbehind Zero-Width Assertions
What language is this?
And provide some code, please; with the ^ anchor you should definitely only be matching on string that begin with BookTitle, so something else is wrong.
If you can guarantee that all whitespace is stripped from the titles, as in your examples, then ^BookTitle:(\S+) should work in many languages.
Explanation:
^ requires the match to start at the beginning of the string, as you know.
\s - *lower*case means: match on white*s*pace (space, tab, etc.)
\S - *upper*case means the inverse: match on anything BUT whitespace.
\w is another possibility: match on *w*ord character (alphanumeric plus underscore) - but that will fail you if, for example, there's an apostrophe in the title.
+, as you know, is a quantifier meaning "at least one of".
Hope that helps.
With the 'multi-line' regex option use something like this:
^BookTitle:([^\s]+)
Without multi-line option, this:
(?:^|\n)BookTitle:([^\s]+)

RegEx :- Need to write a regex which should not allow digits

I need to write a regular expression which should not allow any digits. it should allow any other characters except digits. I tried expression like :- ~[0-9]+
but it restricts everything. could you pls help me?
It is not clear what flavor of regex you need, but in the general, one of the following should work:
^[^0-9]*$
^[^\d]*$
^\D*$
^[[:^digit:]]*$
^\P{IsDigit}*$
The last two forms will work with Unicode digits.
The atom [^0-9] matches anything but a digit; to make sure that in the whole string there are no digits, I added the markers of string start (^) and end ($).
If you want to match any part of a string that contains at least one character that is not a digit, replace the ^...*$ part of the regex by ...+:
[^0-9]+
\D+
etc.
Try [^0-9]+. Note that this will only prevent ASCII digits from appearing, not unicode ones.

Regular expression explanation required

I came across this regular expression which is used to check for alphabetic strings. Can anyone explain how it works to me?
/^\pL++$/uD
Thanks.
\pL+ (sometimes written as \p{L}) matches one or more Unicode letter(s). I prefer \p{L} to \pL because there are other Unicode properties like \p{Lu} (uppercase letter) that only work with the braces; \pLu would mean "a Unicode letter followed by the letter u").
The additional + makes the quantifier possessive, meaning that it will never relinquish any characters it has matched, even if that means an overall match will fail. In the example regex, this is unnecessary and can be omitted.
^ and $ anchor the match at the start and end of the string, ensuring that the entire string has to consist of letters. Without them, the regex would also match a substring surrounded by non-letters.
The entire regex is delimited by slashes (/). After the trailing slash, PHP regex options follow. u is the Unicode option (necessary to handle the Unicode property). D ensures that the $ only matches at the very end of the string (otherwise it would also match right before the final newline in a string if that string ends in a newline).
Looks like PCRE flavor.
According to RegexBuddy:
Assert position at the beginning of the string «^»
A character with the Unicode property “letter” (any kind of letter from any language) «\pL++»
Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
This looks like Unicode processing.. I found a neat article here that seems to explain \pL the rest are anchors and repetition characters.. which are also explained on this site:
http://www.regular-expressions.info/unicode.html
Enjoy