regexpr and only matching prices and not other digits - regex

I'm trying to come up with code that will extract only the price from a line of text.
Motivated by RegEx for Prices?, I came up with the following command:
gregexpr('\\d+(\\.\\d{1,2})', '23434 34.232 asdf 3.12 ')
[[1]]
[1] 7 19
attr(,"match.length")
[1] 5 4
attr(,"useBytes")
[1] TRUE
However, in my case, I would only like 3.12 to match and not 34.232. Any suggestions?

I think this should work:
'\\d+\\.\\d{1,2}(?!\\d)'

\\d+\\.\\d{1,2}(?!\\d)
I'm not 100% sure that negative lookahead is supported in r, so here is an alternative:
\\d+\\.\\d{1,2}(?:[^\\d]|$)

one or more digits followed by a point, followed by 1 or 2 digits, followed by white space or end of string
\\d+\\.\\d{1,2}(\w|$)
Edit: as per comments, R uses double-escape

Related

Regex with one open and close bracket within an number

since few days I am sitting and fighting with the regular expression without any success
My first expression, what I want:
brackets just one time, doesn't matter where
Text or numbers before and after brackets optional
numbers within the brackets
Example what is allowed:
[32] text1
text1 [5]
text1 [103] text2
text1
[123]
[some value [33]] (maybe to complicated, would be not so important?)
My second expression is similar but just numbers before and after the brackets instead text
[32] 11
11 [5]
11 [103] 22
11
[123]
no match:
[12] xxx [5] (brackets are more than one time)
[aa] xxx (no number within brackets)
That's what I did but is not working because I don't know how to do with the on-time-brackets:
^.*\{?[0-9]*\}.*$
From some other answer I found also that, that's looks good but I need that for the numbers:
^[^\{\}]*\{[^\{\}]*\}[^\{\}]*$
I want to use later the number in the brackets and replace with some other values, just for some additional information, if important.
Hope someone can help me. Thanks in advance!
This is what you want:
^([^\]\n]*\[\d+\])?[^[\n]*$
Live example
Update: For just numbers:
^[\d ]*(\[\d+\])?[\d ]*$
Explaination:
^ Start of line
[^...] Negative character set --> [^\]] Any character except ]
* Zero or more length of the Class/Character set
\d 0-9
+ One or more length of the Class/Character set
(...)? 0 or 1 of the group
$ End of line
Note: These RegExs can return empty matches.
Thanks to #MMMahdy-PAPION! He improved the answer.

How can I use Regex to capture a certain set of ages?

I have a set of data, like below;
1
2
3
4
5
6
7
8
9
10
1,1
1,2
1,3
2,12
11,13,15
7,8,12
And so on... I am trying to use Regex in to target a certain set of ages between 1-7, but I am getting matches on any double digit which contains any of these characters too. My regex is currently as below;
/^(1)|(2)|(3)|(4)|(5)|(6)|(7)|$/g
My current matches include 1,2,3,4,5,6,7 - perfect. However, it matches the line with 11,13,15 and 7,8,12 - not what I wanted.
Any advice would be appreciated on how to resolve? Thanks in advance, I am continuing to try to correct.
You can use word boundaries:
\b[1-7]\b
See a demo on regex101.com.
As pointed out by #Quantic, this matches numbers from 1-7 regardless where they are.
If you only want to have lines where there is a number between 1-7, you'll need to use anchors:
^[0-7]$
Or if you want to capture the number:
^([0-7])$
With this, you'll need the multiline flag, see a demo on regex101.com as well.
(?<!\d)[1-7](?!\d)
This looks for any digit 1-7 that does not have another digit on either side of it. (using negative lookbehind/lookahead)
regex101 test

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})

Regex - how to make sure a string contain a word and numbers

I need a little help with Regex.
I want the regex to validate the following sentences:
fdsufgdsugfugh PCL 6
dfdagf PCL 11
fdsfds PCL6
fsfs PCL13
kl;klkPCL6
fdsgfdsPCL13
some chars, than PCL and than 6 or a greater number.
How this can be done?
I'd go with something like this:
^(.*)(PCL *)([6-9][0-9]*|[1-5][0-9]+)$
Meaning:
(.*) = some chars
(PCL *) = then PCL with optional whitespaces afterwards
([6-9][0-9]*|[1-5][0-9]+) then 6 or a greater number
This one should suit your needs:
^.*PCL\s*(?:[6-9]|\d{2,})$
Visualization by Debuggex
In bash:
EXPR=^[a-zA-Z]\+ *PCL *\([6-9]\|[0-9]\{2,\}\)
Translated:
Line begins with at least 1 occurence of a character (ignore caps)
Any amount of spaces, PCL, any amount of spaces
Either a number between 6 or 9, or a number with at least 2 digits
This expression used with something like grep "$EXPR" file.txt will output in stdout the lines that are valid.
This worked well for me. Reads logically too according to the way you described the matching
/[^PCL]+PCL\s?*[6-9]\d*/

price regex help

how to make regex below to detect also prices like just £7 not only everything > 9
/\d[\d\,\.]+/is
thanks
to match a single digit, you can change it to
/\d[\d,.]*/
the + means require one or more, so that's why the whole thing won't match just a 7. The * is 0 or more, so an extra digit or , or . becomes optional.
The longer answer might be more complicated. For example, in the book Regular Expression Cookbook, there is an excerpt: (remove the ^ and $ if you want it to match the 2 in apple $2 each) but note that when the number is 1000 or more, the , is needed. For example, the first regex won't match 1000.33
(unsourced image from a book removed)
Your expression would allow 123...3456... I think you might want something like (£|$|€)?\d\d+((,|.)\d{2})?
This will require the source have a currency symbol, and two digits for cents with a separator.
You might look at a regex more like the following.
/(?:\d+[,.]?\d*)|(?:[,.]\d+)/
Test Set:
5.00
$7.00
6123.58
$1
.75
Result Set:
[0] => 5.00
[1] => 7.00
[2] => 6123.58
[3] => 1
[4] => .75
EDIT: Additional Case added