I have a file with lines like
text text text 3424 text text 3423 50 US text 342 text
What I want to match is 50 US (yes, dollars) and ultimately extract that number.
Everything else changes in different lines, there may be more text or less surrounding, but in each line there is only one "US" anchor that I can match.
So what I want to do is to find a way to match US and get the preceding 3 or 4 characters.
Any ideas? Preferably with sed/awk, but any solution will do.
Perl regexes (or anything that understands non-greedy .*? expressions) are easier than sed for this:
perl -pe 's/^.*?(\d+\.?\d*)\s*US.*$/$1/'
That will handle things like "11.23" as well.
\d+ US
This should work given that US is present only once in the string.
Use lookarounds:
\d+(?= US)
This regex will only capture the numeric amount. The (?= US) tells it to match on "US" but not capture it.
This is what you could use in VBA regex flavor, which also supports lookaheads:
" ((.+)(?= US))"
Starts with a space
Next is the capture group. (.+) I use that instead of \d so that stuff like 5,000 and 11.3 works. In fact, anything works, so if you want the word/number that precedes "US" then this is the way to write it.
Next is the lookahead. So you only want the capture group that is immediately followed by " US". If it finds it, it will only give you back the capture group, not the lookahead value.
Related
In my LaTeX work I need to do Regex search with \|(.*?)\| to capture |whatever| and replace it with \somecommand{$1}. But I do not want to capture || (That is, there is nothing between them.) How should I refine my regex search?
(By the way, what should my title be, so that it is useful for others?)
Change your regex to,
\|[^|]+\|
OR
\|.+\|
If you want to also capture pipes in between searched content
You have to change the asterix (which matches 0+ times) to a plus sign make the quantifier match at least 1 character.
\|(.+?)\|
^
I have the following text
Cool Title Here 12345
Other title here 13455
That I want to turn into this using Atom's find and replace
Cool Title Here, 12345
Other title here, 13455
My goal is to select the space between the end of a word and the start of a number. My first instinct is this statement
[A-Za-z][\s][0-9]
However that also selects the last letter and the first number which is not good for this replacement as I would loose data.
How would I accomplish finding the space inbetween two sections using pure Regex
You can capture the letter and the number, and in the replacement, use back reference to add them back:
So specify the pattern:
([A-Za-z]) ([0-9])
In the replacement:
$1, $2
I am not familiar with the specifics of Atom regular expression processing but some Googling suggests these general regex techniques should work:
You could use \b to identify the word boundary of the preceding word (without capturing it).
You can use (?=\d) to look ahead to the digit without capturing.
so for your example:
\b\s(?=\d)
I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?
Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.
I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1
Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3
Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.
i dont know much about regular expressions and from what i'v learned i cant solve my entire problem.
I have this String:
04 credits between subjects of block 02
I'm only sure i will have [00-99] on the beggining and at end.
I wanna capture the beggining and the end IF the middle has "credits between", the system can have other formats as input, so i wanna be sure that these fields captured will go from the correct pattern.
This is what i'v tried to do:
(\w\w) ^credits between$.+ (\w\w)
I'm using the Regexr website to see what i'm doing, but no success.
You may use the following regex:
^(\d{2})\b.*credits between.*\b(\d{2})$
See regex demo
It will match and capture 2 digits at the beginning and end if the string itself contains credits between. Note that newlines can be supported with [\s\S] instead of ..
The word boundaries \b just make the engine match the digits followed by a non-word character (you may remove it if that is not expected behavior). Then, you'd need to use ^(\d{2})\b.*credits between.*?(\d{2})$ with the lazy matching .*? at the end.
If the number of digits in the numbers at both ends can vary, just use
^(\d+).*credits between.*?(\d+)$
See another demo
I want to match and replace a number of four digit numbers in a csv file
1,1456,2,3,4,5
2,1455,2,3,4,5
so that all 1400 numbers in the second column are mapped to the range of two hundred
1456 -> 256
1455 -> 255
I have this regex to match the 1400 numbers
',[1][4][0-9][0-9],'
but how can i define the matched substring regex to retain the last two digits of the match?
EDIT
Ended up changing the match regex to
,[1][4]([0-9][0-9])
and the match defined as
,2\1
in Notepad++
Replace /14(\d{2})/ with 2\1, where \1 is a back reference to the first match. Adapt to your regex flavor of choice.
sed -e 's/,[1][4]\([0-9][0-9]\),/,2\1,/'
Notice how the \( \) syntax captures a part of the matched expression, and \1 is used to say "the first captured data".
You need to use a backreference - by surrounding one or more parts of a regex in parentheses, you can later reference them in the output. Here is my final version (works with sed -r).
's/,[1][4]([0-9][0-9])/,2\1/'
You should use a group, i.e. something like
',[1][4]([0-9][0-9]),'
Some regex dialects will let you name groups, e.g. in .NET
',[1][4](?<LastTwoDigits>[0-9][0-9]),'
If you specify which language you are using, it will be easier to help you.