Regex extract number between characters - regex

I have a returned string formatted as below:
PR ER
89
>
from which the number can be extracted by using \n(\d+), but sometimes it returns:
23 PR P 10000>
Or, it could be something like:
23
PR P
10000
>
In these scenarios, how can I extract the number 10000 between PR and >?

This might work for you:
\d+(?=\s*>)
It looks for any sequence of digits followed by any number of whitespaces and a '>'

For java if you need
String str = "23 PR P 10000>";
Pattern reg = Pattern.compile("(\\d+)");
Matcher m = reg.matcher(str);
while (m.find()){
System.out.println("group : " + m. group() + " - start :" + m.start() + " - end :" + m.end());
}

i might just answer this myself
\d+\n>
worked!
thanks all

Related

Why is string.find_first_of behaving this way?

I am trying to make a (assembly) parser which uses a string as a guide for how to cut the text to get the tokens I want.
string s = "$t4,";
string guide = "$!,$!,$!";
int i = 1;
string test =s.substr(0, s.find_first_of(" ,.\t"+to_string(guide[i+1]) ));
cout << test << "\n";
if s = "$t4" then test = "$t"
what I am expecting it to do is test to be "$t4", this works for every other $tX except for specifically the number 4 even though it's not in the (" ,.\t"+to_string(guide[i+1])) string
s.find_first_of(" ,.\t" + std::to_string(guide[i + 1]))
Assuming ASCII, that string will be:
,.\t44
44 is the ASCII value of the , in guide[i + 1].
The first character in "$t4," that it'll find is 4 at position 2, and you then create a substring from 0 and length 2, that is $t.

Getting the index of a slice

I want to do some processing on a string in Scala. The first stage of that is finding the index of articles such as: "A ", " A ", "a ", " a ". I am trying to do that like this:
"A house is in front of us".indexOfSlice("\\s+[Aa] ")
I think this should return 0, as the substring is first matched in the first position of the string.
However, this returns -1.
Why does it return -1? Is the regex I am using incorrect?
The other answers as I type this are just missing the point. Your problem is that indexOfSlice doesn't take a regexp, but a sub-sequence to seach for in the sequence. So fixing the regexp won't help at all.
Try this:
val pattern = "\\b[Aa]\\b".r.unanchored
for (mo <- pattern.findAllMatchIn("A house is in front of us, a house is in front of us all")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 0
//| pattern starts at 27
(with fixed regex, too)
Edit: counter-example for the popular but wrong suggestion of "\\s*[Aa] "
val pattern2 = "\\s*[Aa] ".r.unanchored
for (mo <- pattern2.findAllMatchIn("The agenda is hidden")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 9
I see a mistake in your regex. your regex is searching for
at least once space (\s+)
a letter (either A or a)
but string you are matching doesn't contain space in beginning. that's why It's not returning you index 0 but -1.
you could write your regex as "^\\s*[Aa] "
Here is example:
val text = "A house is in front of us";
val matcher = Pattern.compile("^\\s*[Aa] ").matcher(text)
var idx = 0;
if(matcher.find()){
idx = matcher.start()
}
println(idx)
it should return 0 as expected.

Regex to match all " - " deliminators in filename except first and last?

I've been trying to write a regex to match all the " - " deliminators in a filename except the first and last, so I can combine all the data in the middle into one group, for example a filename like:
Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
Has to become:
Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
So basically I'm trying to replace " - " with "- " but not the first or last instance. The Filenames can have 1 to 6 " - " deliminators, but should only affect the ones with 3, 4, 5 or 6 " - " deliminators.
It's for use in File Renamer. flavor is JavaScript. Thanks.
Can you not use a regex? If so:
var s = "Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc";
var p = s.split(' - ');
var r = ''; // result output
var i = 0;
p.forEach(function(e){
switch(i) {
case 0: r += e; break;
case 1: case p.length - 1: r += ' - ' + e; break;
default: r += '- ' + e;
}
i++;
});
console.log(r);
http://jsfiddle.net/c7zcp8z6/1/
s=Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
r=Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
This is assuming that the separator is always - (1 space, 1 dash, 1 space). If not, you need to split on - only, then trim each tokens before reconstructing.
Two options:
1 - You'll need to do some processing of your own by iterating through the matches using
( - )
and building a new string (see this post about getting match indices).
You'll have to check that the match count is greater than 2 and skip the first and last matches.
2 - Use
.+ - ((?:.+ - )+).+ - .+
to get the part of the string to be modified and then do a replace on the the dashes, then build your string (again using the indices from the above regex).
Thanks for the suggestions.
I got it to work this way
It replaces the first and last " - " with " ! ", so I can then do a simple Find and Replace of all remaining " - " with "- ", then change all the " ! " back to " - "

Regular expression to limit number of digits

I am trying to write a regular expression that will only match with qtr1, qtr2, qtr3, qtr4 with help of following regex [q|qtr|qtrs|quarter]+[1-4] but the problem is if i ask something like this "Ficoscore for Q21 2005" a space is added between Q and 21 ie "Ficoscore for Q 21 2005" this not valid.
String regEx = "([q|qtr|qtrs|quarter]+[1-4])";
Pattern pattern = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(userQuerySentence);
System.out.println(matcher.matches());
while (matcher.find()) {
String quarterString = matcher.group();
userQuerySentence = userQuerySentence.replaceAll(quarterString,
(quarterString.substring(0, quarterString.length() - 1) + " " + quarterString.substring(quarterString
.length() - 1)));
}
[q|qtr|qtrs|quarter] is a character class, I guess you want (q|qtr|qtrs|quarter):
String regEx = "(?i)\\b((?:q(?:trs?|uarter)?)[1-4])\\b";

Boost Regex unknown number of var

I have an issue with a regex expression and need some help. I have some expressions like these in mein .txt File:
19 = NAND (1, 19)
regex expression : http://rubular.com/r/U8rO09bvTO
With this regex expression I got seperated matches for the numbers.
But now I need a regex expression with an unknown amount of numbers in the bracket.
For example:
19 = NAND (1, 23, 13, 24)
match1: 19, match2: 1, match3: 23, match4: 13, match5: 24
I don't know the number of the numbers. So I need a main expression for min 2 numbers in the bracket till a unknow number.
By the way i'm using c++.
# Martjin Your first regex expression worked very well thanks.
Here my code:
boost::cmatch result;
boost::regex matchNand ("([0-9]*) = NAND\\((.*?)\\)");
boost::regex matchNumb ("(\\d+)");
string cstring = "19 = NAND (1, 23, 13, 24)";
boost::regex_search(cstring.c_str(), result, matchNand);
cout << "NAND: " << result[1] << "=" << result[2] << endl;
string str = result[2];
boost::regex_search(str.c_str(), result, matchNumb);
cout << "NUM: " << result[1] << "," << result[2]<< "," << result[3] << "," << result[4] << endl;
My output:
NAND: 19=1, 23, 13, 24
NUM: 1,,,
So my new problem is i only find the first number.
The result is also in complete opposite with this solution: http://rubular.com/r/nqXDSuBXjc
A simple (and maybe more clear than one regex) is to split this into two regexes.
First run a regex that splits your result from your arguments:
http://rubular.com/r/YkGdkkg4y3
([0-9]*) = NAND \((.*?)\)
Then perform a regex that will match all the numbers in your argument: http://rubular.com/r/2vpSbZvz12
\d+
Assuming you're using Ruby, you can perform a regex that matches multiple times with the function scan as explained here: http://ruby-doc.org/core-1.9.3/String.html#method-i-scan
Of course you could just use the second regex with the scan function to get all the numbers from that line, but I'm guessing you're going to expand it even more, which is when this approach will be a little more structured.