Regular Expressions: querystring parameters matching - regex

I'm trying to learn something about regular expressions.
Here is what I'm going to match:
/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
My expression should "grabs" abc123 and def456.
And now just an example about what I'm not going to match ("question mark" is missing):
/parent/child/firstparam=abc123&secondparam=def456
Well, I built the following expression:
^(?:/parent/child){1}(?:^(?:/\?|\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?
But that doesn't work.
Could you help me to understand what I'm doing wrong?
Thanks in advance.
UPDATE 1
Ok, I made other tests.
I'm trying to fix the previous version with something like this:
/parent/child(?:(?:\?|/\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?$
Let me explain my idea:
Must start with /parent/child:
/parent/child
Following group is optional
(?: ... )?
The previous optional group must starts with ? or /?
(?:\?|/\?)+
Optional parameters (I grab values if specified parameters are part of querystring)
(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?
End of line
$
Any advice?
UPDATE 2
My solution must be based just on regular expressions.
Just for example, I previously wrote the following one:
/parent/child(?:[?&/]*(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*))*$
And that works pretty nice.
But it matches the following input too:
/parent/child/firstparam=abc123&secondparam=def456
How could I modify the expression in order to not match the previous string?

You didn't specify a language so I'll just usre Perl. So basically instead of matching everything, I just matched exactly what I thought you needed. Correct me if I am wrong please.
while ($subject =~ m/(?<==)\w+?(?=&|\W|$)/g) {
# matched text = $&
}
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
= # Match the character “=” literally
)
\\w # Match a single character that is a “word character” (letters, digits, and underscores)
+? # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
& # Match the character “&” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
\\W # Match a single character that is a “non-word character”
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
Output:

This regex will work as long as you know what your parameter names are going to be and you're sure that they won't change.
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?
Whilst regex is not the best solution for this (the above code examples will be far more efficient, as string functions are way faster than regexes) this will work if you need a regex solution with up to 3 parameters. Out of interest, why must the solution use only regex?
In any case, this regex will match the following strings:
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
It will now only match those containing query string parameters, and put them into capture groups for you.
What language are you using to process your matches?
If you are using preg_match with PHP, you can get the whole match as well as capture groups in an array with
preg_match($regex, $string, $matches);
Then you can access the whole match with $matches[0] and the rest with $matches[1], $matches[2], etc.
If you want to add additional parameters you'll also need to add them in the regex too, and add additional parts to get your data. For example, if you had
/parent/child/?secondparam=def456&firstparam=abc123&fourthparam=jkl01112&thirdparam=ghi789
The regex will become
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?
This will become a bit more tedious to maintain as you add more parameters, though.
You can optionally include ^ $ at the start and end if the multi-line flag is enabled. If you also need to match the whole lines without query strings, wrap this whole regex in a non-capture group (including ^ $) and add
|(?:^\/parent\/child\/?\??$)
to the end.

You're not escaping the /s in your regex for starters and using {1} for a single repetition of something is unnecessary; you only use those when you want more than one repetition or a range of repetitions.
And part of what you're trying to do is simply not a good use of a regex. I'll show you an easier way to deal with that: you want to use something like split and put the information into a hash that you can check the contents of later. Because you didn't specify a language, I'm just going to use Perl for my example, but every language I know with regexes also has easy access to hashes and something like split, so this should be easy enough to port:
# I picked an example to show how this works.
my $route = '/parent/child/?first=123&second=345&third=678';
my %params; # I'm going to put those URL parameters in this hash.
# Perl has a way to let me avoid escaping the /s, but I wanted an example that
# works in other languages too.
if ($route =~ m/\/parent\/child\/\?(.*)/) { # Use the regex for this part
print "Matched route.\n";
# But NOT for this part.
my $query = $1; # $1 is a Perl thing. It contains what (.*) matched above.
my #items = split '&', $query; # Each item is something like param=123
foreach my $item (#items) {
my ($param, $value) = split '=', $item;
$params{$param} = $value; # Put the parameters in a hash for easy access.
print "$param set to $value \n";
}
}
# Now you can check the parameter values and do whatever you need to with them.
# And you can add new parameters whenever you want, etc.
if ($params{'first'} eq '123') {
# Do whatever
}

My solution:
/(?:\w+/)*(?:(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)?|\w+|)
Explain:
/(?:\w+/)* match /parent/child/ or /parent/
(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)? match child?firstparam=abc123 or ?firstparam=abc123 or ?
\w+ match text like child
..|) match nothing(empty)
If you need only query string, pattern would reduce such as:
/(?:\w+/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)
If you want to get every parameter from query string, this is a Ruby sample:
re = /\/(?:\w+\/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)/
s = '/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789'
if m = s.match(re)
query_str = m[1] # now, you can 100% trust this string
query_str.scan(/(\w+)=(\w+)/) do |param,value| #grab parameter
printf("%s, %s\n", param, value)
end
end
output
secondparam, def456
firstparam, abc123
thirdparam, ghi789

This script will help you.
First, i check, is there any symbol like ?.
Then, i kill first part of line (left from ?).
Next, i split line by &, where each value splitted by =.
my $r = q"/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789";
for my $string(split /\n/, $r){
if (index($string,'?')!=-1){
substr($string, 0, index($string,'?')+1,"");
#say "string = ".$string;
if (index($string,'=')!=-1){
my #params = map{$_ = [split /=/, $_];}split/\&/, $string;
$"="\n";
say "$_->[0] === $_->[1]" for (#params);
say "######next########";
}
else{
#print "there is no params!"
}
}
else{
#say "there is no params!";
}
}

Related

Extract first word after specific word

I'm having difficulty writing a Perl program to extract the word following a certain word.
For example:
Today i'm not going anywhere except to office.
I want the word after anywhere, so the output should be except.
I have tried this
my $words = "Today i'm not going anywhere except to office.";
my $w_after = ( $words =~ /anywhere (\S+)/ );
but it seems this is wrong.
Very close:
my ($w_after) = ($words =~ /anywhere\s+(\S+)/);
^ ^ ^^^
+--------+ |
Note 1 Note 2
Note 1: =~ returns a list of captured items, so the assignment target needs to be a list.
Note 2: allow one or more blanks after anywhere
In Perl v5.22 and later, you can use \b{wb} to get better results for natural language. The pattern could be
/anywhere\b{wb}.+?\b{wb}(.+?\b{wb})/
"wb" stands for word break, and it will account for words that have apostrophes in them, like "I'll", that plain \b doesn't.
.+?\b{wb}
matches the shortest non-empty sequence of characters that don't have a word break in them. The first one matches the span of spaces in your sentence; and the second one matches "except". It is enclosed in parentheses, so upon completion $1 contains "except".
\b{wb} is documented most fully in perlrebackslash
First, you have to write parentheses around left side expression of = operator to force array context for regexp evaluation. See m// and // in perlop documentation.[1] You can write
parentheses also around =~ binding operator to improve readability but it is not necessary because =~ has pretty high priority.
Use POSIX Character Classes word
my ($w_after) = ($words =~ / \b anywhere \W+ (\w+) \b /x);
Note I'm using x so whitespaces in regexp are ignored. Also use \b word boundary to anchor regexp correctly.
[1]: I write my ($w_after) just for convenience because you can write my ($a, $b, $c, #rest) as equivalent of (my $a, my $b, my $c, my #rest) but you can also control scope of your variables like (my $a, our $UGLY_GLOBAL, local $_, #_).
This Regex to be matched:
my ($expect) = ($words=~m/anywhere\s+([^\s]+)\s+/);
^\s+ the word between two spaces
Thanks.
If you want to also take into consideration the punctuation marks, like in:
my $words = "Today i'm not going anywhere; except to office.";
Then try this:
my ($w_after) = ($words =~ /anywhere[[:punct:]|\s]+(\S+)/);

regex to duplicate repeated patterns, substituting part of the pattern

I'd like to duplicate a multiple matches in a line, substituting part of the match, but keeping the runs of matches together (that seems to be the tricky part).
e.g.:
Regex:
(x(\d)(,)?)
Replacement:
X$2,O$2$3
Input:
x1,x2,Z3,x4,Z5,x6
Output: (repeated groups broken apart)
X1,O1,X2,O2,Z3,X4,O4,Z5,X6,O6
Desired output (repeated groups, "X1,X2" kept together):
X1,X2,O1,O2,Z3,X4,O4,Z5,X6,O6
Demo: https://regex101.com/r/gH9tL9/1
Is this possible with regex or do I need to use something else?
Update: Wills answer is what I expected. It occurs to me that it might be possible with multiple passes of regex.
You would have to capture the repeating patterns as one match and write out replacements for the whole repeating pattern at once. your current pattern cannot tell that your first and second matches, x1, and x2, respectively, are adjacent.
Im going to say no, this is not possible with one pure regex.
This is because of two important facts about capture groups and replacing.
Repeated capture groups will return the last capture:
Regex's are able to capture patterns which repeat an arbitrary amount of time by using the form <PATTERN>{1,},<PATTERN>+ or <PATTERN>*. However any capture group within <PATTERN> would only return the captures from the last iteration of the pattern. This would prevent your desired ability to capture matches that arbitrarily repeat.
"Hold on", you might say, "I only want to capture patterns that repeat one or two times, I could use (x(\d)(,)?)(x(\d)(,)?)?", which brings us to point 2.
There is no conditional replacement
Using the above pattern we could get your desired output for the repeated match, but not without mangling the solo match replacement.
See: https://regex101.com/r/gH9tL9/2 Without the ability to turn off sections of the replacement based on the existence of capture groups, we cannot achieve the desired output.
But "No, you can't do that" is a challenge to a hacker, I hope I am shown up by a true regex ninja.
Solution with 2 regexes and some code
There's definitely ways to achieve this goal with some code.
Here's a quick and dirty python hack using two regexes http://pythonfiddle.com/wip-soln-for-so-q/
This makes use of python's re.sub(), which can pass matches to one regex to a function ordered_repl which returns the replacement string. By using your original regex within the ordered_repl we can extract the information we want and get the right order by buffering our lists of Xs and Os.
import re
input_string="x1,x2,Z3,x4,Z5,x6"
re1 = re.compile("(?:x\d,?)+") # captures the general thing you want to match using a repeating non-capturing group
re2 = re.compile("(x(\d)(,)?)") # your actual matcher
def ordered_repl(m): # m is a matchobj
buf1 = []
buf2 = []
cap_iter = re.finditer(re2,m.group(0)) # returns an iterator of MatchObjects for all non-overlapping matches
for cap_group in cap_iter:
capture = cap_group.group(2) # capture the digit
buf1.append("X%s" % capture) # buffer X's of this submatch group
buf2.append("O%s" % capture) # buffer O's of this submatch group
return "%s,%s," % (",".join(buf1),",".join(buf2)) # concatenate the buffers and return
print re.sub(re1,ordered_repl,input_string).rstrip(',') # searches string for matches to re1 and passes them to the ordered_repl function
In my specific case I'm using powershell, so I was able to come up with the following:
(linebreaks added for readability)
("x1,x2,z3,x4,z5,x6"
-split '((?<=x\d),(?!x)|(?<!x\d),(?=x))'
| Foreach-Object {
if ($_ -match 'x') {
$_ + ',' + ($_ -replace 'x','y')
} else {$_}
}
) -join ''
Outputs:
x1,x2,y1,y2,z3,x4,y4,z5,x6,y6
Where:
-split '((?<=x\d),(?!x)|(?<!x\d),(?=x))'
breaks apart the string into these groups:
x1,x2
,
z3
,
x4
,
z5
,
x6
using positive and negative lookahead and lookbehind:
comma with x\d before and without x after:
(?<=x\d),(?!x)
comma without x\d before and with x after:
(?<!x\d),(?=x)

Regular expression help in Perl

I have following text pattern
(2222) First Last (ab-cd/ABC1), <first.last#site.domain.com> 1224: efadsfadsfdsf
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
I want the number 1224 or 1234, 4657 from the above text after the text >.
I have this
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain.com>\s\d+:
which will take the text before : But i want the one after email till :
Is there any easy regular expression to do this? or should I use split and do this
Thanks
Edit: The whole text is returned by a command line tool.
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
(3333) - Unique ID
First Last - First and last names
<first.last#site.domain.com> - Email address in format FirstName.LastName#sub.domain.com
1234, 4567 - database primary Keys
: xxxx - Headline
What I have to do is process the above and get hte database ID (in ex: 1234, 4567 2 separate ID's) and query the tables
The above is the output (like this I will get many entries) from the tool which I am calling via my Perl script.
My idea was to use a regular expression to get the database id's. Guess I could use regular expression for this
you can fudge the stuff you don't care about to make the expression easier, say just 'glob' the parts between the parentheticals (and the email delimiters) using non-greedy quantifiers:
/(\d+)\).*?\(.*?\),\s*<.*?>\s*(\d+(?:,\s*\d+)*):/ (not tested!)
there's only two captured groups, the (1234), and the (1234, 4657), the second one which I can only assume from your pattern to mean: "a digit string, followed by zero or more comma separated digit strings".
Well, a simple fix is to just allow all the possible characters in a character class. Which is to say change \d to [\d, ] to allow digits, commas and space.
Your regex as it is, though, does not match the first sample line, because it has a dash - in it (ab-cd/ABC1 does not match \w*\/\w+\d*\). Also, it is not a good idea to rely too heavily on the * quantifier, because it does match the empty string (it matches zero or more times), and should only be used for things which are truly optional. Use + otherwise, which matches (1 or more times).
You have a rather strict regex, and with slight variations in your data like this, it will fail. Only you know what your data looks like, and if you actually do need a strict regex. However, if your data is somewhat consistent, you can use a loose regex simply based on the email part:
sub extract_nums {
my $string = shift;
if ($string =~ /<[^>]*> *([\d, ]+):/) {
return $1 =~ /\d+/g; # return the extracted digits in a list
# return $1; # just return the string as-is
} else { return undef }
}
This assumes, of course, that you cannot have <> tags in front of the email part of the line. It will capture any digits, commas and spaces found between a <> tag and a colon, and then return a list of any digits found in the match. You can also just return the string, as shown in the commented line.
There would appear to be something missing from your examples. Is this what they're supposed to look like, with email?
(1234) First Last (ab-cd/ABC1), <foo.bar#domain.com> 1224: efadsfadsfdsf
(1234) First Last (abcd/ABC12), <foo.bar#domain.com> 1234, 4657: efadsfadsfdsf
If so, this should work:
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain\.com>\s\d+(?:,\s(\d+))?:
$string =~ /.*>\s*(.+):.+/;
$numbers = $1;
That's it.
Tested.
With number catching:
$string =~ /.*>\s*(?([0-9]|,)+):.+/;
$numbers = $1;
Not tested but you get the idea.

Regular Expression issue with * laziness

Sorry in advance that this might be a little challenging to read...
I'm trying to parse a line (actually a subject line from an IMAP server) that looks like this:
=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?=
It's a little hard to see, but there are two =?/?= pairs in the above line. (There will always be one pair; there can theoretically be many.) In each of those =?/?= pairs, I want the third argument (as defined by a ? delimiter) extracted. (In the first pair, it's "Here is som", and in the second it's "e text.")
Here's the regex I'm using:
=\?(.+)\?.\?(.*?)\?=
I want it to return two matches, one for each =?/?= pair. Instead, it's returning the entire line as a single match. I would have thought that the ? in the (.*?), to make the * operator lazy, would have kept this from happening, but obviously it doesn't.
Any suggestions?
EDIT: Per suggestions below to replace ".?" with "[^(\?=)]?" I'm now trying to do:
=\?(.+)\?.\?([^(\?=)]*?)\?=
...but it's not working, either. (I'm unsure whether [^(\?=)]*? is the proper way to test for exclusion of a two-character sequence like "?=". Is it correct?)
Try this:
\=\?([^?]+)\?.\?(.*?)\?\=
I changed the .+ to [^?]+, which means "everything except ?"
A good practice in my experience is not to use .*? but instead do use the * without the ?, but refine the character class. In this case [^?]* to match a sequence of non-question mark characters.
You can also match more complex endmarkers this way, for instance, in this case your end-limiter is ?=, so you want to match nonquestionmarks, and questionmarks followed by non-equals:
([^?]*\?[^=])*[^?]*
At this point it becomes harder to choose though. I like that this solution is stricter, but readability decreases in this case.
One solution:
=\?(.*?)\?=\s*=\?(.*?)\?=
Explanation:
=\? # Literal characters '=?'
(.*?) # Match each character until find next one in the regular expression. A '?' in this case.
\?= # Literal characters '?='
\s* # Match spaces.
=\? # Literal characters '=?'
(.*?) # Match each character until find next one in the regular expression. A '?' in this case.
\?= # Literal characters '?='
Test in a 'perl' program:
use warnings;
use strict;
while ( <DATA> ) {
printf qq[Group 1 -> %s\nGroup 2 -> %s\n], $1, $2 if m/=\?(.*?)\?=\s*=\?(.*?)\?=/;
}
__DATA__
=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?=
Running:
perl script.pl
Results:
Group 1 -> utf-8?Q?Here is som
Group 2 -> utf-8?Q?e text.
EDIT to comment:
I would use the global modifier /.../g. Regular expression would be:
/=\?(?:[^?]*\?){2}([^?]*)/g
Explanation:
=\? # Literal characters '=?'
(?:[^?]*\?){2} # Any number of characters except '?' with a '?' after them. This process twice to omit the string 'utf-8?Q?'
([^?]*) # Save in a group next characters until found a '?'
/g # Repeat this process multiple times until end of string.
Tested in a Perl script:
use warnings;
use strict;
while ( <DATA> ) {
printf qq[Group -> %s\n], $1 while m/=\?(?:[^?]*\?){2}([^?]*)/g;
}
__DATA__
=?utf-8?Q?Here is som?= =?utf-8?Q?e text.?= =?utf-8?Q?more text?=
Running and results:
Group -> Here is som
Group -> e text.
Group -> more text
Thanks for everyone's answers! The simplest expression that solved my issue was this:
=\?(.*?)\?.\?(.*?)\?=
The only difference between this and my originally-posted expression was the addition of a ? (non-greedy) operator on the first ".*". Critical, and I'd forgotten it.

2-step regular expression matching with a variable in Perl

I am looking to do a 2-step regular expression look-up in Perl, I have text that looks like this:
here is some text 9337 more text AA 2214 and some 1190 more BB stuff 8790 words
I also have a hash with the following values:
%my_hash = ( 9337 => 'AA', 2214 => 'BB', 8790 => 'CC' );
Here's what I need to do:
Find a number
Look up the text code for the number using my_hash
Check if the text code appears within 50 characters of the identified number, and if true print the result
So the output I'm looking for is:
Found 9337, matches 'AA'
Found 2214, matches 'BB'
Found 1190, no matches
Found 8790, no matches
Here's what I have so far:
while ( $text =~ /(\d+)(.{1,50})/g ) {
$num = $1;
$text_after_num = $2;
$search_for = $my_hash{$num};
if ( $text_after_num =~ /($search_for)/ ) {
print "Found $num, matches $search_for\n";
}
else {
print "Found $num, no matches\n";
}
This sort of works, except that the only correct match is 9337; the code doesn't match 2214. I think the reason is that the regular expression match on 9337 is including 50 characters after the number for the second-step match, and then when the regex engine starts again it is starting from a point after the 2214. Is there an easy way to fix this? I think the \G modifier can help me here, but I don't quite see how.
Any suggestions or help would be great.
You have a problem with greediness. The 1,50 will consume as much as it can. Your regex should be /(\d+)(.+?)(?=($|\d))/
To explain, the question mark will make the multiple match non-greedy (it will stop as soon as the next pattern is matched - the next pattern gets precedence). The ?= is a lookahead operator to say "check if the next element is a digit. If so, match but do not consume." This allows the first digit to get picked up by the beginning of the regex and be put into the next matched pattern.
[EDIT]
I added an optional end value to the lookahead so that it wouldn't die on the last match.
Just use :
/\b\d+\b/g
Why match everything if you don't need to? You should use other functions to determine where the number is :
/(?=9337.{1,50}AA)/
This will fail if AA is further than 50 chars away from the end of 9337. Of course you will have to interpolate your variables to match your hashe's keys and values. This was just an example for your first key/value pair.