How to extract multiple values with a regular expression in Jmeter - regex

I am running tests with jmeter and I need to extract with a Regular Expression:
insertar?sIws2kyXGJJA_01==
insertar?sIws2kyXGJJA_02==
in the following String:
[\"EMBPAGE1_00010001\",\"**insertar?sIws2kyXGJJA_01==**\",1,100,\"%\",300,\"px\",0,\"center\",\"\",\"[\"EMBPAGE1_00010002\",\"**insertar?sIws2kyXGJJA_02==**\",1,100,\"%\",300,\"px\",0,\"center\",\"\",\"

Use super secret operator (Negative match N)
UPD: G2 - is in my example, as I extract two groups from each encounter.
each encounter is "uuid" in g1 and g2 is second part I need second part here.
that's why $2$ template and g2. If your encounters in one group you ll most likely use $1$ template that will place all encounters into g1.
If you have one match group you don't actually need _gN ending at all.
To understand more the variables after group extraction add a "Debug PostProcessor" and inspect output in TreeView.
It nice two know that control elements like "For each" understand groups and can work with prefix like regexUUID_ and walk through. In most cases it's next you do after extraction.
UPD2. primitive version of regexp in question (insertar\?sIws2kyXGJJA_\d*)==([^[]*)
with template $1$$2$
you ll have the first parts in g1 group and the second parts in g2

In answer given by DMC, you need to add regular expression extractor TWICE to match/retrieve both the values with different Match No. (1, 2). Though it is also correct, suggesting better approach to achieve the same.
Another Approach:
1. Capture Both Values:
You can use Template to capture both the values at the same time, and later, refer it using indexing.
Please check the following screen shot:
Here, we captured both the values using two groups into two different templates, as $1$ and $2$ respectively. Here, templates store the data in the order of the groups specified in regular expression by default. (FYI, you can change the order also by tweaking the order of templates like $2$ and then $1$.)
Now, as in the diagram, we are capturing two values and storing them using templates: $1$ (refers to first group match) and $2$ (refers to second group match)
2. Retrieve Values:
Now, refer these values in your script by using the following syntax:
${insert_values_gn} (n refers to match no.)
eg:
${insert_values_g1} - refers to the first match
${insert_values_g2} - refers to the second match
To make it simple, You can think "insert_values" as list of strings captured using multiple groups and use 'n' (1,2,3 etc) as the index to retrieve the values.
Note: using templates, you can have any number of values can be retrieved using multiple groups and refer to them by indexing, using a single regular expression extractor.

I'm sure there is a more efficient way but this worked:
\*\*(.*?)\*\*.*\"\*\*(.*?)\*\*
You can also use only \*\*(.*?)\*\*
It will match both of them anyway, so make sure you set the right 'Matching No.' in Jmeter if you pass one of the values:
The Matching No should be 1 for the first, and 2 for the second match i believe.

Related

Findall with regular expression in a pandas dataframe returns an incomplete list [duplicate]

so I am having trouble with Pandas for a series findall(). currently I am trying to look at a report and retrieving all the electric components. Currently the report is either a line or a paragraph and mention components in a standardize way. I am using this code
failedCoFromReason =rlist['report'].str.findall(r'([CULJRQF]([\dV]{2,4}))',flags=re.IGNORECASE)
It returns the components but it also returns a repeat value of the number like this [('r919', '919'), ('r920', '920')]
I would like it just to return [('r919'), ('r920')] but I am struggling with getting it to work. Pretty new to pandas and regex and confused how to search. I have tried greedy and non greedy searches but it didn't work.
See the Series.str.findall reference:
Equivalent to applying re.findall() to all the elements in the Series/Index.
The re.findall references says that "if one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group."
So, all you need to do is actually remove all capturing parentheses in this case, as all you need is to get the whole match:
rlist['report'].str.findall(r'[CULJRQF][\dV]{2,4}', flags=re.I)
In other cases, when you need to preserve the group (to quantify it, or to use alternatives), you need to change the capturing groups to non-capturing ones:
rlist['report'].str.findall(r'(?:[CULJRQF](?:[\dV]{2,4}))', flags=re.I)
Though, in this case, it is quite redundant.

repeated, arbitrary capture groups

Given a string, eg.:
static_string.name__john.id__6.foo__bar.final_string
but with an arbitrary number of label__value. components, how can I repeat the capture groups, split them into label & value, and also capture the terminating final_string ?
For the above I'd want [name, john, id, 6, foo, bar, final_string]
Is something like this possible when I don't know the number of label__value. components in advance?
This is for golang / RE2 if that matters.
Update: I don't have the luxury of doing this in a few lines of code, and would need to do this in a single regex. The regex is defined in a config file to an application I don't control, so a code based loop with conditionals etc is unfortunately not possible.
This totally depends on what the thing you are putting this into expects.
This is answer focused on getting you the capture groups in a basic way attempting to avoid any issues with the "thing" you are putting the regex into and RE2.
Note: You might find that the final_string doesn't get the capture group index you expect with this method, but again depends on what you are putting the regex into.
A regular expression that would match "one" and "no" key/value pairs the following is:
^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
static_string.final_string
static_string.name__john.final_string
To support one more key/value pair we repeat part of the regular expression:
Part repeated:
(?:\.([^.]+?)__([^.]+))?
So to support 2 key value pairs the regular expression is:
^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
This now supports the following additional example:
static_string.name__john.foo__bar.final_string
So if I expand that out to support 12 key value pairs the regular expression is:
^[^.]+(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+?)__([^.]+))?(?:\.([^.]+))$
This supports the following additional examples:
static_string.name__john.id__6.foo__bar.final_string
static_string.name2_1b__john.id__6.foo__bar.final_string
static_string.name__john.id__6.foo__bar.name__john.id__6.foo__bar.name__john.id__6.foo__bar.name__john.id__6.foo__bar.final_string

Regular expression to get value with duplicate data

Hi trying to extract my required string from given string. Given string looks like below.
1|a1|id11-name11,x|a2|id21-name21,y|a3|id31-name31~id32-name32,y4|a4|id41-name41~id42-name42~id43-name43
Expected output:
a1~name11|a2~name21|a3~name31|a3~name32|a4~name41|a4~name42|a4~name43
Regular Expression:
(^|,)[^|]{0,}\|([^|]{0,})\|(~){0,}[^-]{0,}-([^,~]{0,})
Extracting $2~$4| or \2~\4|
Regular Expression output:
a1~name11|a2~name21|a3~name31|
Is it possible to get a3~name32 along with a3~name31 using regular expression? Using multiple regular expression is also fine. Values in the third part after pipe symbol is not limited to 4 different values(id41-name41~id42-name42~id43-name43). This could be like id41-name41~id42-name42~id43-name43~id43-name43~id43-name43~id43-name43...
You have two choices first one is to split the string into many parts and get what you want.
Second one depends on the longest repeated part. In your case it is idxx-namexx.
If it is limited to a reasonable value you can repeat that part in you regex so you get all the parts. For instance for 2 you need to add the second part as follows:
([a-zA-Z]\d)\|(id\d+-(name\d+))(~?id\d+-(name\d+))?
______________-------1-------- _---------2--------_________
The groups will be
\1~\3 and
\1~\5
You can check it in Regex101 Site

regular expression multiple matches

For reference, this is the regex tester I am using:
http://www.rsyslog.com/regex/
How can I modify this regular expression:
[^;]+
to receive multiple sub-matches for the following test string:
;first;second;third;fourth;fifth and sixth;seventh;
I currently only receive one sub-match:
first
Basically I want each sub-match to consist of the content between ; characters, I am hoping for a sub-match list like this:
first
second
third
fourth
fifth and sixth
seventh
Following information given in the comments I discovered that the reason I can't get more than one sub-match is that I need to specify the global modifier - and I can't seem to figure out how to do that in the ryslog regex tester I am using.
However, this did lead me to solve my problem in a slightly different manner. I came up with this regular expression which still only gives one match, but the number near the end acts as the index for the desired match, so for example:
(?:;([^;]+)){5}
matches this from my test string in the question:
fifth and sixth
While this solution allows me to achieve what I wanted - though in a different manner - the true answer to my question is found in HamZa's comments. More specifically:
How can I modify the regular expression to receive multiple
sub-matches?
The answer is, you can't modify the regular expression itself in order to get multiple sub-matches. Setting the global modifier is required in order to do that.
Based on this information I have posted a new question on serverfault targeted specifically to the rsyslog regular expression system.

Regex capture words inside tags

Given an XML document, I'd like to be able to pick out individual key/value pairsfrom a particular tag:
<aaa>key0:val0 key1:val1 key2:va2</aaa>
I'd like to get back
key0:val0
key1:val1
key2:val2
So far I have
(?<=<aaa>).*(?=<\/aaa>)
Which will match everything inside, but as one result.
I also have
[^\s][\w]*:[\w]*[^\s] which will also match correctly in groups on this:
key0:val0 key1:val1 key2:va2
But not with the tags. I believe this is an issue with searching for subgroups and I'm not sure how to get around it.
Thanks!
You cannot combine the two expressions in the way you want, because you have to match each occurrence of "key:value".
So in what you came up with - (?<=<abc>)([\w]*:[\w]*[\s]*)+(?=<\/abc>) - there are two matching groups. The bigger one matches everything inside the tags, while the other matches a single "key:value" occurrence. The regex engine cannot give each individual occurence because it does not work that way. So it just gives you the last one.
If you think in python, on the matcher object obtained after applying you regex, you will have access to matcher.group(1) and matcher.group(2), because you have two matching ( ) groups in the regex.
But what you want is the n occurences of "key:value". So it's easier to just run the simpler \w+:\w+ regex on the string inside the tags.
I uploaded this one at parsemarket, and I'm not sure its what you are looking for, but maybe something like this:
(<aaa>)((\w+:\w+\s)*(\w+:\w+)*)(<\/aaa>)
AFAIK, unless you know how many k:v pairs are in the tags, you can't capture all of them in one regex. So, if there are only three, you could do something like this:
<(?:aaa)>(\w+:\w+\s*)+(\w+:\w+\s*)+(\w+:\w+\s*)+<(?:\/aaa)>
But I would think you would want to do some sort of loop with whatever language you are using. Or, as some of the comments suggest, use the parser classes in the language. I've used BeautifulSoup in Python for HTML.