extracting a part of an href in JMeter - regex

i'm stuck with the following.
i have a page on ibm filenet containing a list with objects (these are documents or files) which have a specific classID and ID in their href. i need JMeter to get all HREFS containing a specific type of ID:
<a href="http://ipaddress/Workplace/Browse.jsp?eventTarget=WcmController&eventName=GetInfo&id={350B278C-DE7D-44DE-9B54-099672152476}&vsId=&classId={F14AC85A-4474-479A-9B4E-BCBA180B7975}&objectStoreName=Nice&majorVersion=&minorVersion=&versionStatus=&mimeType=&mode=&objectType=customobject&isPopup=true" target="_blank">
the 'classId' = {F14AC85A-4474-479A-9B4E-BCBA180B7975} is the right class id type i need to click on the page (there are several files with this classID but that is no problem). on the other hand the 'id' is thus different for each file.
how can i extract all 'id's containing this specific classId and make JMeter pass it to the next sampler, so it clicks on just one of them? what will my RegEx look like?

As already mentionned in the comment, I do not know jmeter and how to implement it in the code. A regular expression to match both id and classId within a link would be:
~(classId=|id=)([^&]*)~g
This is, search for a string classId= or id= first. If one of the strings is found, match any character afterwards, except an ampersand (&), as many times as possible (*) and capture it in a group (brackets). Possibly you need to fiddle with the parameters (e.g. /g for global) after the regex.
See this regex101 fiddle for more information.

Related

Regex to find a specific anchor tag that have href with a specific domain and nofollow

I have a string that contains html I want a regex that get me the string that has with a specific domain name and has noFollow
I have found this would will do work on the domain name but does not include nofollow condition
(<a\s*(?!.\brel=)[^>])(href="https?://)((?stackoverflow)[^"]+)"([^>]*)>
let's say the domain name I want is stackoverflow
Example:
- "click here " this would match
- "<a href="stackoverflow.com"> would not match since it has no follow
- "<a href="google.com" rel = "nofollow"> would not match
It's bit hard to match a HTML tag with specific condition, but the following regex should do it:
select regexp_match(str, '<a((?:\s+(([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))* href=((''(https?:\/\/)?stackoverflow\.com[^'']*'')|("(https?:\/\/)?stackoverflow\.com[^"]*"))((?: (([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))*\s+rel=("nofollow"|''nofollow'')((?: (([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))*\/?>') from tes;
It's really hard to read, but basically most of the regex is there for matching attributes. The important thing for you is to find stackoverflow\.com (which can be found 2 times; one for href with single quote and second for double quote) and replace it with whatever domain you need (and don't forget to escape it properly).
Some notes
I don't know which regexp function you want to use, but you should be able to use it with whatever regexp function you need. Another thing is that your example click here won't be matched, because you have spaces between attribute name and = sign (i don't know if this is valid HTML or not). It will work with this click here . If you need to match addresses which might include spaces between = signs just comment me and I'll try to edit the regex.

kimonolabs >Text before comma

I'm trying to scrape a piece of text from a website using Kimonolabs. The text is succesfully scraped using the advanced setting:
div > div > ul > li.location > span.value
The text being scraped using this CSS selector is:
Cityname, streetname 1
However, I wish to delete everything before the comma so that only remains:
Cityname
I wish to do this with regex, but I'm totally ignorant about it. What I do konw is that it has to containof 3 blocks when using Kimonolabs: https://help.kimonolabs.com/hc/en-us/articles/203043464-Manually-input-regular-expressions
Can anybody help me setting up the correct regex? All I got so far is the following, but it's not the correct markup for Kimonolabs (it doesn't allow for it in the dashboard):
^(.+?),
See the docs you referred to:
The regular expression pattern in kimono is defined in three parts. It's important that any custom regular expression you produce retains the three part notation, with the surrounding ( ) for each part. The first part refers to the pattern to the left of the desired content. The middle part refers to the pattern that the desired content must match and the third part refers to the pattern to the right of the desired content.
So, you seem to need:
/^()([^,]+)()/
Or, /(^)([^,]+)(,)/ (it should be equivalent), and the 2nd capture group (the middle part) should capture the Cityname.

Regex for BBCode with optional parameters

I'm currently stuck on a regex. I'm trying to fetch the contents of a BBCode, that has optional params and maybe different notations:
[tag]https://example.com/1[/tag]
[tag='https://example.com/2'][/tag]
[tag="http://another-example.com/whatever"][/tag]
[tag=ftp://an-ftp-host][/tag]
[tag='https://example.com/3',left][/tag]
[tag="https://example.com/4",right][/tag]
[tag=https://example.com/5][/tag]
[tag=https://example.com/i-need-this-one,right]http://example.com/i-dont-need-this-one[/tag]
The 2nd param can just be left or right and if this is given, i need the URL from the first param. Otherwise, i need that one between the tags.
An url as param can be wrapped within ' or " or without any of these.
My current regular expression is this:
~\[tag(?|=[\'"]?+([^]"\']++)[\'"]?+]([^[]++)|](([^[]++)))\[/tag]~i
However, this one also includes the 2nd param in the match list and a lot more of things, that i don't want to match.
Any suggestions?
I've made some changes to do what you want. I've included your version here for easy comparison:
Yours: http://regex101.com/r/dE4aE4/1
\[tag(?:=[\'"]?(.*)[\'"]?)?]([^]]*)?\[/tag]
Mine: http://regex101.com/r/dE4aE4/3
\[tag(?:=[\'"]?([^,]*?)(?:,[^]'"]+)?[\'"]?)?]([^\[]+)?\[/tag]
Observe that I've changed a bit to get the URL without the coma (,): from (.*) to ([^,]*?)(?:,[^]'"]+)?
I've also fixed the content part: from ([^]]*)? to ([^\[]+)?

Searching a number in a specific string with regexp in jmeter

I want to find a specific number from a HTML response.
For example, I want to extract 3 from publicationID3publicationID.
Does someone know a solution with regexp?
Add Regular Expression Extractor Post Processor as a child of the request, which returns to you this string.
Configure it as follows:
Reference Name: publicationID (you can use any variable name here)
Regular Expression: publicationID(\d+)publicationID
Template: $1$
other fields can be left blank.
You can later refer publication ID as ${publicationID} or ${__V(publicationID)}
You can see what matches does your Regular Expression return using View Results Tree Listener (select RegExp Tester from dropdown). Another option is Debug Sampler again with combination with View Results Tree.
you can use \d to match a number using regex.

Contents within an attribute for both single and multiple ending tags

How can I fetch the contents within value attribute of the below tag across the files
<h:graphicImage .... value="*1.png*" ...../>
<h:graphicImage .... value="*2.png*" ....>...</h:graphicImage>
My regular expression search result should result into
1.png
2.png
All I could find was content for multiple ending tags but what about the single ending tags.
Use an XML parser instead, regex cannot truly parse XML properly, unless you know the input will always follow a particular form.
However, here is a regex you can use to extract the value attribute of h:graphicImage tags, but read the caveats after:
<h:graphicImage[^>]+value="\*(.*?)\*"
and the 1.png or 2.png will be in the first captured group.
Caveats:
here I have assumed that your 1.png, 2.png etc are always surrounded by asterisks as that is what it seems from your question (that is what the \* is for)
this regex will fail if one of the attributes has a ">" character in it, for example
<h:graphicImage foo=">" value="*1.png*"
This is what I mentioned before about regex never being able to parse XML properly.
You could work around this by adjusting your regex:
<h:graphicImage.+?+value="\*(.*?)\*"
But this means that if you had <h:graphicImage /><foo value="*1.png*"> then the 1.png from the foo tag is extracted, when you only want to extract from the graphicImage tag.
Again, regex will always have issues with corner cases for XML, so you need to adjust according to your application (for example, if you know that only the graphicImage tag will ever have a "value" attribute, then the second case may be better than the first).