Contents within an attribute for both single and multiple ending tags - regex

How can I fetch the contents within value attribute of the below tag across the files
<h:graphicImage .... value="*1.png*" ...../>
<h:graphicImage .... value="*2.png*" ....>...</h:graphicImage>
My regular expression search result should result into
1.png
2.png
All I could find was content for multiple ending tags but what about the single ending tags.

Use an XML parser instead, regex cannot truly parse XML properly, unless you know the input will always follow a particular form.
However, here is a regex you can use to extract the value attribute of h:graphicImage tags, but read the caveats after:
<h:graphicImage[^>]+value="\*(.*?)\*"
and the 1.png or 2.png will be in the first captured group.
Caveats:
here I have assumed that your 1.png, 2.png etc are always surrounded by asterisks as that is what it seems from your question (that is what the \* is for)
this regex will fail if one of the attributes has a ">" character in it, for example
<h:graphicImage foo=">" value="*1.png*"
This is what I mentioned before about regex never being able to parse XML properly.
You could work around this by adjusting your regex:
<h:graphicImage.+?+value="\*(.*?)\*"
But this means that if you had <h:graphicImage /><foo value="*1.png*"> then the 1.png from the foo tag is extracted, when you only want to extract from the graphicImage tag.
Again, regex will always have issues with corner cases for XML, so you need to adjust according to your application (for example, if you know that only the graphicImage tag will ever have a "value" attribute, then the second case may be better than the first).

Related

kimonolabs >Text before comma

I'm trying to scrape a piece of text from a website using Kimonolabs. The text is succesfully scraped using the advanced setting:
div > div > ul > li.location > span.value
The text being scraped using this CSS selector is:
Cityname, streetname 1
However, I wish to delete everything before the comma so that only remains:
Cityname
I wish to do this with regex, but I'm totally ignorant about it. What I do konw is that it has to containof 3 blocks when using Kimonolabs: https://help.kimonolabs.com/hc/en-us/articles/203043464-Manually-input-regular-expressions
Can anybody help me setting up the correct regex? All I got so far is the following, but it's not the correct markup for Kimonolabs (it doesn't allow for it in the dashboard):
^(.+?),
See the docs you referred to:
The regular expression pattern in kimono is defined in three parts. It's important that any custom regular expression you produce retains the three part notation, with the surrounding ( ) for each part. The first part refers to the pattern to the left of the desired content. The middle part refers to the pattern that the desired content must match and the third part refers to the pattern to the right of the desired content.
So, you seem to need:
/^()([^,]+)()/
Or, /(^)([^,]+)(,)/ (it should be equivalent), and the 2nd capture group (the middle part) should capture the Cityname.

extracting a part of an href in JMeter

i'm stuck with the following.
i have a page on ibm filenet containing a list with objects (these are documents or files) which have a specific classID and ID in their href. i need JMeter to get all HREFS containing a specific type of ID:
<a href="http://ipaddress/Workplace/Browse.jsp?eventTarget=WcmController&eventName=GetInfo&id={350B278C-DE7D-44DE-9B54-099672152476}&vsId=&classId={F14AC85A-4474-479A-9B4E-BCBA180B7975}&objectStoreName=Nice&majorVersion=&minorVersion=&versionStatus=&mimeType=&mode=&objectType=customobject&isPopup=true" target="_blank">
the 'classId' = {F14AC85A-4474-479A-9B4E-BCBA180B7975} is the right class id type i need to click on the page (there are several files with this classID but that is no problem). on the other hand the 'id' is thus different for each file.
how can i extract all 'id's containing this specific classId and make JMeter pass it to the next sampler, so it clicks on just one of them? what will my RegEx look like?
As already mentionned in the comment, I do not know jmeter and how to implement it in the code. A regular expression to match both id and classId within a link would be:
~(classId=|id=)([^&]*)~g
This is, search for a string classId= or id= first. If one of the strings is found, match any character afterwards, except an ampersand (&), as many times as possible (*) and capture it in a group (brackets). Possibly you need to fiddle with the parameters (e.g. /g for global) after the regex.
See this regex101 fiddle for more information.

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Regex to find arguments in text

There's undoubtedly a better way to do this but this is the way my requirements need me to do this.
I'm creating a search form for my web application. I want to use a tagged based search. So I'm using regex to make it work.
So I have a search string: 'c:john customer:15478'
The regex needs to find the tag (c:) and the argument (john), drop the tag, and give me the argument -- and it needs to do so for all of the instances of a tag and their arguments. The regex I have comes close, but it doesn't work correctly. It doesn't grab every argument, or drop the tags in a consistent way. So the question: what's wrong with my regex that needs to be fixed in order to achieve the correct results?
Currently it finds the first tag, grabs its argument, and everything else after it. I need it to stop the match after it finds an argument. i.e. in the case above it will match john customer:15478
Maybe a better question is how do I make VB's regex return everything between the first colon, and the beginning of the next tag (which is followed by another colon) or otherwise stop matching at the beginning of the next tag?
Regex:
(?<=({0}({1})??:)+?)(\S+\s*\S*)(?=\s+?\b\w+:.+?)??
The {0} and the {1} represent a String.format call using a string, say Customer (but it could be anything), to define the tag. the {0} is the first character, and the {1} are the rest of the characters. This regex will match anything that exists behind the tag including another tag and its argument if it exists. So for the string
"c:5401 4664 c:john smith p:joam d:domain.com p:1548 c:215-548-5487 d:""192.168.0.1"""
The matches would be
'5401 4664, john smith, 215-548-5487 d:"192.168.0.1"'
'domain.com p:1548, "192.168.0.1"'
'joam d:domain.com, 1548 c:215-548-5487'
given the tags I have defined. The regex fails to stop its matching at the start of the next tag.
If I undestood You correctly this should solve the problem in general:
/\w+:([^:]+)(?:\s|$)/g
https://regex101.com/r/vN6fH1/1
and with defined tag it would look like this:
/{0}({1})?:([^:]+)(?:\s|$)/g
but this still rely on semicolon not tag name
(so it won't match at all if You did not pass tag name that is in string)

Regexp for finding tags without nested tags

I'm trying to write a regexp which will help to find non-translated texts in html code.
Translated texts means that they are going through special tag: or through construction: ${...}
Ex. non-translated:
<h1>Hello</h1>
Translated texts are:
<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>
I've written the following expression:
\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>
It finds correct strings like:
<p>text<p>
Correctly skips
<a><fmt:message key="common.delete" /></a>
But also catches:
<li><p><fmt:message key="common.delete" /></p></li>
And I can't figure out how to add exception for ${...} strings in this expression
Can anybody help me?
If I understand you properly, you want to ensure the data inside the "tag" doesn't contain fmt:messsage or ${....}
You might be able to use a negative-lookahead in conjuction with a . to assert that the characters captured by the . are not one of those cases:
/<(\w+)[^>]*>(?:(?!<fmt:message|\$\{|<\/\1>).)*<\/\1>/i
If you want to avoid capturing any "tags" inside the tag, you can ignore the <fmt:message portion, and just use [^<] instead of a . - to match only non <
/<(\w+)[^>]*>(?:(?!\$\{)[^<])*<\/\1>/i
Added from comment If you also want to exclude "empty" tags, add another negative-lookahead - this time (?!\s*<) - ensure that the stuff inside the tag is not empty or only containing whitespace:
/<(\w+)[^>]*>(?!\s*<)(?:(?!\$\{)[^<])*<\/\1>/i
If the format is simple as in your examples you can try this:
<(\w+)>(?:(?!<fmt:message).)+</\1>
Rewritten into a more formal question:
Can you match
aba
but not
aca
without catching
abcba ?
Yes.
FSM:
Start->A->B->A->Terminate
Insert abcba and run it
Start is ready for input.
a -> MATCH, transition to A
b -> MATCH, transition to B
c -> FAIL, return fail.
I've used a simple one like this with success,
<([^>]+)[^>]*>([^<]*)</\1>
of course if there is any CDATA with '<' in those it's not going to work so well. But should do fine for simple XML.
also see
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
for a discussion of using regex to parse html
executive summary: don't