Google Analytics Regular Expressions - regex

Kinda new to Rgeluar expressions and for the benefit of learning wanted to know how to do the following on one line:
page matching regular expression: .pdf/$
and page containing "somestring"
and page excluding "someotherstring"
I can obtain my desired output using the 3 rules above. My question is can I put all into one line using regular expression? So the first line would be something like:
page matching reg exp: .pdf/$ somestring+ (then regex for does not contain in GA) someotherstring
Is it possible to put all in a oner?

Lookahead will help you to match multiple independent things in one expression, and even allows to require non-matching. In your case:
/^(?=.*somestring)(?!.*someotherstring).*\.pdf$/

Related

How do I extract a specific match from Regex Extractor - Jmeter

So I got this response from the expression highlighted, but I need to get the last result the one in "Match [ 5 ] 1 "
I'm using this config, is this correct?
I've seen several examples online but they all use the same config, how can I get the one I'm asking for?
Thanks in advance
I've tried switching them and putting 5 in both fields, but I keep getting the Match [one][one] every time
If you have 5 matches you will have the special JMeter Variable form_build_id_matchNr with the value of 5 from where you can extract the last match so you can refer the last match using __V() function as:
${__V(form_build_id_${form_build_id_matchNr},)}
In this case you will need to set "Match No" in the Regular Expression Extractor to -1
More information: Here’s What to Do to Combine Multiple JMeter Variables
In general using regular expressions for parsing HTML is not the best idea, you might want to consider using CSS Selector Extractor and come up with the proper selector to match only the "interesting" value

RegEx for matching HTML tags

I am trying to use regular expression to extract start tags in lines of a given HTML code. In the following lines I expect to get only 'body' and 'h1'as start tags in the first line and 'html','head' and 'title' as start tags in the second line:
I have already tried to do this using the following regular expression:
start_tags = re.findall(r'<(\w+)\s*.*?[^\/]>',line)
'<body data-modal-target class=\'3\'><h1>Website</h1><br /></body></html>'
'<html><head><title>HTML Parser - II</title></head>'
But my output for the first line is: ['body','h1','br'], while I do not expect to catch 'br' as I excluded '/'.
And for the second line is ['html','title'], whereas I expect to catch 'head' too. It would be a grate kind if you let me know which part of my code is wrong?
If you wish to do so with regular expressions, you might want to design multiple different expressions, step by step. You may be able to connect them using OR pipes, but it may not be necessary.
RegEx 1 for h1-h6 tags
This link helps you to capture body tags excluding body and head:
(<(.*)>(.*)</([^br][A-Za-z0-9]+)>)
You might want to add more boundaries to it. For example, you can replace (.*) with lists of chars [].
RegEx Circuit
This link helps you to visualize your expressions:
RegEx 2 for head and body
For head and body tags, you might want to swipe the new lines, which you might want an expression similar to:
(<head>([\s\S]*)<\/head>)|(<body>([\s\S]*)</body>)
Performance
These expressions are rather expensive, you might want to simplify them, or write some other scripts to parse your HTMLs, or find a HTML parser maybe, to do so.

Regular expression to get value with duplicate data

Hi trying to extract my required string from given string. Given string looks like below.
1|a1|id11-name11,x|a2|id21-name21,y|a3|id31-name31~id32-name32,y4|a4|id41-name41~id42-name42~id43-name43
Expected output:
a1~name11|a2~name21|a3~name31|a3~name32|a4~name41|a4~name42|a4~name43
Regular Expression:
(^|,)[^|]{0,}\|([^|]{0,})\|(~){0,}[^-]{0,}-([^,~]{0,})
Extracting $2~$4| or \2~\4|
Regular Expression output:
a1~name11|a2~name21|a3~name31|
Is it possible to get a3~name32 along with a3~name31 using regular expression? Using multiple regular expression is also fine. Values in the third part after pipe symbol is not limited to 4 different values(id41-name41~id42-name42~id43-name43). This could be like id41-name41~id42-name42~id43-name43~id43-name43~id43-name43~id43-name43...
You have two choices first one is to split the string into many parts and get what you want.
Second one depends on the longest repeated part. In your case it is idxx-namexx.
If it is limited to a reasonable value you can repeat that part in you regex so you get all the parts. For instance for 2 you need to add the second part as follows:
([a-zA-Z]\d)\|(id\d+-(name\d+))(~?id\d+-(name\d+))?
______________-------1-------- _---------2--------_________
The groups will be
\1~\3 and
\1~\5
You can check it in Regex101 Site

regular expression multiple matches

For reference, this is the regex tester I am using:
http://www.rsyslog.com/regex/
How can I modify this regular expression:
[^;]+
to receive multiple sub-matches for the following test string:
;first;second;third;fourth;fifth and sixth;seventh;
I currently only receive one sub-match:
first
Basically I want each sub-match to consist of the content between ; characters, I am hoping for a sub-match list like this:
first
second
third
fourth
fifth and sixth
seventh
Following information given in the comments I discovered that the reason I can't get more than one sub-match is that I need to specify the global modifier - and I can't seem to figure out how to do that in the ryslog regex tester I am using.
However, this did lead me to solve my problem in a slightly different manner. I came up with this regular expression which still only gives one match, but the number near the end acts as the index for the desired match, so for example:
(?:;([^;]+)){5}
matches this from my test string in the question:
fifth and sixth
While this solution allows me to achieve what I wanted - though in a different manner - the true answer to my question is found in HamZa's comments. More specifically:
How can I modify the regular expression to receive multiple
sub-matches?
The answer is, you can't modify the regular expression itself in order to get multiple sub-matches. Setting the global modifier is required in order to do that.
Based on this information I have posted a new question on serverfault targeted specifically to the rsyslog regular expression system.

Reg Ex for hyperlinks in comments

I am trying to find a solution to extract an hyperlink out of every comment which begins with %. My first idea was to use a regular hyperlink regex:
^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s]$
and some kind of pattern like:
%.*
so I added them both to:
^%.*(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s]$
But with this pattern I match everything, including the % character and multiple spaces. How can I get only the hyperlink inside the comment?
EDIT1:
Here is an example what to parse:
% http://www.test.com
It is a regular MATLAB Comment and i want to highlight it like a hyperlink to get a more intuitive editor. I am working with Qt 4.7.1 / C++
Thanky for all the answers !
I guess it depends a little on the language that is executing your regex, but you could try putting the URL part in parentheses:
%.*((http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s])
That way you can access it as a group (usually an expression such as $1).