Using RegEx to extract a string in a URL - regex

I've tried to search for this and I'm sure versions of this question have been asked, but I haven't been able to apply other answers to my case.
I need to use RegEx to extract a random string of characters and symbols that appears in the URL when an advertiser sends traffic to me.
The referring URL looks something like this, with the part I want to extract in bold:
https://adclick.g.doubleclick.net/pcs/click%**long-string-of-characters-and-symbols**https://www.mywebsite.com
That long string of characters and symbols (the hash) contains multiple % signs so I need the entire string after the first % sign, but before my website's URL.
I've been pulling my hair out on this and any help would be appreciated!

You can use:
(?<=%).*(?=https)
How it works:
(?<=%) Positive lookbehind: search for a character preceeded by %
.* matches everything until...
(?=https): the first https occurs (Positive lookhead)

Related

Fluentvalidation 6.4.1.0 support me with Incorrect regex

In my case, i want to validate for url image, some url is valid but result is wrong.
Eg: link image is "https://fuvitech.online/wpcontent/uploads/2021/02/bta16600brg.jpg" or "https://fuvitech.online/wp-content/uploads/2021/02/bta16-600brg.jpg" reponse "The image link is not in the correct format".
My code here:
RuleFor(product => product.Images)
.Length(1, 3000).WithMessage(Labels.importProduct_ExceedDescription, p => ImportHelpers.GetColumnName(typeof(ProductEntity).GetProperty(nameof(p.Images))))
.Matches(#"^(http:\/\/|https:\/\/){1}?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$").WithMessage(Labels.importProduct_UrlNotCorrect, p => ImportHelpers.GetColumnName(typeof(ProductEntity).GetProperty(nameof(p.Images))));
Please help me where the above regex is wrong. Thank you.
Try this:
NOTE the following regex pattern may trigger false positives and also may ignore valid image URLs, because it is very difficult to validate whether a given URL is valid.
^https?:\/\/(?:(?:[A-Za-z0-9]+(?:-[A-Za-z0-9]+)+|[A-Za-z0-9]{2,})\.)+[A-Za-z]{2,}(?::\d+)?\/(?:(?:[A-Za-z0-9]+(?:(?:-[A-Za-z0-9]+)+)?\/)+|)[\w-]+\.(?:jpg|jpeg|png)$
Explanation
^ the start of a line/string.
https?:\/\/ match http with an optional letter s, followed by ://.
(?:(?:[A-Za-z0-9]+(?:-[A-Za-z0-9]+)+|[A-Za-z0-9]{2,})\.)+ This will match things like foo-foo.bar-bar., foo.bar-bar. and foo.
[A-Za-z]{2,} this will match the TLD part, e.g., com, org, this part with the previous part will match things like foo-foo.bar-bar.com, foo.bar-bar.com or foo.com.
(?::\d+)? optional group of (a colon : followed by one or more digits) for port part.
\/(?:(?:[A-Za-z0-9]+(?:(?:-[A-Za-z0-9]+)+)?\/)+|) this check for two things, the first one is /uploads/public-images/, /uploads/images/, the second one is a single /.
[\w-]+ this part for the file name, e.g., bta16-600brg.
\.(?:jpg|jpeg|png) you can add here multiple extensions, you can allow uppercase letters by using for example, [Jj][Pp][Gg] for jpg.
$ the end of the line/string.
See regex demo
Thanks #SaSkY answer my question.
I found my mistake.
This source [.[a-z]{2,5}] only allows domain extensions from 2-5 characters. Example [.com] is valid. But in my case [.online] was not valid.
I changed to [.[a-z]{1,10}].

Get an exact regex match of an email value from a list of email addresses

I have a text field which stores a list of email addresses e.g: x#demo.com; a.x#demo.com. I have another text field which stores the exact value matched from the list of emails i.e. if /x#demo.com/i is in x#demo.com;a.x#demo.com then it should return x#demo.com.
The issue I am having is that if I have /a.x#demo.com/i, I will get x#demo.com instead of a.x#demo.com
I know of the regex expression /^x#demo.com$/i, but this means I can only have one email in my list of email addresses which won't help.
I have tried a couple of other regex expressions with no luck.
Any ideas on how I can achieve this?
You can use this slightly changed regex:
/(^|;)x#demo.com($|;)/i
It will match from either beginning of string or start after a semi colon and end either at end of string or at a semi colon.
Edit:
Small change, this uses look behind and look forward, then you will only get the match, you want:
(?<=^|;)x#demo.com(?=$|;)
Edit2:
To allow Spaces around the semi colon and at start and end, use this (#-quoted):
#"(?<=^\s*|;\s*)x#demo.com(?=\s*$|\s*;)"
or use double escaping:
"(?<=^\\s*|;\\s*)x#demo.com(?=\\s*$|\\s*;)"

JMeter extract link using regular expression pass into next request with blank values

This is how I have Test Plan set up:
HTTP Request -> Regular Expression Extractor to extract multiple links - This is extracting correctly -- But some of the links are Blank
RegularExpressionExtractor --- <a href="(.*)" class="product-link">
BeanShell Sampler - to filter blank or null values -- This works fine
BeanShell Sampler
log.info("Enter Beanshell Sampler");
matches = vars.get("url_matchNr");
log.info(matches);
for (Integer i=1; i < Integer.parseInt(matches); i++)
{
String url = vars.get("url_"+i);
//log.info(url1);
if(url != null #and url.length() > 0)
{
log.info(i+"->" + url);
//return url;
//vars.put("url2", url);
vars.put("url2", url);
//props.put("url2", url);
log.info("URL2:" + vars.get("url2"));
}
}
ForEach Controller
ForEach Controller
Test Plan
The problem I am facing is ForEach Controller runs through all the values including Blank or NULL -- How can I run the loop only for the non null blank values
You should change your regular expression to exclude empty value
Instead of using any value including empty using * sign
<a href="(.*)" class="product-link">
Find only not empty strings using + sign:
<a href="(.+)" class="product-link">
As mentioned earlier, you should change your regex!
you can replace it directly by
<a href="(.+)" class="product-link">
or by something more constraining like this:
<a href="^((https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?)$" class="product-link">
which is a regex to match only URLs.
https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149
The first capturing group is all option. It allows the URL to begin
with "http://", "https://", or neither of them. I have a question mark
after the s to allow URL's that have http or https. In order to make
this entire group optional, I just added a question mark to the end of
it.
Next is the domain name: one or more numbers, letters, dots, or hypens
followed by another dot then two to six letters or dots. The following
section is the optional files and directories. Inside the group, we
want to match any number of forward slashes, letters, numbers,
underscores, spaces, dots, or hyphens. Then we say that this group can
be matched as many times as we want. Pretty much this allows multiple
directories to be matched along with a file at the end. I have used
the star instead of the question mark because the star says zero or
more, not zero or one. If a question mark was to be used there, only
one file/directory would be able to be matched.
Then a trailing slash is matched, but it can be optional. Finally we
end with the end of the line.
String that matches:
http://net.tutsplus.com/about
String that doesn't match:
http://google.com/some/file!.html (contains an exclamation point)
Good luck!!!
ForEach controller doesn't work with JMeter Properties, you need to change the "Input Variable Prefix" to url_2 and your test should start working as expected.
Also be aware that since JMeter 3.1 it is recommended to use Groovy language for any form of scripting so consider migrating to JSR223 Sampler and Groovy language on next available opportunity.
Groovy has much better performance while Beanshell might become a bottleneck when it comes to immense loads.

RegEx to cut out URL

I try to get an URL from a String of the following format:
RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH
I already tried some things, especially the the look before/after, which I used before successfully on another url format (starts https... ends .html, this was working).
But seems I'm too stupid to figure out the regex for the kind of string mentioned above. I just want the URL part from https.... to the end of the random last name. Is this even possible?
Any Ideas?
If you can guarantee that randomfirstname_randomlastname is all lowercase and RANDOMRUBBISH is all uppercase, you can use character classes [a-z] and [A-Z]. The language the regex is for will determine how to use these.
This is example works in javascript:
var str = "RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH";
var match = /https:\/\/www\.my-url\.com\/[a-z]*/.exec(str);

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+