Regex JSON response Gatling stress tool - regex

Wanting to capture a variable called scanNumber in the http response loking like this:
{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}
How can I do this with a regular experssion?
The tool I am using is Gatling stress tool (with the Scala DSL)
I have tried to do it like this:
.check(jsonPath("""${scanNumber}""").saveAs("scanNr")))
But I get the error:
---- Errors --------------------------------------------------------------------
> Check extractor resolution crashed: No attribute named 'scanNu 5 (100,0%)
mber' is defined

You were close first time.
What you actually want is:
.check(jsonPath("""$..scanNumber""").saveAs("scanNr")))
or possibly:
.check(jsonPath("""$.profile.memberships[0].scanNumber""").saveAs("scanNr")))
Note that this uses jsonPath, not regular expressions. JsonPath should more reliable than regex for this.
Check out the JsonPath spec for more advanced usage.

use this regex to match this in anywhere in json:
/"scanNumber":"[^"]+"/
and if you want to match just happens in structure you said use:
/\{[^{[]+\{[^{[]+\[\{[^{[]*("scanNumber":"[^"]+")/

Since json fields may change its order you should make your regex more tolerant for those changes:
val j = """{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}"""
val scanNumberRegx = """\{.*"memberships":\[\{.*"scanNumber":"([^"]*)".*""".r
val scanNumberRegx(scanNumber) = j
scanNumber //String = 123-456-123-123
This will work even if the json fields will be in different order (but of course keep the structure)

Related

Regex for finding the name of a method containing a string

I've got a Node module file containing about 100 exported methods, which looks something like this:
exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};
Goal: What I'd like to do is figure out how to grab the name of any method which contains a call to fooMethod, and return the correct method names: methodTwo and methodThree. I wrote a regex which gets kinda close:
exports\.(\w+).*(\n.*?){1,}fooMethod
Problem: using my example code from above, though, it would effectively match methodOne and methodThree because it finds the first instance of export and then the first instance of fooMethod and goes on from there. Here's a regex101 example.
I suspect I could make use of lookaheads or lookbehinds, but I have little experience with those parts of regex, so any guidance would be much appreciated!
Edit: Turns out regex is poorly-suited for this type of task. #ctcherry advised using a parser, and using that as a springboard, I was able to learn about Abstract Syntax Trees (ASTs) and the recast tool which lets you traverse the tree after using various tools (acorn and others) to parse your code into tree form.
With these tools in hand, I successfully built a script to parse and traverse my node app's files, and was able to find all methods containing fooMethod as intended.
Regex isn't the best tool to tackle all the parts of this problem, ideally we could rely on something higher level, a parser.
One way to do this is to let the javascript parse itself during load and execution. If your node module doesn't include anything that would execute on its own (or at least anything that would conflict with the below), you can put this at the bottom of your module, and then run the module with node mod.js.
console.log(Object.keys(exports).filter(fn => exports[fn].toString().includes("fooMethod(")));
(In the comments below it is revealed that the above isn't possible.)
Another option would be to use a library like https://github.com/acornjs/acorn (there are other options) to write some other javascript that parses your original target javascript, then you would have a tree structure you could use to perform your matching and eventually return the function names you are after. I'm not an expert in that library so unfortunately I don't have sample code for you.
This regex matches (only) the method names that contain a call to fooMethod();
(?<=exports\.)\w+(?=[^{]+\{[^}]+fooMethod\(\)[^}]+};)
See live demo.
Assuming that all methods have their body enclosed within { and }, I would make an approach to get to the final regex like this:
First, find a regex to get the individual methods. This can be done using this regex:
exports\.(\w+)(\s|.)*?\{(\s|.)*?\}
Next, we are interested in those methods that have fooMethod in them before they close. So, look for } or fooMethod.*}, in that order. So, let us name the group searching for fooMethod as FOO and the name of the method calling it as METH. When we iterate the matches, if group FOO is present in a match, we will use the corresponding METH group, else we will reject it.
exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})
Explanation:
exports\.(?<METH>\w+): Till the method name (you have already covered this)
(\s|.)*?\{(\s|.)*?: Some code before { and after, non-greedy so that the subsequent group is given preference
(\}|(?<FOO>fooMethod)(\s|.)*?\}): This has 2 parts:
\}: Match the method close delimiter, OR
(?<FOO>fooMethod)(\s|.)*?\}): The call to fooMethod followed by optional code and method close delimiter.
Here's a JavaScript code that demostrates this:
let p = /exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})/g
let input = `exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};';`
let match = p.exec( input );
while( match !== null) {
if( match.groups.FOO !== undefined ) console.log( match.groups.METH );
match = p.exec( input )
}

How can I fix this error in my go project - undefined: bson.RegEx

I get the following error from my editor:
undefined: bson.RegEx
due to this line of code in my go project:
regex := bson.M{"$regex": bson.RegEx{Pattern: id, Options: "i"}}
Why am I getting this error and how can I resolve it?
I've made sure that I'm importing:
"go.mongdb.org/mongo-driver/bson"
I've also checked inside bson/primitive/primitive.go to see that RegEx does exist
Using version 1.1.0 of mongo-driver.
Managed to work around the problem by removing this:
regex := bson.M{"$regex": bson.RegEx{Pattern: id, Options: "i"}}
and add this instead:
regex := `(?i).*` + name + `.*`
filter = bson.M{"name": bson.M{"$regex": regex}}
Why am I getting this error and how can I resolve it?
Using mongo-go-driver v1+, you can utilise bson.primitive. For example:
patternName := `.*` + name + `.*`
filter := bson.M{"name": primitive.Regex{Pattern: patternName, Options:"i"}}
cursor, err := collection.Find(context.TODO(), filter)
This is imported from "go.mongodb.org/mongo-driver/bson/primitive".
In addition, I would also suggest to consider the search pattern. You can optimise a regex search if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. For example, ^name.* will be optimised by matching only against the values from the index that starts with name.
Also worth noting that case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilise case-insensitive indexes. Please see $regex index use for more information.
Depending on the use case, consider MongoDB Text Search. For example, you can create a text index:
db.collection.createIndex({"name":"text"});
Which then you can search using:
filter := bson.M{"$text": bson.M{"$search": name}}
cur, err := collection.Find(context.TODO(), filter)
Also worth mentioning depending on your requirements, there's also MongoDB Atlas Full Text Search feature for advanced search functionality. i.e. text analysers.

JMeter Response Assertion: incorporating property into pattern test

I'd like to setup a JMeter test plan to suggest whether a web site (URL) is Drupal-based (based completely on the HTTP response from the site) and compare it with existing data that I have on the environment. (I realize that using an HTTP approach, as opposed to say examining the site's file system, is "iffy" but I'm curious how useful the approach is)
The JMeter command line might look like this:
JMeter -t "DrupalAssertions.jmx" -Jurl=http://my.dot.com -Jdrupal=true
where I provide the URL to test and an additional property "drupal" indicating my best guess on whether the site is Drupal-based.
In my test plan, I add an HTTP Request to return the HTML content of the page for the URL. I'm then able to successfully add a Response Assertion that tests a pattern (say (?i)(drupal) for a sadly lacking pattern) to see if it's contained in the response.
That much works fine, or as expected, but what I'd like to do is to compare the value of the "drupal" property against the result of that pattern test in that same Response Assertion. I know I'm missing something simple here, but I'm not seeing how to do that.
I want to try to use an expression like this:
(?i)(drupal) == ${__P(drupal)}
in a pattern, but that doesn't work. The name of the Compare Assertion looks promising, but I don't see how to incorporate the property into a comparison.
Update: The approach suggested by PMD UBIK-INGENIERIE does work. I used a Regular Expression Extractor like this:
<RegexExtractor guiclass="RegexExtractorGui" testclass="RegexExtractor" testname="Extract Drupal in Response" enabled="true">
<stringProp name="RegexExtractor.useHeaders">false</stringProp>
<stringProp name="RegexExtractor.refname">drupalInResponse</stringProp>
<stringProp name="RegexExtractor.regex">(.*drupal.*)</stringProp>
<stringProp name="RegexExtractor.template">$0$</stringProp>
<stringProp name="RegexExtractor.default">__false__</stringProp>
<stringProp name="RegexExtractor.match_number">1</stringProp>
</RegexExtractor>
followed by this BeanShell Assertion:
// Variable "drupalInResponse" is "__false__" by default
if ( !(vars.get("drupalInResponse").equals("__false__") ) ) {
vars.put("drupalInResponse","true");
}
else {
vars.put("drupalInResponse","false");
}
print("\n\nThe value of property 'drupal' is: " + props.get("drupal") + "\n");
print("\n\nThe value of variable 'drupalInResponse' is: " + vars.get("drupalInResponse") + "\n");
if (vars.get("drupalInResponse").equals( props.get("drupal") ) ) {
print("Site Drupalness is consistent with your beliefs");
}
else {
print("You're wrong about the site's Drupalness");
Failure = true;
FailureMessage = "Incorrect Drupal assumption";
}
In the Regular Expression Extractor, I'd set a default value that I felt wouldn't be matched by my pattern of interest, then did an ugly verbose Java comparison with the "drupal" property in the BeanShell Assertion.
Wish somehow that the assertion could be made in a single component rather than it having two parts, but you can't argue with "working" :)
You can use a regexp extractir with your first pattern
Then use a Beanshell assertion which will use your variable and compare it to drupal property.

Regular expression for youtube links

Does someone have a regular expression that gets a link to a Youtube video (not embedded object) from (almost) all the possible ways of linking to Youtube?
I think this is a pretty common problem and I'm sure there are a lot of ways to link that.
A starting point would be:
http://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
http://www.youtube.com/watch?v=t-ZRX8984sc
http://youtu.be/t-ZRX8984sc
... please add more possible links and/or regular expressions to detect them.
So far I got this Regular expression working for the examples I posted, and it gets the ID on the first group:
http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?‌​[\w\?‌​=]*)?
You can use this expression below.
(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?\/?.*(?:watch|embed)?(?:.*v=|v\/|\/)([\w\-_]+)\&?
I'm using it, and it cover the most used URLs.
I'll keep updating it on This Gist.
You can test it on this tool.
I like #brunodles's solution the most but you can still match non video links like https://www.youtube.com/feed/subscriptions
I went with this solution
(?:https?:\/\/)?(?:www\.)?youtu(?:\.be\/|be.com\/\S*(?:watch|embed)(?:(?:(?=\/[-a-zA-Z0-9_]{11,}(?!\S))\/)|(?:\S*v=|v\/)))([-a-zA-Z0-9_]{11,})
It can also be used to match multiple whitespace separated links.
The video id will be captured in the first group.
Tested with the following urls:
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
https://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
https://www.youtube.com/embed/watch?v=iwGFalTRHDA
https://www.youtube.com/embed/v=iwGFalTRHDA
https://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share
https://m.youtube.com/watch?v=iwGFalTRHDA
// will not match
https://www.youtube.com/feed/subscriptions
https://www.youtube.com/channel/UCgc00bfF_PvO_2AvqJZHXFg
https://www.youtube.com/c/NatGeoEdOrg/videos
https://regex101.com/r/rq2KLv/1
I improved the links posted above with a friend for a script I wrote for IRC to recognize even links without http at all. It worked on all stress tests I got so far, including garbled text with barely recognizable youtube urls, so here it is:
~(?:https?://)?(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w\-]+)(?:&(?:amp;)?[\w\?=]*)?~
I testet all the regular expressions that are shown here and none could cover all url types that my client was using.
I built this pretty much through trial and error, but it seems to work with all the patterns that Poppy Deejay posted.
"(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/)([a-zA-Z0-9_-]{11})+"
Maybe it helps someone who is in a similar situation that I had today ;)
Piggy backing on Fanmade, this covers the below links including the url encoded version of attribution_links:
(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/|watch\%3Fv\%3D)([a-zA-Z0-9_-]{11})+
https://www.youtube.com/attribution_link?a=tolCzpA7CrY&u=%2Fwatch%3Fv%3DMoBL33GT9S8%26feature%3Dshare
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
www.youtu.be/iwGFalTRHDA
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ
I've been having problems lately with the atttribution_link urls so i tried making my own regex that works for those too.
Here is my regex string:
(https?://)?(www\\.)?(yotu\\.be/|youtube\\.com/)?((.+/)?(watch(\\?v=|.+&v=))?(v=)?)([\\w_-]{11})(&.+)?
and here are some test cases i've tried:
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
www.youtu.be/iwGFalTRHDA
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ
Also remember to check the string you get for your video url, sometimes it may get the percent characters. If so just do this
url = [url stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
and it should fix it.
Remember also that the index of the youtube key is now index 9.
NSRange youtubeKey = [result rangeAtIndex:9]; //the youtube key
NSString * strKey = [url substringWithRange:youtubeKey] ;
It'd be the longest RegEx in the world if you managed to cover all link formats, but here's one to get you started which will cover the first couple of link formats:
http://(www\.)?youtube\.com/watch\?.*v=([a-zA-Z0-9]+).*
The second group will match the video ID if you need to get that out.
(?:http?s?:\/\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\.?be)(?:\.com)?(?:(?:\w*.?:\/\/)?\w*.?\w*-?.?\w*\/(?:embed|e|v|watch|.*\/)?\??(?:feature=\w*\.?\w*)?&?(?:v=)?\/?)([\w\d_-]{11})(?:\S+)?
https://regex101.com/r/nJzgG0/3
Detects YouTube and YouTube Music link in any string
I took all variants from here:
https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486#file-youtubeurlformats-txt
And built this regexp (YouTube ID is in group 2):
(\/|%3D|v=|vi=)([0-9A-z-_]{11})[%#?&\s]
Check it here: https://regexr.com/4u4ud
Edit: Works for any single string w/o breaks.
I'm working with that kind of links:
http://www.youtube.com/v/M-faNJWc9T0?fs=1&rel=0
And here's the regEx I'm using to get ID from it:
"(.+?)(\/v/)([a-zA-Z0-9_-]{11})+"
This is iterating on the existing answers and handles edge cases better. (for example http://thisisnotyoutu.be/thing)
/(?:https?:\/\/|www\.|m\.|^)youtu(?:be\.com\/watch\?(?:.*?&(?:amp;)?)?v=|\.be\/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?/
here is the complete solution for getting youtube video id for java or android, i didn't found any link which doesn't work with this function
public static String getValidYoutubeVideoId(String youtubeUrl)
{
if(youtubeUrl == null || youtubeUrl.trim().contentEquals(""))
{
return "";
}
youtubeUrl = youtubeUrl.trim();
String validYoutubeVideoId = "";
String regexPattern = "^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
Pattern regexCompiled = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = regexCompiled.matcher(youtubeUrl);
if(regexMatcher.find())
{
try
{
validYoutubeVideoId = regexMatcher.group(1);
}
catch(Exception ex)
{
}
}
return validYoutubeVideoId;
}
This is my answer to use in Scala. This is useful to extract 11 digits from Youtube's URL.
"https?://(?:[0-9a-zA-Z-]+.)?(?:www.youtube.com/|youtu.be\S*[^\w-\s])([\w -]{11})(?=[^\w-]|$)(?![?=&+%\w](?:[\'"][^<>]>|))[?=&+%\w-]*"
def getVideoLinkWR: UserDefinedFunction = udf(f = (videoLink: String) => {
val youtubeRgx = """https?://(?:[0-9a-zA-Z-]+\.)?(?:youtu\.be/|youtube\.com\S*[^\w\-\s])([\w \-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:[\'"][^<>]*>|</a>))[?=&+%\w-./]*""".r
videoLink match {
case youtubeRgx(a) => s"$a".toString
case _ => videoLink.toString
}
}
Youtube video URL Change to iframe supported link:
REGEX: https://regex101.com/r/LeZ9WH/2/
http://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
http://www.youtube.com/watch?v=t-ZRX8984sc
http://youtu.be/t-ZRX8984sc
https://youtu.be/2sFlFPmUfNo?t=1
Php function example:
if (!function_exists('clean_youtube_link')) {
/**
* #param $link
* #return string|string[]|null
*/
function clean_youtube_link($link)
{
return preg_replace(
'#(.+?)(\/)(watch\x3Fv=)?(embed\/watch\x3Ffeature\=player_embedded\x26v=)?([a-zA-Z0-9_-]{11})+#',
"https://www.youtube.com/embed/$5",
$link
);
}
}
This should work for almost all youtube links when extracting from a string:
((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]{10}).\b
var isValidYoutubeLink: Bool{
// working for all the youtube url's
NSPredicate(format: "SELF MATCHES %#", "(?:http?s?:\\/\\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\\.?be)(?:\\.com)?(?:(?:\\w*.?:\\/\\/)?\\w*.?\\w*-?.?\\w*\\/(?:embed|e|v|watch|.*\\/)?\\??(?:feature=\\w*\\.?\\w*)?&?(?:v=)?\\/?)([\\w\\d_-]{11})(?:\\S+)?").evaluate(with: self)
}
With this Javascript Regex, the first capture is a video ID :
^(?:https?:)?(?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube(?:\-nocookie)?\.(?:[A-Za-z]{2,4}|[A-Za-z]{2,3}\.[A-Za-z]{2})\/)(?:watch|embed\/|vi?\/)*(?:\?[\w=&]*vi?=)?([^#&\?\/]{11}).*$
(?-s)^https?\W+(?:www\.|m\.|music\.)*youtu\.?be(?:\.com|\/watch|\/o?embed|\/shorts|\/attribution_link\?[&\w\-=]*[au]=|\/ytsc\w+|[\?&\/]+[ve]i?\b|\?feature=\w+|-nocookie)*[\/=]([a-z\d\-_]{11})[\?&#% \t ] *.*$
or
(?-s)^(?:(?!https?[:\/]|www\.|m\.yo|music\.yo|youtu\.?be[\/\.]|watch[\/\?]|embed\/)\V)*(?:https?[:\/]+|www\.|m\.|music\.)+youtu\.?be(?:\.com\/|watch|o?embed(?:\/|\?url=\S+?)?|shorts|attribution_link\?[&\w\-=]*[au]=\/?|ytsc\w+|[\?&]*[ve]i?\b|\?feature=\w+|[\?&]time_continue=\d+|-nocookie|%[23][56FD])*(?:[\/=]|%2F|%3D)([a-z\d\-_]{11})[\?&#% \t ]? *.*$
(the part >>#% \t⠀ ]<< should contain continuous space, which is Alt+255, but stackoverflow-com can't print it)
(this string may be replaced to \1, sorted and abbreviated with: )
V█(?-i)^([A-Za-z\d\-_]{11})(?:\v+\1)*$
>█https:\/\/youtu\.be\/\1
(./dot can take up any symbol; \V or [^\r\n] can any except special, emoji and others; this >> [^!-⠀:/‽|\s] << can grab some emoji)
https://youtu.be/x26ANNC3C-8 • ♾ 𝕳𝕰𝕽𝕰𝕿𝕳𝕰𝖄𝕮𝕺𝕸𝕰 - 𝔩𝔢𝔞𝔳𝔢 𝔪𝔢 𝔞𝔩𝔬𝔫𝔢 • 7:15
This regex solve my problem, I can get youtube link having watch, embed or shared link
(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)
You can check here https://regex101.com/r/Kvk0nB/1

How to use regex in selenium locators

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:
http://[^/]*\d+com
I would like to use:
sel.get_attribute( '//a[regx:match(#href, "http://[^/]*\d+.com")]/#name' )
which would return a list of the name attribute of all the links that match the regex.
(or something like it)
thanks
The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:
xpath=//div[matches(#id,'che.*boxes')]
(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')
Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).
If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)
You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an # and the attribute name. For example in Java this might be:
String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();
for (String linkId : allLinks) {
String linkHref = selenium.getAttribute("id=" + linkId + "#href");
if (linkHref.matches("http://[^/]*\\d+.com")) {
matchingLinks.add(link);
}
}
A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer:
selenium: Is it possible to use the regexp in selenium locators
Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.
You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.
Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.
Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.
//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements
//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:
http://jsoup.org/cookbook/extracting-data/dom-navigation
the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/
Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:
void MyCallingMethod(IWebDriver driver)
{
//Search by ID:
string attrName = "id";
//Regex = 'a number that is 1-10 digits long'
string attrRegex= "[0-9]{1,10}";
SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{
List<IWebElement> elements = new List<IWebElement>();
//Allows spaces around equal sign. Ex: id = 55
string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
//Search page source
MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
//iterate over matches
foreach (Match match in matches)
{
//Get exact attribute value
Match innerMatch = Regex.Match(match.Value, attrRegex);
cssSelector = "[" + attrName + "=" + attrRegex + "]";
//Find element by exact attribute value
elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
}
return elements;
}
Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.