Regular expressions and Selenium WebDriver xpath

Regular expressions and Selenium WebDriver xpath - regex

How can I fix this code to work?
public void check(WebDriver driver) {
driver.findElement(By.xpath("//a[matches(#href,'/staff/transcript/\\d{5}//.pdf')]")).click();
}
I must find a link where 5-digit indentifier varies.

Try to get href attribute
parse that string to get that 5 digit identifier
use that identifier and construct your locator and click.
String href=driver.findElement(By.xpath("//a[contains(#href,'/staff/transcript/')][contains(#href,'.pdf')]")).getAttribute("href");
String identifier=href.substring(href.lastIndexOf("/")+1,href.indexOf("."));
driver.findElement(By.xpath("//a[matches(#href,'/staff/transcript/"+identifier+"//.pdf')]")).click();

one possible solution to your problem:
using js iterate through all tags and find first which corresponds to your regex.
pubic String getLocatorByRegExp(){
JavascriptExecutor js = (JavascriptExecutor) driver;
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("var regex = /^\d{5}$/");
stringBuilder.append("var x=document.getElementsByTagName('a');");
stringBuilder.append("for(var t = 0; t <x.length; t++){if(regex.test(parseInt(x[t].text()))) return x[t].text().toString();} ");
String res= (String) js.executeScript(stringBuilder.toString());
return res;
}
String properLinkText = getLocatorByRegExp();
driver.findElement(By.xpath(//a[contains(text(),properLinkText)])).click()
Quite complicated approach. But it seems to me that it is possible to find simplier solution.
Is it 5-digit indentifier unique on the page ( i mean only one element on the page ?)
If so, it is easy to find css locator or xpath to this element.
Provide please some piece of your html and point out element you need to click on.

Related

How to extract part of url - dart/flutter

I'm trying to extract the part of url (To be more specific, I'm trying to extract the value of page_info parameter in the url which is next to rel="next"
String testUrl = "<https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9>; rel='previous', <https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel='next'";
List<String> splitUrl = testUrl.split("=");
print(splitUrl[5]);
// this is what it prints out
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel
// this is what I'm trying to extract
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0
// value for rel="next"
I tried to split the url by using split function on String but that would also bring the angle bracket with it. I'm trying to extract only page_info= parameter value which is for rel="next"
I know this has to do something with regex but I'm not really good at it! Any help would be really appreciated
I grabbed that url from header response (paginated REST API), it returns two page_info parameters (one for next and other one for previous page) I'm trying to extract value for next page. Splitting the url didn't help me
thank you

An alternative approach is to use Uri.parse to parse the URL:
void main() {
String testUrl = "<https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9>; rel='previous', <https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel='next'";
// Extract just the URL.
var match = RegExp(r'<([^>]*)>').firstMatch(testUrl);
if (match != null) {
var uri = Uri.parse(match.group(1)!);
print(uri.queryParameters['page_info']); // Prints: eyJkaXJlY3Rpb24iOiJ...
}
}
Note that the above wouldn't need any of the RegExp code if testUrl were a proper URL without the angle brackets and rel='next' junk.

the regEx pattern page_info=([\w]+)
gives you
eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0
https://regexr.com/6qj1h

Extract JSON from String using flutter dart

Hello I want to extract JSON from below input string.
I have tried bellow regex in java and it is working fine,
private static final Pattern shortcode_media = Pattern.compile("\"shortcode_media\":(\\{.+\\})");
I want in regex for dart.
Input String
<script type="text/javascript">window.__initialDataLoaded(window._sharedData);</script><script type="text/javascript">window.__additionalDataLoaded('/p/B9fphP5gBeG/',{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}});</script><script type="text/javascript">
<script type="text/javascript">window.__initialDataLoaded(window._newData);</script><script type="text/javascript">window._newData('/p/B9fphP5gBeG/',{"graphql":{"post":{"__typename":"id","id":"2260708142683789190","new_code":"B9fphP5gBeG"}}});</script><script type="text/javascript">
(function(){
function normalizeError(err) {
var errorInfo = err.error || {};
var getConfigProp = function(propName, defaultValueIfNotTruthy) {
var propValue = window._sharedData && window._sharedData[propName];
return propValue ? propValue : defaultValueIfNotTruthy;
};
return {}
}
)
Expected json
{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}}
Note: There are multiple json string in input string, i need json of shortcode_media tag

please use
void main() {

String json = '''
{"graphql":
{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}},
"abc":{"def":"test"}
}
''';
RegExp regExp = new RegExp(
"\"shortcode_media\":(\\{.+\\})",
caseSensitive: false,
multiLine: false,
);
print(regExp.stringMatch(json).toString());
}
output
"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}
Dartpad

The corresponding Dart RegExp would be:
static final RegExp shortcodeMedia = RegExp(r'"shortcode_media":(\{.+\})");
It does not work, though. JSON is not a regular language, so you can't parse it using regular expressions.
The value of "shortcode_media" in your example JSON ends with several } characters. The RegExp will stop the match at the third of those, even though the second } is the one matching the leading {. If your JSON text contains any further values after the shortcode_media entry, those might be included as well.
Stopping at the first } would also be too short.
If someone reorders the JSON source code to the equivalent
"shortcode_media":{"dimensions":{"height":1326,"width":1080},"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG"}
(that is, putting the "dimensions" entry first), then you would only capture until the end of the dimensions block.
I would recommend either using a proper JSON parser, or at least improving the RegExp to be able to handle a single nested JSON object - since you seem to already know that it will happen.
Such a RegExp could be:
RegExp(r'"shortcode_media":(\{(?:[^{}]*(?:\{.*?\})?)*?\})')
This RegExp will capture the correct number of braces for the example code, but still won't work if there are more nested JSON objects. Only a real parser can handle the general case correctly.

Return part of string using Regex

I'm looking for some regex to retrieve the GUID from the following URL
GetUploadedUserAudioIdfriendlyName=eb0c5663-a9c3-4321-8c0e-5ffbfb3139fc
I've so far got
GetUploadedUserAudioId\?friendlyName=([A-Fa-f0-9-]*)
but this is returning the full url
This is the image of where the expressions are to give you an idea of what I am trying to do.

Don't use a regex for this.
Assuming you're using C#.NET, use the static ParseQueryString() method of the System.Web.HttpUtility class that returns a NameValueCollection.
Uri myUri = new Uri("http://www.example.com?GetUploadedUserAudioIdfriendlyName=eb0c5663-a9c3-4321-8c0e-5ffbfb3139fc");
string param1 = HttpUtility.ParseQueryString(myUri.Query).Get("param1");
Check this documentation
EDIT: If you want it as a Guid after that, then cast it to one:
var paramGuid = new Guid(param1);

How to use regex in selenium locators

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:
http://[^/]*\d+com
I would like to use:
sel.get_attribute( '//a[regx:match(#href, "http://[^/]*\d+.com")]/#name' )
which would return a list of the name attribute of all the links that match the regex.
(or something like it)
thanks

The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:
xpath=//div[matches(#id,'che.*boxes')]
(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')
Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).
If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)

You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an # and the attribute name. For example in Java this might be:
String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();
for (String linkId : allLinks) {
String linkHref = selenium.getAttribute("id=" + linkId + "#href");
if (linkHref.matches("http://[^/]*\\d+.com")) {
matchingLinks.add(link);
}
}

A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer:
selenium: Is it possible to use the regexp in selenium locators

Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.
You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.
Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.
Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.
//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements
//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:
http://jsoup.org/cookbook/extracting-data/dom-navigation
the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:
void MyCallingMethod(IWebDriver driver)
{
//Search by ID:
string attrName = "id";
//Regex = 'a number that is 1-10 digits long'
string attrRegex= "[0-9]{1,10}";
SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{
List<IWebElement> elements = new List<IWebElement>();
//Allows spaces around equal sign. Ex: id = 55
string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
//Search page source
MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
//iterate over matches
foreach (Match match in matches)
{
//Get exact attribute value
Match innerMatch = Regex.Match(match.Value, attrRegex);
cssSelector = "[" + attrName + "=" + attrRegex + "]";
//Find element by exact attribute value
elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
}
return elements;
}
Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!

In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.

Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =

You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}

Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?

Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]

Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)

Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.

You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.

a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expressions and Selenium WebDriver xpath - regex

How can I fix this code to work? public void check(WebDriver driver) { driver.findElement(By.xpath("//a[matches(#href,'/staff/transcript/\\d{5}//.pdf')]")).click(); } I must find a link where 5-digit indentifier varies.

Related

How to extract part of url - dart/flutter

Extract JSON from String using flutter dart

Return part of string using Regex

How to use regex in selenium locators

Regex to parse querystring values to named groups

Categories

Resources