How to extract part of url - dart/flutter - regex

I'm trying to extract the part of url (To be more specific, I'm trying to extract the value of page_info parameter in the url which is next to rel="next"
String testUrl = "<https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9>; rel='previous', <https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel='next'";
List<String> splitUrl = testUrl.split("=");
print(splitUrl[5]);
// this is what it prints out
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel
// this is what I'm trying to extract
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0
// value for rel="next"
I tried to split the url by using split function on String but that would also bring the angle bracket with it. I'm trying to extract only page_info= parameter value which is for rel="next"
I know this has to do something with regex but I'm not really good at it! Any help would be really appreciated
I grabbed that url from header response (paginated REST API), it returns two page_info parameters (one for next and other one for previous page) I'm trying to extract value for next page. Splitting the url didn't help me
thank you

An alternative approach is to use Uri.parse to parse the URL:
void main() {
String testUrl = "<https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9>; rel='previous', <https://demo.myshopify.com/admin/api/2022-01/products.json?limit=10&page_info=eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0>; rel='next'";
// Extract just the URL.
var match = RegExp(r'<([^>]*)>').firstMatch(testUrl);
if (match != null) {
var uri = Uri.parse(match.group(1)!);
print(uri.queryParameters['page_info']); // Prints: eyJkaXJlY3Rpb24iOiJ...
}
}
Note that the above wouldn't need any of the RegExp code if testUrl were a proper URL without the angle brackets and rel='next' junk.

the regEx pattern page_info=([\w]+)
gives you
eyJkaXJlY3Rpb24iOiJwcmV2IiwibGFzdF9pZCI6NjczMDU4MDcyMTc1NCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBCcmFjZWxldCJ9
eyJkaXJlY3Rpb24iOiJuZXh0IiwibGFzdF9pZCI6NjczMDIyNzcxMjA5MCwibGFzdF92YWx1ZSI6IjE4SyBHb2xkIFBsYXRlZCBIZWFydCBQZW5kYW50IE5lY2tsYWNlIn0
https://regexr.com/6qj1h

Related

remove "?show=false" using regex [duplicate]

I looking for regular expression to use in my javascript code, which give me last part of url without parameters if they exists - here is example - with and without parameters:
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg
In both cases as result I want to get:
14238253_132683573850463_7287992614234853254_n.jpg
Here is this regexp
.*\/([^?]+)
and JS code:
let lastUrlPart = /.*\/([^?]+)/.exec(url)[1];
let lastUrlPart = url => /.*\/([^?]+)/.exec(url)[1];
// TEST
let t1 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC"
let t2 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg"
console.log(lastUrlPart(t1));
console.log(lastUrlPart(t2));
May be there are better alternatives?
You could always try doing it without regex. Split the URL by "/" and then parse out the last part of the URL.
var urlPart = url.split("/");
var img = urlPart[urlPart.length-1].split("?")[0];
That should get everything after the last "/" and before the first "?".

How to search and replace from a SafeHtml variable in Angular?

I've a very simple question.
I've a sanitized string and its type in Angular is SafeHtml.
How would be the best approach to search and replace some Html inside this SafeHtml variable?
...
const sanitzedHtml: SafeHtml = this.sanitizer.bypassSecurityTrustHtml(changes.pureHtml.currentValue);
...
My goal is to replace some string with some extra html code, so the best would be to be able to search only within the html nodes, not really everywhere in the code.
Are there faster way than reconverting the SafeHtml variable into a string and apply a basic replace with a RegExp?
Thanks
Change HTML code before sanitize
1 - Using regex
You can change your code by using Regex on your html string, then sanitize it.
let html = "<div>myHtml</div>"
const regex = /myRegexPattern/i;
html.replace(regex, 'Replacement html part'));
2 - Using DocumentFragment
You can also create a fragment of your html, modify what you want in it and string it before start your sanitize function
let str:string = "<div id='test'>myHtml</div>";
const sanitzedHtml:SafeHtml = this.sanitizer.bypassSecurityTrustHtml(changeMyHtml(str));
function changeMyHtml(htmlString:string):string{
let fragment= document.createRange().createContextualFragment(str);
//do what you need to do here like for exemple
fragment.getElementById('test').innerHtml = "myHtmlTest";
//then return a string of the modified html
const serializer = new XMLSerializer();
return serializer.serializeToString(element)
}

actionscript find and convert text to url [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How do I linkify text using ActionScript 3
I have this script that grabs a twitter feed and displays in a little widget. What I want to do is look at the text for a url and convert that url to a link.
public class Main extends MovieClip
{
private var twitterXML:XML; // This holds the xml data
public function Main()
{
// This is Untold Entertainment's Twitter id. Did you grab yours?
var myTwitterID= "username";
// Fire the loadTwitterXML method, passing it the url to your Twitter info:
loadTwitterXML("http://twitter.com/statuses/user_timeline/" + myTwitterID + ".xml");
}
private function loadTwitterXML(URL:String):void
{
var urlLoader:URLLoader = new URLLoader();
// When all the junk has been pulled in from the url, we'll fire finishedLoadingXML:
urlLoader.addEventListener(Event.COMPLETE, finishLoadingXML);
urlLoader.load(new URLRequest(URL));
}
private function finishLoadingXML(e:Event = null):void
{
// All the junk has been pulled in from the xml! Hooray!
// Remove the eventListener as a bit of housecleaning:
e.target.removeEventListener(Event.COMPLETE, finishLoadingXML);
// Populate the xml object with the xml data:
twitterXML = new XML(e.target.data);
showTwitterStatus();
}
private function addTextToField(text:String,field:TextField):void{
/*Regular expressions for replacement, g: replace all, i: no lower/upper case difference
Finds all strings starting with "http://", followed by any number of characters
niether space nor new line.*/
var reg:RegExp=/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
//Replaces Note: "$&" stands for the replaced string.
text.replace(reg,"$&");
field.htmlText=text;
}
private function showTwitterStatus():void
{
// Uncomment this line if you want to see all the fun stuff Twitter sends you:
//trace(twitterXML);
// Prep the text field to hold our latest Twitter update:
twitter_txt.wordWrap = true;
twitter_txt.autoSize = TextFieldAutoSize.LEFT;
// Populate the text field with the first element in the status.text nodes:
addTextToField(twitterXML.status.text[0], twitter_txt);
}
If this
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig
is your regexp for converting text to urls, than i have some remarks.
First of all, almost all characters in chacacter classes are parsed literally.
So, here
[-A-Z0-9+&##\/%?=~_|!:,.;]
you say to search any of this characters (except /).
Simple regexp for url search will look similar to this
/\s((https?|ftp|file):\/\/)?([-a-z0-9_.:])+(\?[-a-z0-9%_?&.])?(\s+|$)/ig
I'm not sure, if it will process url borders right, but \b symbol can be a dot, so i think \s (space or linebreak) will suit better.
I`m not sure about ending (is it allowed in actionscript to use end-of-string symbol not at the end of regexp?)
And, of course, you have to tune it to suit your data.

Regex to replace domain within links that are not images

Need to replace a domain name on all the links on the page that are not images or pdf files.
This would be a full html page received through a proxy service.
Example:
test<img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf
test1
Result:
test<img src="http://www.test.com" /><a href="http://www.test.com/test.pdf">pdf
test1
If you are using .NET, I strongly suggest you to use HTML Agility Pack
Direct parsing using regex can be very error prone. This questions is also similar to the post below.
What regex should I use to remove links from HTML code in C#?
If the domain is http://www.example.com, the following should do the trick:
/http:\/\/www\.example\.com\S*(?!pdf|jpg|png|gif)\s/
This uses a negative lookahead to ensure that the regex matches a string only if the string does not contain pdf,png,jpg or gif at the specified position.
If none of your pdf urls have query parameters (like a.pdf?asd=12), the following code will work. It replaces only absolute and root-relative urls.
var links = document.getElementsByTagName("a");
var len = links.length;
var newDomain = "http://mydomain.com";
/**
* Match absolute urls (starting with http)
* and root relative urls (starting with a `/`)
* Does not match relative urls like "subfolder/anotherpage.html"
* */
var regex = new RegExp("^(?:https?://[^/]+)?(/.*)$", "i");
//uncomment next line if you want to replace only absolute urls
//regex = new RegExp("^https?://[^/]+(/.*)$", "i");
for(var i = 0; i < len; i++)
{
var link = links.item(i);
var href = link.getAttribute("href");
if(!href) //in case of named anchors
continue;
if(href.match(/\.pdf$/i)) //if pdf
continue;
href = href.replace(regex, newDomain + "$1");
link.setAttribute("href", href);
}

How to use regex in selenium locators

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:
http://[^/]*\d+com
I would like to use:
sel.get_attribute( '//a[regx:match(#href, "http://[^/]*\d+.com")]/#name' )
which would return a list of the name attribute of all the links that match the regex.
(or something like it)
thanks
The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:
xpath=//div[matches(#id,'che.*boxes')]
(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')
Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).
If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)
You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an # and the attribute name. For example in Java this might be:
String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();
for (String linkId : allLinks) {
String linkHref = selenium.getAttribute("id=" + linkId + "#href");
if (linkHref.matches("http://[^/]*\\d+.com")) {
matchingLinks.add(link);
}
}
A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer:
selenium: Is it possible to use the regexp in selenium locators
Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.
You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.
Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.
Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.
//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements
//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:
http://jsoup.org/cookbook/extracting-data/dom-navigation
the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/
Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:
void MyCallingMethod(IWebDriver driver)
{
//Search by ID:
string attrName = "id";
//Regex = 'a number that is 1-10 digits long'
string attrRegex= "[0-9]{1,10}";
SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{
List<IWebElement> elements = new List<IWebElement>();
//Allows spaces around equal sign. Ex: id = 55
string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
//Search page source
MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
//iterate over matches
foreach (Match match in matches)
{
//Get exact attribute value
Match innerMatch = Regex.Match(match.Value, attrRegex);
cssSelector = "[" + attrName + "=" + attrRegex + "]";
//Find element by exact attribute value
elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
}
return elements;
}
Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.