Xpath query returns no element found - regex

I'm querying google search for number of searches. the xpath query i'm using is //div[#id='resultStats'] which too my understanding on the pages html:
<div id="resultStats">About 1,660,000,000 results<nobr> (0.65 seconds) </nobr></div>
should return the data within the div.
I've tried with Importhtml(url,xpath) and http://xpather.com/ <- an XML path tester and I get "Imported content is empty" and "no content found" respectively.
I was initially using importhtml and isolated the issue to the XPath using xpather as an XPath tester, so I think I've zoned the issue down a bit. Any help would be appreciated.

try:
=VALUE(REGEXREPLACE(MID(INDEX(IMPORTHTML(
"https://www.google.com/search?q="&A1&"";"table";1);4;2);6;23);"\D+";))

Related

Removing entire tags containing a specific term using regex

I am altering a database with approximately 500 html pages using phpmyadmin.
Several pages contain a Facebook Pixel or Google Tag that I would like to remove.
The easiest way I thought would be to search via regex the entire tag that contains some expression or term related to Facebook or Google, and replace it with blank.
An example would be
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'G-XXXXXXXX');
</script>
or
<script>
(window, document, 'script', 'https://connect.facebook.net/en_US/fbevents.js');
fbq('init', '9999999999999999');
fbq('track', 'salespage_xxxxxx');
</script>
Although all are unique, some have the same code or another element that makes it possible to identify each one of them.
Before running in myphpadmin, I'm trying to formulate the expression using SublimeText3
It's the first contact I have with the regex and I found it fascinating, but even following some references I can't match the search.
The expression I came up with after some research was
<(.*)>[\s\S]face[\s\S]<\/(.*)>
Where I thought the expression would select the entire tag containing the word "face", but it doesn't find anything.
I would like some help.
If it works, it would be able to make several other necessary changes.
This regex expression will match the <script> tag that contains the face keyword
<(script)>(?:(?!<\/\1>|face)[\s\S])+face(?:(?!<\/\1>)[\s\S])+<\/\1>
See example: https://regex101.com/r/LfRlBV/1

Specific xPath and Regex - Web Crawling

I'm currently in the process of trying to scrape a website. The problem is the information is placed on google maps in an iframe. Specifically, Latitude and Longitude.
I'm able to get all the other information I currently need expect this. Searching around, and working with import.io tech support, I found I need to use specific xPath and Regex to pull this information but the code I found on the site has me lost. Ideally I'd like to pull Latitude and Longitude separately. This is the code I have to work with.
What are my options? Thank you.
<div class="padding-listItem--sm">
<iframe width="100%" height="310" frameborder="0" allowfullscreen="" src="https://www.google.com/maps/embed/v1/place?q=33.3929503,-111.908652&key=AIzaSyDK08tC4NRubbIiw-xwDR1WEp-YAXX1Mx8" style="border:0"></iframe>
</div>
1) Get the src attribute of the iframe element.
string srcText = driver.findElement(By.tagName("iframe")).getAttribute("src");
2) Parse the url (found in srcText) for the latitude and longitude values.
Regex to find both numbers:
/([-]?\d+\.\d+)/g
when the url is as you specified:
https://www.google.com/maps/embed/v1/place?q=33.3929503,-111.908652&key=AIzaSyDK08tC4NRubbIiw-xwDR1WEp-YAXX1Mx8"
The XPath to obtain the iframe source is:
//div[#class='padding-listItem--sm']/iframe/#src
Then you can apply a regex like this one to obtain latitude and longitude
/q=(-?[\d\.]*),(-?[\d\.]*)/g
Implementation online Here

Jmeter Regular Expression Extraction

Could someone help me in getting the value "1237857346" from the following using regex or any other way I could get the value "1237857346" in from HTML in JMeter.
<select class="card_account" name="existing__account">
<option value="" disabled="disabled">Card Number</option>
<option value="1237857346" selected="selected">************4567</option>
</select>
Little bit of background. I am using JMeter and trying to extra the value "1237857346" to pass it in the next request.
It is not very good idea to parse HTML using Regular Expressions as it evidenced by the famous Stack Overflow answer
I would suggest switching to XPath Extractor instead. Add the XPath Extractor as a child of HTTP Request sampler which returns that select and configure it as follows:
XML Parsing Options: tick Use Tidy box. It may not be necessary but if your server response is not XML/XHTML compliant you'll get nothing
Reference Name: anything meaningful, i.e. value - it will be the name of the variable holding extracted data
XPath Expression: //select[#class='card_account']/option[#selected='selected']/#value - it will take
select having class = card_account
option with selected = "selected"
value attribute of the above option
and store it to "value" variable. You will be able to refer to it as ${value} where required.
See following material for further reference:
XPath Tutorial
XPath Language Specification
Using the XPath Extractor in JMeter
You can use the following regex:
<option[^<]*>Card Number</option>\s*<option[^<]*?value="(\d+)"
The value will be in group 1 ($1$), which is exactly what you need.
See demo
In case the are always 12 asterisks (that can be matched with \*{12}) in the <option> node value, you'd can use:
<option[^<]*value="(\d+)"[^<]*>\*{12}\d+</option>
See another demo.

How to properly use xpath & regexp extractor in jmeter?

I have the following text in the HTML response:
<input type="hidden" name="test" value="testValue">
I need to extract the value from the above input tag.
I've tried both regexp and xpath extractor, but neither is working for me:
regexp pattern
input\s*type="hidden"\s*name="test"\s*value="(.+)"\s*>
xpath query
//input[#name="test"]/#value
The above xpath gives an error at the Xpath Assertion Listener .. "No node matched".
I tried a lot and concluded that the xpath works only if I use it as //input[#name].
At the moment I'm trying to add an actual name it gives the error .. "No node matched".
Could anyone please suggest me how to resolve the above issue?
Please take a look at my previous answer :
https://stackoverflow.com/a/11452267/169277
The relevant part for you would be step 3:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Element;
String html = prev.getResponseDataAsString(); // get response from your sampler
Document doc = Jsoup.parse(html);
Element inputElement = doc.select("input[name=test]").first();
String inputValue = inputElement.attr("value");
vars.put("inputTextValue", inputValue);
Update
So you don't get tangled with the code I've created jMeter post processor called Html Extractor here is the github url :
https://github.com/c0mrade/Html-Extractor
Since you are using XPath Extractor to parse HTML (not XML) response ensure that Use Tidy (tolerant parser) option is CHECKED (in XPath Extractor's control panel).
Your xpath query looks fine, check the option mentioned above and try again.

The regular expression for finding the image url in <img> tag in HTML using VB .Net code

I want to extract the image url from any website. I am reading the source info through webRequest. I want a regular expression which will fetch the Image url from this content i.e the Src value in the <img> tag.
I'd recommend using an HTML parser to read the html and pull the image tags out of it, as regexes don't mesh well with data structures like xml and html.
In C#: (from this SO question)
var web = new HtmlWeb();
var doc = web.Load("http://www.stackoverflow.com");
var nodes = doc.DocumentNode.SelectNodes("//img[#src]");
foreach (var node in nodes)
{
Console.WriteLine(node.src);
}
/(?:\"|')[^\\x22*<>|\\\\]+?\.(?:jpg|bmp|gif|png)(?:\"|')/i
is a decent one I have used before. This gets any reference to an image file within an html document. I didn't strip " or ' around the match, so you will need to do that.
Try this*:
<img .*?src=["']?([^'">]+)["']?.*?>
Tested here with:
<img class="test" src="/content/img/so/logo.png" alt="logo homepage">
Gives
$1 = /content/img/so/logo.png
The $1 (you have to mouseover the match to see it) corresponds to the part of the regex between (). How you access that value will depend on what implementation of regex you are using.
*If you want to know how this works, leave a comment
EDIT
As nearly always with regexp, there are edge cases:
<img title="src=hack" src="/content/img/so/logo.png" alt="logo homepage">
This would be matched as 'hack'.