phpQuery finding the type of tag - phpquery

I need to be able to determine the type of tag i have selected with phpQuery.
So, if i have the reference of an element, how can i easily figure out its tag type?
In jquery/js tagName will suffice or prop('tagName')
But in phpQuery i cannot seem to find a straight forward function to do this..
$doc = phpQuery::newDocumentFilePHP($ftp_file['local_path']);
if(!pq('.clasToFind')->length) {
$tagType = pq('.clasToFind')->tagName;
}
Is the best answer regex the answer here?

tagName is a DomNode property. So when you iterate:
foreach(pq('.clasToFind') as $el){
echo $el->tagName;
}

See my answer here: How to find tag name using phpquery?
You need to call get to point to the first element of the collection, even if it has only one element. So, your code would be something like this:
$doc = phpQuery::newDocumentFilePHP($ftp_file['local_path']);
if($doc->find('.clasToFind')->length) {
$tagType = $doc->find('.clasToFind')->get(0)->tagName;
}

Related

Regex for getting content of a html property when another specific property doesn't exist

I struggle to find a solution for what is probably pretty simple, and despite I crawl a lot of questions, I can't manage to make it work.
Here are 2 HTML elements:
Test1
Test2
I want to get ONLY the content of the 1st element's href property (#content1). It must match because the html element contains no "onclick" property.
This regex works for matching the 1st element only:
^<a href="#"((?!onclick).)*$
but I can't figure out how to get the HREF content.
I've tried this:
^<a href="#(.*)"((?!onclick).)*$
but in this case, both elements are matching.
Thanks for your help !
I strongly suggest that you should do that in two steps. For one thing, parsing arbitrary html with a regexp is a notoriously slippery and winding road. For the other: there is no achievement in doing everything with one illegible regex.
And there's more to it: "contains no "onclick" attribute" is not the same as "href attribute is not directly followed by onclick attribute". So, a one-regex-solution would be either very complicated or very fragile (html tags have arbitrary attributes order).
var a = [
'Test1',
'Test2'
];
console.log(
a.filter(i => i.match(/onclick/i) == null)
.map(i => i.match(/href="([^"]+)"/i)[1]
)
This assumes that your href attribute values are valid and do not contain quotes (which is, of course, technically possible).
Regex is not made for this. JavaScript would work better. This code will store an array of the hrefs matching your requirements in the variable hrefArray.
var hrefArray = [];
for (var elem of document.getElementsByTagName('a')) {
if (elem.onclick) hrefArray.push(elem.href)
}
An example with your HTML is in the snippet below:
var hrefArray = [];
for (var elem of document.getElementsByTagName('a')) {
if (elem.onclick) hrefArray.push(elem.href)
}
console.log(hrefArray);
body {
background-color: gray;
}
Test1
Test2

XML Name space issue revisited

XML Name space issue revisited:
I am still not able to find a good solution to the problem that the findnode or findvalue does not work when we have xmlns has some value.
The moment I set manually xmlns="", it starts working. At least in my case. Now I need to automate this.
consider this
< root xmlns="something" >
--
---
< /root>
My recommended solution :
dynamically set the value to xmlns=""
and when the work is done automatically we can reset to the original value xmlns="something"
And this seems to be a working solution for my XMLs only but its stll manual.
I need to automate this:
How to do it 2 options:
using Perl regex, or
using proper LibXML setNamespace etc.
Please put your thought in this context.
You register the namespace. The point of XML is not having to kludge around with regexes!
Besides, it's easier: you create an XML::LibXML::XPathContext, register your namespaces, and use its find* calls with your chosen prefixes.
The following example is verbatim from a script of mine to list references in Visual Studio projects:
(...)
# namespace handling, see the XML::LibXML::Node documentation
my $xpc = new XML::LibXML::XPathContext;
$xpc->registerNs( 'msb',
'http://schemas.microsoft.com/developer/msbuild/2003' );
(...)
my $tree; eval { $tree = $parser->parse_file($projfile) };
(...)
my $root = $tree->getDocumentElement;
(...)
foreach my $attr ( find( '//msb:*/#Include', $root ) )
{
(...)
}
(...)
sub find { $xpc->find(#_)->get_nodelist; }
(...)
That's all it takes!
I only have one xmlns attribuite at the top of the XML once only so this works for me.
All I did was first to remove the namespace part i.e. remove the xmlns from my XML file.
NODE : for my $node ($conn->findnodes("//*[name()='root']")) {
my $att = $node->getAttribute('xmlns');
$node->setAttribute('xmlns', "");
last NODE;
}
using last just to make sure i come of the for loop in time.
And then once I am done with the XML parsing I will replace the
<root>
with
<root xmlns="something">
using simple Perl file operation or sed editor.

error with RegExp inside angular.toJson

I'm trying to do a 'like' query in mongodb. I see that's done with a regexp so I'm trying to set it like this:
$scope.clients = Client.query({
q:angular.toJson({
name: RegExp($routeParams.str)
})
});
The thing is angular.toJson function does not get any regexp:
http://plnkr.co/edit/idXMT1
Is there any other way to do that?
From my understanding of your question, you want your JSON object to contain the regular expression as a string?
In that case, you can manually convert it to a string in the declaration, i.e:
$scope.object = angular.toJson({
"param" : new RegExp(str).toString()
});
See updated plunk:
http://plnkr.co/edit/yLogI8
I finally found a solution.
The problem is using the toJon function as it does not allow any key starting with / or $ so the solution is avoiding the use of this function and just write the object as a string
$scope.clients = Client.query({
q:'{src:{$regex:"' + $routeParams.str + '"}}'
};

Raphael.js : how to unset Element's attr?

I have a link set to Element:
element.attr({href: 'http://google.com'});
Now I want to delete a link. I'm trying:
element.attr({href: false});
element.attr({href: null});
element.attr({href: ''});
But none of them works.
Even
delete element.attrs.href;
doesn't help.
How can I unset element's attribute?
you should be aware that (using Raphaël API,) .attr() does not affect the underlying DOM element, but merely attaches a property to the Raphaël object.
if you wish to address the actual node's href attribute you should either use:
element.node.href = 'http://google.com';
or:
element.node.setAttribute('href', 'http://google.com');
Check out 'Element.node' on the Raphaël's documentation.

How to use regex in selenium locators

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:
http://[^/]*\d+com
I would like to use:
sel.get_attribute( '//a[regx:match(#href, "http://[^/]*\d+.com")]/#name' )
which would return a list of the name attribute of all the links that match the regex.
(or something like it)
thanks
The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:
xpath=//div[matches(#id,'che.*boxes')]
(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')
Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).
If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)
You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an # and the attribute name. For example in Java this might be:
String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();
for (String linkId : allLinks) {
String linkHref = selenium.getAttribute("id=" + linkId + "#href");
if (linkHref.matches("http://[^/]*\\d+.com")) {
matchingLinks.add(link);
}
}
A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer:
selenium: Is it possible to use the regexp in selenium locators
Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.
You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.
Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.
Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.
//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements
//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:
http://jsoup.org/cookbook/extracting-data/dom-navigation
the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/
Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:
void MyCallingMethod(IWebDriver driver)
{
//Search by ID:
string attrName = "id";
//Regex = 'a number that is 1-10 digits long'
string attrRegex= "[0-9]{1,10}";
SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{
List<IWebElement> elements = new List<IWebElement>();
//Allows spaces around equal sign. Ex: id = 55
string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
//Search page source
MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
//iterate over matches
foreach (Match match in matches)
{
//Get exact attribute value
Match innerMatch = Regex.Match(match.Value, attrRegex);
cssSelector = "[" + attrName + "=" + attrRegex + "]";
//Find element by exact attribute value
elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
}
return elements;
}
Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.