jSoup - How to get elements with background style (inline CSS)? - regex

I'm building an app in Railo, which uses the jSoup .jar library. It all works really well in my CFML language.
Anyhow, I can grab every element with a "style" attribute doing:
<cfset variables.mySelection = variables.myDocument.select("*[style]") />
But this returns an array which contains elements that sometimes do not have a "background" or "background-image" style on them. As an example, the HTML might looks like so:
<p style="color: red;">I should not be selected</p>
<p style="background: green">I **should** be selected</p>
<p style="text-align: left;">I should not be selected</p>
<p style="background-image: url("/path/to/image.jpg");">I **should** be selected</p>
So I can get these elements above, but I don't want the 1st and 3rd in my array, as they don't have a background style...do you know how I can only grab and work with these?
Please note, I'm not after a COMPUTATED style, or anything that complicated, I'm just wondering if I can filter based on the properties of an inline CSS style. Perhaps some regex after the fact? I'm open to ideas!
I tried messing with :contains(background) as a key word, but I wasn't sure if that was the correct path?
Many thanks for your help.
Michael.

Try with:
variables.myDocument.select("*[style*='background']")
As *= is the standard selector to match a substring in the attribute content.

Elements els = doc.select(div[style*=dashed]);
Or
Elements elements = doc1.select("span[style*=font-weight:bold]");

Related

Can't read the XML node elements in ColdFusion

I'm trying to read some values from the XML file which I created, but it gives me the following error:
coldfusion.runtime.UndefinedElementException: Element MYXML.UPLOAD is undefined in XMLDOC.
Here is my code
<cffile action="read" file="#expandPath("./config.xml")#" variable="configuration" />
<cfset xmldoc = XmlParse(configuration) />
<div class="row"><cfoutput>#xmldoc.myxml.upload-file.size#</cfoutput></div>
Here is my config.xml
<myxml>
<upload-file>
<size>15</size>
<accepted-format>pdf</accepted-format>
</upload-file>
</myxml>
Can someone help me to figure out what is the error?
When I am printing the entire variable as <div class="row"><cfoutput>#xmldoc#</cfoutput></div> it is showing the values as
15 pdf
The problem is the hyphen - contained in the <upload-file> name within your XML. If you are in control of the XML contents the easiest fix will be to not use hyphens in your field names. If you cannot control the XML contents then you will need to do more to get around this issue.
Ben Nadel has a pretty good blog article in the topic - Accessing XML Nodes Having Names That Contain Dashes In ColdFusion
From that article:
To get ColdFusion to see the dash as part of the node name, we have to "escape" it, for lack of a better term. To do so, we either have to use array notation and define the node name as a quoted string; or, we have to use xmlSearch() where we can deal directly with the underlying document object model.
He goes on to give examples. As he states in that article, you can either quote the node name to access the data. Like...
<div class="row">
<cfoutput>#xmldoc.myxml["upload-file"].size#</cfoutput>
</div>
Or you can use the xmlSearch() function to parse the data for you. Note that this will return an array of the data. Like...
<cfset xmlarray = xmlSearch(xmldoc,"/myxml/upload-file/")>
<div class="row">
<cfoutput>#xmlarray[1].size#</cfoutput>
</div>
Both of these examples will output 15.
I created a gist for you to see these examples as well.

Regex for HTML RESPONSE BODY present under div tag

I need to build a regex for extracting the value present under value field.
i.e "f70a8c3d0a6cbe2e235c7fd1dd27d052df7412ea"
HTML RESPONSE BODY :
Note: I have pasted just a minor part of the response....but formToken key is unique
<div class="hidden">
<input name="formToken type="hidden"
value="f70a8c3d0a6cbe2e235c7fd1dd27d052df7412ea"
/>
</div>
I wrote the below regex but it returned nothing:
regex("formToken" type="hidden" value="([^"]*)"/>).find(0).exists, found nothing
Can you try this?
regex("type="hidden".*value="(.*?)[ \t]*"/>).find(0).exists
Instead of a regex, you could use a css selector check which is probably way easier once you have ids or css classes to search for.
Thank you all....I was able to get formToken using css
.check(css("input[name='formToken']", "value").saveAs("formTokex"))
Works like this for me:
.exec(http("request_1")
.get("<<<<YOUR_URL>>>>>")
.check(css("form[name='signInForm']", "action").saveAs("urlPath"))
and later printing it:
println(session( "urlPath" ).as[String])

How to filter the html markups when render a template with jinja2?

Now I'm biulding a django project with jinja2 dealing with templates. Some page contents are submited by the client with wysiwy editor, and thing's going fine with the detail pages.
But the list pages are wrong with the slice of the contents.
My code:
<div class="summary ">
<div class="content">{{ question.content[:200]|e}}...</div>
</div>
But the output is:
<p>what i want to show here is raw text without markups</p>...
The expected result is that the html markups like <p></p> <section>.... are gone (filtered or eliminated) and only the raw text shows!
So how can I fix it? Thanks in advance!
Use striptags filter:
striptags(value)
Strip SGML/XML tags and replace adjacent whitespace
by one space.
<div class="content">{{ question.content|striptags}}...</div>
Jinja2 striptags filter test will also help you to understand how it works.
Hope that helps.

How to write this Regex

HTML:
<dt>
<a href="#profile-experience" >Past</a>
</dt>
<dd>
<ul class="past">
<li>
President, CEO & Founder <span class="at">at</span> China Connection
</li>
<li>
Professional Speaker and Trainer <span class="at">at</span> Edgemont Enterprises
</li>
<li>
Nurse & Clinic Manager <span class="at">at</span> <span>USAF</span>
</li>
</ul>
</dd>​​​​​
I want match the <li> node.
I write the Regex:
<dt>.+?Past+?</dt>\s+?<dd>\s+?<ul class=""past"">\s+?(?:<li>\s*?([\W\w]+?)+?\s*?</li>)+\s+?</ul>
In fact they do not work.
No not parse HTML using a regex like it's just a big pile of text. Using a DOM parser is a proper way.
Don't use regular expressions to parse HTML...
Don't use a regular expression to match an html document. It is better to parse it as a DOM tree using a simple state machine instead.
I'm assuming you're trying to get html list items. Since you're not specifying what language you use here's a little pseudo code to get you going:
Pseudo code:
while (iterating through the text)
if (<li> matched)
find position to </li>
put the substring between <li> to </li> to a variable
There are of course numerous third-party libraries that do this sort of thing. Depending on your development environment, you might have a function that does this already (e.g. javascript).
Which language do you use?
If you use Python, you should try lxml: http://lxml.de. With lxml, you can search for the node with tag ul and class "past". You then retrieve its children, which are li, and get text of those nodes.
If you are trying to extract from or manipulate this HTML, xPath, xsl, or CSS selectors in jQuery might be easier and more maintainable than a regex. What exactly is your goal and in what framework are you operating?
please learn to use jQuery for this sort of thing

Find ColdFusion Generated ID

Is there a way to find the elements generated by ColdFusion's <CFLayout> and <CFLayoutArea> tags?
These tags:
<cflayout type="tab" name="MyAccount">
<cflayoutarea name="OrderStatus" title="P" source="/o.cfm" />
Generate this code:
<td id="ext-gen31" style="width: 174px;">
<a id="ext-gen28" class="x-tabs-right" href="#">
<span class="x-tabs-left">
<em id="ext-gen29" class="x-tabs-inner" style="width: 154px;">
<span id="ext-gen30" class="x-tabs-text" title="P" unselectable="on" style="width: 154px;">
I want to update the title information in the id of ext-gen30 but don't know what that name is going to be or how to find it.
It doesn't directly answer your question, but if your goal is to get a reference to the DOM objects in order to set their properties (and it sounds like that's the case), then the documentation suggests that you should be able to use the function ColdFusion.Layout.getTabLayout() to get a reference to the Ext layout object, and then manipulate that however you'd like.