I'm using "$Content.LimitWordCountXML" in my search results template as shown here, but its showing results like this "Attendance[file_link,id=214]" when the content is a link, how do I stop it so its just shows the link text and no the code / ID? Thanks
LimitWordCountXML is a function on the StringField class, not the HTMLText class. It acts as if the string variable it is working on is plain text, not HTML. Therefore it does not strip HTML.
We can use the HTMLText Summary function instead, which does strip HTML and accepts a word limit as the first parameter.
Related
I want to find a RegEx that allows me to find a specific text between HTML table tags.
I have: This is a test text <tr><td>text inside table</td></tr> and I want the RegEx to return me just the second 'text' because it is inside the table.
I have tried <tr>(text)<\/tr> but returns nothing.
It needs to be done with RegEx it cannot be done with a HTML parser
Your <tr>(text)<\/tr> matches only <tr>text</tr>, but you have other text around.
So you need <tr>.*?(text).*?<\/tr> for that
I need to wrap numbers inside HTML tags without affecting attributes.
So far, all I could get is selecting what's inside a tag only, digits and non digital characters too :(
Here's the regular expression I'm using :
/([0-9]+(?:\.[0-9]*)?)/g
Here's the code at RegExr!
I'll be using jQuery to parse it. This is the closest I could get jsfiddle.
How to make this regular expression look only for numbers inside html tags?
Thanks for your help.
This matches 123 in <div>123</div> for example:
[0-9]+(?:\.[0-9]*)|(?<=^|>)\d+(?=<|$)
This regex was edited from the link you provided: http://regexr.com/?361gc
This selects only numbers within html tags. It also works on multi line text.
(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)
You can test it here.
But please be advised that <html> and <body> tags will match the pattern you asked for, so when you are running a complete html document through this regex, most or all numbers will be matching.
Testing on your code on jsfiddle I changed it to this:
$('body').each(function() {
$(this).html(function(i, v) {
return v.replace(/(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)/gim, '<span>$1</span>');
});
});
So now it only runs on the elements of the body and not the whole document. Is that giving the expected result?
I am a new user to JSoup. I want to extract the href value from the html.
For example:
String html = "<p>An <a href='http://exa'mple.com'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String linkHref = link.attr("href");
I am getting the output as "http://exa" , but I need the output as "http://exa'mple.com" (the raw text in href). link.outerHtml() is providing some different text.
I can't alter the HTML. HTML is the user's input.
Try this:
String html = "<p>An <a href='http://exa%27mple.com'><b>example</b></a> link.</p>";
I can't see how this will be possible, given that the jsoup parser will be expecting a ' to close the href argument and that's exactly what it gets. I think your only option is to pre-parse the string provided by the user, but even that will be tricky, as you'll have to come up with a rule to distinguish between "correct" and "incorrect" quote marks.
I would like to use YQL to extract textual Ids from particular paragraph divs on a webpage. The id is in the form "ApplicationRef:NNN". The basic query I use is along the lines of:
select content from html where url="..." and content matches "ApplicationRef*"
which returns the whole of the paragraph containing that text. However, I'd like to further process the result so that the query returns just the NNN part of each paragraph instead. Is this possible?
I realise that I can process the result from the YQL query further locally, but it would be neater if I can do all the processing in one go within the YQL query itself.
thanks,
I want to metch a keyword that is not linked, as the following example shows, I just match the google keyword that is neither between <a></a> nor included in the attributes, I only want to match the last google:
google is linked, google is not linked.
Do not parse HTML with regular expressions. HTML is an irregular language. Use a HTML parser.
This works for me (javascript):
var matches = str.match(/(?:<a[^>]*>[^<]*<\/a>[\s\S]*)*(google)/);
See it in action
Provided you can be sure that your HTML is well behaved (and valid), especially does not contain comments or nested a tags, you can try
google(?!((?!<a[\s>]).)*</a>)
That matches any "google" that is not followed by a closing a tag before the next opening a tag. But you might be better of using a HTML Parser instead.