SilverStripe 3: Search Results template showing unwanted code

SilverStripe 3: Search Results template showing unwanted code - templates

I'm using "$Content.LimitWordCountXML" in my search results template as shown here, but its showing results like this "Attendance[file_link,id=214]" when the content is a link, how do I stop it so its just shows the link text and no the code / ID? Thanks

LimitWordCountXML is a function on the StringField class, not the HTMLText class. It acts as if the string variable it is working on is plain text, not HTML. Therefore it does not strip HTML.
We can use the HTMLText Summary function instead, which does strip HTML and accepts a word limit as the first parameter.

Related

How to find a specific between table HTML tags RegEx

I want to find a RegEx that allows me to find a specific text between HTML table tags.
I have: This is a test text <tr><td>text inside table</td></tr> and I want the RegEx to return me just the second 'text' because it is inside the table.
I have tried <tr>(text)<\/tr> but returns nothing.
It needs to be done with RegEx it cannot be done with a HTML parser

Your <tr>(text)<\/tr> matches only <tr>text</tr>, but you have other text around.
So you need <tr>.*?(text).*?<\/tr> for that

Regular expression to find numbers inside HTML tags

I need to wrap numbers inside HTML tags without affecting attributes.
So far, all I could get is selecting what's inside a tag only, digits and non digital characters too :(
Here's the regular expression I'm using :
/([0-9]+(?:\.[0-9]*)?)/g
Here's the code at RegExr!
I'll be using jQuery to parse it. This is the closest I could get jsfiddle.
How to make this regular expression look only for numbers inside html tags?
Thanks for your help.

This matches 123 in <div>123</div> for example:
[0-9]+(?:\.[0-9]*)|(?<=^|>)\d+(?=<|$)
This regex was edited from the link you provided: http://regexr.com/?361gc

This selects only numbers within html tags. It also works on multi line text.
(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)
You can test it here.
But please be advised that <html> and <body> tags will match the pattern you asked for, so when you are running a complete html document through this regex, most or all numbers will be matching.
Testing on your code on jsfiddle I changed it to this:
$('body').each(function() {
$(this).html(function(i, v) {
return v.replace(/(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)/gim, '<span>$1</span>');
});
});
So now it only runs on the elements of the body and not the whole document. Is that giving the expected result?

Extract the href value with apostrophe in Java

I am a new user to JSoup. I want to extract the href value from the html.
For example:
String html = "<p>An <a href='http://exa'mple.com'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String linkHref = link.attr("href");
I am getting the output as "http://exa" , but I need the output as "http://exa'mple.com" (the raw text in href). link.outerHtml() is providing some different text.
I can't alter the HTML. HTML is the user's input.

Try this:
String html = "<p>An <a href='http://exa%27mple.com'><b>example</b></a> link.</p>";

I can't see how this will be possible, given that the jsoup parser will be expecting a ' to close the href argument and that's exactly what it gets. I think your only option is to pre-parse the string provided by the user, but even that will be tricky, as you'll have to come up with a rule to distinguish between "correct" and "incorrect" quote marks.

Extract specific matching text (regex?) from YQL query

I would like to use YQL to extract textual Ids from particular paragraph divs on a webpage. The id is in the form "ApplicationRef:NNN". The basic query I use is along the lines of:
select content from html where url="..." and content matches "ApplicationRef*"
which returns the whole of the paragraph containing that text. However, I'd like to further process the result so that the query returns just the NNN part of each paragraph instead. Is this possible?
I realise that I can process the result from the YQL query further locally, but it would be neater if I can do all the processing in one go within the YQL query itself.
thanks,

Regular expression to match word instances not in html attrs or link text

I want to metch a keyword that is not linked, as the following example shows, I just match the google keyword that is neither between <a></a> nor included in the attributes, I only want to match the last google:
google is linked, google is not linked.

Do not parse HTML with regular expressions. HTML is an irregular language. Use a HTML parser.

This works for me (javascript):
var matches = str.match(/(?:<a[^>]*>[^<]*<\/a>[\s\S]*)*(google)/);
See it in action

Provided you can be sure that your HTML is well behaved (and valid), especially does not contain comments or nested a tags, you can try
google(?!((?!<a[\s>]).)*</a>)
That matches any "google" that is not followed by a closing a tag before the next opening a tag. But you might be better of using a HTML Parser instead.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SilverStripe 3: Search Results template showing unwanted code - templates

I'm using "$Content.LimitWordCountXML" in my search results template as shown here, but its showing results like this "Attendance[file_link,id=214]" when the content is a link, how do I stop it so its just shows the link text and no the code / ID? Thanks

Related

How to find a specific between table HTML tags RegEx

Regular expression to find numbers inside HTML tags

Extract the href value with apostrophe in Java

Extract specific matching text (regex?) from YQL query

Regular expression to match word instances not in html attrs or link text

Categories

Resources