Yesod Hamlet breaks HTML by replacing single quotes with double quotes - yesod

I have some HTML code that I'm using in Hamlet:
<div .modal-card .card data-options='{"valueNames": ["name"]}' data-toggle="lists">
Notice that the single quotes for data-options allows the use of double quotes inside the string.
The problem is that when Hamlet renders the page, Hamlet puts " around the ' and so the HTML is broken:
<div class="modal-card card" data-options="'{" valuenames":"="" ["name"]}'="" data-toggle="lists">
Some external JS library plugin code runs, it tries to parse the JSON inside data-options and fails.
How can I tell Hamlet to include a literal string?
I've tried various combinations of:
let theString = "{\"valueNames\": [\"name\"]}"
let theString2 = "data-options='{\"valueNames\": [\"name\"]}'"
etc
And in the hamlet file:
<div .modal-card .card data-options='#{ preEscapedText theString }' data-toggle="lists">
or
<div .modal-card .card #{ preEscapedText theString2 } data-toggle="lists">
But all attempts produce invalid HTML or invalid JSON inside the string.
How can I instruct Hamlet to simply include a literal string in the output HTML?
Update:
Tried more things, no result.
The string2 example doesn't work because Hamlet seems to think that I'm trying to set id="{" as per https://www.yesodweb.com/book/shakespearean-templates#shakespearean-templates_attributes

Why not render the JSON escaped (" become ") and “handle” the quotes later when parsing?
Interpolate in Hamlet:
<div #the-modal .modal-card .card data-options='#{theString}' data-toggle="lists">
Parse the data attribute as JSON:
let json = document.getElementById("the-modal").getAttribute("data-options");
let opts = JSON.parse(json); // At least in Chrome, it works!
As for theString2 alternative, you can also interpolate attributes in Hamlet using a tuple or list of tuples and the star symbol:
let dataOptions = ("data-options", "{\"valueNames\": [\"name\"]}") :: (Text, Text)
...
<div #the-modal .modal-card .card *{dataOptions} data-toggle="lists">

Related

Beautifulsoup get the value of hrefs separated by commas

sup2 = soup2.find_all("div", {"class": "xxxxxxx"})
When i use find_all over a div i get the following result
<div class="xxxxxxx" data-reactid="37">aa , bb </div>
how to get href between these two commas
Iterate over the Tag elements in sup2 and select the 'href' attribute, eg:
hrefs = [a['href'] for tag in sup2 for a in tag.find_all('a')]
Using css selectors:
hrefs = [tag['href'] for tag in soup2.select("div.xxxxxxx a")]

RegExp replace all but selected

So I'm trying to erase everything except the matched case in this 1900 line document with Notepad++ RegExp Find/Replace, so that I only have the file names, which shorten it to under about 1000 lines at minimum. I know the code that selects the text ((?<=/images/item/)(.*)(?=" a) but the problem is I don't know how to make it erase anything that doesn't match that case. Here's a portion of the document.
using notepad++, it would find and select abyssal-scepter.gif, aegis-of-the-legion.gif, etc
<img src="/images/item/abyssal-scepter.gif" alt="LoL Item: Abyssal Scepter"><br> <div id="id_77" class="tier-wrapper drag-items health magic-resist health-regen champ-box float-left ajax-tooltip {t:'Item',i:'77'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-advanced filter-bonus-aura filter-category-health filter-category-magic-resist filter-category-health-regen ui-draggable ui-draggable-handle">
<img src="/images/item/aegis-of-the-legion.gif" alt="LoL Item: Aegis of the Legion"><br> <div id="id_235" class="tier-wrapper drag-items ability-power movement champ-box float-left ajax-tooltip {t:'Item',i:'235'} filter-tier-advanced filter-bonus-unique-passive filter-category-ability-power filter-category-movement ui-draggable ui-draggable-handle">
<img src="/images/item/aether-wisp.gif" alt="LoL Item: Aether Wisp"><br>
<div class="info">
<div class="champ-name">Aether Wisp</div>
<div class="champ-sub">
<img src="/images/gold.png" alt="Item Cost" style="width:16px; vertical-align:middle;"> 850 / 415
</div>
</div>
</div>
<div id="id_21" class="tier-wrapper drag-items ability-power champ-box float-left ajax-tooltip {t:'Item',i:'21'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-basic filter-category-ability-power ui-draggable ui-draggable-handle">
<img src="/images/item/amplifying-tome.gif" alt="LoL Item: Amplifying Tome"><br>
<div class="info">
<div class="champ-name">Amplifying Tome</div>
<div class="champ-sub">
I'm not familiar with RegExp, so to summarize, I need it to look like this at the end of it.
abyssal-scepter.gif
aegis-of-thelegion.gif
aether-wisp.gif
amplifying-tome.gif
Thank you for your time
A Notepad++ solution:
Find what : .*?/images/item/(.*?)"|.*
Replace with : $1\n
Search mode : Regular expression (with ". matches newline" checked)
The result will have an extra linefeed at the end.
But that shouldn't pose a problem I suppose.
Maybe this can help. or not since you dropped the Javascript tag out of your original post
<script type="text/javascript">
var thestring = "<img src=\"/images/item/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";
var thestring2 = "<img src=\"/images/otherstuff/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";
function ParseIt(incomingstring) {
var pattern = /"\/images\/item\/(.*)" /;
if (pattern.test(incomingstring)) {
return pattern.exec(incomingstring)[1];
}
else {
return "";
}
//return pattern.test(incomingstring) ? pattern.exec(incomingstring)[1] : "";
}
</script>
Calling ParseIt(thestring) returns "aegis-of-the-legion.gif"
Calling ParseIt(thestring2) return ""
Since you are doing this in NP++, this works for me. In cases like this where speed and results are more important than specific technique, I'll usually run several regexes. First, I'll get each tag on its own line by doing a search for > and replacing it with >\n. This gets each tag on its own line for simpler processing. Then a replace of ^>*<.*?".*?/?([\w\d\-_]+\.\w{2,4})?".*>.*$ with $1 will will extract all the filenames from the tags, removing the unneeded text. Then, finally, to clear all the tags that didn't have a filename in them, just replace <.*> with an empty string. Finally, use Edit>Line Operations>Remove empty lines, and you'll have the result you're looking for. It's not a 100% regex solution, but this is a one time action that you just need a simple result from.

Trying to match src part of HTML <img> tag Regular Expression

I've got a bunch of strings already separated from an HTML file, examples:
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
I am trying to make a regular expression that will grab the src="URL" part of the img tag so that I can replace it later based on a few other conditions. The many instances of quotation marks are giving me the biggest problem, I'm still relatively new with Regex, so a lot of the tricks are out of my knowledge,
Thanks in advance
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
Example:
$html = <<<DATA
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
//i.imgur.com/tApg8ebb.jpg
//i.imgur.com/SwmwL4Gb.jpg
//s.imgur.com/images/blog_rss.png
If you would rather store the results in an array, you could do..
foreach ($imgs as $img) {
$sources[] = $img->getAttribute('src');
}
print_r($sources);
Output
Array
(
[0] => //i.imgur.com/tApg8ebb.jpg
[1] => //i.imgur.com/SwmwL4Gb.jpg
[2] => //s.imgur.com/images/blog_rss.png
)
$pattern = '/<img.+src="([\w/\._\-]+)"/';
I'm not sure which language you're using, so quote syntax will vary.

'TypeError: expected string or buffer' when performing re.sub on beautiful soup result set iteration

content_a is a beautiful soup result set (ie the type is <class 'bs4.element.ResultSet'>) that is made up of values whose type is <class 'bs4.element.Tag'>.
If i print 'content_a' i get:
[<div class="class1 class2">Here is the first sentence.
<br/> <br/> Here is some text "and some more text."
<br/> <br/> Here is another sentence.
<br/> Text<br/><span class="class3">Text</span></div>, <div class="class1 class2">Here is the first sentence.
<br/> <br/> Here is some text "and some more text."
<br/> <br/> Here is another sentence.
<br/> Text<br/><span class="class3">Text</span></div>, etc
So it seems to me it should be a simple iterable list of divs.
I am wanting to replace <div class="class1 class2"> with <div class="class1 class2"><p> (my eventual goal being to replace all <br />'s with paragraph tags).
In my test where the source content is a string I have:
import re
blablabla = ['<div class="class1 class2">', '<div class="class1 class2">']
for _ in blablabla:
_ = re.sub('(<div class=\"class1 class2\">)', r"\1<p>",_)
print _
which returns, as required:
<div class="class1 class2"><p>
<div class="class1 class2"><p>
I am trying to perform the same process on each iterable in content_a with:
import re
for _ in content_a:
_ = re.sub('(<div class=\"class1 class2\">)', r"\1<p>",_)
print _
but am getting the error:
...in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
So the only difference that i can tell between the two examples is that one is a beautiful soup result set and one is just a plain list.
Can anyone see why this error could be occuring?
Edit:
Someone has pointed out here that sub requires a string as the third argument, so the third argument that i am passing is the iterable value which is of type <class 'bs4.element.Tag'>. So perhaps this is the problem. But i need to retain the nature of these values for later modification so i am not sure how to proceed at the moment.
Update/Workaround:
Just to save someone spending time on an answer, i figured out a workaround, basically i realised i could adjust the content later in the process and i did this by converting it to a string with read() and could then perform all the re.sub changes on the required elements in the string.
And the little regex i came up with was:
string = re.sub('([^\r]*)\r', r'\1</p>\n<p>', string)
As suggested, I am posting the workaround I used as the solution:
Update/Workaround:
Just to save someone spending time on an answer, I figured out a workaround, basically I realised I could adjust the content later in the process and i did this by converting it to a string with read() and could then perform all the re.sub changes on the required elements in the string.
And the little regex I came up with was:
string = re.sub('([^\r]*)\r', r'\1</p>\n<p>', string)

Extracting variables from string, regular expression?

My puzzle: as a PHP newby I am trying to extract some data from a string using a regular expression, but I cannot find a correct syntax.
The content of the string is scraped as html of several images from a website, I want the final output to be 3 seperate variables: "$Number1", "$Number2" and "$Status".
An example of the content of the input string $html:
<div id="system">
<img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt=".5" height="35" src="/images/numbers/point5.jpg" style="margin-left: -4px" width="26" /><img alt="system statusA" height="35" src="/images/numbers/statusA.jpg" width="37" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="1" height="35" src="/images/numbers/1.jpg" width="18" /><img alt=".0" height="35" src="/images/numbers/point0.jpg" style="margin-left: -4px" width="26" />
</div>
The possible values which can appear in this string are:
0.jpg
1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
6.jpg
7.jpg
8.jpg
9.jpg
point0.jpg
point5.jpg
statusA.jpg
statusB.jpg
statusC.jpg
statusD.jpg
statusE.jpg
statusF.jpg
The result should be variables:
"Number1" (XX.X) based upon the first two numbers (0-9) and .0 or .5
"Status" (statusX) based upon the status
"Number2" (XX.X) based upon the last two numbers (0-9) and .0 or .5
Code so far:
$regex = '\balt='(.*?)';
preg_match($regex,$html,$match);
var_dump($match);
echo $match[0];
Probably I have to do this in multiple steps or use another function, who can help me?
The very first thing that you should ask yourself is: "in what format is my input data". Since in this case it is clearly a snippet of HTML, you should feed that snippet to an HTML parser, and not to a regular expression engine.
I don't know the exact function names, but your code should look like this:
$htmltext = '<div id="system">[...]</div>';
$htmltree = htmlparser_parse($htmltext);
$images = $htmltree->find_all('img');
foreach ($images as $image) {
echo $image->src;
}
So you need to find an HTML parser that parses a string into a tree of nodes. The nodes should have methods for finding node inside them based on CSS classes, element names or node IDs. For Python this library is called BeautifulSoup, for Java it is JSoup, and I'm sure that there is something similar for PHP.
The examples provided with simplehtmldom look promising.
Possibly DOM : http://www.php.net/manual/en/book.dom.php
See Robust and Mature HTML Parser for PHP too
You want just the alt's? Try this xpath example:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXpath($doc);
foreach($xpath->query('//img/#alt') as $node){
echo $node->nodeValue."\n";
}