I'm fairly new to Ember, but I'm on v1.12 and struggling with the following problem.
I'm making a template helper
The helper takes the bodies of tweets and HTML anchors around the hashtags and usernames.
The paradigm I'm following is:
use Ember.Handlebars.Utils.escapeExpression(value); to escape the input text
do logic
use Ember.Handlebars.SafeString(value);
However, 1. seems to escape apostrophes. Which means that any sentences I pass to it get escaped characters. How can I avoid this whilst making sure that I'm not introducing potential vulnerabilities?
Edit: Example code
export default Ember.Handlebars.makeBoundHelper(function(value){
// Make sure we're safe kids.
value = Ember.Handlebars.Utils.escapeExpression(value);
value = addUrls(value);
return new Ember.Handlebars.SafeString(value);
});
Where addUrlsis a function that uses a RegEx to find and replace hashtags or usernames. For example, if it were given #emberjs foo it would return #emberjs foo.
The result of the above helper function would be displayed in an Ember (HTMLBars) template.
escapeExpression is designed to convert a string into the representation which, when inserted in the DOM, with escape sequences translated by the browser, will result in the original string. So
"1 < 2"
is converted into
"1 < 2"
which when inserted into the DOM is displayed as
1 < 2
If "1 < 2" were inserted directly into the DOM (eg with innerHTML), it would cause quite a bit of trouble, because the browser would interpret < as the beginning of a tag.
So escapeExpression converts ampersands, less than signs, greater than signs, straight single quotes, straight double quotes, and backticks. The conversion of quotes is not necessary for text nodes, but could be for attribute values, since they may enclosed in either single or double quotes while also containing such quotes.
Here's the list used:
var escape = {
"&": "&",
"<": "<",
">": ">",
'"': """,
"'": "'",
"`": "`"
};
I don't understand why the escaping of the quotes should be causing you a problem. Presumably you're doing the escapeExpression because you want characters such as < to be displayed properly when output into a template using normal double-stashes {{}}. Precisely the same thing applies to the quotes. They may be escaped, but when the string is displayed, it should display fine.
Perhaps you can provide some more information about input and desired output, and how you are "printing" the strings and in what contexts you are seeing the escaped quote marks when you don't want to.
Related
I would like to know how I can insert regular expression in a table column in a Oracle table.
insert into rule_master(rule)
values('^[0-how #'ff#'9]+$') where rule_id='7'
...but I am getting error syntax near where is wrong. I tried this with and with out single quotes. Please suggest me a solution.
Aside from the invalid syntax using where, you also need to escape the single quotes in your string by doubling them up:
A single quotation mark (') within the literal must be preceded by an escape character. To represent one single quotation mark within a literal, enter two single quotation marks.
so with a normal text literal:
insert into rule_master(rule) values('^[0-how #''ff#''9]+$')
^^ ^^
or you can use the alternative quoting mechanism syntax, if you can identify a quote_delimiter character that will never appear in the value (or at least not immediately before a single quote); e.g. if you know # will never appear you can use a pattern like:
values(q'#<your actual value>#')
i.e.:
insert into rule_master(rule) values(q'#^[0-how #'ff#'9]+$#')
^ ^ ^
If the where part is supposed to be populating that column at the same time then the syntax would be more like:
insert into rule_master(rule_id, rule)
values(7, q'#^[0-how #'ff#'9]+$#')
and if a row with that ID already exists you should be using update rather than insert:
update rule_master
set rule = q'#^[0-how #'ff#'9]+$#'
where rule_id = 7
or perhaps merge if you aren't sure.
In a .csv file I have lines like the following :
10,"nikhil,khandare","sachin","rahul",viru
I want to split line using comma (,). However I don't want to split words between double quotes (" "). If I split using comma I will get array with the following items:
10
nikhil
khandare
sachin
rahul
viru
But I don't want the items between double-quotes to be split by comma. My desired result is:
10
nikhil,khandare
sachin
rahul
viru
Please help me to sort this out.
The character used for separating fields should not be present in the fields themselves. If possible, replace , with ; for separating fields in the csv file, it'll make your life easier. But if you're stuck with using , as separator, you can split each line using this regular expression:
/((?:[^,"]|"[^"]*")+)/
For example, in Python:
import re
s = '10,"nikhil,khandare","sachin","rahul",viru'
re.split(r'((?:[^,"]|"[^"]*")+)', s)[1::2]
=> ['10', '"nikhil,khandare"', '"sachin"', '"rahul"', 'viru']
Now to get the exact result shown in the question, we only need to remove those extra " characters:
[e.strip('" ') for e in re.split(r'((?:[^,"]|"[^"]*")+)', s)[1::2]]
=> ['10', 'nikhil,khandare', 'sachin', 'rahul', 'viru']
If you really have such a simple structure always, you can use splitting with "," (yes, with quotes) after discarding first number and comma
If no, you can use a very simple form of state machine parsing your input from left to right. You will have two states: insides quotes and outside. Regular expressions is a also a good (and simpler) way if you already know them (as they are basically an equivalent of state machine, just in another form)
I need to use RegEx to run through a string of text but only return that parts that I need. Let's say for example the string is as follows:
1234,Weapon Types,100,Handgun,"This is the text, "and", that is all."""
\d*,Weapon Types,(\d*),(\w+), gets me most of the way, however it is the last part that I am having an issue with. Is there a way for me to capture the rest of the string i.e.
"This is the text, "and", that is all."""
without picking up the quotes? I've tried negating them, however it just stops the string at the quote.
Please keep in mind that the text for this string is unknown so doing literal matches will not work.
You've given us something very difficult to solve. It's okay that you have nested commas inside your string. Once we come across a double-quote, we can ignore everything until the end quote. This would gooble up commas.
But how will your parser know that the next double-quote isn't ending the string. How does it know that it a nested double-quote?
If I could slightly modify your input string to make it clear what is a nested quote, then parsing is easy...
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, "and", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""([^""]+)""");
MessageBox.Show(m.Groups[3].Value);
But if your input string must have nested quotes like that, then we must come up with some other rule for detecting what is the real end of the string. How about this?
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, \"and\", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""(.+)"",");
MessageBox.Show(m.Groups[3].Value);
The result is...
This is the text, "and", that is all.
Let's say I have:
<cfscript>
arrButtons = [
{
"name" = "Add",
"bclass" = "add",
"onpress" = "addItem"
},
{
"name" = "Edit",
"bclass" = "edit",
"onpress" = "editItem"
},
{
"name" = "Delete",
"bclass" = "delete",
"onpress" = "deleteItem"
}
];
jsButtons = SerializeJSON(arrButtons);
// result :
// [{"onpress":"addItem","name":"Add","bclass":"add"},{"onpress":"editItem","name":"Edit","bclass":"edit"},{"onpress":"deleteItem","name":"Delete","bclass":"delete"}]
</cfscript>
For every onpress item, I need to remove the double quotes from its value to match the JS library requirement (onpress value must a callback function).
How do I remove the double quotes using a regular expression?
The final result must be:
[{"onpress":addItem,"name":"Add","bclass":"add"},{"onpress":editItem,"name":"Edit","bclass":"edit"},{"onpress":deleteItem,"name":"Delete","bclass":"delete"}]
No double quotes surrounding addItem, editItem, and deleteItem.
Edit 2012-07-13
Why I need this? I created a CFML function that the result is a collection of JS that will be used in many files. jsButton object will be used as one part of the options available in the JS library. One of that function's arguments is an array of struct (the default is arrButtons), and the supplied arguments value can merge with the default value.
Since we can't (in CFML) write onpress value without double quotes, so I have to add double quotes to that value, and convert the (CFML) array of struct to JSON (which is just a string) and remove the double quotes before place it in the JS library option.
with Railo, we can declare the struct as a linked struct to make sure we have same ordered key for loop or conversion (from above example onpress always the latest key in the struct). with this linked struct and same key order, we can remove the double quotes with simple Replace function, but of course we can't guarantee every programmer who use the CFML function doesn't forget to use linked struct and key order same as example above
I'm not sure this is actually necessary - depending on how/where you're dealing with the JS callbacks, it might be possible to use the string function names to reference the function without needing to remove the quotes (i.e. object[button.onpress]).
However, since you asked, here is a regex solution:
jsButtons = jsButtons.replaceAll('(?<="onpress":)"([^"]+)"','$1');
The regex there is made up of two parts:
(?<="onpress":) -- lookbehind to ensure we are dealing with the text "onpress":
"([^"]+)" -- match the quotes and capture their contents.
The $1 on the replacement side is to replace the matched text (i.e. the entire quoted value) with the first capture group (i.e. the contents of the quotes).
If case-sensitivity of "onpress" might be an issue, you can prefix the regex with (?i) to ignore case.
If there will be multiple different events (not just "onpress") you can update the relevant part of the expression above to be (?<="on(?:press|hover|squeek)":) etc.
Note: All the above relies on the format output from serializeJson not changing - if it's possible that there might be comments, whitespace, single quotes, or anything else in future then a longer expression would be needed to cater for those - which is part of why you should investigate if you even need regex to solve this problem in the first place.
What you're wanting to output is not JSON, so using SerializeJSON is a kludge.
Is there any reason you are putting it into a ColdFusion Array first, instead of writing the Javascript directly?
JSON is purely meant to be a data description language. Per
http://www.json.org, it is a "lightweight data-interchange format." -
not a programming language.
Per http://en.wikipedia.org/wiki/JSON, the "basic types" supported
are:
Number (integer, real, or floating point)
String (double-quoted Unicode with backslash escaping)
Boolean (true and false)
Array (an ordered sequence of values, comma-separated and enclosed in square brackets)
Object (collection of key:value pairs, comma-separated and enclosed in curly braces)
null
--Source
I guess in this case you can simply use serialize(). That should do the trick...
Gert
I need to find and delete all the non standard ascii chars that are in a string (usually delivered there by MS Word). I'm not entirely sure what these characters are... like the fancy apostrophe and the dual directional quotation marks and all that. Is that unicode? I know how to do it ham-handed [a-z etc. etc.] but I was hoping there was a more elegant way to just exclude anything that isn't on the keyboard.
Probably the best way to handle this is to work with character sets, yes, but for what it's worth, I've had some success with this quick-and-dirty approach, the character class
[\x80-\x9F]
this works because the problem with "Word chars" for me is the ones which are illegal in Unicode, and I've got no way of sanitising user input.
Microsoft apps are notorious for using fancy characters like curly quotes, em-dashes, etc., that require special handling without adding any real value. In some cases, all you have to do is make sure you're using one of their extended character sets to read the text (e.g., windows-1252 instead of ISO-8859-1). But there are several tools out there that replace those fancy characters with their plain-but-universally-supported ewquivalents. Google for "demoronizer" or "AsciiDammit".
I usually use a JEdit macro that replaces the most common of them with a more ascii-friendly version, i.e.:
hyphens and dashes to minus sign;
suspsension dots (single char) to multiple dots;
list item dot to asterisk;
etc.
It is easily adaptable to Word/Openoffice/whatever, and of course modified to suit your needs. I wrote an article on this topic:
http://www.megadix.it/node/138
Cheers
What you are probably looking at are Unicode characters in UTF-8 format. If so, just escape them in your regular expression language.
My solution to this problem is to write a Perl script that gives me all of the characters that are outside of the ASCII range (0 - 127):
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
for my $character (grep { ord($_) > 127 } split //) {
$seen{$character}++;
}
}
print "saw $_ $seen{$_} times, its ord is ", ord($_), "\n" for keys %seen;
I then create a mapping of those characters to what I want them to be and replace them in the file:
#!/usr/bin/perl
use strict;
use warnings;
my %map = (
chr(128) => "foo",
#etc.
);
while (<>) {
s/([\x{80}-\x{FF}])/$map{$1}/;
print;
}
What I would do is, use AutoHotKey, or python SendKeys or some sort of visual basic that would send me all possible keys (also with shift applied and unapplied) to a Word document.
In SendKeys it would be a script of the form
chars = ''.join([chr(i) for i in range(ord('a'),ord('z'))])
nums = ''.join([chr(i) for i in range(ord('0'),ord('9'))])
specials = ['-','=','\','/',','.',',','`']
all = chars+nums+specials
SendKeys.SendKeys("""
{LWIN}
{PAUSE .25}
r
winword.exe{ENTER}
{PAUSE 1}
%(all)s
+(%(all)s)
"testQuotationAndDashAutoreplace"{SPACE}-{SPACE}a{SPACE}{BS 3}{LEFT}{BS}
{Alt}{PAUSE .25}{SHIFT}
changeLanguage
%(all)s
+%(all)s
"""%{'all':all})
Then I would save the document as text, and use it as a database for all displable keys in your keyboard layout (you might want to replace the default input language more than once to receive absolutely all displayable characters).
If the char is in the result text document - it is displayable, otherwise not. No need for regexp. You can of course afterward embed the characters range within a script or a program.