What's an example of some invalid markdown? - unit-testing

I'm writing unit tests for a model with an attribute that's interpreted as markdown. I'd like to test that if the markdown is invalid, then the object is invalid - but it's such a forgiving syntax that everything I've tried so far turns out to be valid markdown! What's an example of some invalid markdown?

I haven't used markdown extensively but i was under the impression that it is impossible to write "invalid" markdown only markdown that wont do what you want it to. As in instead of throwing an error when it doesn't know what to do it just treats it as plain text.
On a different path one could probably write a script to try and identify things that the user probably didn't intend, for example if someone entered **test* they probably intended *test* or **test**

All strings are valid markdown.

If all text is markdown and vice versa, then I suppose one example of invalid markdown would be invalid text in the encoding that you are using, i.e. invalid UTF-8, invalid ASCII or invalid ISO-8859-1.

Related

Parsing JSON style text using sscanf()

["STRING", FLOAT, FLOAT, FLOAT],
I need to parse three values from this string - a STRING and three FLOATS.
sscanf() returns zero, probably I got the format specifiers wrong.
sscanf(current_line.c_str(), "[\"%s[^\"]\",%f,%f,%f],",
&temp_node.id,
&temp_node.pos.x,
&temp_node.pos.y,
&temp_node.pos.z))
Do you know what's wrong?
Please read the manual page on sscanf(3). The %s format does not match using a regular expression, it just scans non-whitespace characters. Even if it worked as you assumed, your regular expression would not be able to handle all JSON strings correctly (which might not be a problem if your input data format is sufficiently restricted, but would be unclean nonetheless).
Use a proper JSON parser. It's not really complicated. I used cJSON for a moderately complex case, you should be able to integrate it within a few hours.
To fix your immediate problem, use this format specifier:
"[\"%[^\"]s\",%f,%f,%f],"
The right syntax for parsing a set is %[...]s instead of %s[...].
That being said, sscanf() is not the right tool for parsing JSON. Even the "fixed" code would fail to parse strings that contain escaped quotes, for instance.
Use a proper JSON parser library.

Convert \xc3\xd8\xe8\xa7\xc3\xb4\xd to human readable format

I am having trouble converting '\xc3\xd8\xe8\xa7\xc3\xb4\xd' (which is a Thai text) to a readable format. I get this value from a smart card, and it basically was working for Windows but not in Linux.
If I print in my Python console, I get:
����ô
I tried to follow some google hints but I am unable to accomplish my goal.
Any suggestion is appreciated.
Your text does not seem to be a Unicode text. Instead, it looks like it is in one of Thai encodings. Hence, you must know the encoding before printing the text.
For example, if we assume your data is encoded in TIS-620 (and the last character is \xd2 instead of \xd) then it will be "รุ่งรดา".
To work with the non-Unicode strings in Python, you may try: myString.decode("tis-620") or even sys.setdefaultencoding("tis-620")

ColdFusion -- Do I need URLDecode with form POSTs? / URLDecode randomly removes one character

I'm using a WYSIWYG to allow users to format text. This is the error-causing text:
<p><span style="line-height: 115%">This text starts with a 'T'</span></p>
The error is that the 'T' in "This", or whatever the first letter happens to be, is randomly removed when using URLDecode and saving to the DB. Removing URLDecode on the server side seems to fix it without any negative side-effects (the DB contains the same information).
The documentation says that
Query strings in HTTP are always URL-encoded.
Is this really the case? If so, why doesn't removing URLDecode seem to mess everything up?
So two questions:
Why is URLDecode causing the first text character to be removed like this (it seems to only happen when the line-height property is present)?
Do I really need (or would I even want) to use URLDecode before putting POSTed data into the database?
Edit: I made a test page to echo back the decoded text, and URLDecode is definitely removing that character, but I have no idea why.
I believe decoding is done automatically when form scope is populated. That's why characters after % (this char is used for encoding) are removed -- you are trying to decode the string second time.
For security reasons you might be interested in stripping script tags, or even cleaning up HTML using white-list. Try to search in CFLib.org for applicable functions.

Can an tinyxml someone explain which characters need to be escaped?

I am using tinyxml to save input from a text ctrl. The user can copy whatever they like into the text box and it gets written to an xml file. I'm finding that the new lines don't get saved and neither do & characters. The weird part is that tinyxml just discards them completely without any warning. If I put a & into the textbox and save, the tag will look like:
<textboxtext></textboxtext>
newlines completely disappear as well. No characters whatsoever are stored. What's going on? Even if I need to escape them with &amp or something, why does it just discard everything? Also, I can't find anything on google regarding this topic. Any help?
EDIT:
I found this topic which suggest the discarding of these characters may be a bug.
TinyXML and preserving HTML Entities
It is, apparently, a bug in TinyXml.
The simple workaround is to escape anything that it might not like:
&, ", ', < and > got their regular xml entities encoding
strange characters (read non-alphanumerical / regular punctuation) are best translated to their unicode codepoint: &#....;
Remember that TinyXml is before all a lightweight xml library, not a full-fledged beast.

Detecting Characters in an XSLT

I have encountered some odd characters that do not display properly in Internet Explorer, such as these: “, –, and ’. I think they're carried over from copy-and-paste Word content.
I am using XSLT to build the page content and it would be great to detect these characters in the XSLT and replace them with valid HTML codes. I already do string replacement in the style sheet, but I'm not sure how detect these encoded characters or whether it's possible.
What about simply changing the encoding for the Stylesheet as well as its output to UTF-8? The characters you mention are “, – and ’. Certainly not invalid or so, given the correct encoding (the characters are at least perfectly valid in Codepage 1252).
Using a good XML editor such as XMLSpy should highlight any errors in formatting your XSLT by validating at development time.
Jeni Tennison's Multiple string replacements may be a good starting point.