C++ Runtime string formatting - c++

Usually I use streams for formatting stuff however in this case ?I don't know the format until runtime.
I want to be able to take something like the following format string:
Hello {0}! Your last login was on {1,date:dd/mm/yy}.
...and feed in the variables "Fire Lancer" and 1247859223, and end up with the following formatted string:
Hello Fire Lancer! Your last login was on 17/07/09.
In other languages I use there is built in support for this kind of thing, eg pythons format string method, however in c++ there doesn't seem to be any such functionality, accept the C print methods which are not very safe.
Also this is for a high performance program, so whatever solution I use needs to parse the format string once and store it (eg mayby a Parse method that returns a FormatString object with a Format(string) method), not reparse the string every time the format method is called...

Your format string looks very much like those used in ICU MessageFormat. Did you consider using it?

Boost Formatting does that for you:
http://www.boost.org/doc/libs/1_39_0/libs/format/doc/format.html
Check out this question and answer for examples of usage:

boost::format will do the positional arguments portion, but not the date formatting...

Related

LLDB summary strings without quotes

Lets say that I have a c++ class that contains two c strings like below.
class PathExample {
char* partA; // Eg: "/some/folder/"
char* partB; // Eg: "SomeFile.txt"
}
I can make an lldb summary string for it:
type summary add PathExample --summary-string "${var.partA}${var.partB}"
However this adds unnecessary and confusing quotes "/some/folder/""SomeFile.txt".
How can I format the type summary string to not use quotes, or at least append the strings before adding quotes? Eg: "/some/folder/SomeFile.txt"
"Remove leading or trailing quote in the summary value" before adding to the output is a not supported by the summary string formatting options. We're trying to keep those options fairly streamlined, and that's a bit too much of a special purpose feature.
The thing that allows us to keep the summary string version fairly restrained is that you can always write a Python summary, which allows you to format up the output in whatever way you like. There's an example that's somewhat like what you want in the section on Python scripting:
https://lldb.llvm.org/use/variable.html#python-scripting
You wouldn't use GetValueAsUnsigned as that example does. The C-string rendering of char * types is actually done by a built-in summary, so you would use "SBValue.GetSummary" to get the string value. That's actually the same thing that's substituted into the summary string so it also has the quotes on it. But in Python it's trivial to strip the leading and trailing quotes before concatenating the two strings.
Note, though it's convenient for playing around with, you don't have to define the Python summary callback inline as shown in the example. You can put a function with the correct signature in a .py file somewhere, use command script import <path to .py file> and then import it using the -F option to type summary add. Remember to use the full name of the function (module_name.func_name) when you specify it. I have a bunch of these in a ~/.lldb directory and command script import them in my ~/.lldbinit.
help type summary add has some more details on how to do this.

Parsing JSON style text using sscanf()

["STRING", FLOAT, FLOAT, FLOAT],
I need to parse three values from this string - a STRING and three FLOATS.
sscanf() returns zero, probably I got the format specifiers wrong.
sscanf(current_line.c_str(), "[\"%s[^\"]\",%f,%f,%f],",
&temp_node.id,
&temp_node.pos.x,
&temp_node.pos.y,
&temp_node.pos.z))
Do you know what's wrong?
Please read the manual page on sscanf(3). The %s format does not match using a regular expression, it just scans non-whitespace characters. Even if it worked as you assumed, your regular expression would not be able to handle all JSON strings correctly (which might not be a problem if your input data format is sufficiently restricted, but would be unclean nonetheless).
Use a proper JSON parser. It's not really complicated. I used cJSON for a moderately complex case, you should be able to integrate it within a few hours.
To fix your immediate problem, use this format specifier:
"[\"%[^\"]s\",%f,%f,%f],"
The right syntax for parsing a set is %[...]s instead of %s[...].
That being said, sscanf() is not the right tool for parsing JSON. Even the "fixed" code would fail to parse strings that contain escaped quotes, for instance.
Use a proper JSON parser library.

Escaping and unescaping HTML

In a function I do not control, data is being returned via
return xmlFormat(rc.content)
I later want to do a
<cfoutput>#resultsofreturn#</cfoutput>
The problem is all the HTML tags are escaped.
I have considered
<cfoutput>#DecodeForHTML(resultsofreturn)#</cfoutput>
But I am not sure these are inverses of each other
Like Adrian concluded, the best option is to implement a system to get to the pre-encoded value.
In the current state, the string your working with is encoded for an xml document. One option is to create an xml document with the text and parse the text back out of the xml document. I'm not sure how efficient this method is, but it will return the text back to it's pre-encoded value.
function xmlDecode(text){
return xmlParse("<t>#text#</t>").t.xmlText;
}
TryCF.com example
As of CF 10, you should be using the newer encodeFor functions. These functions account for high ASCII characters as well as UTF-8 characters.
Old and Busted
XmlFormat()
HTMLEditFormat()
JSStringFormat()
New Hotness
encodeForXML()
encodeForXMLAttribute()
encodeForHTML()
encodeForHTMLAttribute()
encodeForJavaScript()
encodeForCSS()
The output from these functions differs by context.
Then, if you're only getting escaped HTML, you can convert it back using Jsouo or the Jakarta Commons Lang library. There are some examples in a related SO answer.
Obviously, the best solution would be to update the existing function to return either version of the content. Is there a way to copy that function in order to return the unescaped content? Or can you just call it from a new function that uses the Java solution to convert the HTML?

Parse a string for open and close tags

Let's say I have the following strings:
"This [color=RGB]is[\color] a string."
"This [color=RGB][bold]is[\bold][\color] another string."
What I'm looking for is a good way to parse the string in order to extract the tag information and then reconstruct the original string without tags.
The tag informations will be used during text rendering.
Obviously I can achieve the goal by working directly with strings (find/substr/replace and so on), but I'm asking if there is another way cleaner, for example using regular expression.
Note:
There are very few tags I need, but there is the possibility to nest them (only of different type).
Can't use Boost.
There's a very simple answer that might work, depending on the complexity of your strings. (And me understanding you correctly, i.e. you just want to get the cleaned up strings, not actually extract the tags.) Just remove all tags. Replace
\[.*?]
with nothing. Example here
Now, if your string should be able to contain tag-like objects this might not work.
Regards

ICU Custom Currency Formatting (C++)

Is it possible to custom format currency strings using the ICU library similar to the way it lets you format time strings by providing a format string (e.g. "mm/dd/yyy").
So that for a given locale (say USD), if I wanted I could have all currency strings come back "xxx.00 $ USD".
See http://icu-project.org/apiref/icu4c/classDecimalFormat.html,
Specifically: http://icu-project.org/apiref/icu4c/classDecimalFormat.html#aadc21eab2ef6252f25eada5440e3c65
For pattern syntax see: http://icu-project.org/apiref/icu4c/classDecimalFormat.html#_details
I didn't used this but from my knowledge of ICU this is the direction.
However I would suggest to use:
http://icu-project.org/apiref/icu4c/classNumberFormat.html and createCurrencyInstance member and then use setMaximumIngegerDigits or other functions to make what you need -- that would be much more localized. Try not assume anything about any culture. Because "10,000 USD" my be misinterpreted as "$ 10" in some countries where "," used for fraction part separation.
So be careful.
You can create a currency instance, then if it is safe to cast it to a DecimalFormat
if (((const NumberFormat*)fmt)->getDynamicClassID() == DecimalFormat::getStaticClassID())
{ const DecimalFormat* df = (const DecimalFormat*) fmt; ...
… then you can call applyPattern on it. See the information on ¤, ¤¤, ¤¤¤ under 'special pattern chars'
Use the ICU library's createCurrencyInstance().