Trace32 - reading ASCII from memory into macro/variable - trace32

I would like to know the proper way of reading a memory chunk which is plain ASCII into a variable (Practice macro). This is apparently possible for simple integer types, with
Data.Long(). However, conversion to ASCII would be cumbersome.
I tried finding this in Lauterbach manuals but in this area they are quite misleading, mixing visualization in UI with data processing. I do not need this in the UI. I only need this string in a variable for further processing.

Use the Data.STRING() function to store zero-terminated strings into a PRACTICE macro:
&text=Data.STRing(<address>)

Related

What's the best way to store binary

Ive recently implemented Hoffman compression in c++, if I were to store the results as binary it would take up a lot more space as each 1 and 0 is a character. Alternatively I was thinking maybe I could break the binary into sections of 8 and put characters in the text file, but that would kinda be annoying (so hopefully that can be avoided). My question here is what is the best way to store binary in a text file in terms of character efficietcy?
[To recap the comments...]
My question here is what is the best way to store binary in a text file in terms of character efficiently?
If you can store the data as-is, then do so (in other words, do not use any encoding; simply save the raw bytes).
If you need to store the data within a text file (for instance as a paragraph or as a quoted string), then you have many ways of doing so. For instance, base64 is a very common one, but there are many others.

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

What to use to store Unicode (UTF-16) strings? (C++11)

I ask this question in the light of the innovations that C++11 brings, namely uchar16_t/u16string.
I write an application that should have multilingual support. According to my plan the localization strings will be stored in XML as UTF-16, and retrieved with pugixml. THe strings will be used both for the GUI and generating HTML report of the computation results. Since I have understood wchar_t/wstring as being deprecated in favour of new u16string, I've planned to use u16string for storing language strings inside the program.
But since both pugixml and MFC's CString use wchar_t as underlining storage type for the Unicode, should I perhaps forget about u16string for now and instead use straightforwardly wstring?
Language-portability is crucial, platform portability doesn't matter.
I use MVS 2013 with Intel compiler.
The encoding used for storing the data outside the program is the only one that matters.
That data is likely to be used from other software. Someone will want to write those strings and they'll probably use some kind of specialised editor or gasp a general-purpose text editor. UTF-8 has much better support from other software than UTF-16, and that's what I would recommend and why.
Inside the program, what encoding you use doesn't matter, as long as you do it consistently and don't mix them up in stupid ways.
Obviously, if you use the same encoding inside the program as you do outside of it, you don't need to perform any conversions and the risk of mixing them up and producing mojibake is not there.
The thing with pugixml using wchar_t is that the encoding it uses then depends on the size of wchar_t. If the size is 2, it uses UTF-16; if the size is 4 it uses UTF-32. pugixml also has the option to use UTF-8 with char by setting the PUGIXML_WCHAR_MODE macro appropriately, so you can use that instead.
If you use wchar_t API, stick to wstring. Remember: since we're inside the program, it doesn't matter if it's going to be UTF-16 or UTF-32, as long as we're consistent. If you use the char API, stick to string. You could, I guess, perform conversions from wchar_t to char16_t and use u16strings, but that wouldn't give much benefit.
The saving and loading functions in pugixml take an xml_encoding parameter that lets you pick what encoding will be on the data outside the program, and that doesn't have to match what you use internally. Pick whichever you find the most convenient.

String-Conversion: MBCS <-> UNICODE with multiple \0 within

I am trying to convert a std::string Buffer - containing data from a bitmap file - to std::wstring.
I am using MultiByteToWideChar, but that does not work, because the function stops after it encounters the first '\0'-character. Seems like it interprets it as the end of the string.
When i dont pass -1 as the length-parameter, but the real length of the data in the std::string-Buffer, it messes the Unicode-String up with characters that definetly not appeared at that position in the original string...
Do I have to write my own conversion function?
Or maybe shall i keep the data as a casual char-array, because the special-symbols will be converted incorrectly?
With regards
There are many, many things that will fail with this approach. Among other things, extra bytes may be added to your data without your realizing it.
It's odd that your only option takes a std::wstring(). If this is a home-grown library, you should take the trouble to write a new function. If it's not, make sure there's nothing more suitable before writing your own.

(Encoded) String handling in C++ - questions / best practices?

What are the best practices for handling strings in C++? I'm wondering especially how to handle the following cases:
File input/output of text and XML files, which may be written in different encodings. What is the recommended way of handling this, and how to retrieve the values? I guess, a XML node may contain UTF-16 text, and then I have to work with it somehow.
How to handle char* strings. After all, this can be unsigned or not, and I wonder how I determine what encoding they use (ANSI?), and how to convert to UTF-8? Is there any recommended reading on this, where the basic guarantees of C/C++ about strings are documented?
String algorithms for UTF-8 etc. strings -- computing the length, parsing, etc. How is this done best?
What character type is really portable? I've learned that wchar_t can be anything from 8-32 bit wide, making it no good choice if I want to be consistent across platforms (especially when moving data between different platforms - this seems to be a problem, as described for example in EASTL, look at item #13)
At the moment, I'm using std::string everywhere, with a small helper utility to convert to UTF-16 when calling Unicode-APIs, but I'm pretty sure that this is not really the best way. Using something like Qt's QString or the ICU String class seems to be right, but I wonder whether there is a more lightweight approach (i.e. if my char strings are ANSI encoded, and the subset of ANSI that is used is equal to UFT-8, then I can easily treat the data as UTF-8 and provide converters from/to UTF-8, and I'm done, as I can store it in std::string, unless there are problems with this approach).
For a shorter answer, I would just recommend using UTF-16 for simplicity; Java/C#/Python 3.0 switched to that model exactly for simplicity.
I've always expected wchar_t to be 16 or 32bit wide, and many platforms support that; indeed, APIs like wcrtomb() do not allow an implementation to support a shift state for wchar_t*, but since UTF-8 needs none, it may be used, while other encodings are ruled out.
Then, I answer the question about XML.
File input/output of text and XML files, which may be written in different encodings. What is the recommended way of handling this, and how to retrieve the values? I guess, a XML node may contain UTF-16 text, and then I have to work with it somehow.
I'm not sure, but I don't think so.
Mixing two encodings in the same file is asking for trouble and data corruption.
Encoding a file in UTF-16 is usually a bad choice since most programs rely on using ASCII everwhere.
The issue is: an XML file might use any single encoding, maybe even UTF-16, but then also the initial encoding declaration has to use UTF-16, and even the tags then. The problem I see with UTF-16 is: how should one reliable parse the initial declaration? The answer comes in the specification:, ยง 4.3.3:
In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.
When reading that, note that also an XML file is an entity, called the document entity; in general, an entity is a storage unit for the document. From the whole specification, I'd say that only one encoding declaration is allowed for each entity, and I'd convert all entities to UTF-16 when reading them for easier handling.
Webography:
http://www.w3.org/TR/REC-xml/, XML spec.
http://www.xml.com/axml/testaxml.htm, Annotated XML spec.
String algorithms for UTF-8 etc. strings -- computing the length, parsing, etc. How is this done best?
mbrlen gives you the length of a C string. I don't think std::string can be used for multibyte strings, you should use wstring for wide ones.
In general, you should probaby stick with UTF-16 inside your program and use UTF-8 only on I/O (I don't know well other options, but they are surely more complex and error-prone).
How to handle char* strings. After all, this can be unsigned or not, and I wonder how I determine what encoding they use (ANSI?), and how to convert to UTF-8? Is there any recommended reading on this, where the basic guarantees of C/C++ about strings are documented?
Basically, you can use any encoding, and you will happen to use the native encoding of the system on which you are running on, as long as it's an 8-bit encoding. C was born for ASCII, and locale handling was an afterthought. For years, each system understood mostly one native encoding, say ISO-8859-x, and files from another encoding could even be non-representable.
Since for UTF-8 strings one byte is not always one character, I guess that the safest bet is to use multibyte string for them. The C manuals I used described multibyte string in abstract, without details on those issues (in particular, on the used encoding). For C, see functions like mbrlen and mbrtowc. On my Linux system, it is noted that their behaviour depends on LC_CTYPE, and this probably means that the native type of multibyte strings. From the documentation it can be inferred that their API supports also encodings where you can shift from one-byte to two-bytes and back.
How to handle char* strings. After all, this can be unsigned or not,
If you rely on signedness of char, you're doing it wrong. Signedness of chars only matters if you use char as a numeric type, and then you should always use either unsigned or signed chars; in fact, you should pretend that plain char is neither unsigned nor signed, and that an expression like a > 0 (if a is a char) has undefined semantics. But what would it be useful for, anyway?