Break an input string into a list using Arduino - list

I am working on a very basic REPL for Arduino. To get parameters, I need to split a String into parts, separating using spaces. I do not know how I would store the result. For example, pinmode 1 input would result in a list: "pinmode", 1, "input". The 1 would have to be an int. I have looked at other Stack Overflow answers, but they require a char input.

Don't use String. That's the reason all the other answers use char, commonly called C strings (lower case "s"). String uses "dynamic memory" (i.e., the heap), which is bad on this small microcontroller. It also add 1.6k to your program size.
The simplest thing to do is save each received character into a char array, until you get the newline character, '\n'. Be sure to add the NUL character at the end, and be sure your array is sized appropriately.
Then process the array using the C string library: strcmp, strtoul, strtok, isdigit, etc. Learning about these routines will really pay off, as it keeps your program small and fast. C strings are easy to print out, as well.
Again, stay away from String. It is tempting to beginners, because it is easy to understand. However, it has many subtle, complicated and unpredictable ways to make your embedded program fail. This is not a PC with lots of RAM and a swap file on a hard drive.

Related

What is the advantage of using gets(a) instead of cin.getline(a,20)?

We will have to define an array for storing the string either way.
char[10];
And so suppose I want to store smcck in this array. What is the advantage of using gets(a)? My teacher said that the extra space in the array is wasted when we use cin.getline(a, 20), but that applies for gets(a) too right?
Also just an extra question, what exactly is stored in the empty "boxes"of an array?
gets() is a C function,it does not do bounds checking and is considered dangerous, it has been kept all this years for compatibility and nothing else.
You can check the following link to clear your doubt :
http://www.gidnetwork.com/b-56.html
Don't mix C features with C++, though all the feature of C works in C++ but it is not recommended . If you are working on C++ then you should probably avoid using gets(). use getline() instead.
Well, I don't think gets(a) is bettet because it does not check for the size of the string. If you try to read a long string using it, it may cause an buffer overflow. That means it will use all the 10 spaces you allocated for it and then it will try to use space allocated for another variables or another programs (what is going to make you publication crash).
The cin.getline() receives an int as a parameter with tells it to not read more than the expected number of characters. If you allocate a vector with only 10 positions and read 20 characters it will cause the same problem I told you about gets().
About the strings representation in memory, if you put "smcck" on an array
char v[10];
The word will take the first 5 positions (0 to 4), the position 5 will be taken by a null character (represented by '\0') that will mark the end of the string. Usually, what comes next in the array does not matter and are kept the way it were in the past.the null terminated character is used to mark where the string ends, so you can work it safely.

What are the problems of a zero-terminated string that length-prefixed strings overcome?

What are the problems of a zero-terminated string that length-prefixed strings overcome?
I was reading the book Write Great Code vol. 1 and I had that question in mind.
One problem is that with zero-terminated strings you have to keep finding the end of the string repeatedly. The classic example where this is inefficient is concatenating into a buffer:
char buf[1024] = "first";
strcat(buf, "second");
strcat(buf, "third");
strcat(buf, "fourth");
On every call to strcat the program has to start from the beginning of the string and find the terminator to know where to start appending. This means the function spends more and more time finding the place to append as the string grows longer.
With a length-prefixed string the equivalent of the strcat function would know where the end is immediately, and would just update the length after appending to it.
There are pros and cons to each way of representing strings and whether they cause problems for you depend on what you are doing with strings, and which operations need to be efficient. The problem described above can be overcome by manually keeping track of the end of the string as it grows, so by changing the code you can avoid the performance cost.
One problem is that you can not store null characters (value zero) in a zero terminated string. This makes it impossible to store some character encodings as well as encrypted data.
Length-prefixed strings do not suffer that limitation.
First a clarification: C++ strings (i.e. std::string) aren't weren't required to end with zero until C++11. They always provided access to a zero-terminated C string though.
C-style strings end with a 0 character for historical reasons.
The problems you're referring to are mainly bound to security issues: zero ended strings need to have a zero terminator. If they lack it (for whatever reason), the string's length becomes unreliable and they can lead to buffer overrun problems (which a malicious attacker can exploit by writing arbitrary data in places where it shouldn't be.. DEP helps in mitigating these issues but it's off-topic here).
It is best summarized in The Most Expensive One-byte Mistake by Poul-Henning Kamp.
Performance Costs: It is cheaper to manipulate memory in chunks, which cannot be done if you're always having to look for the NULL character. In other words if you know before hand you have a 129 character string, it would likely be more efficient to manipulate it in sections of 64, 64, and 1 bytes, instead of character by character.
Security: Marco A. already hit this pretty hard. Over and under-running string buffers is still a major route for attacks by hackers.
Compiler Development Costs: Big costs are associated with optimizing compilers for null terminating strings that would have been easier with the address and length format.
Hardware Development Costs: Hardware development costs are also large for string specific instructions associated with null terminating strings.
A few more bonus features that can be implemented with length-prefixed strings:
It's possible to have multiple styles of length prefix, identifiable through one or more bits of the first byte identified by the string pointer/reference. In exchange for a little extra time determining string length, one could e.g. use a single-byte prefix for short strings and longer prefixes for longer strings. If one uses a lot of 1-3 byte strings that could save more than 50% on overall memory consumption for such strings compared with using a fixed four-byte prefix; such a format could also accommodate strings whose length exceeded the range of 32-bit integers.
One may store variable-length strings within bounds-checked buffers at a cost of only one or two bits in the length prefix. The number N combined with the other bits would indicate one of three things:
An N-byte string
(Optional) An N-byte buffer holding a zero-length string
An N-byte buffer which, if its last byte B is less than 248, holds a string of length N-B-1; if the 248 or more, the preceding B-247 bytes would store the difference between the buffer size and the string length. Note that if the length of the string is precisely N-1, the string will be followed by a NUL byte, and if it's less than that the byte following the string will be unused and could be set to NUL.
Using such an approach, one would need to initialize strong buffers before use (to indicate their length), but would then no longer need to pass the length of a string buffer to a routine that was going to store data there.
One may use certain prefix values to indicate various special things. For example, one may have a prefix that indicates that it is not followed by a string, but rather by a string-data pointer and two integers giving buffer size and current length. If methods that operate on strings call a method to get the data pointer, buffer size, and length, one may pass such a method a reference to a portion of a string cheaply provided that the string itself will outlive the method call.
One may extend the above feature with a bit to indicate that the string data is in a region that was generated by malloc and may be resized if needed; additionally, one could safely have methods that sometimes return a dynamically-generated string allocated on the heap, and sometimes return an immutable static string, and have the recipient perform a "free this string if it isn't static".
I don't know if any prefixed-string implementations implement all those bonus features, but they can all be accommodated for very little cost in storage space, relatively little cost in code, and less cost in time than would be required to use NUL-terminated strings whose length was neither known nor short.
What are the problems of a zero-terminated string that length-prefixed strings overcome?
None whatsoever.
It's just eye candy.
Length-prefixed strings have, as part of their structure, information on how long the string is. If you want to do the same with zero-terminated strings you can use a helper variable;
lpstring = "foobar"; // saves '6' somewhere "inside" lpstring
ztstring = "foobar";
ztlength = 6; // saves '6' in a helper variable
Lots of C library functions work with zero-terminated strings and cannot use anything past the '\0' byte. That's an issue with the functions themselves, not the string structure. If you need functions which deal with zero-terminated strings with embedded zeroes, write your own.

C string one character shorter than defined length?

Very new to c++ and I have the following code:
char input[3];
cout << "Enter input: ";
cin.getline(input,sizeof(input));
cout << input;
And entering something like abc will only output ab, cutting it one character short. The defined length is 3 characters so why is it only capturing 2 characters?
Remember that c-strings are null terminated. To store 3 characters you need to allocate space for 4 because of the null terminator.
Also as the #MikeSeymour mentioned in the comments in c++ its best to avoid the issue completely and use std::string.
You can thank your favorite deity that this fail-safe is in, most functions aren't that kind.
In C, strings are null-terminated, which means they take an extra character than the actual data to mark where the string actually ends.
Since you're using C++ anyway, you should avoid bare-bones char arrays. Some reasons:
buffer overflows. You managed to hit this issue on your first try, take a hint!
Unicode awareness. We're living in 2015. Still using 256 characters is unacceptable by any standard.
memory safety. It's way harder to leak a proper string than a plain old array. strings have strong copy semantics that cover pretty much anything you can think of.
ease of use. You have the entire STL algorithm list at your disposal! Use it rather than rolling your own.

String-Conversion: MBCS <-> UNICODE with multiple \0 within

I am trying to convert a std::string Buffer - containing data from a bitmap file - to std::wstring.
I am using MultiByteToWideChar, but that does not work, because the function stops after it encounters the first '\0'-character. Seems like it interprets it as the end of the string.
When i dont pass -1 as the length-parameter, but the real length of the data in the std::string-Buffer, it messes the Unicode-String up with characters that definetly not appeared at that position in the original string...
Do I have to write my own conversion function?
Or maybe shall i keep the data as a casual char-array, because the special-symbols will be converted incorrectly?
With regards
There are many, many things that will fail with this approach. Among other things, extra bytes may be added to your data without your realizing it.
It's odd that your only option takes a std::wstring(). If this is a home-grown library, you should take the trouble to write a new function. If it's not, make sure there's nothing more suitable before writing your own.

Including huge string in our c++ programs?

I am trying to include huge string in my c++ programs, Its size is 20598617 characters , I am using #define to achieve it. I have a header file which contains this statement
#define "<huge string containing 20598617 characterd>"
When I try to compile the program I get error as fatal error C1060: compiler is out of heap space
I tried following command line options with no success
/Zm200
/Zm1000
/Zm2000
How can I make successful compilation of this program?
Platform: Windows 7
You can't, not reliably. Even if it will compile, it's liable to break the runtime library, or the OS assumptions, and so forth.
If you tell us why you're trying to do it, we can offer lots of alternatives. Deciding how to handle arbitrarily large data is a major part of programming.
Edited to add:
Rather than guess, I looked into MSDN:
Prior to adjacent strings being
concatenated, a string cannot be
longer than 16380 single-byte
characters.
A Unicode string of about one half
this length would also generate this
error.
The page concludes:
You may want to store exceptionally
large string literals (32K or more) in
a custom resource or an external file.
What do other compilers say?
Further edited to add:
I created a file like this:
char s[] = {'x','x','x','x'};
I kept doubling the occurrences of 'x', testing each one as an #include file.
An 8388608 byte string succeeded; 16777216 bytes failed, with the "out of heap space" error.
I suspect you are running into a design limit on the size of a character string.
Most people really think that a million characters is long enough :-}
To avoid such design limits, I'd try not to put the whole thing into a single literal string. On the suspicion that #define macro bodies likewise have similar limits, I't try not to put the entire thing in a single #define, either.
Most C compilers will accept pretty big lists of individual characters as initializers. If you write
char c[]={ c1, c2, ... c20598617 };
with the c_i being your individual characters, you may succeed. I've seen GCC2 applications where there were 2 million elements like this (apparantly they were loading some type of ROM image). You might even be able to group the c_i into blocks of K characters for K=100, 1000, 10000 as suits your tastes, and that might actually help the compiler.
You might also consider running your string through a compression algorithm,
putting the compressed result into your C++ file by any of the above methods,
and decompressing after the program was loaded.
I suspect you can get a decompression algorithm into a few thousand bytes.
Store the string to a file and just open and read it...
Its much cleaner/organized that way [i'm assuming that right now you have a file named blargh.h which contains that one #Define...]
Um, store the string in a separate resource of some sort and load it in? Seriously, in embedded land, you would have this as a separate resource and not hold it in RAM. On windows, I believe you can use .dlls or other external resources to handle this for you. Compilers aren't designed to hold this size of resources for you and they will fail.
Increase the compiler heap space.
If your string comes from a large text or binary file, you may have luck with either the xxd -i command (to get everything in an array, per Ira Baxter's answer) or a variant of the bin2obj command (to get everything into a .o file you can link into the program).
Note that the string may not be null terminated in this case.
See answers to the earlier question, "How can I get the contents of a file at build time into my C++ string?"
(Also, as an aside: note the existence of the .xbm format.)
This is a very old question, but since there's no definitive answer yet: C++11's raw string literals seem to do the job.
This compiles nicely on GCC 4.8:
#include <string>
std::string data = R"(
... <1.4 MB of base85-encoded string> ...
)";
As said in other posts in this thread, this is definitely not the preferred way of handling large amounts of data.