Parsing a string into multiple strings - c++

I have to split a string into several ones. In fact what i need is to parse some input from a file that is in the following format (i9, i9, i2) for example.
i9 means a decimal number as:
(5blankspaces)4567
So i need to retrieve that numbers properly. The width is always fixed so every number must obey that.
A correct instance of that format would be
(5spaces)4567(6spaces)453(1space)2
or
(5spaces)4567(6spaces)45322 (where 22 is the argument for i2 in this case)
The white spaces before the numbers are giving me headache, so i thought i could split every argument into a character array and then convert it to integer since the %d specifier ignores all blank space and i dont know how to use the width as well as ignoring spaces.. (if this can be done, i mean, parsing all to integer please say so!!)
If not.. well, i would need help to parse every string into substring, so far i've done this but it's not working:
while (fgets(buffer, 600, f)) {
sscanf(buffer, "%9[^\n]s %9[^\n]s %2[^\n]s", arg1, arg2, arg3);
....
}
Please, any help would be greatly appreciated!!!

This answer is C. That is why I used the variable name new.
Use strncpy() and terminate the result properly.
char part1[10], part2[10], part3[3];
const char *new = " 4567 45322\n"; /* the line read */
strncpy(part1, new, 9); part1[9] = 0;
strncpy(part2, new+9, 9); part2[9] = 0;
strncpy(part3, new+18, 2); part3[2] = 0;
I suggest you do not try to write multi-language source files.

In C++, use substr(), along with the usual string to integer conversions:
#include <string>
std::string s = " 1234 78922";
std::string s1 = s.substr(0, 9);
std::string s2 = s.substr(9, 9);
std::string s3 = s.substr(18); // or substr(18, 2)
int n1 = std::stoi(s1), n2 = std::stoi(s2), n3 = std::stoi(s3);
Apply the usual length checks where appropriate to validate that the input is indeed in the correct format.

Related

How to break up a CString that does not have any delimiters within it?

I have a CString that I want to break up into small strings. It is a string consisting of a constant 2 byte header and 2 byte footer, but the rest of the string has no discernible pattern. I need to break them up based on sizes: so the first two bytes become the header, then I need to extract the next 2, then 3 and so on. (these numbers are in no pattern either)
Example:
CString test = "1010eefabbccde1f1f"
I need
CString header = "1010";
CString test1 = "eefa";
CString test2 = "bbccde";
CString footer = "1f1f";
I read about sscanf being used for this purpose, but I have only managed to use it split strings into int.
Example:
CString test = '101022223333331010';
int a,b,c,d;
sscanf(test,"%02d%02d%03d%02d",&a,&b,&c,&d);
This works for strings containing only numbers. But when I do the same for strings by changing %d to %s, exceptions get raised.
Is there a better way to do this?
My understanding is that given an input string test and vector<size_t> sizes of sizes, you wish to break the string apart into those sizes, and then you wish to take those parts, treat them as hex numbers, and return them in vector<int> result.
I'm going to presume that you have already tested test to ensure the correct number of characters exist. And I'm going to assume that the sizes include the header and footer sizes.
After running something like this:
const auto test = "1010eefabbccde1f1f"s;
const vector<size_t> sizes { 4U, 4U, 6U, 4U };
const auto result = accumulate(cbegin(sizes), cend(sizes), vector<int>(), [&, pos = 0U](auto& a, const auto& b) mutable {
a.push_back(stoi(test.substr(pos, b), nullptr, 16));
pos += b;
return a;
});
result will contain:
4112
61178
12307678
7967
Live Example

Trouble Understanding sprintf with 'char * str + int'

I was looking at a project and came across the following code and am unable to figure out what the sprintf is doing in this context and was hoping someone might be able to help me figure it out.
char storage[64];
int loc = 0;
int size = 35;
sprintf(storage+(loc),"A"); //Don't know what this does
loc+=1;
sprintf(storage+(loc),"%i", size); //Don't know what this does
loc+=4;
sprintf(storage+(loc), "%i", start); //Don't know what this does
start += size;
loc += 3;
The code later does the following in another part
string value;
int actVal;
int index = 0;
for(int j = index+1; j < index+4; j++)
{
value += storage[j];
}
istringstream iss;
iss.str(value);
iss >> actVal; //Don't understand how this now contains size
The examples I have seen online regarding sprintf never covered that the above code was possible, but the program executes fine. I just can't figure out how the "+loc" affects storage in this instance and how the values would be saved/stored. Any help would be appreciated.
Ugly code! Regardless, for the first part, storage+(loc) == &storage[loc]. You end up with a string "A35\0<unknown_value>1234\0", assuming start = 1234, or in long form:
sprintf(&storage[0],"A");
sprintf(&storage[1],"%i", size);
sprintf(&storage[5], "%i", start);
For the second part, assuming we have the "A35\0<unknown_value>1234\0" above, we get:
value += '3';
value += '5';
value += '\0';
value += '<unknown_value>'; // This might technically be undefined behaviour
So now value = "35". [1]
iss.str(value);
iss >> actVal;
This turns the string into an input stream and reads out the first string representing an integer, "35", and converts it into an integer, giving us basically actVal = atoi(value.c_str());.
Finally, according to this page, yes, reading an uninitialised ("indeterminate value" is the official term) array element is undefined behaviour thus should be avoided.
[1] Note that in a usual implementation, there is a theoretical 10/256 chance that the <unknown_value> could contain an ASCII digit, so value could end up being between 350 and 359, which is obviously not a good outcome and is why one shouldn't ignore undefined behaviour.
The function sprintf() works just like printf(), except the result is not printed in stdout, rather it is store in a string variable. I suggest you read the sprintf() man page carefully:
https://linux.die.net/man/3/sprintf
Even if you are not on a Linux, that function is pretty much similar across different platforms, be it Windows, Mac or other animals. That said, this piece of code you have presented seems to be unnecessarily complicated.
The first part could be written as:
sprintf(storage,"A %i %i", size, start);
For a similar-but-not-equal result, but then again, it all depends on what exactly the original programmer intended this storage area to hold. As Ken pointed out, there are some undefined bytes and behaviors coming from this code as-is.
From the standard:
int sprintf ( char * str, const char * format, ... );
Write formatted data to string
Composes a string with the same text that would be printed if format was used on printf, but instead of being printed, the content is stored as a C string in the buffer pointed by str.
sprintf(storage+(loc),"A");
writes "A" into a buffer called storage. The storage+(loc) is pointer arithmetic. You're specifying which index of the char array you're writing into. So, storage = "A".
sprintf(storage+(loc),"%i", size);
Here you're writing size into storage[1]. Now storage = "A35\0", loc = 1, and so on.
Your final value of storage = "A35\0<garbage><value of start>\0"
actVal: Don't understand how this now contains size
The for loop goes through storage[1] through storage[5], and builds up value using the contents of storage. value contains the string "35\0<garbage>", and iss.str(value) strips it down to "35\0".
iss >> actVal
If you have come across std::cin, it's the same concept. The first string containing an integer value is written into actVal.

C++ Convert char array to int representation

What is the best way to convert a char array (containing bytes from a file) into an decimal representation so that it can be converted back later?
E.g "test" -> 18951210 -> "test".
EDITED
It can't be done without a bignum class, since there's more letter combinations possible than integer combinations in an unsigned long long. (unsigned long long will hold about 7-8 characters)
If you have some sort of bignum class:
biguint string_to_biguint(const std::string& s) {
biguint result(0);
for(int i=0; i<s.length(); ++i) {
result *= UCHAR_MAX;
result += (unsigned char)s[i];
}
return result;
}
std::string biguint_to_string(const biguint u) {
std::string result;
do {
result.append(u % UCHAR_MAX)
u /= UCHAR_MAX;
} while (u>0);
return result;
}
Note: the string to uint conversion will lose leading NULLs, and the uint to string conversion will lose trailing NULLs.
I'm not sure what exactly you mean, but characters are stored in memory as their "representation", so you don't need to convert anything. If you still want to, you have to be more specific.
EDIT: You can
Try to read byte by byte shifting the result 8 bits left and oring it
with the next byte.
Try to use mpz_inp_raw
You can use a tree similar to Huffman compression algorithm, and then represent the path in the tree as numbers.
You'll have to keep the dictionary somewhere, but you can just create a constant dictionary that covers the whole ASCII table, since the compression is not the goal here.
There is no conversion needed. You can just use pointers.
Example:
char array[4 * NUMBER];
int *pointer;
Keep in mind that the "length" of pointer is NUMBER.
As mentioned, character strings are already ranges of bytes (and hence easily rendered as decimal numbers) to start with. Number your bytes from 000 to 255 and string them together and you've got a decimal number, for whatever that is worth. It would help if you explained exactly why you would want to be using decimal numbers, specifically, as hex would be easier.
If you care about compression of the underlying arrays forming these numbers for Unicode Strings, you might be interested in:
http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode
If you want some benefits of compression but still want fast random-access reads and writes within a "packed" number, you might find my "NSTATE" library to be interesting:
http://hostilefork.com/nstate/
For instance, if you just wanted a representation that only acommodated 26 english letters...you could store "test" in:
NstateArray<26> myString (4);
You could read and write the letters without going through a compression or decompression process, in a smaller range of numbers than a conventional string. Works with any radix.
Assuming you want to store the integers(I'm reading as ascii codes) in a string. This will add the leading zeros you will need to get it back into original string. character is a byte with a max value of 255 so it will need three digits in numeric form. It can be done without STL fairly easily too. But why not use tools you have?
#include <iostream>
#include <sstream>
using namespace std;
char array[] = "test";
int main()
{
stringstream out;
string s=array;
out.fill('0');
out.width(3);
for (int i = 0; i < s.size(); ++i)
{
out << (int)s[i];
}
cout << s << " -> " << out.str();
return 0;
}
output:
test -> 116101115116
Added:
change line to
out << (int)s[i] << ",";
output
test -> 116,101,115,116,

How to read in only a particular number of characters

I have a small query regarding reading a set of characters from a structure. For example: A particular variable contains a value "3242C976*32" (char - type). How can I get only the first 8 bits of this variable. Kindly help.
Thanks.
Edit:
I'm trying to read in a signal:
For Ex: $ASWEER,2,X:3242C976*32
into this structure:
struct pg
{
char command[7]; // saves as $ASWEER,2,X:3242C976*32
char comma1[1]; // saves as ,2,X:3242C976*32
char groupID[1]; // saves as 2,X:3242C976*32
char comma2[1]; // etc
char handle[2]; // this is the problem, need it to save specifically each part, buts its not
char canID[8];
char checksum[3];
}m_pg;
...
When memcopying buffer into a structure, it works but because there is no carriage returns it saves the rest of the signal in each char variable. So, there is always garbage at the end.
you could..
convert your hex value in canID to float(depending on how you want to display it), e.g.
float value1 = HexToFloat(m_pg.canID); // find a conversion script for HexToFloat
CString val;
val.Format("0.3f",value1);
the garbage values aren't actually being stored in the structure, it only displays it as so, as there is no carriage return, so format the message however you want to and display it using the CString val;
If "3242C976*3F" is a c-string or std::string, you can just do:
char* str = "3242C976*3F";
char first_byte = str[0];
Or with an arbitrary memory block you can do:
SomeStruct memoryBlock;
char firstByte;
memcpy(&firstByte, &memoryBlock, 1);
Both copy the first 8bits or 1 byte from the string or arbitrary memory block just as well.
After the edit (original answer below)
Just copy by parts. In C, something like this should work (could also work in C++ but may not be idiomatic)
strncpy(m_pg.command, value, 7); // m.pg_command[7] = 0; // oops
strncpy(m_pg.comma, value+7, 1); // m.pg_comma[1] = 0; // oops
strncpy(m_pg.groupID, value+8, 1); // m.pg_groupID[1] = 0; // oops
strncpy(m_pg.comma2, value+9, 1); // m.pg_comma2[1] = 0; // oops
// etc
Also, you don't have space for the string terminator in the members of the structure (therefore the oopses above). They are NOT strings. Do not printf them!
Don't read more than 8 characters. In C, something like
char value[9]; /* 8 characters and a 0 terminator */
int ch;
scanf("%8s", value);
/* optionally ignore further input */
while (((ch = getchar()) != '\n') && (ch != EOF)) /* void */;
/* input terminated with ch (either '\n' or EOF) */
I believe the above code also "works" in C++, but it may not be idiomatic in that language
If you have a char pointer, you can just set str[8] = '\0'; Be careful though, because if the buffer is less than 8 (EDIT: 9) bytes, this could cause problems.
(I'm just assuming that the name of the variable that already is holding the string is called str. Substitute the name of your variable.)
It looks to me like you want to split at the comma, and save up to there. This can be done with strtok(), to split the string into tokens based on the comma, or strchr() to find the comma, and strcpy() to copy the string up to the comma.

How to capture length of sscanf'd string?

I'm parsing a string that follows a predictable pattern:
1 character
an integer (one or more digits)
1 colon
a string, whose length came from #2
For example:
s5:stuff
I can see easily how to parse this with PCRE or the like, but I'd rather stick to plain string ops for the sake of speed.
I know I'll need to do it in 2 steps because I can't allocate the destination string until I know its length. My problem is gracefully getting the offset for the start of said string. Some code:
unsigned start = 0;
char type = serialized[start++]; // get the type tag
int len = 0;
char* dest = NULL;
char format[20];
//...
switch (type) {
//...
case 's':
// Figure out the length of the target string...
sscanf(serialized + start, "%d", &len);
// <code type='graceful'>
// increment start by the STRING LENGTH of whatever %d was
// </code>
// Don't forget to skip over the colon...
++start;
// Build a format string which accounts for length...
sprintf(format, "%%%ds", len);
// Finally, grab the target string...
sscanf(serialized + start, format, string);
break;
//...
}
That code is roughly taken from what I have (which isn't complete because of the issue at hand) but it should get the point across. Maybe I'm taking the wrong approach entirely. What's the most graceful way to do this? The solution can either C or C++ (and I'd actually like to see the competing methods if there are enough responses).
You can use the %n conversion specifier, which doesn't consume any input - instead, it expects an int * parameter, and writes the number of characters consumed from the input into it:
int consumed;
sscanf(serialized + start, "%d%n", &len, &consumed);
start += consumed;
(But don't forget to check that sscanf() returned > 0!)
Use the %n format specifier to write the number of characters read so far to an integer argument.
Here's a C++ solution, it could be better, and is hard-coded specifically to deal with your example input, but shouldn't require much modification to get working.
std::stringstream ss;
char type;
unsigned length;
char dummy;
std::string value;
ss << "s5:Helloxxxxxxxxxxx";
ss >> type;
ss >> length;
ss >> dummy;
ss.width(length);
ss >> value;
std::cout << value << std::endl;
Disclaimer:
I'm a noob at C++.
You can probably just use atoi which will ignore the colon.
e.g. len = atoi(serialized + start);
The only thing with atoi is that if it returns zero it could mean either the conversion failed, or that the length was truly zero. So it's not always the most appropriate function.
if you replace you colon with a space scanf will stop on it and you can get the size malloc the size then run another scanf to get the rest of the string`
int main (int argc, const char * argv[]) {
char foo[20];
char *test;
scanf("%s",foo); //"hello world"
printf("foo = %s\n", foo);//prints hello
//get size
test = malloc(sizeof(char)* 10);//replace 10 with your string size
scanf("%s", test);
printf("test = %s\n", test);//prints world
return 0;
}
`
Seems like the format is overspecified... (using a variable length field to specify the length of a variable length field).
If you're using GCC, I'd suggest
if (sscanf(serialized,"%c%d:%as",&type,&len,&dest)<3) return -1;
/* use type, dest; ignore len */
free(dest);
return 0;