Extracting string data from text C++

Extracting string data from text C++ - c++

Im currently writing a c++ program that needs to extract string and numeric data from a text file. The format of the data is the following;
3225 C9+ ELECTR C8 C * 1.00E-6 -0.30 0.0
first entry is an integer, next 5 entries are strings and the last 3 are floats. No string is ever greater than 7 characters long.
I am reading the file line by line and then extracting the data using;
sscanf(ln.c_str(),"%d %s %s %s %s %s %e %e %e",
&rref[numre],&names[numre][0],&names[numre][1],&names[numre][2],&names[numre][3],
&names[numre][4],&nums[numre][0],&nums[numre][1],&nums[numre][2]);
this works fine untill I meet a line like;
3098 SIC2H3+ ELECTR SIC2H2 H * 1.50E-7 -0.50 0.0
where one of the entrys is the full 7 characters long. In this case I get;
names[3097][0] = "SIC2H3+ELECTR"
and,
names[3097][1] = "ELECTR"
Anybody got any ideas...they will be much appreciated!!

The most likely problem is in the declaration of names: if you declared it as holding seven characters or less, and forgot to allocate space for terminating zero, you'd get the results that you are describing.
char names[MAX][4][7]
will have enough space for strings of length 6 or less; for strings of length 7, you need
char names[MAX][4][8]

Related

How to change int between -120 and 120 to char (or other simple way to save as one byte)

I am reading a sensor and obtaining values for change in temperature every 10 minutes. At the moment I am saving the change in temperature as an integer.
The range in temperature change should be between -120 and 120, I want to save the temperature change to EEPROM, but I only have 512 bytes spare and as an integer the values take up 2 bytes. Therefore I thought I could assign the value to the corresponding char value and save the char to EEPROM (since this will only take one byte) e.g. (e.g. '4' 's' '$' etc.), however I can't see the easy way to do this.
I am using the arduino IDE which is C++ I believe and asking here because it's really a software question
I thought I should be able to use something like
int tempAsInt = -50;
char tempAsChar;
tempAsChar = char(tempAsInt);
or
int tempAsInt = -50;
signed char tempAsChar;
tempAsChar = tempAsInt;
but the first one printed the same characters (upsidedown question mark or null value) for varied tempAsInt values
and the second one just printed out the same value as the integer, i.e. if the change was -50, it printed -50, so I'm not sure if it is really a char, though perhaps I'm just printing it wrong.
My printing code is
mySerial.print("\tTempAsInt: ");
mySerial.print(tempAsInt);
mySerial.print("\tTempDiffAsInt: ");
mySerial.print(tempDiffAsInt);
mySerial.print("\tTempDiffAsChar: ");
mySerial.print(tempDiffAsChar);

In C and C++, there are several ways to cast an object to a different type. Refer http://www.cplusplus.com/doc/tutorial/typecasting/ for a summary.
Without the complete code, it is tough to say for sure. However, your problem seems to be caused by using cout to check the values of the variables like:
cout<<tempAsInt<<endl<<tempAsChar;
cout interprets the tempAsChar variable as a character type and prints the value as per the encoding.
Since -50 is outside the printable range of the ASCII code (refer http://www.asciitable.com/) on your system, you will see some value which is not really a representation of tempAsChar, but a filler such as a question mark for an unprintable.
You can confirm the above behavior by setting tempAsChar to a value of , say, 50 to see the character '2'.
To verify that tempAsChar indeed has the correct value, use printf instead:
printf("tempAsChar (int)= %d and tempAsChar (char)= %c",tempAsChar,tempAsChar);
You should see the output:
tempAsChar (int)= 50 and tempAsChar (char)= 2

How to convert an array of ASCII codes to int C++

First of all, i would like to read from plain text, i read hundreds of webpages about it and i just can't make it. I want to read every byte of the file and every two byte is a number what i want to store.
I want to read: 10 20.
I get: ASCII code of 1, ASCII code of 0, ASCII code of space etc. etc.
I tried several things, like stream.get, or stream.read, tried to convert with atoi but then i can't concatenate the two digits, i tried sprintf but all of them failed.

Array of ASCII codes:
char ASCII[] = "10 20";
Convert to integer variables:
std::istringstream iss(ASCII);
int x,y;
iss >> x >> y;
Done.
Here's the working sample: http://ideone.com/y8ZRGs

If you want to do this with your own code, there are only two things you need to be able to do.
First, you need to convert from the ASCII code of a digit to the number it represents. This is as simple as subtracting '0'.
Second, you need to convert from the numerical value of each digit of a two digit number to the number that represents. This is simple -- if T is the tens place and U is the units, it's 10T + U.
So, for example:
int twoDigitNumber (char tens, char units)
{
return 10 * (tens - '0') + (units - '0');
}

File Size to store an integer

I want to write an integer (for ex - 222222) to a text file in a way that the size of the file is reduced. If I write the integer in the form of a string, it takes 6 Bytes because of the six characters present. If I store the integer in the form of an integer, it again takes 6 Bytes. Why isn't the file size equal to 4 Bytes since an int takes 4 Bytes?
#include <iostream>
#include<stdlib.h>
#include<stdio.h>
using namespace std;
int main()
{
//char* x = "222222.2222";
//double x = 222222.2222;
int x = 222222;
FILE *fp = fopen("now.txt","w");
fprintf(fp,"%d",x);
return 0;
}

Here is the definition of fprintf:
writes the C string pointed by format to the stream.
So whatever you pass to the function, they are treated as a string, that's the output file all has 222222 stored in it.
If you want to store a integer rather than a string in the file, you could use: fwrite.
int x = 222222;
FILE *fp = fopen("now.txt","w");
fwrite(&x, sizeof(int), 1, fp);
Then the file stores: 0E 64 03 00 if you change you editor to hex mode. It's 4 bytes.

There is a simple reason behind this.
Whenever we write to file it's stored in characters. So when you write integer 222222 into a file it's written character by character not as an integer.

when you write integer as integer, that file turns in to a binary file.
When you write and read binary files, it's required to take care of the paddings , byte order etc.
The other way around is plain text and you read it as strings and with the help of libraries we convert it to integers.

Compress 21 Alphanumeric Characters in to 16 Bytes

I'm trying to take 21 bytes of data which uniquely identifies a trade and store it in a 16 byte char array. I'm having trouble coming up with the right algorithm for this.
The trade ID which I'm trying to compress consists of 2 fields:
18 alphanumeric characters
consisting of the ASCII characters
0x20 to 0x7E, Inclusive. (32-126)
A 3-character numeric string "000" to "999"
So a C++ class that would encompass this data looks like this:
class ID
{
public:
char trade_num_[18];
char broker_[3];
};
This data needs to be stored in a 16-char data structure, which looks like this:
class Compressed
{
public:
char sku_[16];
};
I tried to take advantage of the fact that since the characters in trade_num_ are only 0-127 there was 1 unused bit in each character. Similarly, 999 in binary is 1111100111, which is only 10 bits -- 6 bits short of a 2-byte word. But when I work out how much I can squeeze this down, the smallest I can make it is 17 bytes; one byte too big.
Any ideas?
By the way, trade_num_ is a misnomer. It can contain letters and other characters. That's what the spec says.
EDIT: Sorry for the confusion. The trade_num_ field is indeed 18 bytes and not 16. After I posted this thread my internet connection died and I could not get back to this thread until just now.
EDIT2: I think it is safe to make an assumption about the dataset. For the trade_num_ field, we can assume that the non-printable ASCII characters 0-31 will not be present. Nor will ASCII codes 127 or 126 (~). All the others might be present, including upper and lower case letters, numbers and punctuations. This leaves a total of 94 characters in the set that trade_num_ will be comprised of, ASCII codes 32 through 125, inclusive.

If you have 18 characters in the range 0 - 127 and a number in the range 0 - 999 and compact this as much as possible then it will require 17 bytes.
>>> math.log(128**18 * 1000, 256)
16.995723035582763
You may be able to take advantage of the fact that some characters are most likely not used. In particular it is unlikely that there are any characters below value 32, and 127 is also probably not used. If you can find one more unused character so you can first convert the characters into base 94 and then pack them into the bytes as closely as possible.
>>> math.log(94**18 * 1000, 256)
15.993547951857446
This just fits into 16 bytes!
Example code
Here is some example code written in Python (but written in a very imperative style so that it can easily be understood by non-Python programmers). I'm assuming that there are no tildes (~) in the input. If there are you should substitute them with another character before encoding the string.
def encodeChar(c):
return ord(c) - 32
def encode(s, n):
t = 0
for c in s:
t = t * 94 + encodeChar(c)
t = t * 1000 + n
r = []
for i in range(16):
r.append(int(t % 256))
t /= 256
return r
print encode(' ', 0) # smallest possible value
print encode('abcdefghijklmnopqr', 123)
print encode('}}}}}}}}}}}}}}}}}}', 999) # largest possible value
Output:
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[ 59, 118, 192, 166, 108, 50, 131, 135, 174, 93, 87, 215, 177, 56, 170, 172]
[255, 255, 159, 243, 182, 100, 36, 102, 214, 109, 171, 77, 211, 183, 0, 247]
This algorithm uses Python's ability to handle very large numbers. To convert this code to C++ you could use a big integer library.
You will of course need an equivalent decoding function, the principle is the same - the operations are performed in reverse order.

That makes (18*7+10)=136 bits, or 17 bytes. You wrote trade_num is alphanumeric? If that means the usual [a-zA-Z0-9_] set of characters, then you'd have only 6 bits per character, needing (18*6+10)=118 bit = 15 bytes for the whole thing.
Assuming 8 bit = 1 byte
Or, coming from another direction: You have 128 bits for storage, you need ~10 bits for the number part, so there are 118 left for the trade_num. 18 characters means 118/18=6.555 bits per characters, this means you can have only the space to encode 26.555 = 94 different characters **unless there is a hidden structure in trade_num that we could exploit to save more bits.

This is something that should work, assuming you need only characters from allowedchars, and there is at most 94 characters there. This is python, but it is written trying not to use fancy shortcuts--so that you'll be able to translate it to your destination language easier. It assumes however that the number variable may contain integers up to 2**128--in C++ you should use some kind of big number class.
allowedchars=' !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}'
alphabase = len(allowedchars)
def compress(code):
alphanumeric = code[0:18]
number = int(code[18:21])
for character in alphanumeric:
# find returns index of character on the allowedchars list
number = alphabase*number + allowedchars.find(character)
compressed = ''
for i in xrange(16):
compressed += chr(number % 256)
number = number/256
return compressed
def decompress(compressed):
number = 0
for byte in reversed(compressed):
number = 256*number + ord(byte)
alphanumeric = ''
for i in xrange(18):
alphanumeric = allowedchars[number % alphabase] + alphanumeric
number = number/alphabase
# make a string padded with zeros
number = '%03d' % number
return alphanumeric + number

You can do this in ~~15bytes (14 bytes and 6 bits).
For each character from trace_num_ you can save 1 bit if you want save ascii in 7 bits.
Then you have 2 bytes free and 2
bits, you must have 5.
Let get number information, each char can be one from ten values (0 to 9).
Then you must have 4 bits to save this character, to save number you must have 1 byte and 4 bits, then you save half of this.
Now you have 3 bytes free and 6 bits,
you must have 5.
If you want to use only qwertyuioplkjhgfdsazxcvbnmQWERTYUIOPLKJHGFDSAZXCVBNM1234567890[]
You can save each char in 6 bits. Then you have next 2 bytes and 2 bits.
Now you have 6 bytes left, and your string can save in 15 bytes +
nulltermination = 16bytes.
And if you save your number in integer on 10 bytes. You can fit this into 14 bytes and 6 bits.

There are 95 characters between the space (0x20) and tilde (0x7e). (The 94 in other answers suffer from off-by-1 error).
Hence the number of distinct IDs is 9518×1000 = 3.97×1038.
But that compressed structure can only hold (28)16 = 3.40×1038 distinct values.
Therefore it is impossible to represent all IDs by that structure, unless:
There is 1 unused character in ≥15 digits of trade_num_, or
There are ≥14 unused characters in 1 digit of trade_num_, or
There are only ≤856 brokers, or
You're using is a PDP-10 which has a 9-bit char.

Key questions are:
There appears to be some contradiction in your post whether the trade number is 16 or 18 characters. You need to clear that up. You say the total is 21 consisting of 16+3. :-(
You say the trade num characters are in the range 0x00-0x7f. Can they really be any character in that range, including tab, new line, control-C, etc? Or are they limited to printable characters, or maybe even to alphanumerics?
Does the output 16 bytes have to be printable characters, or is it basically a binary number?
EDIT, after updates to original post:
In that case, if the output can be any character in the character set, it's possible. If it can only be printable characters, it's not.
Demonstration of the mathematical possibility is straightforward enough. There are 94 possible values for each of 18 characters, and 10 possible values for each of 3. Total number of possible combinations = 94 ^ 18 * 10 ^ 3 ~= 3.28E35. This requires 128 bits. 2 ^127 ~= 1.70e38, which is too small, while 2^128 ~= 3.40e38, which is big enough. 128 bits is 16 bytes, so it will just barely fit if we can use every possible bit combination.
Given the tight fit, I think the most practical way to generate the value is to think of it as a double-long number, and then run the input through an algorithm to generate a unique integer for every possible input.
Conceptually, then, let's imagine we had a "huge integer" data type that is 16 bytes long. The algorithm would be something like this:
huge out;
for (int p=0;p<18;++p)
{
out=out*94+tradenum[p]-32;
}
for (int p=0;p<3;++p)
{
out=out*10+broker[p]-'0';
}
// Convert output to char[16]
unsigned char[16] out16;
for (int p=15;p>=0;--p)
{
out16[p]=huge&0xff;
huge=huge>>8;
}
return out16;
Of course we don't have a "huge" data type in C. Are you using pure C or C++? Isn't there some kind of big number class in C++? Sorry, I haven't done C++ in a while. If not, we could easily create a little library to implement a huge.

If it can only contain letters, then you have less than 64 possibilities per character (26 upper case, 26 lower case, leaving you 12 for space, terminator, underscore, etc). With 6 bits per character, you should get there - in 15 characters. Assuming you don't support special characters.

Use the first 10 bits for the 3-character numeric string (encode the bits like they represent a number and then pad with zeros as appropriate when decoding).
Okay, this leaves you with 118 bits and 16 alphanumeric characters to store.
0x00 to 0x7F (if you mean inclusive) comprises 128 possible characters to represent. That means that each character can be identified by a combination of 7 bits. Come up with an index mapping each number those 7 bits can represent to the actual character. To represent 16 of your "alphanumeric" characters in this way, you need a total of 112 bits.
We now have 122 bits (or 15.25 bytes) representing our data. Add an easter egg to fill in the remaining unused bits and you have your 16 character array.

sprintf_s problem

I have a funny problem using this function.
I use it as follow:
int nSeq = 1;
char cBuf[8];
int j = sprintf_s(cBuf, sizeof(cBuf), "%08d", nSeq);
And every time I get an exception. The exception is buffer to small.
When I changed the second field in the function to sizeof(cBuf) + 1.
Why do I need to add one if I only want to copy 8 bytes and I have an array that contains 8 bytes?

Your buffer contains 8 places. Your string contains 8 characters and a null character to close it.

Your string will require terminating '\0' and 8 bytes of data(00000001) due to %08d.
So you have to size as 9.

All sprintf functions add a null to terminate a string. So in effect your string is 9 characters long. 8 bytes of text and the ending zero

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extracting string data from text C++ - c++

Related

How to change int between -120 and 120 to char (or other simple way to save as one byte)

How to convert an array of ASCII codes to int C++

File Size to store an integer

Compress 21 Alphanumeric Characters in to 16 Bytes

sprintf_s problem

Categories

Resources