searching an unsigned char array for characters - c++

I have a binary data file that I am trying to read. The values in the file are 8-bit unsigned integers, with "record" delimiters that are ASCII text ($MSG, $GRP, for example). I read the data as one big chunk, as follows:
unsigned char *inBuff = (unsigned char*)malloc(file_size*sizeof(unsigned char));
result = fread(inBuff, sizeof(unsigned char), file_size, pFile);
I need to search this array to find records that start with $GRP (so I can then read the data that follows), can someone suggest a good way to do this? I have tried several things, and none of them have worked. For example, my most recent attempt was:
std::stringstream str1;
str1 << inBuff;
std::string strTxt = str1.str();
However, when I check the length on this, it is only 5. I looked at the file in Notepad, and noticed that the sixth character is a NULL. So it seems like it is cutting off there because of the NULL. Any ideas?

Assuming the fread does not return a -1, the value in it will tell you how many bytes are available to search.
It is unreasonable to expect to be able to do a string search on binary data, as there my be NUL characters in the binary data which will cause the length function to terminate early.
One possibly way is to to search for the data is to use memcmp on the buffer, with your search key, and length of the search key.

(As per my comment)
C str functions assume zero-terminated strings. Any C string function will stop at the very first binary 0. Use memchr to locate the $ and then use strncmp or memcmp. In particular, do not assume the byte immediately after the 4-byte identifier is a binary 0.
In code (C, not tested):
/* recordId should point to a simple string such as "$GRP" */
unsigned char *find_record (unsigned char *data, size_t max_length, char *recordId)
{
unsigned char *ptr;
size_t remaining_length;
ptr = startOfData;
if (strlen(recordId) > max_length)
return NULL;
remaining_length = max_length;
do
{
/* fast scan for the first character only */
ptr = memchr (ptr, recordId[0], remaining_length);
if (!ptr)
return NULL;
/* first character matches, test entire string */
if (!memcmp (ptr, recordId, strlen(recordId))
return ptr;
/* no match; test onwards from the next possible position */
ptr++;
/* take care not to overrun end of data */
/* It's tempting to test
remaining_length = ptr - startOfData;
but there is a chance this will end up negative, and
size_t does not like to be negative.
*/
if (ptr >= startOfData+max_length)
break;
remaining_length = ptr-startOfData;
} while (1);
return NULL;
}

Related

String is not null terminated error

I'm having a string is not null terminated error, though I'm not entirely sure why. The usage of std::string in the second part of the code is one of my attempt to fix this problem, although it still doesn't work.
My initial codes was just using the buffer and copy everything into client_id[]. The error than occurred. If the error is correct, that means I've got either client_ id OR theBuffer does not have a null terminator. I'm pretty sure client_id is fine, since I can see it in debug mode. Strange thing is buffer also has a null terminator. No idea what is wrong.
char * next_token1 = NULL;
char * theWholeMessage = &(inStream[3]);
theTarget = strtok_s(theWholeMessage, " ",&next_token1);
sendTalkPackets(next_token1, sizeof(next_token1) + 1, id_clientUse, (unsigned int)std::stoi(theTarget));
Inside sendTalkPackets is. I'm getting a string is not null terminated at the last line.
void ServerGame::sendTalkPackets(char * buffer, unsigned int buffersize, unsigned int theSender, unsigned int theReceiver)
{
std::string theMessage(buffer);
theMessage += "0";
const unsigned int packet_size = sizeof(Packet);
char packet_data[packet_size];
Packet packet;
packet.packet_type = TALK;
char client_id[MAX_MESSAGE_SIZE];
char theBuffer[MAX_MESSAGE_SIZE];
strcpy_s(theBuffer, theMessage.c_str());
//Quick hot fix for error "string not null terminated"
const char * test = theMessage.c_str();
sprintf_s(client_id, "User %s whispered: ", Usernames.find(theSender)->second.c_str());
printf("This is it %s ", buffer);
strcat_s(client_id, buffersize , theBuffer);
Methinks that problem lies in this line:
sendTalkPackets(next_token1, sizeof(next_token1) + 1, id_clientUse, (unsigned int)std::stoi(theTarget));
sizeof(next_token1)+1 will always gives 5 (on 32 bit platform) because it return size of pointer not size of char array.
One thing which could be causing this (or other problems): As
buffersize, you pass sizeof(next_token1) + 1. next_token1 is
a pointer, which will have a constant size of (typically) 4 or 8. You
almost certainly want strlen(next_token1) + 1. (Or maybe without the
+ 1; conventions for passing sizes like this generally only include
the '\0' if it is an output buffer. There are a couple of other
places where you're using sizeof, which may have similar problems.
But it would probably be better to redo the whole logic to use
std::string everywhere, rather than all of these C routines. No
worries about buffer sizes and '\0' terminators. (For protocol
buffers, I've also found std::vector<char> or std::vector<unsigned char>
quite useful. This was before the memory in std::string was
guaranteed to be contiguous, but even today, it seems to correspond more
closely to the abstraction I'm dealing with.)
You can't just do
std::string theMessage(buffer);
theMessage += "0";
This fails on two fronts:
The std::string constructor doesn't know where buffer ends, if buffer is not 0-terminated. So theMessage will potentially be garbage and include random stuff until some zero byte was found in the memory beyond the buffer.
Appending string "0" to theMessage doesn't help. What you want is to put a zero byte somewhere, not value 0x30 (which is the ascii code for displaying a zero).
The right way to approach this, is to poke a literal zero byte buffersize slots beyond the start of the buffer. You can't do that in buffer itself, because buffer may not be large enough to accomodate that extra zero byte. A possibility is:
char *newbuffer = malloc(buffersize + 1);
strncpy(newbuffer, buffer, buffersize);
newbuffer[buffersize] = 0; // literal zero value
Or you can construct a std::string, whichever you prefer.

How to read in only a particular number of characters

I have a small query regarding reading a set of characters from a structure. For example: A particular variable contains a value "3242C976*32" (char - type). How can I get only the first 8 bits of this variable. Kindly help.
Thanks.
Edit:
I'm trying to read in a signal:
For Ex: $ASWEER,2,X:3242C976*32
into this structure:
struct pg
{
char command[7]; // saves as $ASWEER,2,X:3242C976*32
char comma1[1]; // saves as ,2,X:3242C976*32
char groupID[1]; // saves as 2,X:3242C976*32
char comma2[1]; // etc
char handle[2]; // this is the problem, need it to save specifically each part, buts its not
char canID[8];
char checksum[3];
}m_pg;
...
When memcopying buffer into a structure, it works but because there is no carriage returns it saves the rest of the signal in each char variable. So, there is always garbage at the end.
you could..
convert your hex value in canID to float(depending on how you want to display it), e.g.
float value1 = HexToFloat(m_pg.canID); // find a conversion script for HexToFloat
CString val;
val.Format("0.3f",value1);
the garbage values aren't actually being stored in the structure, it only displays it as so, as there is no carriage return, so format the message however you want to and display it using the CString val;
If "3242C976*3F" is a c-string or std::string, you can just do:
char* str = "3242C976*3F";
char first_byte = str[0];
Or with an arbitrary memory block you can do:
SomeStruct memoryBlock;
char firstByte;
memcpy(&firstByte, &memoryBlock, 1);
Both copy the first 8bits or 1 byte from the string or arbitrary memory block just as well.
After the edit (original answer below)
Just copy by parts. In C, something like this should work (could also work in C++ but may not be idiomatic)
strncpy(m_pg.command, value, 7); // m.pg_command[7] = 0; // oops
strncpy(m_pg.comma, value+7, 1); // m.pg_comma[1] = 0; // oops
strncpy(m_pg.groupID, value+8, 1); // m.pg_groupID[1] = 0; // oops
strncpy(m_pg.comma2, value+9, 1); // m.pg_comma2[1] = 0; // oops
// etc
Also, you don't have space for the string terminator in the members of the structure (therefore the oopses above). They are NOT strings. Do not printf them!
Don't read more than 8 characters. In C, something like
char value[9]; /* 8 characters and a 0 terminator */
int ch;
scanf("%8s", value);
/* optionally ignore further input */
while (((ch = getchar()) != '\n') && (ch != EOF)) /* void */;
/* input terminated with ch (either '\n' or EOF) */
I believe the above code also "works" in C++, but it may not be idiomatic in that language
If you have a char pointer, you can just set str[8] = '\0'; Be careful though, because if the buffer is less than 8 (EDIT: 9) bytes, this could cause problems.
(I'm just assuming that the name of the variable that already is holding the string is called str. Substitute the name of your variable.)
It looks to me like you want to split at the comma, and save up to there. This can be done with strtok(), to split the string into tokens based on the comma, or strchr() to find the comma, and strcpy() to copy the string up to the comma.

Using istringstream to process a memory block of variable length

I'm trying to use istringstream to recreate an encoded wstring from some memory. The memory is laid out as follows:
1 byte to indicate the start of the wstring encoding. Arbitrarily this is '!'.
n bytes to store the character length of the string in text format, e.g. 0x31, 0x32, 0x33 would be "123", i.e. a 123-character string
1 byte separator (the space character)
n bytes which are the wchars which make up the string, where wchar_t's are 2-bytes each.
For example, the byte sequence:
21 36 20 66 00 6f 00 6f 00
is "!6 f.o.o." (using dots to represent char 0)
All I've got is a char* pointer (let's call it pData) to the start of the memory block with this encoded data in it. What's the 'best' way to consume the data to reconstruct the wstring ("foo"), and also move the pointer to the next byte past the end of the encoded data?
I was toying with using an istringstream to allow me to consume the prefix byte, the length of the string, and the separator. After that I can calculate how many bytes to read and use the stream's read() function to insert into a suitably-resized wstring. The problem is, how do I get this memory into the istringstream in the first place? I could try constructing a string first and then pass that into the istringstream, e.g.
std::string s((const char*)pData);
but that doesn't work because the string is truncated at the first null byte. Or, I could use the string's other constructor to explicitly state how many bytes to use:
std::string s((const char*)pData, len);
which works, but only if I know what len is beforehand. That's tricky given that the data is variable length.
This seems like a really solvable problem. Does my rookie status with strings and streams mean I'm overlooking an easy solution? Or am I barking up the wrong tree with the whole string approach?
Try setting your stringstream's rdbuf:
char* buffer = something;
std::stringbuf *pbuf;
std::stringstream ss;
std::pbuf=ss.rdbuf();
std::pbuf->sputn(buffer, bufferlength);
// use your ss
Edit: I see that this solution will have a similar problem to your string(char*, len) situation. Can you tell us more about your buffer object? If you don't know the length, and it isn't null terminated, it's going to be very hard to deal with.
Is it possible to modify how you encode the length, and make that a fixed size?
unsigned long size = 6; // known string length
char* buffer = new char[1 + sizeof(unsigned long) + 1 + size];
buffer[0] = '!';
memcpy(buffer+1, &size, sizeof(unsigned long));
buffer should hold the start indicator (1 byte), the actual size (size of unsigned long), the delimiter (1 byte) and the text itself (size).
This way, you could get the size "pretty" easy, then set the pointer to point beyond the overhead, and then use the len variable in the string constructor.
unsigned long len;
memcpy(&len, pData+1, sizeof(unsigned long)); // +1 to avoid the start indicator
// len now contains 6
char* actualData = pData + 1 + sizeof(unsigned long) + 1;
std::string s(actualData, len);
It's low level and error prone :) (for instance if you read anything that isn't encoded the way that you expect it to be, the len can get pretty big), but you avoid dynamically reading the length of the string.
It seems like something on this order should work:
std::wstring make_string(char const *input) {
if (*input != '!')
return "";
char length = *++input;
return std::wstring(++input, length);
}
The difficult part is dealing with the variable length of the size. Without something to specify the length it's hard to guess when to stop treating the data as specifying the length of the string.
As for moving the pointer, if you're going to do it inside a function, you'll need to pass a reference to the pointer, but otherwise it's a simple matter of adding the size you found to the pointer you received.
It's tempting to (ab)use the (deprecated but nevertheless standard) std::istrstream here:
// Maximum size to read is
// 1 for the exclamation mark
// Digits for the character count (digits10() + 1)
// 1 for the space
const std::streamsize max_size = 3 + std::numeric_limits<std::size_t>::digits10;
std::istrstream s(buf, max_size);
if (std::istream::traits_type::to_char_type(s.get()) != '!'){
throw "missing exclamation";
}
std::size_t size;
s >> size;
if (std::istream::traits_type::to_char_type(s.get()) != ' '){
throw "missing space";
}
std::wstring(reinterpret_cast<wchar_t*>(s.rdbuf()->str()), size/sizeof(wchar_t));

Very strange char array behaviour

.
unsigned int fname_length = 0;
//fname length equals 30
file.read((char*)&fname_length,sizeof(unsigned int));
//fname contains random data as you would expect
char *fname = new char[fname_length];
//fname contains all the data 30 bytes long as you would expect, plus 18 bytes of random data on the end (intellisense display)
file.read((char*)fname,fname_length);
//m_material_file (std:string) contains all 48 characters
m_material_file = fname;
// count = 48
int count = m_material_file.length();
now when trying this way, intellisense still shows the 18 bytes of data after setting the char array to all ' ' and I get exactly the same results. even without the file read
char name[30];
for(int i = 0; i < 30; ++i)
{
name[i] = ' ';
}
file.read((char*)fname,30);
m_material_file = name;
int count = m_material_file.length();
any idea whats going wrong here, its probably something completely obvious but im stumped!
thanks
Sounds like the string in the file isn't null-terminated, and intellisense is assuming that it is. Or perhaps when you wrote the length of the string (30) into the file, you didn't include the null character in that count. Try adding:
fname[fname_length] = '\0';
after the file.read(). Oh yeah, you'll need to allocate an extra character too:
char * fname = new char[fname_length + 1];
I guess that intellisense is trying to interpret char* as C string and is looking for a '\0' byte.
fname is a char* so both the debugger display and m_material_file = fname will be expecting it to be terminated with a '\0'. You're never explicitly doing that, but it just happens that whatever data follows that memory buffer has a zero byte at some point, so instead of crashing (which is a likely scenario at some point), you get a string that's longer than you expect.
Use
m_material_file.assign(fname, fname + fname_length);
which removes the need for the zero terminator. Also, prefer std::vector to raw arrays.
std::string::operator=(char const*) is expecting a sequence of bytes terminated by a '\0'. You can solve this with any of the following:
extend fname by a character and add the '\0' explicitly as others have suggested or
use m_material_file.assign(&fname[0], &fname[fname_length]); instead or
use repeated calls to file.get(ch) and m_material_file.push_back(ch)
Personally, I would use the last option since it eliminates the explicitly allocated buffer altogether. One fewer explicit new is one fewer chance of leaking memory. The following snippet should do the job:
std::string read_name(std::istream& is) {
unsigned int name_length;
std::string file_name;
if (is.read((char*)&name_length, sizeof(name_length))) {
for (unsigned int i=0; i<name_length; ++i) {
char ch;
if (is.get(ch)) {
file_name.push_back(ch);
} else {
break;
}
}
}
return file_name;
}
Note:
You probably don't want to use sizeof(unsigned int) to determine how many bytes to write to a binary file. The number of bytes read/written is dependent on the compiler and platform. If you have a maximum length, then use it to determine the specific byte size to write out. If the length is guaranteed to fewer than 255 bytes, then only write a single byte for the length. Then your code will not depend on the byte size of intrinsic types.

Howto read chunk of memory as char in c++

Hello I have a chunk of memory (allocated with malloc()) that contains bits (bit literal), I'd like to read it as an array of char, or, better, I'd like to printout the ASCII value of 8 consecutively bits of the memory.
I have allocated he memory as char *, but I've not been able to take characters out in a better way than evaluating each bit, adding the value to a char and shifting left the value of the char, in a loop, but I was looking for a faster solution.
Thank you
What I've wrote for now is this:
for allocation:
char * bits = (char*) malloc(1);
for writing to mem:
ifstream cleartext;
cleartext.open(sometext);
while(cleartext.good())
{
c = cleartext.get();
for(int j = 0; j < 8; j++)
{ //set(index) and reset(index) set or reset the bit at bits[i]
(c & 0x80) ? (set(index)):(reset(index));//(*ptr++ = '1'):(*ptr++='0');
c = c << 1;
}..
}..
and until now I've not been able to get character back, I only get the bits printed out using:
printf("%s\n" bits);
An example of what I'm trying to do is:
input.txt contains the string "AAAB"
My program would have to write "AAAB" as "01000001010000010100000101000010" to memory
(it's the ASCII values in bit of AAAB that are 65656566 in bits)
Then I would like that it have a function to rewrite the content of the memory to a file.
So if memory contains again "01000001010000010100000101000010" it would write to the output file "AAAB".
int numBytes = 512;
char *pChar = (char *)malloc(numBytes);
for( int i = 0; i < numBytes; i++ ){
pChar[i] = '8';
}
Since this is C++, you can also use "new":
int numBytes = 512;
char *pChar = new char[numBytes];
for( int i = 0; i < numBytes; i++ ){
pChar[i] = '8';
}
If you want to visit every bit in the memory chunk, it looks like you need std::bitset.
char* pChunk = malloc( n );
// read in pChunk data
// iterate over all the bits.
for( int i = 0; i != n; ++i ){
std::bitset<8>& bits = *reinterpret_cast< std::bitset<8>* >( pByte );
for( int iBit = 0; iBit != 8; ++iBit ) {
std::cout << bits[i];
}
}
I'd like to printout the ASCII value of 8 consecutively bits of the memory.
The possible value for any bit is either 0 or 1. You probably want at least a byte.
char * bits = (char*) malloc(1);
Allocates 1 byte on the heap. A much more efficient and hassle-free thing would have been to create an object on the stack i.e.:
char bits; // a single character, has CHAR_BIT bits
ifstream cleartext;
cleartext.open(sometext);
The above doesn't write anything to mem. It tries to open a file in input mode.
It has ascii characters and common eof or \n, or things like this, the input would only be a textfile, so I think it should only contain ASCII characters, correct me if I'm wrong.
If your file only has ASCII data you don't have to worry. All you need to do is read in the file contents and write it out. The compiler manages how the data will be stored (i.e. which encoding to use for your characters and how to represent them in binary, the endianness of the system etc). The easiest way to read/write files will be:
// include these on as-needed basis
#include <algorithm>
#include <iostream>
#include <iterator>
#include <fstream>
using namespace std;
// ...
/* read from standard input and write to standard output */
copy((istream_iterator<char>(cin)), (istream_iterator<char>()),
(ostream_iterator<char>(cout)));
/*-------------------------------------------------------------*/
/* read from standard input and write to text file */
copy(istream_iterator<char>(cin), istream_iterator<char>(),
ostream_iterator<char>(ofstream("output.txt"), "\n") );
/*-------------------------------------------------------------*/
/* read from text file and write to text file */
copy(istream_iterator<char>(ifstream("input.txt")), istream_iterator<char>(),
ostream_iterator<char>(ofstream("output.txt"), "\n") );
/*-------------------------------------------------------------*/
The last remaining question is: Do you want to do something with the binary representation? If not, forget about it. Else, update your question one more time.
E.g: Processing the character array to encrypt it using a block cipher
/* a hash calculator */
struct hash_sha1 {
unsigned char operator()(unsigned char x) {
// process
return rc;
}
};
/* store house of characters, could've been a vector as well */
basic_string<unsigned char> line;
/* read from text file and write to a string of unsigned chars */
copy(istream_iterator<unsigned char>(ifstream("input.txt")),
istream_iterator<char>(),
back_inserter(line) );
/* Calculate a SHA-1 hash of the input */
basic_string<unsigned char> hashmsg;
transform(line.begin(), line.end(), back_inserter(hashmsg), hash_sha1());
Something like this?
char *buffer = (char*)malloc(42);
// ... put something into the buffer ...
printf("%c\n", buffer[0]);
But, since you're using C++, I wonder why you bother with malloc and such...
char* ptr = pAddressOfMemoryToRead;
while(ptr < pAddressOfMemoryToRead + blockLength)
{
char tmp = *ptr;
// temp now has the char from this spot in memory
ptr++;
}
Is this what you are trying to achieve:
char* p = (char*)malloc(10 * sizeof(char));
char* p1 = p;
memcpy(p,"abcdefghij", 10);
for(int i = 0; i < 10; ++i)
{
char c = *p1;
cout<<c<<" ";
++p1;
}
cout<<"\n";
free(p);
Can you please explain in more detail, perhaps including code? What you're saying makes no sense unless I'm completely misreading your question. Are you doing something like this?
char * chunk = (char *)malloc(256);
If so, you can access any character's worth of data by treating chunk as an array: chunk[5] gives you the 5th element, etc. Of course, these will be characters, which may be what you want, but I can't quite tell from your question... for instance, if chunk[5] is 65, when you print it like cout << chunk[5];, you'll get a letter 'A'.
However, you may be asking how to print out the actual number 65, in which case you want to do cout << int(chunk[5]);. Casting to int will make it print as an integer value instead of as a character. If you clarify your question, either I or someone else can help you further.
Are you asking how to copy the memory bytes of an arbitrary struct into a char* array? If so this should do the trick
SomeType t = GetSomeType();
char* ptr = malloc(sizeof(SomeType));
if ( !ptr ) {
// Handle no memory. Probably should just crash
}
memcpy(ptr,&t,sizeof(SomeType));
I'm not sure I entirely grok what you're trying to do, but a couple of suggestions:
1) use std::vector instead of malloc/free and new/delete. It's safer and doesn't have much overhead.
2) when processing, try doing chunks rather than bytes. Even though streams are buffered, it's usually more efficient grabbing a chunk at a time.
3) there's a lot of different ways to output bits, but again you don't want a stream output for each character. You might want to try something like the following:
void outputbits(char *dest, char source)
{
dest[8] = 0;
for(int i=0; i<8; ++i)
dest[i] = source & (1<<(7-i)) ? '1':'0';
}
Pass it a char[9] output buffer and a char input, and you get a printable bitstring back. Decent compilers produce OK output code for this... how much speed do you need?