C Code Acting Differently to C++ on Lookup - c++

I have the following code block (NOT written by me), which performs mapping and recodes ASCII characters to EBCDIC.
// Variables.
CodeHeader* tchpLoc = {};
...
memset(tchpLoc->m_ucpEBCDCMap, 0xff, 256);
for (i = 0; i < 256; i++) {
if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
ucTmp2 = i;
asc2ebn(&ucTmp1, &ucTmp2, 1);
tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
}
}
The CodeHeader definition is
typedef struct {
...
UCHAR* m_ucpASCIIMap;
UCHAR* m_ucpEBCDCMap;
} CodeHeader;
and the method that seems to be giving me problems is
void asc2ebn(char* szTo, char* szFrom, int nChrs)
{
while (nChrs--)
*szTo++ = ucpAtoe[(*szFrom++) & 0xff];
}
[Note, the unsigned char array ucpAtoe[256] is copied at the end of the question for reference].
Now, I have an old C application and my C++11 conversion running side-by-side, the two codes write a massive .bin file and there is a tiny discrepancy which I have traced to the above code. What is happening for both codes is that the block
...
if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
ucTmp2 = i;
asc2ebn(&ucTmp1, &ucTmp2, 1);
tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
}
gets entered into for i = 32 and the asc2ebn method returns ucTmp1 as 64 or '#' for both C and C++ variants great. The next entry is for i = 48, for this value the asc2ebn method returns ucTmp1 as 240 or 'ð' and the C++ code returns ucTmp1 as -16 or 'ð'. My question is why is this lookup/conversion producing different results for exactly the same input and look up array (copied below)?
In this case the old C code is taken as correct, so I want the C++ to produce the same result for this lookup/conversion. Thanks for your time.
static UCHAR ucpAtoe[256] = {
'\x00','\x01','\x02','\x03','\x37','\x2d','\x2e','\x2f',/*00-07*/
'\x16','\x05','\x25','\x0b','\x0c','\x0d','\x0e','\x0f',/*08-0f*/
'\x10','\x11','\x12','\xff','\x3c','\x3d','\x32','\xff',/*10-17*/
'\x18','\x19','\x3f','\x27','\x22','\x1d','\x35','\x1f',/*18-1f*/
'\x40','\x5a','\x7f','\x7b','\x5b','\x6c','\x50','\xca',/*20-27*/
'\x4d','\x5d','\x5c','\x4e','\x6b','\x60','\x4b','\x61',/*28-2f*/
'\xf0','\xf1','\xf2','\xf3','\xf4','\xf5','\xf6','\xf7',/*30-37*/
'\xf8','\xf9','\x7a','\x5e','\x4c','\x7e','\x6e','\x6f',/*38-3f*/
'\x7c','\xc1','\xc2','\xc3','\xc4','\xc5','\xc6','\xc7',/*40-47*/
'\xc8','\xc9','\xd1','\xd2','\xd3','\xd4','\xd5','\xd6',/*48-4f*/
'\xd7','\xd8','\xd9','\xe2','\xe3','\xe4','\xe5','\xe6',/*50-57*/
'\xe7','\xe8','\xe9','\xad','\xe0','\xbd','\xff','\x6d',/*58-5f*/
'\x79','\x81','\x82','\x83','\x84','\x85','\x86','\x87',/*60-67*/
'\x88','\x89','\x91','\x92','\x93','\x94','\x95','\x96',/*68-6f*/
'\x97','\x98','\x99','\xa2','\xa3','\xa4','\xa5','\xa6',/*70-77*/
'\xa7','\xa8','\xa9','\xc0','\x6a','\xd0','\xa1','\xff',/*78-7f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*80-87*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*88-8f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*90-97*/
'\xff','\xff','\xff','\x4a','\xff','\xff','\xff','\xff',/*98-9f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a0-a7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a8-af*/
'\xff','\xff','\xff','\x4f','\xff','\xff','\xff','\xff',/*b0-b7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*b8-bf*/
'\xff','\xff','\xff','\xff','\xff','\x8f','\xff','\xff',/*c0-c7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*c8-cf*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d0-d7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d8-df*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e0-e7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e8-ef*/
'\xff','\xff','\xff','\x8c','\xff','\xff','\xff','\xff',/*f0-f7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff' };

In both C and C++, the standard doesn't require char to be a signed or unsigned type. It's implementation defined, and apparently, your C compiler decided char to be unsigned char, while your C++ compiler decided it to be signed char.
For GCC, the flag to make char to be unsigned char is -funsigned-char. For MSVC, it's /J.

Related

how do I convert a single char to a uint in a well defined way and cross platform way?

Let's say I have a single char:
char myChar = 'A';
and I want to populate an uint:
uint8_t myUint8 = 0; // 0 is just a default;
is it well defined to do this:
myUint8 = static_cast<uint8_t>(myChar);
So in this case I would expect
myUint8 == 65
to evaluate to true. Is this well defined? Will this function cross platform? If not how can I make it well defined and portable?
Update: While I would like to handle the problem of determining the encoding scheme and if uint8_t is available on target eventually, they are beyond the scope of this question, hence I would like to refine my example as follows:
char myChar = 'A';
unsigned myUnsigned = 0; // 0 is just a default;
myUnsigned = static_cast<unsigned>(myChar);
char myRoundTripChar = static_cast<char>(myUnsigned);
printf("char is unchanged: %s", myRoundTripChar == 'A' ? "true" : "false" );
the desired output here is true. This alternate example removes all the 'magic numbers' and possibly defined types I think to show the problem another way.

How to resolve this C6385 code analysis warning: Reading invalid data

I am trying to address a code analysis warning that appears in the following method:
CStringArray* CCreateReportDlg::BuildCustomAssignArray(ROW_DATA_S &rsRowData)
{
INT_PTR iAssign, iNumAssigns, iUsedAssign;
CStringArray *pAryStrCustom = nullptr;
CUSTOM_ASSIGN_S *psAssign = nullptr;
if (rsRowData.uNumCustomToFill > 0)
{
pAryStrCustom = new CStringArray[rsRowData.uNumCustomToFill];
iNumAssigns = m_aryPtrAssign.GetSize();
for (iAssign = 0, iUsedAssign = 0; iAssign < iNumAssigns; iAssign++)
{
psAssign = (CUSTOM_ASSIGN_S*)m_aryPtrAssign.GetAt(iAssign);
if (psAssign != nullptr)
{
if (!psAssign->bExcluded)
{
pAryStrCustom[iUsedAssign].Copy(psAssign->aryStrBrothersAll);
iUsedAssign++;
}
}
}
}
return pAryStrCustom;
}
The offending line of code is:
pAryStrCustom[iUsedAssign].Copy(psAssign->aryStrBrothersAll);
I compile this code for both 32 bit and 64 bit. The warning being raised is:
Warning (C6385) Reading invalid data from pAryStrCustom: the readable size is (size_t)*40+8 bytes, but 80 bytes may be read.
I don't know if it is relevant, but the CUSTOM_ASSIGN_S structure is defined as:
typedef struct tagCustomAssignment
{
int iIndex;
CString strDescription;
CString strHeading;
BOOL bExcluded;
CStringArray aryStrBrothersAll;
CStringArray aryStrBrothersWT;
CStringArray aryStrBrothersSM;
BOOL bIncludeWT;
BOOL bIncludeTMS;
BOOL bFixed;
int iFixedType;
} CUSTOM_ASSIGN_S;
My code is functional (for years) but is there a coding improvement I can make to address this issue? I have read the linked article and it is not clear to me. I have also seen this question (Reading Invalid Data c6385) along similar lines. But in my code I can't see how that applies.
Warning... the readable size is (size_t)*40+8 bytes, but 80 bytes may be read.
The wording for this warning is not accurate, because size_t is not a number, it's a data type. (size_t)*40+8 doesn't make sense. It's probably meant to be:
Warning... the readable size is '40+8 bytes', but '80 bytes' may be read.
This warning can be roughly reproduced with the following example:
//don't run this code, it's just for viewing the warning
size_t my_size = 1;
char* buf = new char[my_size];
buf[1];
//warning C6385: Reading invalid data from 'buf':
//the readable size is 'my_size*1' bytes, but '2' bytes may be read
The warning is correct and obvious. buf[1] is out of bound. The compiler sees allocation size for buf is my_size*1, and index 1 is accessing byte '2'. I think in other place the compiler prints it incorrectly, but the actual warning is valid.
In any case, just make sure iUsedAssign is within range
if (!psAssign->bExcluded && iUsedAssign < rsRowData.uNumCustomToFill)
{
...
}

go equivalents of c types

What are the right equivalent of unsigned char or unsigned char* in go? Or am I even doing this right?
I have this C++ class:
class ArcfourPRNG
{
public:
ArcfourPRNG();
void SetKey(unsigned char *pucKeyData, int iKeyLen);
void Reset();
unsigned char Rand();
private:
bool m_bInit;
unsigned char m_aucState0[256];
unsigned char m_aucState[256];
unsigned char m_ucI;
unsigned char m_ucJ;
unsigned char* m_pucState1;
unsigned char* m_pucState2;
unsigned char m_ucTemp;
};
I am trying to rewrite it to go:
type ArcfourPRNG struct {
m_bInit bool
m_aucState0 [256]byte
m_aucState [256]byte
m_ucI, m_ucJ []byte
*m_pucState1 []byte
*m_pucState2 []byte
m_ucTemp []byte
}
func (arc4 *ArcfourPRNG) SetKey(pucKeyData []byte, iKeyLen int) {
func (arc4 *ArcfourPRNG) Reset() {
func (arc4 *ArcfourPRNG) Rand() uint {
Well, I just started with go a few hours ago. So this is still confusing me.
A function
for(i=0; i<256; i++)
{
m_pucState1 = m_aucState0 + i;
m_ucJ += *m_pucState1 + *(pucKeyData+m_ucI);
m_pucState2 = m_aucState0 + m_ucJ;
//Swaping
m_ucTemp = *m_pucState1;
*m_pucState1 = *m_pucState2;
*m_pucState2 = m_ucTemp;
m_ucI = (m_ucI + 1) % iKeyLen;
}
memcpy(m_aucState, m_aucState0, 256); // copy(aucState[:], aucState0) ?
Hopefully this can clear a few things up for you.
For storing raw sequences of bytes, use a slice []byte. If you know exactly how long the sequence will be, you can specify that, e.g. [256]byte but you cannot resize it later.
While Go has pointers, it does not have pointer arithmetic. So you will need to use integers to index into your slices of bytes.
For storing single bytes, byte is sufficient; you don't want a slice of bytes. Where there are pointers in the C++ code used to point to specific locations in the array, you'll simply have an integer index value that selects one element of a slice.
Go strings are not simply sequences of bytes, they are sequences of UTF-8 characters stored internally as runes, which may have different lengths. So don't try to use strings for this algorithm.
To reimplement the algorithm shown, you do not need either pointers or pointer arithmetic at all. Instead of keeping pointers into the byte arrays as you would in C++, you'll use int indexes into the slices.
This is kind of hard to follow since it's virtually all pointer arithmetic. I would want to have a description of the algorithm handy while converting this (and since this is probably a well-known algorithm, that should not be hard to find). I'm not going to do the entire conversion for you, but I'll demonstrate with hopefully a simpler example. This prints each character of a string on a separate line.
C++:
unsigned char *data = "Hello World";
unsigned char *ptr = 0;
for (int i = 0; i < std::strlen(data); i++) {
ptr = i + data;
std::cout << *ptr << std::endl;
}
Go:
data := []byte("Hello World")
for i := 0; i < len(data); i++ {
// The pointer is redundant already
fmt.Println(data[i:i+1])
}
So, learn about Go slices, and when you do reimplement this algorithm you will likely find the code to be somewhat simpler, or at least easier to understand, than its C++ counterpart.

Function isalnum(): unexpected results

For an assignment, I am using std::isalnum to determine if the input is a letter or a number. The point of the assignment is to create a "dictionary." It works well on small paragraphs, but does horrible on pages of text. Here is the code snippet I am using.
custom::String string;
std::cin >> string;
custom::String original = string;
size_t size = string.Size();
char j;
size_t i = 0;
size_t beg = 0;
while( i < size)
{
j = string[i];
if(!!std::isalnum(static_cast<unsignedchar>(j)))
{
--size;
}
if( std::isalnum( j ) )
{
string[i-beg] = tolower(j);
}
++i;
}//end while
string.SetSize(size - beg, '\0');
The code presented as I write this, does not make sense as a whole.
However, the calls to isalnum, as shown, would only work for plain ASCII, because
the C character classification functions require non-negative argument, or else EOF as argument, and
in order to work for international characters,
the encoding must be single-byte per character, and
setlocale should have been called prior to using the functions.
Regarding the first of these three points, you can wrap std::isalnum like this:
using Byte = unsigned char;
auto is_alphanumeric( char const ch )
-> bool
{ return !!std::isalnum( static_cast<Byte>( ch ) ); }
where the !! is just to silence a sillywarning from Visual C++ (warning about "performance", of all things).
Disclaimer: code untouched by compiler's hands.
Addendum: if you don't have a C++11 compiler, but only C++03,
typedef unsigned char Byte;
bool is_alphanumeric( char const ch )
{
return !!std::isalnum( static_cast<Byte>( ch ) );
}
As Bjarne remarked, C++11 feels like a whole new language! ;-)
I was able to create a solution to the problem. I noticed that isalnum did take care of some non alpha-numerics, but not all the time. Since the code above is part of a function, I called it multiple times with refined results given each time. I then came up with a do while loop that stores the string's size, calls the function, stores the new size, and compares them. If they are not the same it means that there is a chance that it needs to be called again. If they are the same, then the string has been fully cleaned. I am guessing that the reason isalnum was not working well was because I was reading in several chapters of a book into the string. Here is my code:
custom::string abc;
std::cin >> abc;
size_t first = 0;
size_t second = 0;
//clean the word
do{
first = abc.Size();
Cleanup(abc);
second = abc.Size();
}while(first != second);

How can I store hexadecimals inside an array? C++ MFC

I have to use an array of hexadecimals because I'm doing a program to communicate with a video server controller and he just understands messages in hexadecimal. I can connect the video controller with my server, but when I try to send messages using the send() function, passing an array of unsigned char that contains my information in hexadecimal, it doesn't work.
This is how I am using the array. I don't know if it is correct.
void sendMessage()
{
int retorno;
CString TextRetorno;
unsigned char HEX_bufferMessage[12]; // declaration
// store info
HEX_bufferMessage[0] = 0xF0;
HEX_bufferMessage[1] = 0x15;
HEX_bufferMessage[2] = 0x31;
HEX_bufferMessage[3] = 0x02;
HEX_bufferMessage[4] = 0x03;
HEX_bufferMessage[5] = 0x00;
HEX_bufferMessage[6] = 0x00;
HEX_bufferMessage[7] = 0xD1;
HEX_bufferMessage[8] = 0xD1;
HEX_bufferMessage[9] = 0x00;
HEX_bufferMessage[10] = 0x00;
HEX_bufferMessage[11] = 0xF7;
retorno = send(sckSloMo, (const char*) HEX_bufferMessage, sizeof(HEX_bufferMessage), 0);
TextRetorno.Format("%d", retorno);
AfxMessageBox(TextRetorno); // value = 12
if (retorno == SOCKET_ERROR)
{
AfxMessageBox("Error Send!! =[ ");
return;
}
return;
}
Pop quiz. What's the difference between:
int n = 0x0F;
and:
int n = 15;
If you said, "nothing," you're correct.
When assigning integral values, specifying 0x, 00 for octal, or nothing for decimal makes no difference in what is actually stored. This is a convenience for you, the programmer only. These are integral variables we're talking about -- they store numeric data only. They don't store or care about radix. In fact, you might be surprised to learn that when you assigned a numeric value to an integral variable, what is actually stored isn't decimal or hexadecimal or even octal -- it's binary.
Since you're storing these values as unsigned char, and char (unsigned or otherwise) is really just an integral type, then what you're doing is fine:
HEX_bufferMessage[0] = 0xF0;
HEX_bufferMessage[1] = 0x15;
HEX_bufferMessage[2] = 0x31;
but your question makes no sense:
Anyone knows if using an array of unsigned char is the right way to
store hexadecimals??