const static char assignment with PROGMEM causes error in avr-g++ 5.4.0 - avr-gcc

I have a piece of code that was shipped as part of the XLR8 development platform that formerly used a bundled version (4.8.1) of the avr-gcc/g++ compiler. I tried to use the latest version of avr-g++ included with by my linux distribution (Ubuntu 22.04) which is 5.4.0
When running that compiler, I am getting the following error that seems to make sense to me. Here is the error and the chunk of related code below. In the bundled version of avr-g++ that was provided with the XLR8 board, this was not an error. I'm not sure why because it appears that the code below is attempting to place 16 bit words into an array of chars.
A couple questions,
Can anyone explain the reason this worked with previous avr-gcc releases and was not considered an error?
Because of the use of sizeof in the snippet below to control the for loop terminal count, I think the 16 bit size was supposed to be the data type per element of the array. Is that accurate?
If the size of the element was 16 bits, then is the correct fix simply to make that array of type unsigned int rather than char?
/home/rich/.arduino15/packages/alorium/hardware/avr/2.3.0/libraries/XLR8Info/src/XLR8Info.cpp:157:12: error: narrowing conversion of ‘51343u’ from ‘unsigned int’ to ‘char’ inside { } [-Wnarrowing]
0x38BF};
bool XLR8Info::hasICSPVccGndSwap(void) {
// List of chip IDs from boards that have Vcc and Gnd swapped on the ICSP header
// Chip ID of affected parts are 0x????6E00. Store the ???? part
const static char cidTable[] PROGMEM =
{0xC88F, 0x08B7, 0xA877, 0xF437,
0x94BF, 0x88D8, 0xB437, 0x94D7, 0x38BF, 0x145F, 0x288F, 0x28CF,
0x543F, 0x0837, 0xA8B7, 0x748F, 0x8477, 0xACAF, 0x14A4, 0x0C50,
0x084F, 0x0810, 0x0CC0, 0x540F, 0x1897, 0x48BF, 0x285F, 0x8C77,
0xE877, 0xE49F, 0x2837, 0xA82F, 0x043F, 0x88BF, 0xF48F, 0x88F7,
0x1410, 0xCC8F, 0xA84F, 0xB808, 0x8437, 0xF4C0, 0xD48F, 0x5478,
0x080F, 0x54D7, 0x1490, 0x88AF, 0x2877, 0xA8CF, 0xB83F, 0x1860,
0x38BF};
uint32_t chipId = getChipId();
for (int i=0;i< sizeof(cidTable)/sizeof(cidTable[0]);i++) {
uint32_t cidtoTest = (cidTable[i] << 16) + 0x6E00;
if (chipId == cidtoTest) {return true;}
}
return false;
}

As you already pointed out, the array type char definitely looks wrong. My guess is, that this is bug that may have never surfaced in the field. hasICSPVccGndSwap should always return false, so maybe they never used a chip type that had its pins swapped and got away with it.
Can anyone explain the reason this worked with previous avr-gcc releases and was not considered an error?
Yes, the error/warning behavior was changed with version 5.
As of G++ 5, the behavior is the following: When a later standard is in effect, e.g. when using -std=c++11, narrowing conversions are diagnosed by default, as required by the standard. A narrowing conversion from a constant produces an error, and a narrowing conversion from a non-constant produces a warning, but -Wno-narrowing suppresses the diagnostic.
I would've expected v4.8.1 to throw a warning at least, but maybe that has been ignored.
Because of the use of sizeof in the snippet below to control the for loop terminal count, I think the 16 bit size was supposed to be the data type per element of the array. Is that accurate?
Yes, this further supports that the array type should've been uint16 in the first place.
If the size of the element was 16 bits, then is the correct fix simply to make that array of type int rather than char?
Yes.

Several bugs here. I am not familiar with that software, but there are at least the following, obvious bugs:
The element type of cidTable should be a 16-bit, integral type like uint16_t. This follows from the code and also from the comments.
You cannot read from PROGMEM like that. The code will read from RAM, where it uses a flash address to access RAM. Currently, there is only one way to read from flash in avr-g++, which is inline assembly. To make life easier, you can use macros from avr/pgmspace.h like pgm_read_word.
cidTable[i] << 16 is Undefined Behaviour because a 16-bit type is shifted left by 16 bits. The type is an 8-bit type, then it is promoted to int which has 16 bits only. Same problem if the element type is already 16 bits wide.
Taking it all together, in order to make sense in avr-g++, the code would be something like:
#include <avr/pgmspace.h>
bool XLR8Info::hasICSPVccGndSwap()
{
// List of chip IDs from boards that have Vcc and Gnd swapped on
// the ICSP header. Chip ID of affected parts are 0x????6E00.
// Store the ???? part.
static const uint16_t cidTable[] PROGMEM =
{
0xC88F, 0x08B7, 0xA877, 0xF437, ...
};
uint32_t chipId = getChipId();
for (size_t i = 0; i < sizeof(cidTable) / sizeof (*cidTable); ++i)
{
uint16_t cid = pgm_read_word (&cidTable[i]);
uint32_t cidtoTest = ((uint32_t) cid << 16) + 0x6E00;
if (chipId == cidtoTest)
return true;
}
return false;
}

Related

Why g++ giver: "error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]"

I'm doing an UDP connection, between a microcontroller and a computer.
The framework I'm using is c++ based and has a function to send an UDP packet with the following prototype:
bool UdpConnection::send(const char *data, int length)
int length is the number of bytes that the pointer contains.
But I'm doing some input reading using a function that returns a uint16_t type.
I cannot change anything directly in those two functions.
Then I did the following:
UdpConnection udp;
uint16_t dummy = 256;
udp.send(reinterpret_cast<char*>(dummy),2);
But I'm a curious guy so I tried the following:
UdpConnection udp;
uint16_t dummy = 256;
udp.send((char*)dummy,2);
When I compile this last code I get:
error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
In my analysis both snippets do the same thing, why I get an error in the last one but not in the first?
EDIT:
The first snippet of code compiles but gives segmentation fault when the code runs. So neither code works.
EDIT 2:
An efficient and tested solution for the problem, but not an answer to the original question, is this:
union Shifter {
uint16_t b16;
char b8[2];
} static shifter;
shifter.b16 = 256;
udp.send(shifter.b8,2);
This solution is widely used, but it's not portable as it's dependent on CPU byte ordering, so do a test in your application before to be sure.
I would assume this is correct:
udp.send(reinterpret_cast<char*>(&dummy),2);
Note ampersand. Otherwise you are sending two byes from address 256, which are probably random (at best). This is microcontroller, so it might not crash.
Second version should be:
udp.send((char*)&dummy,2);

Following code crashes in x64 bit but not x86 build

I've always been using a xor encryption class for my 32 bit applications but recently I have started working on a 64 bit one and encountered the following crash: https://i.stack.imgur.com/jCBlJ.png
Here's the xor class I'm using:
// xor.h
#pragma once
template <int XORSTART, int BUFLEN, int XREFKILLER>
class XorStr
{
private:
XorStr();
public:
char s[BUFLEN];
XorStr(const char* xs);
~XorStr()
{
for (int i = 0; i < BUFLEN; i++) s[i] = 0;
}
};
template <int XORSTART, int BUFLEN, int XREFKILLER>
XorStr<XORSTART, BUFLEN, XREFKILLER>::XorStr(const char* xs)
{
int xvalue = XORSTART;
int i = 0;
for (; i < (BUFLEN - 1); i++)
{
s[i] = xs[i - XREFKILLER] ^ xvalue;
xvalue += 1;
xvalue %= 256;
}
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
}
The crash occurs when I try to use the obfuscated string but doesnt necessarily happen 100% of the times (never happens on 32 bit, however). Here's a small example of a 64 bit app that will crash on the second obfuscated string:
#include <iostream>
#include "xor.h"
int main()
{
// no crash
printf(/*123456789*/XorStr<0xDE, 10, 0x017A5298>("\xEF\xED\xD3\xD5\xD7\xD5\xD3\xDD\xDF" + 0x017A5298).s);
// crash
printf(/*123456*/XorStr<0xE3, 7, 0x87E64A05>("\xD2\xD6\xD6\xD2\xD2\xDE" + 0x87E64A05).s);
return 0;
}
The same app will run perfectly fine if built in 32 bit.
Here's the HTML script to generate the obfuscated strings: https://pastebin.com/QsZxRYSH
I need to tweak this class to work on 64 bit because I have a lot of strings that I already have encrypted that I need to import from a 32 bit project into the one I'm working on at the moment, which is 64 bit. Any help is appreciated!
The access violation is because 0x87E64A05 is larger than the largest value a signed 32bit integer can hold (which is 0x7FFFFFFF).
Because int is likely 32bit, then XREFKILLER cannot hold 0x87E64A05 and so its value will be implementation-defined.
This value is then used later to subtract again from xs after the pointer passed was artificially advanced by the literal 0x87E64A05 which would be interpreted as long or long long to make the value fit, depending on whether long is 32bit or larger and therefore wouldn't narrowing into the implementation defined value.
Therefore you are effectively left with some random pointer in xs[i - XREFKILLER] and this is likely to give undefined behavior, e.g. an access violation.
If compiled for 32bit x86 it probably so happens that int and pointers have the same bit-size and that the implementation-defined over-/underflow and narrowing behaviors happen to be such that the addition and subtraction cancel correctly as expected. If however the pointer type is larger than 32bit this cannot function.
There is no point to XREFKILLER at all. It just does one calculation that is immediately reverted (if there is no over-/underflow).
Note that the fact that the compiler accepts the narrowing in the template argument at all is a bug. Your program is ill-formed and the compiler should give you an error message.
In GCC for example this bug persists until version 8.2, but has been fixed on current trunk (i.e. version 9).
You will have similar problems with XORSTART if char happens to be signed on your platform, because then your provided values wont fit into it. But in that case you will have to enable warnings, because that won't be a conversion making the program ill-formed. Also the behavior of ^ may not be as you expect if char is signed on your system.
It is not clear what the point of
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
is. It should be:
s[BUFLEN - 1] = '\0';
Passing the resulting string to printf as first argument will lead to spurious undefined behavior if the result string happens to contain a % which would be interpreted as introduction to a format specifier. Use std::cout.
If you want to use printf you need to write std::printf and #include<cstdio> to guarantee that it will be available. However, since this is C++, you should be using std::cout instead anyway.
More fundamentally your output string may happen to contain a 0 other than the terminating one after your transformation. This would be interpreted as end of the C-style string. This seems like a major design flaw and you probably want to use std::string instead for that reason (and because it is better style).

comparison between signed and unsigned integer expressions and 0x80000000

I have the following code:
#include <iostream>
using namespace std;
int main()
{
int a = 0x80000000;
if(a == 0x80000000)
a = 42;
cout << "Hello World! :: " << a << endl;
return 0;
}
The output is
Hello World! :: 42
so the comparison works. But the compiler tells me
g++ -c -pipe -g -Wall -W -fPIE -I../untitled -I. -I../bin/Qt/5.4/gcc_64/mkspecs/linux-g++ -o main.o ../untitled/main.cpp
../untitled/main.cpp: In function 'int main()':
../untitled/main.cpp:8:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(a == 0x80000000)
^
So the question is: Why is 0x80000000 an unsigned int? Can I make it signed somehow to get rid of the warning?
As far as I understand, 0x80000000 would be INT_MIN as it's out of range for positive a integer. but why is the compiler assuming, that I want a positive number?
I'm compiling with gcc version 4.8.1 20130909 on linux.
0x80000000 is an unsigned int because the value is too big to fit in an int and you did not add any L to specify it was a long.
The warning is issued because unsigned in C/C++ has a quite weird semantic and therefore it's very easy to make mistakes in code by mixing up signed and unsigned integers. This mixing is often a source of bugs especially because the standard library, by historical accident, chose to use an unsigned value for the size of containers (size_t).
An example I often use to show how subtle is the problem consider
// Draw connecting lines between the dots
for (int i=0; i<pts.size()-1; i++) {
draw_line(pts[i], pts[i+1]);
}
This code seems fine but has a bug. In case the pts vector is empty pts.size() is 0 but, and here comes the surprising part, pts.size()-1 is a huge nonsense number (today often 4294967295, but depends on the platform) and the loop will use invalid indexes (with undefined behavior).
Here changing the variable to size_t i will remove the warning but is not going to help as the very same bug remains...
The core of the problem is that with unsigned values a < b-1 and a+1 < b are not the same thing even for very commonly used values like zero; this is why using unsigned types for non-negative values like container size is a bad idea and a source of bugs.
Also note that your code is not correct portable C++ on platforms where that value doesn't fit in an integer as the behavior around overflow is defined for unsigned types but not for regular integers. C++ code that relies on what happens when an integer gets past the limits has undefined behavior.
Even if you know what happens on a specific hardware platform note that the compiler/optimizer is allowed to assume that signed integer overflow never happens: for example a test like a < a+1 where a is a regular int can be considered always true by a C++ compiler.
It seems you are confusing 2 different issues: The encoding of something and the meaning of something. Here is an example: You see a number 97. This is a decimal encoding. But the meaning of this number is something completely different. It can denote the ASCII 'a' character, a very hot temperature, a geometrical angle in a triangle, etc. You cannot deduce meaning from encoding. Someone must supply a context to you (like the ASCII map, temperature etc).
Back to your question: 0x80000000 is encoding. While INT_MIN is meaning. There are not interchangeable and not comparable. On a specific hardware in some contexts they might be equal just like 97 and 'a' are equal in the ASCII context.
Compiler warns you about ambiguity in the meaning, not in the encoding. One way to give meaning to a specific encoding is the casting operator. Like (unsigned short)-17 or (student*)ptr;
On a 32 bits system or 64bits with back compatibility int and unsigned int have encoding of 32bits like in 0x80000000 but on 64 bits MIN_INT would not be equal to this number.
Anyway - the answer to your question: in order to remove the warning you must give identical context to both left and right expressions of the comparison.
You can do it in many ways. For example:
(unsigned int)a == (unsigned int)0x80000000 or (__int64)a == (__int64)0x80000000 or even a crazy (char *)a == (char *)0x80000000 or any other way as long as you maintain the following rules:
You don't demote the encoding (do not reduce the amount of bits it requires). Like (char)a == (char)0x80000000 is incorrect because you demote 32 bits into 8 bits
You must give both the left side and the right side of the == operator the same context. Like (char *)a == (unsigned short)0x80000000 is incorrect an will yield an error/warning.
I want to give you another example of how crucial is the difference between encoding and meaning. Look at the code
char a = -7;
bool b = (a==-7) ? true : false;
What is the result of 'b'? The answer will shock you: it is undefined.
Some compilers (typically Microsoft visual studio) will compile a program that b will get true while on Android NDK compilers b will get false.
The reason is that Android NDK treats 'char' type as 'unsigned char', while Visual studio treats 'char' as 'signed char'. So on Android phones the encoding of -7 actually has a meaning of 249 and is not equal to the meaning of (int)-7.
The correct way to fix this problem is to specifically define 'a' as signed char:
signed char a = -7;
bool b = (a==-7) ? true : false;
0x80000000 is considered unsigned per default.
You can avoid the warning like this:
if (a == (int)0x80000000)
a=42;
Edit after a comment:
Another (perhaps better) way would be
if ((unsigned)a == 0x80000000)
a=42;

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

Byte array to int in C++

Would the following be the most efficient way to get an int16 (short) value from a byte array?
inline __int16* ReadINT16(unsigned char* ByteArray,__int32 Offset){
return (__int16*)&ByteArray[Offset];
};
If the byte array contains a dump of the bytes in the same endian format as the machine, this code is being called on. Alternatives are welcome.
It depends on what you mean by "efficient", but note that in some architectures this method will fail if Offset is odd, since the resulting 16 bit int will be misaligned and you will get an exception when you subsequently try to access it. You should only use this method if you can guarantee that Offset is even, e.g.
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset){
assert((Offset & 1) == 0); // Offset must be multiple of 2
return *(int16_t*)&ByteArray[Offset];
};
Note also that I've changed this slightly so that it returns a 16 bit value directly, since returning a pointer and then subsequently de-referencing it will most likely less "efficient" than just returning a 16 bit value directly. I've also switched to standard Posix types for integers - I recommend you do the same.
I'm surprised no one has suggested this yet for a solution that is both alignment safe and correct across all architectures. (well, any architecture where there are 8 bits to a byte).
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset)
{
int16_t result;
memcpy(&result, ByteArray+Offset, sizeof(int16_t));
return result;
};
And I suppose the overhead of memcpy could be avoided:
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset)
{
int16_t result;
uint8_t* ptr1=(uint8_t*)&result;
uint8_t* ptr2 = ptr1+1;
*ptr1 = *ByteArray;
*ptr2 = *(ByteArray+1);
return result;
};
I believe alignment issues don't generate exceptions on x86. And if I recall, Windows (when it ran on Dec Alpha and others) would trap the alignment exception and fix it up (at a modest perf hit). And I do remember learning the hard way that Sparc on SunOS just flat out crashes when you have an alignment issue.
inline __int16* ReadINT16(unsigned char* ByteArray,__int32 Offset)
{
return (__int16*)&ByteArray[Offset];
};
Unfortunately this has undefined behavour in C++, because you are accessing storage using two different types which is not allowed under the strict aliasing rules. You can access the storage of a type using a char*, but not the other way around.
From previous questions I asked, the only safe way really is to use memcpy to copy the bytes into an int and then use that. (Which will likely be optimised to the same code you'd hope anyway, so just looks horribly inefficient).
Your code will probably work, and most people seem to do this... But the point is that you can't go crying to your compiler vendor when one day it generates code that doesn't do what you'd hope.
I see no problem with this, that's exactly what I'd do. As long as the byte array is safe to access and you make sure that the offset is correct (shorts are 2 bytes so you may want to make sure that they can't do odd offsets or something like that)