GCC, G++ how to turn character too long warning into error - c++

I am writing a huge code base for webservices backends in C++, my frontend is in Javascript, so I tend to write some frontend code and then go back to C++ code, more often then I would like, I write the following crime somewhere: container['myKey'], which usually turns into a crash somewhere, I have hot reload with cached compile, which means that if I don't catch the warning the compiler throws for the multi char string the first time it happens, I won't see it in the next compilation(because that unit will probably not recompile).
It is a very stupid error that bring random crashes to my software and that sometimes I take hours to find the culprit and fix it. For some reason the compiler Warning for multi char strings that are longer than 4 characters cannot be turned into an Error in g++ (it doesn't have a warning code, like other warnings.). Meaning I'm left with writing some kind of linter to put in my CI/CD to cause an error when a multi char literal is found.
So my idea is to write a little bash script that will check all my .cpp files for any multi char literals and return an error if it finds any.
I tried to write a regex to catch those, but I failed, I would be very happy if you, a regex magician, could help me writing a regex to catch 'things like this', but not "things 'like this'" ( a call to query("SELECT * FROM data WHERE data LIKE ('potato')") should be valid.
Thanks in advance.
Edit: Additional information:
Following the solution proposed in the comments, I tried -Werror=multichar, and I found something rather curious.
It works only in a few cases:
g++ -o obj/src/main.ocpp -c src/main.cpp -fno-trapping-math -Werror=multichar -O3 -std=c++17 -D_FORCE_INLINES -I./src/include -I/usr/include/mysql
src/main.cpp:52:11: error: multi-character character constant [-Werror=multichar]
52 | langAll['test'] = true;
Because the warning changes if there are too many characters inside the simple quotes:
g++ -o obj/src/main.ocpp -c src/main.cpp -fno-trapping-math -Werror=multichar -O3 -std=c++17 -D_FORCE_INLINES -I./src/include -I/usr/include/mysql
src/main.cpp:52:11: warning: character constant too long for its type
52 | langAll['testtttttt'] = true;
I found the GCC portion of code that blurts out this warning:
https://github.com/gcc-mirror/gcc/blob/master/libcpp/charset.cc
if (type == CPP_UTF8CHAR)
max_chars = 1;
if (i > max_chars)
{
i = max_chars;
cpp_error (pfile, type == CPP_UTF8CHAR ? CPP_DL_ERROR : CPP_DL_WARNING,
"character constant too long for its type");
}
else if (i > 1 && CPP_OPTION (pfile, warn_multichar))
cpp_warning (pfile, CPP_W_MULTICHAR, "multi-character character constant");
Not sure how to turn this:
CPP_DL_WARNING,"character constant too long for its type");
into an error though.
After talking to chatgpt it suggested: -Woverlength-strings -Werror=overlength-strings , but it also doesn't work to make that warning into an error.

Related

clang AddressSanitizer instructs code improperly, false-positive result

FOREWORD
The current question is pretty damn huge and related to my master thesis, so I am humbly asking for your patience. I encountered a problem that is going to be explained further about half a year ago and the problem was needed an exterior look because at that point I was really stuck and I had nobody to help me. In the end I waved a hand at the problem, but now I am back in business (the second wind, let us put it that way).
INTRODUCTION
Crucial technologies used in the project:
C++, llvm/clang 13.0.1, ASAN, libFuzzer
The underlying idea behind the project I was writting is:
Write a parser of C-code projects to find functions that are presumed to be vulnerable (in the frames of the current question it does not matter how I decide that they are vulnerable)
When I find the vulnerable function, I start to write fuzzer code with libFuzzer for the function.
At this point I have an IR file with my vulnerable function, an IR file with my fuzzer code so it is time
to perform a separate compilation of two files. During the compilation process I instruct them with ASAN and libFuzzer by the clang compiler.
So the two files are coalesced together and I have an executable called, for example, 'fuzzer'. Theoretically, I can execute this executable and libFuzzer is going to fuzz my vulnerable function.
ACTUAL PROBLEM (PART 1)
ASAN intructs my code somehow bad. It gives me the wrong result.
How do I know that?
I found and took a vulnerable function. This function is from the old version of libcurl and is called sanitize_cookie_path. I reproduced the bug with AFL++ and it gave me what I wanted. If you pass a single quote to the function, it is going to 'blow'. Something similar I wanted to do with libFuzzer and ASAN, but as I mentioned earlier these two did not give me the expected result. Having spent some time on the problem, I can say that there is something with ASAN.
PROBLEM REPRODUCTION
I have the code (see below) in the file sanitize_cookie_path.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stddef.h>
static char* sanitize_cookie_path(const char* cookie_path) {
size_t len;
char* new_path = strdup(cookie_path);
if (!new_path) {
return NULL;
}
if (new_path[0] == '\"') {
memmove((void *)new_path, (const void*)(new_path + 1), strlen(new_path));
}
if (new_path[strlen(new_path) - 1] == '\"') {
new_path[strlen(new_path) - 1] = 0x0;
}
if (new_path[0] !='/') {
free(new_path);
new_path = strdup("/");
return new_path;
}
len = strlen(new_path);
if (1 < len && new_path[len - 1] == '/') {
new_path[len - 1] = 0x0;
}
return new_path;
}
int main(int argc, char** argv) {
if (argc != 2) {
exit(1);
}
sanitize_cookie_path('\"');
return 0;
}
My C++ code compiles it with the command:
clang -O0 -emit-llvm path/to/sanitize_cookie_path.c -S -o path/to/sanitize_cookie_path.ll > /dev/null 2>&1
On the IR level of the above code I get rid of the 'main' so only the 'sanitize_cookie_path' function presents.
I generate the simple fuzzer code (see below) for this function:
#include <cstdio>
#include <cstdint>
static char* sanitize_cookie_path(const char* cookie_path) ;
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
(void) sanitize_cookie_path((char*) data);
return 0;
}
Then I compile it with the command:
clang -O0 -emit-llvm path/to/fuzz_sanitize_cookie_path.cc -S -o path/to/fuzz_sanitize_cookie_path.ll > /dev/null 2>&1
Two IR files are being compiled with the separate compilation. NOTE that before the separate compilation I perform some business to get them fit each other. For instance, I ditch the 'static' keyword and resolve name mangling from C++ to C code.
I compile them both together with the command:
clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=address,fuzzer -fsanitize-coverage=trace-cmp,trace-gep,trace-div path/to/sanitize_cookie_path.ll path/to/fuzz_sanitize_cookie_path.ll -o path-to/fuzzer > /dev/null 2>&1
The final 'fuzzer' executable is ready.
ACTUAL PROBLEM (PART 2)
If you execute the fuzzer program, it is not going to give you the same results as AFL++ gives you. My fuzzer tumbles down on the '__interceptor_strdup' function from some standard library (see error snippet below). The crash report done by libFuzzer is literally empty (0 bytes), but ideally it had to find that the error is with a quote ("). Having done my own research I found out that ASAN did instruct the code bad and it gives me a false-position result. Frankly speaking I can fuzz the 'printf' function from stdio.h and find the same error.
[sanitize_cookie_path]$ ./fuzzer
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1016408680
INFO: Loaded 1 modules (11 inline 8-bit counters): 11 [0x5626d4c64c40, 0x5626d4c64c4b),
INFO: Loaded 1 PC tables (11 PCs): 11 [0x5626d4c64c50,0x5626d4c64d00),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
=================================================================
==2804==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000011 at pc 0x5626d4ba7671 bp 0x7ffe43152df0 sp 0x7ffe431525a0
READ of size 2 at 0x602000000011 thread T0
#0 0x5626d4ba7670 in __interceptor_strdup (/path/to/fuzzer+0xdd670)
#1 0x5626d4c20127 in sanitize_cookie_path (/path/to/fuzzer+0x156127)
#2 0x5626d4c20490 in LLVMFuzzerTestOneInput (/path/to/fuzzer+0x156490)
#3 0x5626d4b18940 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/path/to/fuzzer+0x4e940)
#4 0x5626d4b1bae6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (/path/to/fuzzer+0x51ae6)
#5 0x5626d4b1c052 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (/path/to/fuzzer+0x52052)
#6 0x5626d4b0100b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/path/to/fuzzer+0x3700b)
#7 0x5626d4af0297 in main (/path/to/fuzzer+0x26297)
#8 0x7f8e6442928f (/usr/lib/libc.so.6+0x2928f)
#9 0x7f8e64429349 in __libc_start_main (/usr/lib/libc.so.6+0x29349)
#10 0x5626d4af02e4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
I used gdb to enter into the strdup(cookie_path). gdb shows me that the fuzzer tumbles down on the address 0x0000555555631687.
0x0000555555631684 <+452>: mov %rbp,%rsi
0x0000555555631687 <+455>: addr32 call 0x555555674100 <_ZN6__asan18ReportGenericErrorEmmmmbmjb>
0x000055555563168d <+461>: pop %rax
WHAT I TRIED TO DO
I tried to instuct my sanitize_cookie_path.c and fuzz_sanitize_cookie_path.cc with ASAN right at the beginning, not at the IR level but whatever I did nothing worked.
I passed to the 'fuzzer' the so called corpus directory with pre-cooked data to be passed to the fuzzer. I even passed the quote explicitly to the 'fuzzer', but nothing. Example (with the same directory as the fuzzer):
$ mkdir corpus/; echo "\"" > corpus/input; hexdump corpus/input
0000000 0a22
0000002
$ ./fuzzer corpus/
I also googled everything I could about libFuzzer and ASAN, but nothing gave me the results.
Changed compilation command. I got rid of the '-fno-omit-frame-pointer' and '-fsanitize-coverage=trace-cmp,trace-gep,trace-div'.
If there are some uncertainties in the details I have provided, do not hesitate to ask about them and I will iron them out to be more clear for you.
What are some other sites/forums where I can possibly get heard? I would ideally want to contact the developers of ASAN.
I will be more than happy for any help.
UPDATE 04/10/2022
llvm/clang have been upgraded from 13.0.1 to the latest available version in the Arch repository - 14.0.6. The problem still persists.
Opened an issue in the google/sanitizers repository.
Once more I have reread my question and comments, looked again at the code and additionally ran into this thought:
AddressSanitizer is not expected to produce false positives. If you
see one, look again; most likely it is a true positive!
As #Richard Critten and #chi have correctly pointed out in the comments section strdup function needs a NULL terminated string, so I changed my solution
from
(void) sanitize_cookie_path((char*) data);
to
char* string_ = new char[size + 1];
memcpy(string_, data, size);
string_[size] = 0x0;
(void) sanitize_cookie_path(string_);
delete[] string_;
The about solution converts the raw array of bytes data to a NULL terminated string string_ and passes it to the function. This solution works as it is expected.
It was just a stupid mistake that I had overlooked. Thanks again to #Richard Critten and #chi and everyone that tried to help.
Since there is no bug, I am going to retract my false accusations in google/sanitizers.

PX Transform Routine compile issues

I have a transformer routine written in C++ that is set to clear all whitespace and map to a value if the input string is either null or empty. The c++ code compiles and has tested properly, but I am having trouble getting the routine to work in Datastage.
As per instructions, I have copied the exact compiler options that I have in my DS Environment as below.
g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small BlankToValue.cpp
g++ -shared -m64 BlankToValue.so BlankToValue.o
When testing the routine in a job however I get the following error.
Sequential_File_36,0: Internal Error: (shbuf): iomgr/iomgr.C: 2649
Is there a different set of options I should be using for compilation?
For reference, the c++ code.
#include <stdlib.h>
#include <stdio.h>
#include <algorithm>
#include <locale.h>
#include <locale>
char * BlankToValue(char *InStr, char *RepStr)
{
if (InStr[0] == '\0') // Check for null pointer at first character of input string.
{
return RepStr; // Return replacement string if true. This is to prevent unnecessary processing.
} else
{
const char* checkstr = InStr; // Establish copy of inputstring stored in checkstring.
do { // Start outer loop.
while (isspace(*checkstr)) { // Inner loop while current checkstring byte is whitespace.
++checkstr; // Increment to next checkstring byte.
}
} while ((*InStr++ = *checkstr++)); // Set inputstring byte to current checkstring and iterate both. Breaks when either string evaluates to null.
*InStr = '\0'; // Set null terminator for input string at current byte location.
if (InStr[0] == '\0') // Checks first character of cleaned input string for null pointer.
{
return RepStr; // Return replacement string if true.
} else
{
return InStr; // Return new input string if false.
}
}
}
William,
in your DataStage routine definition that points to this custom function, did you select routine type as object (.o file that is compiled into transformer stage at job run time) or a library (a lib.so file that is loaded at job run time but has requirements on library naming convention and that library is located in library path). Your code above suggested you are creating a *.so file but not prefixed with lib. Here is an example:
https://www.ibm.com/support/pages/node/403041
Additionally, if the first error in job log was not a library load error but rather was the internal error (shbuf) error, I found a couple of cases where that has occurred in the past with custom routines:
Custom routine involved null handling, as does yours, and began to fail after upgrading to Information Server 8.5 when null handling rules changed in our product. The changes are explained here:
https://www.ibm.com/support/pages/node/433863
You could test if this is the issue by running the job with new job level environment variable: APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING=1
In another case, the shbuf error in custom routine was the result of transformer stage receiving a large record (larger than could be handled by the datatype defined in the custom routine). Does the job still fail when using only a single sample input record with small values in all string type fields.
Thanks.
Also, I noticed that the error is coming from sequential file stage, rather than the transformer stage that is using the custom routine. Thus may also need to consider what is the output datatype for your custom routine and ensure that it is exiting with valid value that is not too large for the datatype and also not larger that default transport buffer size used between stages (defaults to 128k).
After a day or two of multiple attempts to try different compile and code approaches I found the solution to my problem. The below code was throwing a segmentation fault when passed a null column. Which makes sense in retrospect.
if (InStr[0] == '\0')
It has been corrected to the below and now everything works properly.
if ((InStr == NULL) || (InStr[0] == '\0'))

Stack is totally messed up by trying to produce a buffer overflow

after hours of debugging without any effort, I hope to find some help here on StackOverflow.
I'm currently on a PTP training and due to the fact that I'm only using Linux, i also want to practice the very firsts Labs on my local machine.
What i have to do is to exploit a very simple Program via buffer overflow. Just the Sources are given:
goodpwd.cpp:
#include <iostream>
#include <cstring>
int bf_overflow(char *str){
char buffer[10]; //our buffer
strcpy(buffer,str); //the vulnerable command
return 0;
}
int good_password(){ // a function which is never executed
printf("Valid password supplied\n");
printf("This is good_password function \n");
return 0;
}
int main(int argc, char *argv[])
{
int password=0; // controls whether password is valid or not
printf("You are in goodpwd.exe now\n");
bf_overflow(argv[1]); //call the function and pass user input
if ( password == 1) {
good_password(); //this should never happen
}
else {
printf("Invalid Password!!!\n");
}
printf("Quitting sample1.exe\n");
return 0;
}
I compiled it to get an executable by using
gcc -fno-stack-protector -z execstack -o goodpwd goodpwd.cpp -ggdb -m32 -lstdc++ -no-pie -O0
(I also already tried it without -no-pie and -O0 but I thought maybe the optimization could be the problem..)
I used gdb to debug the executable:
gdb goodpwd -tui -q
After setting a breakpoint to line 6 (the one with the vulnerable strcpy) I executed the following command:
(gdb) run AAAAAAAAAAAAAABCDE
after pressing n to go to the next line, I had a look into the stack:
(gdb) x/20x $esp
this gave me the following result:
0xffffd6f0: 0xffffd748 0x4141a8b0 0x41414141 0x41414141
0xffffd700: 0x41414141 0x45444342 0xffffd700 0x0804923b
0xffffd710: 0xffffd99c 0xf7fe4bd0 0xffffd800 0x08049209
0xffffd720: 0x00000002 0xffffd7f4 0xffffd800 0x00000000
0xffffd730: 0x0804c000 0x00000002 0x08049080 0xffffd760
I cannot explain myself why:
there are two A's at 0xffffd6f4
there are no A's at 0xffffd6f6
I got 16 A's starting at 0xffffd6f8
I got EDCB at 0xffffd704 (because of little endian, thank you #1201ProgramAlarm)
$bsp is 0xffffd708 and $eip is 0x80491a7 but after doing two more steps (leaving the function) $eip is set to 0x804923e because after all I've learned, I'm pretty sure it should be 0x08049209
after those two steps I get those error: main (argc=<error reading variable: Cannot access memory at address 0x4141a8b0>,
argv=<error reading variable: Cannot access memory at address 0x4141a8b4>) at goodpwd.cpp:21
I'd really appreciate if there's someone who's able to help me.
Struggling in module 3 of 43 is not the best feeling I've ever got :D
Edit:
ASLR should be deactivated:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Maybe it was a bit too late yesterday. But today I found out, that #1202ProgramAlarm made a very good point.
Because of using a little-endian system, 0xffffd704 was right.
My confusions about 0xffffd6f4 and 0xffffd6f6 have been irrelevant because they not influenced the result.
The value of the old $EIP was still in 0xffffd70e but I never touched it.
I just had to enhance the string in the argument and afterwards I've been able to exploit the vulnerability.
It was a lot of fun. Thanks for the advises.

C++ OpenSSL EVP_DigestVerify intermittently failing rsa_pk1.c:103 with RSA_R_BAD_PAD_BYTE_COUNT

I have included a full, basic code example which can be compiled with the following command (with boost installed) "g++ -std=c++11 -g test.cpp -lssl -lcrypto -lboost_system"
The code I have pretty closely follows the OpenSSL EVP Signing and Verifying Asymmetric example with a bit of Boost refactoring for memory management.
However, it's behaviour is very intermittent and changes with different keys as well as different text. I am pretty sure I am missing something here but due to time pressures I am about to just make a secure system call to the openssl utility which I have had no issues with.
The test output below illustrates the issue. "A" and "AAA" are signed and verified successfully, whereas "AA" fails with a RSA_R_BAD_PAD_BYTE_COUNT padding error. In order to try an correct this, I set the padding to PKCS but it made no difference.
2048 bit [2] - Text: A - Success
Authentication failure: 67567722, rsa_pk1.c, 103, , 0 257
Error string: error:0407006A:lib(4):func(112):reason(106)
2048 bit [2] - Text: AA - Failure
2048 bit [2] - Text: AAA - Success
Any pointers here would be much appreciated!
The issue has to do with the way you're converting char* into string inside testSignVerify:
string hashString((char*)pHash.get());
should be:
string hashString(pHash.get(), hashLength);
It is because default string constructor will stop on the first '\0' it encounters, which is OK for normal strings, but for cryptographic hash it is just one of possible chars.

Print to locate the error code block

Hi great C++ programmers,
I've been in this situation many times, but I still don't know how to solve it.
I am programming in C++ in Rstudio, and my program runs into a fatal error which needs me to restart. I want to locate where the mistake is in my code. What I often do is adding some detection lines like:
int main()
{
...code block;
std::cout<<"1.1\n";
...code block;
std::cout<<"1.2\n";
...code block;
std::cout<<"1.3\n";
...code block;
std::cout<<"1.4\n";
return 1;
}
Then I run the code.
The weird thing is, before it runs into "fatal error", sometimes I got "1.1" printed on the console, sometimes I got "1.1" and "1.2", sometimes I got "1.1", "1.2" and "1.3, and sometimes I got nothing.
I guess it has something to do with the operating system because it is the operating system that got the order to print something which would take some time, but meanwhile the CPU was executing forward the code and met the fatal error.
Or, maybe it's the way the codes were compiled that results in such thing?
Is there anyway to solve it? I just want the program to print out everything I asked before it runs into "fatal error".
Thanks!
The compiler optimization could reorder the code, making the printouts unreliable. You are coding C++ in the R environment, and the default g++ compiler optimization is set to -O2. Go to your R directory, search for a file named "Makeconf". Open it with a text editor, locate command "CXX11FLAGS = -O2 ...", erase "-O2", save the file and rebuild your program. Other commands like "CXXFLAGS = -O2 ..." may also need to be modified. Code will run in strictly sequential order by then.
You are using std::cout which is not what R uses, and you get caught up in two different buffering schemes.
Easy solution: use Rcpp::Rcout instead which redirects into R's output stream.