How to determine why a C program outputs unexpected messages?

How to determine why a C program outputs unexpected messages? - c++

I'm trying to program in C on Eclipse, I have installed and configured MinGW, but I have a problem that I don't understand:
I have some simple code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int num1,num2;
setbuf(stdout,NULL);
printf("enter two numbers");
scanf("%d%d",&num1,&num2);
if(num1>num2){
printf("num1 is greater than num2");
}else{
printf("num2 is greater than num1);
}
return 0;
}
After I compile and run, it shows me "Enter two numbers" and I enter two numbers, I can't see any further output and keyboard function doesn't work on console screen, it doesn't give me an error, but it does show some strange output on console:
<terminated>
<terminated>(exit value: -1.073.741.515) CPS:exe

You're not reading in the values correctly:
scanf("%d%d",num1,num2);
The %d format specifier for scanf expect a int *, i.e. a pointer to an int, as an argument. It needs the address of a variable to be able to write a value to where that address is stored.
You're instead passing the current values of num1 and num2 which are essentially garbage values because the variables have not been written to.
You instead want:
scanf("%d%d",&num1,&num2);

I don't kwon if the code you post is exactly the same as what you compiled, when I copy and compile the code you posted, gcc report an error at line:
printf("num2 is greater than num1);
There lost the ending quotation mark in printf();
After I fixed this error, it works well. My environment is
Linux DESKTOP-7VH0PN1 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
With gcc:
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.

Related

clang AddressSanitizer instructs code improperly, false-positive result

FOREWORD
The current question is pretty damn huge and related to my master thesis, so I am humbly asking for your patience. I encountered a problem that is going to be explained further about half a year ago and the problem was needed an exterior look because at that point I was really stuck and I had nobody to help me. In the end I waved a hand at the problem, but now I am back in business (the second wind, let us put it that way).
INTRODUCTION
Crucial technologies used in the project:
C++, llvm/clang 13.0.1, ASAN, libFuzzer
The underlying idea behind the project I was writting is:
Write a parser of C-code projects to find functions that are presumed to be vulnerable (in the frames of the current question it does not matter how I decide that they are vulnerable)
When I find the vulnerable function, I start to write fuzzer code with libFuzzer for the function.
At this point I have an IR file with my vulnerable function, an IR file with my fuzzer code so it is time
to perform a separate compilation of two files. During the compilation process I instruct them with ASAN and libFuzzer by the clang compiler.
So the two files are coalesced together and I have an executable called, for example, 'fuzzer'. Theoretically, I can execute this executable and libFuzzer is going to fuzz my vulnerable function.
ACTUAL PROBLEM (PART 1)
ASAN intructs my code somehow bad. It gives me the wrong result.
How do I know that?
I found and took a vulnerable function. This function is from the old version of libcurl and is called sanitize_cookie_path. I reproduced the bug with AFL++ and it gave me what I wanted. If you pass a single quote to the function, it is going to 'blow'. Something similar I wanted to do with libFuzzer and ASAN, but as I mentioned earlier these two did not give me the expected result. Having spent some time on the problem, I can say that there is something with ASAN.
PROBLEM REPRODUCTION
I have the code (see below) in the file sanitize_cookie_path.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stddef.h>
static char* sanitize_cookie_path(const char* cookie_path) {
size_t len;
char* new_path = strdup(cookie_path);
if (!new_path) {
return NULL;
}
if (new_path[0] == '\"') {
memmove((void *)new_path, (const void*)(new_path + 1), strlen(new_path));
}
if (new_path[strlen(new_path) - 1] == '\"') {
new_path[strlen(new_path) - 1] = 0x0;
}
if (new_path[0] !='/') {
free(new_path);
new_path = strdup("/");
return new_path;
}
len = strlen(new_path);
if (1 < len && new_path[len - 1] == '/') {
new_path[len - 1] = 0x0;
}
return new_path;
}
int main(int argc, char** argv) {
if (argc != 2) {
exit(1);
}
sanitize_cookie_path('\"');
return 0;
}
My C++ code compiles it with the command:
clang -O0 -emit-llvm path/to/sanitize_cookie_path.c -S -o path/to/sanitize_cookie_path.ll > /dev/null 2>&1
On the IR level of the above code I get rid of the 'main' so only the 'sanitize_cookie_path' function presents.
I generate the simple fuzzer code (see below) for this function:
#include <cstdio>
#include <cstdint>
static char* sanitize_cookie_path(const char* cookie_path) ;
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
(void) sanitize_cookie_path((char*) data);
return 0;
}
Then I compile it with the command:
clang -O0 -emit-llvm path/to/fuzz_sanitize_cookie_path.cc -S -o path/to/fuzz_sanitize_cookie_path.ll > /dev/null 2>&1
Two IR files are being compiled with the separate compilation. NOTE that before the separate compilation I perform some business to get them fit each other. For instance, I ditch the 'static' keyword and resolve name mangling from C++ to C code.
I compile them both together with the command:
clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=address,fuzzer -fsanitize-coverage=trace-cmp,trace-gep,trace-div path/to/sanitize_cookie_path.ll path/to/fuzz_sanitize_cookie_path.ll -o path-to/fuzzer > /dev/null 2>&1
The final 'fuzzer' executable is ready.
ACTUAL PROBLEM (PART 2)
If you execute the fuzzer program, it is not going to give you the same results as AFL++ gives you. My fuzzer tumbles down on the '__interceptor_strdup' function from some standard library (see error snippet below). The crash report done by libFuzzer is literally empty (0 bytes), but ideally it had to find that the error is with a quote ("). Having done my own research I found out that ASAN did instruct the code bad and it gives me a false-position result. Frankly speaking I can fuzz the 'printf' function from stdio.h and find the same error.
[sanitize_cookie_path]$ ./fuzzer
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1016408680
INFO: Loaded 1 modules (11 inline 8-bit counters): 11 [0x5626d4c64c40, 0x5626d4c64c4b),
INFO: Loaded 1 PC tables (11 PCs): 11 [0x5626d4c64c50,0x5626d4c64d00),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
=================================================================
==2804==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000011 at pc 0x5626d4ba7671 bp 0x7ffe43152df0 sp 0x7ffe431525a0
READ of size 2 at 0x602000000011 thread T0
#0 0x5626d4ba7670 in __interceptor_strdup (/path/to/fuzzer+0xdd670)
#1 0x5626d4c20127 in sanitize_cookie_path (/path/to/fuzzer+0x156127)
#2 0x5626d4c20490 in LLVMFuzzerTestOneInput (/path/to/fuzzer+0x156490)
#3 0x5626d4b18940 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/path/to/fuzzer+0x4e940)
#4 0x5626d4b1bae6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (/path/to/fuzzer+0x51ae6)
#5 0x5626d4b1c052 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (/path/to/fuzzer+0x52052)
#6 0x5626d4b0100b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/path/to/fuzzer+0x3700b)
#7 0x5626d4af0297 in main (/path/to/fuzzer+0x26297)
#8 0x7f8e6442928f (/usr/lib/libc.so.6+0x2928f)
#9 0x7f8e64429349 in __libc_start_main (/usr/lib/libc.so.6+0x29349)
#10 0x5626d4af02e4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
I used gdb to enter into the strdup(cookie_path). gdb shows me that the fuzzer tumbles down on the address 0x0000555555631687.
0x0000555555631684 <+452>: mov %rbp,%rsi
0x0000555555631687 <+455>: addr32 call 0x555555674100 <_ZN6__asan18ReportGenericErrorEmmmmbmjb>
0x000055555563168d <+461>: pop %rax
WHAT I TRIED TO DO
I tried to instuct my sanitize_cookie_path.c and fuzz_sanitize_cookie_path.cc with ASAN right at the beginning, not at the IR level but whatever I did nothing worked.
I passed to the 'fuzzer' the so called corpus directory with pre-cooked data to be passed to the fuzzer. I even passed the quote explicitly to the 'fuzzer', but nothing. Example (with the same directory as the fuzzer):
$ mkdir corpus/; echo "\"" > corpus/input; hexdump corpus/input
0000000 0a22
0000002
$ ./fuzzer corpus/
I also googled everything I could about libFuzzer and ASAN, but nothing gave me the results.
Changed compilation command. I got rid of the '-fno-omit-frame-pointer' and '-fsanitize-coverage=trace-cmp,trace-gep,trace-div'.
If there are some uncertainties in the details I have provided, do not hesitate to ask about them and I will iron them out to be more clear for you.
What are some other sites/forums where I can possibly get heard? I would ideally want to contact the developers of ASAN.
I will be more than happy for any help.
UPDATE 04/10/2022
llvm/clang have been upgraded from 13.0.1 to the latest available version in the Arch repository - 14.0.6. The problem still persists.
Opened an issue in the google/sanitizers repository.

Once more I have reread my question and comments, looked again at the code and additionally ran into this thought:
AddressSanitizer is not expected to produce false positives. If you
see one, look again; most likely it is a true positive!
As #Richard Critten and #chi have correctly pointed out in the comments section strdup function needs a NULL terminated string, so I changed my solution
from
(void) sanitize_cookie_path((char*) data);
to
char* string_ = new char[size + 1];
memcpy(string_, data, size);
string_[size] = 0x0;
(void) sanitize_cookie_path(string_);
delete[] string_;
The about solution converts the raw array of bytes data to a NULL terminated string string_ and passes it to the function. This solution works as it is expected.
It was just a stupid mistake that I had overlooked. Thanks again to #Richard Critten and #chi and everyone that tried to help.
Since there is no bug, I am going to retract my false accusations in google/sanitizers.

C++ std::regex segmentation fault

I have various data that I need to parse and get the weight out of it.
I'm using
C++11
std::regex
Debian 9.9
gcc 6.3.0
The problem is that sometimes segmentation fault occurs, it happens very rarely.
The input that throws the error mostly consist of just space and newline characters.
Here is the regex:
(?:\b(?:(kilogram\.*s*\.*|kg\.*s*\.*)(?:[^[:alnum:]])*)(?:\s*weight\s*)*(?:\s*is\s*|\s*are\s*)*)\W*([\d\.,]*\d+\b)|(?:(?:[\s\.]?|^)([\d\.,]*\d+)\W*(kilogram\.*s*\.*|kg\.*s*\.*)\b)
Example regex that works on regex101.com but throws segmentation fault in C++ on my Debian server regex101
Here are some more regex101 examples of input, just to fast get an idea of what regex is searching for.
Here is an example of C++ code that fails.
And here is the same C++ code that works, but using another online compiler (cpp.sh).
Can someone please help me to solve this segmentation fault problem?
Thank you.

I have the same issue with a simple regex .+ and [a-zA-Z0-9\\+/=]+.
I have tried different compilers: g++, clang++, clang-cl on Windows, and g++, clang++ on Linux (WSL).
On Windows, the application freeze and ends suddenly. On Ubuntu (WSL), I have the Segmentation Fault.
The error happens for g++ on Windows with c++11, c++14, c++17 and also c++20.
Limit
In your example, your data regex101 has 31275 characters which, I suppose, is too many for regex_match.
Here is the program I used to guess the maximal length of the data.
#include <iostream>
#include <regex>
int main(int argc, char **argv) {
int length = argc > 1 ? std::stoi(std::string(argv[1])) : 30000;
std::regex testRegex(".+");
std::string data = "";
for (int i = 0; i < length; ++i) {
data += "a";
}
std::cout << "Match: " << std::regex_match(data, testRegex) << std::endl;
return 0;
}
// Limit before crash (it's a bit random so the limit is not accurate)
// Windows 11
// clang++ Windows : 4999998
// clang-cl Windows : 4999998
// g++ Windows : 6833
// WSL Ubuntu 20.04
// clang++ WSL : 23804
// g++ WSL : 26187
How to solve
According to this test, data has a size limit, and the application will stop if the limit is exceeded.
What you can do is:
Remove some unnecessary spaces before using regex_match
Split the data in half
On Windows, you can use clang++ to increase the limit to 5M chars
For me, I split my data in half because the regex [a-zA-Z0-9\\+/=]+ doesn't require the entire input.
If anybody knows how we can increase the limit (with some flags or #define), I am interested.

c++ (on Clion) for loop stops in the middle with no errors (exit code 0) [duplicate]

When using CLion I have found the output sometimes cuts off.
For example when running the code:
main.cpp
#include <stdio.h>
int main() {
int i;
for (i = 0; i < 1000; i++) {
printf("%d\n", i);
}
fflush(stdout); // Shouldn't be needed as each line ends with "\n"
return 0;
}
Expected Output
The expected output is obviously the numbers 0-999 on each on a new line
Actual Output
After executing the code multiple times within CLion, the output often changes:
Sometimes it executes perfectly and shows all the numbers 0-999
Sometimes it cuts off at different points (e.g. 0-840)
Sometimes it doesn't output anything
The return code is always 0!
Screenshot
Running the code in a terminal (i.e. not in CLion itself)
However, the code outputs the numbers 0-999 perfectly when compiling and running the code using the terminal.
I have spent so much time on this thinking it was a problem with my code and a memory issue until I finally realised that this was just an issue with CLion.
OS: Ubuntu 14.04 LTS
Version: 2016.1
Build: #CL-145.258
Update
A suitable workaround is to run the code in debug mode (thanks to #olaf).

The consensus is that this is an IDE issue. Therefore, I have reported the bug.
A suitable workaround is to execute the code in debug mode (no breakpoint required).
I will update this question, as soon as this bug is fixed.
Update 1
WARNING: You should not change information in registry unless you have been asked specifically by JetBrains. Registry is not in the main menu for a reason! Use the following solution at your own risk!!!
JetBrains have contacted me and provided a suitable solution:
Go to the Find Action Dialog box (CTRL+SHIFT+A)
Search for "Registry..."
Untick run.processes.with.pty
Should then work fine!
Update 2
The bug has been added here:
https://youtrack.jetbrains.com/issue/CPP-6254
Feel free to upvote it!

Linux gdb 'examine' behavior

I'm using gdb to explore a core file on Linux and I noticed weird behavior when examining memory addresses:
(gdb) x/f 0xbd091a10
0xbd091a10: 0
(gdb) x/g 0xbd091a10
0xbd091a10: 65574
(gdb) x/f 0xbd091a10
0xbd091a10: 65574
These statements were run directly back to back, and I don't understand why examining as float returns inconsistent results. The value 65574 does make sense as it corresponds to the identity of the last loaded item by the process.
Does anyone know the reason for this?
Version details:
Linux mx534vm 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5)

There is no inconsistency in this. 'f' and 'g' are specifiers in different categories, 'f' for format and 'g' for unit size. Each specifier, when used, becomes the default for its own category which holds for all subsequent uses of 'x'. Thus, your two last commands are both equivalent to x/fg 0xbd091a10.

swprintf fails with unicode characters in xcode, but works in visual studio

While trying to convert some existing code to support unicode characters this problem popped up. If i try to pass a unicode character (in this case im using the euro symbol) into any of the *wprintf functions it will fail, but seemingly only in xcode. The same code works fine in visual studio and I was even able to get a friend to test it successfully with gcc on linux. Here is the offending code:
wchar_t _teststring[10] = L"";
int _iRetVal = swprintf(_teststring, 10, L"A¥€");
wprintf(L"return: %d\n", _iRetVal);
// print values stored in string to check if anything got corrupted
for (int i=0; i<wcslen(_teststring); ++i) {
wprintf(L"%d: (%d)\n", i, _teststring[i]);
}
In xcode the call to swprintf will return -1, while in visual studio it will succeed and proceed to print out the correct values for each of the 3 chars (65, 165, 8364).
I have googled long and hard for solutions, one suggestion that has appeared a number of times is using a call such as:
setlocale(LC_CTYPE, "UTF-8");
I have tried various combinations of arguments with this function with no success, upon further investigation it appears to be returning null if i try to set the locale to any value other than the default "C".
I'm at a loss as to what else i can try to solve this problem, and the fact it works in other compilers/platforms just makes it all the more frustrating. Any help would be much appreciated!
EDIT:
Just thought i would add that when the swprintf call fails it sets an error code (92) which is defined as:
#define EILSEQ 92 /* Illegal byte sequence */

It should work if you fetch the locale from the environment:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void) {
setlocale(LC_ALL, "");
wchar_t _teststring[10] = L"";
int _iRetVal = swprintf(_teststring, 10, L"A¥€");
wprintf(L"return: %d\n", _iRetVal);
// print values stored in string to check if anything got corrupted
for (int i=0; i<wcslen(_teststring); ++i) {
wprintf(L"%d: (%d)\n", i, _teststring[i]);
}
}
On my OS X 10.6, this works as expected with GCC 4.2.1, but when compiled with CLang 1.6, it places the UTF-8 bytes in the result string.
I could also compile this with Xcode (using the standard C++ console application template), but because graphical applications on OS X don't have the required locale environment variables, it doesn't work in Xcode's console. On the other hand, it always works in the Terminal application.
You could also set the locale to en_US.UTF-8 (setlocale(LC_ALL, "en_US.UTF-8")), but that is non-portable. Depending on your goal there may be better alternatives to wsprintf.

If you are using Xcode 4+ make sure you have set an appropriate encoding for your files that contain your strings. You can find the encoding settings on a right pane under "Text Settings" group.

Microsoft had a plan to be compatible with other compilers starting from VS 2015 but finally it never happened because of problems with legacy code, see link.
Fortunately you can still enable ISO C (C99) standard in VS 2015 by adding _CRT_STDIO_ISO_WIDE_SPECIFIERS preprocessor macro. It is recommended while writing portable code.

I found that using "%S" (upper case) in the formatting string works.
"%s" is for 8-bit characters, and "%S" is for 16-bit or 32-bit characters.
See: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
I'm using Qt Creator 4.11, which uses Clang 10.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to determine why a C program outputs unexpected messages? - c++

Related

clang AddressSanitizer instructs code improperly, false-positive result

C++ std::regex segmentation fault

c++ (on Clion) for loop stops in the middle with no errors (exit code 0) [duplicate]

Linux gdb 'examine' behavior

swprintf fails with unicode characters in xcode, but works in visual studio

Categories

Resources