Setting stack size with setrlimit fails with gcc - c++

I'm using gcc 10.1 on Ubuntu 18.04. I'm getting segfaults when defining a large stack allocated variable, even though my stack seems to be large enough to accommodate it. Here is a code snippet:
#include <iostream>
#include <array>
#include <sys/resource.h>
using namespace std;
int main() {
if (struct rlimit rl{1<<28, 1l<<32}; setrlimit(RLIMIT_STACK, &rl))
cout << "Can not set stack size! errno = " << errno << endl;
else
cout << "Stack size: " << rl.rlim_cur/(1<<20) << "MiB to " << rl.rlim_max/(1<<20) << "MiB\n";
array<int8_t, 100'000'000> a;
cout << (int)a[42] << endl;
}
which segfaults when compiled with gcc, but runs fine when compiled with clang 11.0.1 and outputs:
Stack size: 256MiB to 4096MiB
0
EDIT
Clang was eliding allocation of a. Here is a better example:
#include <iostream>
#include <array>
#include <sys/resource.h>
using namespace std;
void f() {
array<int8_t, 100'000'000> a;
cout << (long)&a[0] << endl;
}
int main()
{
if (struct rlimit rl{1<<28, 1l<<32}; setrlimit(RLIMIT_STACK, &rl))
cout << "Can not set stack size! errno = " << errno << endl;
else
cout << "Stack size: " << rl.rlim_cur/(1<<20) << "MiB to " << rl.rlim_max/(1<<20) << "MiB" << endl;
array<int8_t, 100'000'000> a; // line 21
cout << (long)&a[0] << endl; // line 23
f();
}
which you can find at: https://wandbox.org/permlink/XMaGFMa7heWfI9G8. It runs fine when lines 21 and 23 are commented out, but segfaults otherwise.

Use proc(5) and pmap(1) and strace(1) to understand the limitations of your computer.
array<int8_t, 100'000'000> a;
requires about 100Mbytes of space on call stack, which is generally limited to a few megabytes (perhaps even by your Linux kernel, but I am not sure)
Try also cat /proc/$$/limits in your terminal. On mine I am getting
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
The difference of behavior between compilers might be attributed to various optimizations (e.g. permitted by some C++ standard like n4849). A clever enough compiler is allowed to use just a few words for a inside your function f (e.g. because it would figure out, maybe with some abstract interpretation techniques, that locations a[1024] ... a[99999999] are useless).
If you compile with a recent GCC (e.g. GCC 10), you could invoke it as g++ -O -Wall -Wextra -fanalyzer -Wstack-usage=2048 to get useful warnings. See also this draft report funded by CHARIOT & DECODER projects.
In practice, use dynamic allocation for huge data (e.g. placement new with mmap(2) and smart pointers)
For a real application, consider writing your GCC plugin to get ad-hoc warnings.
Or at least compile your source code foo.cc with g++ -O2 -fverbose-asm -S foo.cc and look inside the generated foo.s and repeat with clang++ : the generated assembler files are different.

Related

Calculating a file's mean value of data bytes

Just for fun, I am trying to calculate a file's mean value of data bytes, essentially replicating a feature available in an already existing tool (ent). Basically, it is simply the result of summing all the bytes of a file and dividing by the file length. If the data are close to random, this should be about 127.5. I am testing 2 methods of computing the mean value, one is a simple for loop which works on an unordered_map and the other is using std::accumulate directly on a string object.
Benchmarking both methods show that it is much slower to use std::accumulate than a simple for loop. Also, mesured on my system, on average, clang++ is about 4 times faster for the accumulate method than g++.
So here are my questions:
Why is the for loop method producing bad output at around 2.5GB input for g++ but not with clang++. My guess is I am doing things wrong (UB probably), but they happen to work with clang++. (solved and code modified accordingly)
Why is the std::accumulate method so much slower on g++ with the same optimization settings?
Thanks!
Compiler info (target is x86_64-pc-linux-gnu):
clang version 11.1.0
gcc version 11.1.0 (GCC)
Build info:
g++ -Wall -Wextra -pedantic -O3 -DNDEBUG -std=gnu++2a main.cpp -o main-g
clang++ -Wall -Wextra -pedantic -O3 -DNDEBUG -std=gnu++20 main.cpp -o main-clang
Sample file (using random data):
dd if=/dev/urandom iflag=fullblock bs=1G count=8 of=test-8g.bin (example for 8GB random data file)
Code:
#include <chrono>
#include <filesystem>
#include <fstream>
#include <iostream>
#include <numeric>
#include <stdexcept>
#include <string>
#include <unordered_map>
auto main(int argc, char** argv) -> int {
using std::cout;
std::filesystem::path file_path{};
if (argc == 2) {
file_path = std::filesystem::path(argv[1]);
} else {
return 1;
}
std::string input{};
std::unordered_map<char, int> char_map{};
std::ifstream istrm(file_path, std::ios::binary);
if (!istrm.is_open()) {
throw std::runtime_error("Could not open file");
}
const auto file_size = std::filesystem::file_size(file_path);
input.resize(file_size);
istrm.read(input.data(), static_cast<std::streamsize>(file_size));
istrm.close();
// store frequency of individual chars in unordered_map
for (const auto& c : input) {
if (!char_map.contains(c)) {
char_map.insert(std::pair<char, int>(c, 1));
} else {
char_map[c]++;
}
}
double sum_for_loop = 0.0;
cout << "using for loop\n";
// start stopwatch
auto start_timer = std::chrono::steady_clock::now();
// for loop method
for (const auto& item : char_map) {
sum_for_loop += static_cast<unsigned char>(item.first) * static_cast<double>(item.second);
}
// stop stopwatch
cout << std::chrono::duration<double>(std::chrono::steady_clock::now() - start_timer).count() << " s\n";
auto mean_for_loop = static_cast<double>(sum_for_loop) / static_cast<double>(input.size());
cout << std::fixed << "sum_for_loop: " << sum_for_loop << " size: " << input.size() << '\n';
cout << "mean value of data bytes: " << mean_for_loop << '\n';
cout << "using accumulate()\n";
// start stopwatch
start_timer = std::chrono::steady_clock::now();
// accumulate method, but is slow (much slower in g++)
auto sum_accum =
std::accumulate(input.begin(), input.end(), 0.0, [](auto current_val, auto each_char) { return current_val + static_cast<unsigned char>(each_char); });
// stop stopwatch
cout << std::chrono::duration<double>(std::chrono::steady_clock::now() - start_timer).count() << " s\n";
auto mean_accum = sum_accum / static_cast<double>(input.size());
cout << std::fixed << "sum_for_loop: " << sum_accum << " size: " << input.size() << '\n';
cout << "mean value of data bytes: " << mean_accum << '\n';
}
Sample output from 2GB file (clang++):
using for loop
2.024e-05 s
sum_for_loop: 273805913805 size: 2147483648
mean value of data bytes: 127.500814
using accumulate()
1.317576 s
sum_for_loop: 273805913805.000000 size: 2147483648
mean value of data bytes: 127.500814
Sample output from 2GB file (g++):
using for loop
2.41e-05 s
sum_for_loop: 273805913805 size: 2147483648
mean value of data bytes: 127.500814
using accumulate()
5.269024 s
sum_for_loop: 273805913805.000000 size: 2147483648
mean value of data bytes: 127.500814
Sample output from 8GB file (clang++):
using for loop
1.853e-05 s
sum_for_loop: 1095220441576 size: 8589934592
mean value of data bytes: 127.500440
using accumulate()
5.247585 s
sum_for_loop: 1095220441576.000000 size: 8589934592
mean value of data bytes: 127.500440
Sample output from 8GB file (g++):
using for loop
7.5e-07 s
sum_for_loop: 1095220441576.000000 size: 8589934592
mean value of data bytes: 127.500440
using accumulate()
21.484348 s
sum_for_loop: 1095220441576.000000 size: 8589934592
mean value of data bytes: 127.500440
There are numerous issues with the code. The first - and the one that is causing your display problem - is that sum_for_loop should be a double, not an unsigned long. The sum is overflowing what can be stored in an unsigned long, resulting in your incorrect result when that happens.
The timers should be started after the cout, otherwise you're including the output time in the compute time. In addition, the "for loop" elapsed time excludes the time taken to construct char_map.
When building char_map, you don't need the if. If an entry is not found in the map it will be zero initialized. A better approach (since you only have 256 unique values) is to use an indexed vector (remembering to cast the char to unsigned char).

Method 'shrink_to_fit' could not be resolved

I am facing a problem with shrink_to_fit() function of STL C++. The problem is which I use it, the compiler gives error "Method 'shrink_to_fit' could not be resolved" on Eclipse Luna (32 bit) with MinGW compiler but the same program works fine in Dev C++.
Image of the program:
Error:
Compiler do not recommend shrink_to_fit() after using dot(.):
Original code:
#include <iostream>
#include <vector>
using namespace std;
int main(void) {
vector<int> v(128);
cout << "Initial capacity = " << v.capacity() << endl;
v.resize(25);
cout << "Capacity after resize = " << v.capacity() << endl;
v.shrink_to_fit();
cout << "Capacity after shrink_to_fit = " << v.capacity() << endl;
return 0;
}
Please let me know is this my fault or IDE's.
P.S. I am using C++14.
It works fine for me (with -std=c++11 flag, and MinGW distro from https://nuwen.net/mingw.html#install) on
Eclipse IDE for C/C++ Developers,
Version: 2019-09 R (4.13.0)
Build id: 20190917-1200
OS: Windows 10, v.10.0, x86_64 / win32
Java version: 13.0.1
as well as on Linux (with -std=c++11 flag and GCC 7.4.0 compiler). It could be a problem with your IDE, compiler (with right flag) or STL implementation. There can't be a 4th reason in my view.
Works fine for me.
Try compile 'by hand' to find out if it is about the ide:
g++ foo.cpp -o foo
./foo

Flatbuffers struct in union not working (C++)

I am trying to get going with Flatbuffers in C++, but I'm already failing to write and read a struct in a union. I have reduced my original problem to an anonymous, minimal example.
Example Schema (favorite.fbs)
// favorite.fbs
struct FavoriteNumbers
{
first: uint8;
second: uint8;
third: uint8;
}
union Favorite
{ FavoriteNumbers }
table Data
{ favorite: Favorite; }
root_type Data;
I compiled the schema using Flatbuffers 1.11.0 downloaded from the release page (I'm on Windows so to be safe I used the precompiled binaries).
flatc --cpp favorite.fbs
This generates the file favorite_generated.h.
Example Code (fav.cpp)
#include <iostream>
#include "favorite_generated.h"
int main(int, char**)
{
using namespace flatbuffers;
FlatBufferBuilder builder;
// prepare favorite numbers and write them to the buffer
FavoriteNumbers inFavNums(17, 42, 7);
auto inFav{builder.CreateStruct(&inFavNums)};
auto inData{CreateData(builder, Favorite_FavoriteNumbers, inFav.Union())};
builder.Finish(inData);
// output original numbers from struct used to write (just to be safe)
std::cout << "favorite numbers written: "
<< +inFavNums.first() << ", "
<< +inFavNums.second() << ", "
<< +inFavNums.third() << std::endl;
// output final buffer size
std::cout << builder.GetSize() << " B written" << std::endl;
// read from the buffer just created
auto outData{GetData(builder.GetBufferPointer())};
auto outFavNums{outData->favorite_as_FavoriteNumbers()};
// output read numbers
std::cout << "favorite numbers read: "
<< +outFavNums->first() << ", "
<< +outFavNums->second() << ", "
<< +outFavNums->third() << std::endl;
return 0;
}
I'm using unary + to force numerical output instead of characters. An answer to another question here on StackOverflow told me I had to use CreateStruct to achieve what I want. I compiled the code using g++ 9.1.0 (by MSYS2).
g++ -std=c++17 -Ilib/flatbuffers/include fav.cpp -o main.exe
This generates the file main.exe.
Output
favorite numbers written: 17, 42, 7
32 B written
favorite numbers read: 189, 253, 34
Obviously this is not the desired outcome. What am I doing wrong?
Remove the & in front of inFavNums and it will work.
CreateStruct is a template function, which sadly in this case it means it will also take pointers without complaining about it. Would be nice to avoid that, but that isn't that easy in C++.

boost::filesystem::space() is reporting wrong diskspace

I have 430 GB free on drive C:. But for this program:
#include <iostream>
#include <boost/filesystem.hpp>
int main()
{
boost::filesystem::path p("C:");
std::size_t freeSpace = boost::filesystem::space(p).free;
std::cout<<freeSpace << " Bytes" <<std::endl;
std::cout<<freeSpace / (1 << 20) << " MB"<<std::endl;
std::size_t availableSpace = boost::filesystem::space(p).available;
std::cout << availableSpace << " Bytes" <<std::endl;
std::cout << availableSpace / (1 << 20) << " MB"<<std::endl;
std::size_t totalSpace = boost::filesystem::space(p).capacity;
std::cout << totalSpace << " Bytes" <<std::endl;
std::cout << totalSpace / (1 << 20) << " MB"<<std::endl;
return 0;
}
The output is:
2542768128 Bytes
2424 MB
2542768128 Bytes
2424 MB
2830102528 Bytes
2698 MB
I need to know how much diskspace is available because my application has to download a huge file, and I need to know whether it's viable to download it.
I'm using mingw on Windows:
g++ (i686-posix-dwarf-rev2, Built by MinGW-W64 project) 7.1.0
I also tried using MXE to cross compile from Linux:
i686-w64-mingw32.static-g++ (GCC) 5.5.0
Both are returning the same numbers.
std::size_t is not guaranteed to be the biggest standard unsigned type. Actually, it rarely is.
And boost::filesystem defines space_info thus:
struct space_info // returned by space function
{
uintmax_t capacity;
uintmax_t free;
uintmax_t available; // free space available to a non-privileged process
};
You would have easily avoided the error by using auto, which would be natural as the exact type is not of any importance. Nearly always only mismatch hurts, thus Almost Always auto.
Use a type that boost::filesystem::space(p).free requires. it may require a 64 bit integer type:
uintmax_t freeSpace = boost::filesystem::space(p).free;
use of auto is also good.

Linux getrusage() maxrss maximum resident set size not increasing with allocation (C++)

I am trying to use getrusage(.) and maximum resident set size (maxrss) to check for memory leaks. However, when i purposely try to create a leak, maxrss does not change. Maybe i am not understanding maxrss deeply enough. Here is the code:
#include <iostream>
#include <sys/time.h>
#include <sys/resource.h>
using namespace std;
int main() {
struct rusage r_usage;
getrusage(RUSAGE_SELF, &r_usage);
cout << r_usage.ru_maxrss << "kb\n";
cout << "Allocating...\n";
int a = 100000; // have tried range of numbers
int* memleaktest = new int[a]; // class member
if(!memleaktest)
cout << "Allocation failed";
getrusage(RUSAGE_SELF, &r_usage);
cout << "after allocation " << r_usage.ru_maxrss << "kb\n";
return 0;
}
I get the exact same value after allocatoin (~15000kb).
On Ubuntu x86.
Allocated memory isn't actually mapped until you access it.
If you initialize the array with values, Linux is forced to actually allocate and map new pages:
#include <iostream>
#include <sys/time.h>
#include <sys/resource.h>
using namespace std;
int main() {
struct rusage r_usage;
getrusage(RUSAGE_SELF, &r_usage);
cout << r_usage.ru_maxrss << "kb\n";
cout << "Allocating...\n";
int a = 1000000; // Sufficiently large
int* memleaktest = new int[a](); // Initialized to zero
if(!memleaktest)
cout << "Allocation failed";
getrusage(RUSAGE_SELF, &r_usage);
cout << "after allocation " << r_usage.ru_maxrss << "kb\n";
return 0;
}
On my system, this results in:
4900kb
Allocating...
after allocation 6844kb
Note that compiler optimizations may decide that the array is unused or should be allocated up front, so prefer compiling without them or rewriting the test case in such a way that it can't be optimized.
Due to performance issues the Operating System (OS) allocates resources in chuncks, not each app-request is a new resource. So, when a block of memory is released in the app, the OS may still reserve the chunck where it belongs to.
Why? Consider an app requesting 1G of different 1 byte memory blocks. The OS must track all of them, which means that the total amount of memory is 1G plus 2G*sizeof(pair) needed to store the pairs {begin, size} to identify each memory block.
If you want to detect memory leaks use the good old Valgrind tool.