clock_gettime fails in chrooted Debian etch with CLOCK_PROCESS_CPUTIME_ID - c++

I have setup a chrooted Debian Etch (32bit) under Ubuntu 12.04 (64bit), and it appears that clock_gettime() works with CLOCK_MONOTONIC, but fails with both CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID. The errno is set to EINVAL, which according to the man page means that "The clk_id specified is not supported on this system."
All three clocks work fine outside the chrooted Debian and in 64bit chrooted Debian etch.
Can someone explains to me why this is the case and how to fix it?
Much appreciated.

I don't know the cause yet, but I have ideas that won't fit in the comment box.
First, you can make the test program simpler by compiling it as C instead of C++ and not linking it to libpthread. -lrt should be good enough to get clock_gettime. Also, compiling it with -static could make tracing easier since the dynamic linker startup stuff won't be there.
Static linking might even change the behavior of clock_gettime. It's worth trying just to find out whether it works around the bug.
Another thing I'd like to see is the output of this vdso-bypassing test program:
#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
int main(void)
{
struct timespec ts;
if(syscall(SYS_clock_gettime, CLOCK_PROCESS_CPUTIME_ID, &ts)) {
perror("clock_gettime");
return 1;
}
printf("CLOCK_PROCESS_CPUTIME_ID: %lu.%09ld\n",
(unsigned long)ts.tv_sec, ts.tv_nsec);
return 0;
}
with and without -static, and if it fails, add strace.
Update (actually, skip this. go to the second update)
A couple more simple test ideas:
compile and run a 32-bit test program in the Ubuntu host system, by adding -m32 to the gcc command. It's possible that the kernel's 32-bit compatibility mode is causing the error. If that's the case, then the 32-bit version will fail no matter which libc it gets linked to.
take the non-static test programs you compiled under Debian, copy them to the Ubuntu host system and try to run them there. Change in behavior will point to libc as the cause.
Then it's time for the hard stuff. Looking at disassembled code and maybe single-stepping it in gdb. Instead of having you do that on your own, I'd like to get a copy of the code you're running. Upload a a static-compiled failing test program somewhere I can get it. Also a copy of the 32-bit vdso provided by your kernel might be interesting. To extract the vdso, run the following program (compiled in the 32-bit chroot) which will create a file called vdso.dump, and upload that too.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int getvseg(const char *which, const char *outfn)
{
FILE *maps, *outfile;
char buf[1024];
void *start, *end;
size_t sz;
void *copy;
int ret;
char search[strlen(which)+4];
maps = fopen("/proc/self/maps", "r");
if(!maps) {
perror("/proc/self/maps");
return 1;
}
outfile = fopen(outfn, "w");
if(!outfile) {
perror(outfn);
fclose(maps);
return 1;
}
sprintf(search, "[%s]\n", which);
while(fgets(buf, sizeof buf, maps)) {
if(strlen(buf)<strlen(search) ||
strcmp(buf+strlen(buf)-strlen(search),search))
continue;
if(sscanf(buf, "%p-%p", &start, &end)!=2) {
fprintf(stderr, "weird line in /proc/self/maps: %s", buf);
continue;
}
sz = (char *)end - (char *)start;
/* copy because I got an EFAULT trying to write directly from vsyscall */
copy = malloc(sz);
if(!copy) {
perror("malloc");
goto fail;
}
memcpy(copy, start, sz);
if(fwrite(copy, 1, sz, outfile)!=sz) {
if(ferror(outfile))
perror(outfn);
else
fprintf(stderr, "%s: short write", outfn);
free(copy);
goto fail;
}
free(copy);
goto success;
}
fprintf(stderr, "%s not found\n", which);
fail:
ret = 1;
goto out;
success:
ret = 0;
out:
fclose(maps);
fclose(outfile);
return ret;
}
int main(void)
{
int ret = 1;
if(!getvseg("vdso", "vdso.dump")) {
printf("vdso dumped to vdso.dump\n");
ret = 0;
}
if(!getvseg("vsyscall", "vsyscall.dump")) {
printf("vsyscall dumped to vsyscall.dump\n");
ret = 0;
}
return ret;
}
Update 2
I reproduced this by downloading an etch libc. It's definitely caused be glibc stupidity. Instead of a simple syscall wrapper for clock_gettime it has a big wad of preprocessor spaghetti culminating in "you can't use clockid's that we didn't pre-approve". You're not going to get it to work with that old glibc. Which brings us to the question I didn't want to ask: why are you trying to use an obsolete version of Debian anyway?

Related

ecCodes (grib reading library) does not free the memory

I am using ecCodes library in my project, and I have encountered an issue that memory is not freed between reading the files.
The minimal example representing the problem is this (and is basically a combination of those two library API usage examples [1](https://confluence.ecmwf.int/display/ECC/grib_get_keys) [2]:
#include <string>
#include <vector>
#include <iostream>
#include "eccodes.h"
int main() {
std::string filenames[] = {"../data/era5_model.grib", "../data/era5_model2.grib", "../data/era5_model3.grib",
"../data/era5_model4.grib"};
std::vector<long> vec = {};
for (auto & filename : filenames) {
FILE* f = fopen(filename.c_str(), "r");
int err = 0;
codes_handle* h;
while ((h = codes_handle_new_from_file(nullptr, f, PRODUCT_GRIB, &err)) != nullptr) {
long k1 = 0;
err = codes_get_long(h, "level", &k1);
vec.push_back(k1);
}
codes_handle_delete(h);
fclose(f);
}
std::cout << vec[52];
return 0;
}
In the example the program reads 4 identical ERA5 files, each of size 1.5GB. Before opening new file previous one is closed with codes_handle_delete() and fclose().
Therefore, the expected behaviour would be for the memory usage to stay at about 1.5GB. However, in reality the memory usage steadily increases to about 6.5GB and is freed when program closes (see screenshot below).
This particular example has been run on CLion with CMake (Release configuration), but the issue occurs with every other configuration and also in my other Rust project which calls ecCodes with FFI.
The library seems well tested and supported so it seems unlikely that it is a library bug. Therefore, is that an expected behaviour or is my code wrong? If the latter, how can I correct it?
I am using Ubuntu 21.04 and ecCodes 2.20.0 installed with apt
So I contacted the library authors and realized that I have not read this example carefully enough.
For the ecCodes to correctly free the memory codes_handle should be deleted every time it is created (analogically to how you should free the memory every time you alloc it). Therefore in my example codes_handle_delete() should be INSIDE the while loop:
while ((h = codes_handle_new_from_file(nullptr, f, PRODUCT_GRIB, &err)) != nullptr) {
long k1 = 0;
err = codes_get_long(h, "level", &k1);
vec.push_back(k1);
codes_handle_delete(h);
}
After that change memory usage is almost unnoticeable.

C++ getpid() vs syscall(39)?

I read that syscall(39) returns the current process id (pid)
Then why these 2 programs output 2 different numbers?
int main() {
long r = syscall(39);
printf("returned %ld\n", r);
return 0;
}
and:
int main() {
long r = getpid();
printf("returned %ld\n", r);
return 0;
}
I am running my program in clion, and when I change the first line I get different result which is really strange.
Running the code in the answers I got (in macos):
returned getpid()=9390 vs. syscall(39)=8340
which is really strange.
In ubuntu I got same pid for both, why is that?
Making system calls by their number is not going to be portable.
Indeed, we see that 39 is getpid on Linux, but getppid ("get parent pid") on macOS.
getpid on macOS is 20.
So that's why you see a different result between getpid() and syscall(39) on macOS.
Note that macOS, being a BSD kernel derivative, is not related to Linux in any way. It can't possibly be, since it's closed-source.
There's one key detail that's missing here -- every time you run the program, your OS assigns it a new PID. Calling the same program twice in a row will likely return different PIDs - so what you're describing isn't a good way to test the difference between getpid() and syscall(39).
Here's a better program to compare the two that calls both functions in the same program.
#include <sys/syscall.h>
#include <stdio.h>
int main() {
long pid1 = getpid();
long pid2 = syscall(39);
printf("returned getpid()=%ld vs. syscall(39)=%ld\n", pid1, pid2);
return 0;
}

Weird OpenCL calls side effect on C++ for loop performance

I'm working on a C++ project using OpenCL. I'm using the CPU as an OpenCL device with the intel OpenCL runtime
I noticed a weird side effect in calling OpenCL functions. Here is a simple test:
#include <iostream>
#include <cstdio>
#include <vector>
#include <CL/cl.hpp>
int main(int argc, char* argv[])
{
/*
cl_int status;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
std::vector<cl::Device> devices;
platforms[1].getDevices(CL_DEVICE_TYPE_CPU, &devices);
cl::Context context(devices);
cl::CommandQueue queue = cl::CommandQueue(context, devices[0]);
status = queue.finish();
printf("Status: %d\n", status);
*/
int ch;
int b = 0;
int sum = 0;
FILE* f1;
f1 = fopen(argv[1], "r");
while((ch = fgetc(f1)) != EOF)
{
sum += ch;
b++;
if(b % 1000000 == 0)
printf("Char %d read\n", b);
}
printf("Sum: %d\n", sum);
}
It's a simple loop that reads a file char by char and adds them so the compiler doesn't try to optimize it out.
My system is a Core i7-4770K, 2TB HDD 16GB DDR3 running Ubuntu 14.10. The program above, with a 100MB file as input, takes around 770ms. This is consistent with my HDD speed. So far so good.
If you now invert the comments and run only the OpenCL calls region, it takes around 200ms. Again, so far, so good.
Buf if you uncomment all, the program takes more than 2000ms. I would expect 770ms + 200ms, but it is 2000ms. You can even notice an increased delay between the output messages in the for loop. The two regions (OpenCL calls and reading chars) are supposed to be independent.
I don't understand why using OpenCL interferes with a simple C++ for loop performance. It's not a simple OpenCL initialization delay.
I'm compiling this example with:
g++ weird.cpp -O2 -lOpenCL -o weird
I also tried using Clang++, but it happens the same.
This was an interesting one. It's because getc is made threadsafe version at the point when the queue is instantiated and so the time increase is the grab-release cycle of the locks - I'm not sure why/how this occurs but that is the decisive point on the AMD OpenCL SDK with intel CPUs. I was quite amazed I had essentially the same times as OP.
https://software.intel.com/en-us/forums/topic/337984
You can try a remedy for this specific problem by just changing getc to getc_unlocked.
It brought it back down to 930 ms for me - that time increase over 750ms is mainly spent in platform and context creation lines.
I believe that the effect is caused by the OpenCL objects still being in scope, and therefore not being deleted before the for loop. They may be affecting the other computation because of considerations needed. For example, running the example as you gave it yields the following times on my system (g++ 4.2.1 with O2 on Mac OSX):
CL: 0.012s
Loop: 14.447s
Both: 14.874s
But putting the OpenCL code into its own anonymous scope, therefore automatically calling the destructors before the loops seems to get rid of the problem. Using the code:
#include <iostream>
#include <cstdio>
#include <vector>
#include "cl.hpp"
int main(int argc, char* argv[])
{
{
cl_int status;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
std::vector<cl::Device> devices;
platforms[1].getDevices(CL_DEVICE_TYPE_CPU, &devices);
cl::Context context(devices);
cl::CommandQueue queue = cl::CommandQueue(context, devices[0]);
status = queue.finish();
printf("Status: %d\n", status);
}
int ch;
int b = 0;
int sum = 0;
FILE* f1;
f1 = fopen(argv[1], "r");
while((ch = fgetc(f1)) != EOF)
{
sum += ch;
b++;
if(b % 1000000 == 0)
printf("Char %d read\n", b);
}
printf("Sum: %d\n", sum);
}
I get the timings:
CL: 0.012s
Loop: 14.635s
Both: 14.648s
Which seems to add linearly. The effect is pretty small compared to other effects on the system, such as CPU load from other processes, but it seems to be gone when adding the anonymous scope. I'll do some profiling and add it as an edit if it produces anything of interest.

setenv, unsetenv, putenv

I am working on a custom shell for a systems programming class. We were instructed to implement the built-in setenv() and unsetenv() commands with a hint of check man pages for putenv().
My issue is that setenv(char*, char*, int) and putenv(char*) do not seem to be working at all. My code for executing a command entered is as follows:
//... skipping past stuff for IO redirection
pid = fork();
if(pid == 0){
//child
if(!strcmp(_simpleCommands[0]->_arguments[0],"printenv")){
//check if command is "printenv"
extern char **environ;
int i;
for(i = 0; environ[i] != NULL; i++){
printf("%s\n",environ[i]);
}
exit(0);
}
if(!strcmp(_simpleCommands[0]->_arguments[0],"setenv")){
//if command is "setenv" get parameters char* A, char* B
char * p = _simpleCommands[0]->_arguments[1];
char * s = _simpleCommands[0]->_arguments[2];
//putenv(char* s) needs to be formatted A=B; A is variable B is value
char param[strlen(p) + strlen(s) + 1];
strcat(param,p);
strcat(param,"=");
strcat(param,s);
putenv(param);
//setenv(p,s,1);
exit(0);
}
if(!strcmp(_simpleCommands[0]->_arguments[0],"unsetenv")){
//remove environment variable
unsetenv(_simpleCommands[0]->_arguments[0]);
exit(0);
}
//execute command
execvp(_simpleCommands[0]->_arguments[0],_simpleCommands->_arguments);
perror("-myshell");
_exit(1);
}
//omitting restore IO defaults...
If I run printenv it works properly, but if I try to set a new variable using either putenv() or setenv() my printenv() command returns the exact same thing, so it does not appear to be working.
As a side note, the problem may not be with the functions or how I called them, because my shell is executing the commands as though it had to format a wildcard (* or ?) which I am not sure should happen.
You appear to be calling fork unconditionally before examining the command line. But some shell built-in commands need to run in the parent process, so that their effect persists. All the built-ins that manipulate the environment fall in this category.
As an aside, I wouldn't try to use the C library's environment manipulation functions if I were writing a shell. I'd use three-argument main, copy envp into a data structure under my full control, and then feed that back into execve. This is partially because I'm a control freak, and partially because it's nigh-impossible to do anything complicated with setenv and/or putenv and not have a memory leak. See this older SO question for gory details.
What make you think it is not working? I wrote a simple test case below...and it worked as expected.
Making sure you setevn and prientevn are called in the same process.
#include <stdlib.h>
#include <assert.h>
int main()
{
char * s= "stack=overflow";
int ret = putenv(s);
assert(ret == 0);
//printout all the env
extern char **environ;
int i;
for(i = 0; environ[i] != NULL; i++){
printf("%s\n",environ[i]);
}
return 0;
}
pierr#ubuntu:~/workspace/so/c/env$ ./test | grep stack
stack=overflow

Are there compiler flags to get malloc to return pointers above the 4G limit for 64bit testing (various platforms)?

I need to test code ported from 32bit to 64bit where pointers are cast around as integer handles, and I have to make sure that the correct sized types are used on 64 bit platforms.
Are there any flags for various compilers, or even flags at runtime which will ensure that malloc returns pointer values greater than the 32bit limit?
Platforms I'm interested in:
Visual Studio 2008 on Windows XP 64, and other 64 bit windows
AIX using xLC
64bit gcc
64bit HP/UX using aCC
Sample Application that allocates 4GB
So thanks to R Samuel Klatchko's answer, I was able to implement a simple test app that will attempt to allocate pages in the first 4GB of address space. Hopefully this is useful to others, and other SO users can give me an idea how portable/effective it is.
#include <stdlib.h>
#include <stdio.h>
#define UINT_32_MAX 0xFFFFFFFF
#ifdef WIN32
typedef unsigned __int64 Tuint64;
#include <windows.h>
#else
typedef unsigned long long Tuint64;
#include <sys/mman.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#endif
static void* Allocate(void* pSuggested, unsigned int PageSize)
{
#ifdef WIN32
void* pAllocated = ::VirtualAlloc(pSuggested, PageSize, MEM_RESERVE ,PAGE_NOACCESS);
if (pAllocated)
{
return pAllocated;
}
return (void*)-1;
#else
void* pAllocated = ::mmap(pSuggested,
PageSize,
PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,
-1,
0);
if (pAllocated == MAP_FAILED)
{
pAllocated = (void*)-1;
}
return pAllocated;
#endif
}
static void Deallocate(void* pRegion, unsigned int PageSize)
{
#ifdef WIN32
::VirtualFree(pRegion,0,MEM_RELEASE);
#else
::munmap(pRegion,PageSize);
#endif
}
static void Gobble32bitAddressSpace()
{
#ifdef WIN32
SYSTEM_INFO SysInfo;
::GetSystemInfo(&SysInfo);
unsigned int PageSize = SysInfo.dwAllocationGranularity;
#else
unsigned int PageSize = ::sysconf(_SC_PAGE_SIZE);
#endif
unsigned int AllocatedPages = 0;
unsigned int SkippedPages = 0;
void *pStart = 0;
while( ((Tuint64)pStart) < UINT_32_MAX)
{
void* pAllocated = Allocate(pStart, PageSize);
if (pAllocated != (void*)-1)
{
if (pAllocated == pStart)
{
//Allocated at expected location
AllocatedPages++;
}
else
{
//Allocated at a different location
//unallocate and consider this page unreserved
SkippedPages++;
Deallocate(pAllocated,PageSize);
}
}
else
{
//could not allocate at all
SkippedPages++;
}
pStart = (char*)pStart + PageSize;
}
printf("PageSize : %u\n",PageSize);
printf("Allocated Pages : %u (%u bytes)\n",AllocatedPages,PageSize*AllocatedPages);
printf("Skipped Pages : %u (%u bytes)\n",SkippedPages,SkippedPages*PageSize);
}
int main()
{
Gobble32bitAddressSpace();
//Try to call malloc now and see if we get an
//address above 4GB
void* pFirstMalloc = ::malloc(1024);
if (((Tuint64)pFirstMalloc) >= UINT_32_MAX)
{
printf("OK\n");
}
else
{
printf("FAIL\n");
}
return 0;
}
One technique I have used in the past is to allocate enough memory at startup that all the address space below the 4GB limit is used up. While this technique does rely on malloc first using the lower parts of the address space, this was true on all the platforms I work on (Linux, Solaris and Windows).
Because of how Linux uses overcommit, if you don't touch the allocated space below the 4GB limit, you won't use up any virtual memory.
On Windows, you can use VirtualAlloc() with the MEM_RESERVE flag to use up address space without allocating any actual storage.
Not a compiler switch, but a boot-time switch for Windows can do what you want. There is a command called "nolomem" which forces everything to be loaded in address space > 4GB.
If you are using XP, you should be able to use /nolomem in boot.ini . See documentation on OSR.
For Vista/Win7 you can use NOLOMEM option. Documentation is here. This can be done as such:
bcdedit /set {current} NOLOMEM
Not that you asked specifically, but for others that might be curious, gcc on Mac OS X seems to allocate from the area above 4GB for 64-bit programs by default.
Here's a C program to verify this on whatever compiler/OS combination you might have:
#include <stdlib.h>
#include <stdio.h>
int main() {
void *p = malloc(1000);
printf("%p\n", p);
return 0;
}
You would do well to rewrite your code so that the intptr_t type were used, since that is intended exactly to render such practices safer. Unfortunately it is defined in the c99 header, and VC++ does not support C99. That would not however stop you from creating such a header for that platform.
You might also add asserts where such casts occur e.g.
assert( sizeof(integer_param) == sizeof(void*) ) ;
or you could cast the value back to the original pointer type, and then compare:
assert( (mytype*)integer_param == original_pointer ) ;