How to implement google test sharding in c++? - c++

I want to parallelise my googletest cases in c++.
I have read the documentation of google test sharding but unable to implement it in c++ coding environment.
As I'm new to the coding field , so can anyone please by a code explain to me the documentation in the link below
https://github.com/google/googletest/blob/master/googletest/docs/advanced.md
Google Sharding works on different machines or can be implemented on same using multiple threads?

Sharding isn't done in code, it's done using the environment. Your machine specifies two environment variables GTEST_TOTAL_SHARDS, which is the total number of machines you are running and GTEST_SHARD_INDEX, which is unique to each machine. When GTEST starts up, it selects a subset of these tests.
If you want to simulate this, then you need to set these environment variables (which can be done in code).
I would probably try something like this (on Windows) in a .bat file:
set GTEST_TOTAL_SHARDS=10
FOR /L %%I in (1,1,10) DO cmd.exe /c "set GTEST_SHARD_INDEX=%%I && start mytest.exe"
And hope that the new cmd instance had it's own environment.

Running the following in a command window worked for me (very similar to James Poag's answer, but note change of range from "1,1,10" to "0,1,9", "%%" -> "%" and "set" to "set /A"):
set GTEST_TOTAL_SHARDS=10
FOR /L %I in (0,1,9) DO cmd.exe /c "set /A GTEST_SHARD_INDEX=%I && start mytests.exe"

After further experimentation it is also possible to do this in C++. But it is not straightforward and I did not find a portable way of doing it. I can't post the code as it was done at work.
Essentially, from main, create new processes (where n is the number of cores available), capture the results from each shard, merge and output to the screen.
To get each process running a different shard, the total number of shard and instance number is given to the child process by the controller.
This is done by retrieving and copying the current environment, and setting in the copy the two environment variables (GTEST_TOTAL_SHARDS and GTEST_SHARD_INDEX) as required. GTEST_TOTAL_SHARDS is always the same, but GTEST_SHARD_INDEX will be the instance number of the child.
Merging the results is tedious but straightforward string manipulation. I successfully managed to get a correct total at the end, adding up the results of all the separate shards.
I was using Windows, so used CreateProcessA to create the new processes, passing in the custom environment.
It turned out that creating new processes takes a significant amount of time, but my program was taking about 3 minutes to run, so there was good benefits to be had from parallel running - the time came down to about 30 seconds on my 12 core PC.
Note that if this all seems overkill, there is a python program which does what I have described here but using a python script (I think - I haven't used it). This might be more straight forward.

Related

Process spawned by Windows service runs 3 to 4 times more slowly than spawned by GUI

I have written a service application in Borland C++. It works fine. In the ServiceStart(TService *Sender,bool &Started) routine, I call mjwinrun to launch a process which picks up and processes macros. This process has no UI and any errors are logged to a file. It continues to run, until the server is restarted, shut down, or the process is terminated using Task Manager. Here is mjwinrun :-
int mjwinrun(AnsiString cmd)
{
STARTUPINFO mjstupinf; PROCESS_INFORMATION mjprcinf;
memset(&mjstupinf,0,sizeof(STARTUPINFO)); mjstupinf.cb=sizeof(STARTUPINFO);
if (!CreateProcess(NULL,cmd.c_str(),NULL,NULL,TRUE,0,NULL,GetCurrentDir().c_str(),&mjstupinf,&mjprcinf))
{
LogMessage("Could not launch "+cmd); return -1;
}
CloseHandle(mjprcinf.hThread); CloseHandle(mjprcinf.hProcess);
return mjprcinf.dwProcessId;
}
cmd is the command line for launching the macro queue processor. I used a macro that is CPU/Memory intensive and got it to write its timings to a file. Here is what I found :-
1) If the macro processor is launched from the command line within a logged on session, no matter what Windows core it is running under, the macro is completed in 6 seconds.
2) If the macro processor is launched from a service starting up on Vista core or earlier (using mjwinrun above), the macro is completed in 6 seconds.
3) If the macro processor is launched from a service starting up on Windows 7 core or later (using mjwinrun above), the macro is completed in more than 18 seconds.
I have tried all the different flags for CreateProcess and none of them make a difference. I have tried all different accounts for the service and that makes no difference. I tried setting all of the various priorities for tasks, I/O and Page, but they all make no difference. It's as if the service's spawned processes are somehow throttled, not in I/O terms, but in CPU/memory usage terms. Any ideas what changed in Windows 7 onwards?
I isolated code to reproduce this, and it eventually boiled down to calls to the database engine to lookup a field definition (TTable methods FindField and FieldByName). These took much longer on a table with a lot of fields when run on a service app instead of a GUI app. I devised my own method to store mappings from field names to field definitions, since I always opened my databases with a central routine. I used an array of strings indexed by the Tag property on each table (common to all BCB objects), where each string was composed of ;fieldname;fieldnumber; pairs, and then did a .Pos of the field name to get the field number. fieldnumber is zero-padded to a width of 4. This only uses a few hundred KB of RAM for the entire app and all of its databases. Once in place, the service app runs at the same speed as the GUI app. The only thing I can think of that may explain this, is that service apps have a fixed heap (I think I read 48MBytes somewhere by default) for themselves and any process they spawn. With lots of fields, the memory overflowed and had to thrash to VM on the disk. The GUI app had no such limit and was able to do the lookup entirely in real memory. However, I maybe completely wrong. One thing I have learnt is that FieldByName and FindField are expensive TTable functions to call, and I have now supplanted them all with my own mechanism which seems to work much better and much faster. Here is my lookup routine :-
AnsiString fldsbytag[MXSPRTBLS+100];
TField *fldfromtag(TAdsTable *tbl,AnsiString fld)
{
int fi=fldsbytag[tbl->Tag].Pos(";"+fld.UpperCase()+";"),gi;
if (fi==0) return tbl->FindField(fld);
gi=StrToIntDef(fldsbytag[tbl->Tag].SubString(fi+fld.Length()+2,4),-1);
if (gi<0 || gi>=tbl->Fields->Count) return tbl->FindField(fld);
return tbl->Fields->Fields[gi];
}
It will be very difficult to give an authoritative answer to this question without a lot more details.
However a factor to consider is the Windows foreground priority boost described here.
You may want to read Russinovich's book chapter on processes/threads, in particular the stuff on scheduling. You can find PDFs of the book online (there are two that together make up the whole book). I believe the latest (or next to latest) edition covers changes in Win 7.

Creating a C++ daemon and keeping the environment

I am trying to create a c++ daemon that runs on a Red Hat 6.3 platform and am having trouble understanding the differences between the libc daemon() call, the daemon shell command, startproc, start-stop-daemon and about half a dozen other methods that google suggests for creating daemons.
I have seen suggestions that two forks are needed, but calling daemon only does one. Why is the second fork needed?
If I write the init.d script to call bash daemon, does the c code still need to call daemon?
I implemented my application to call the c daemon() function since it seems the simplest solution, but I am running into the problem of my environment variables seem to get discarded. How do I prevent this?
I also need to run the daemon as a particular user, not as root.
What is the simplest way to create a C++ daemon that keeps its environment variables, runs as a specific user, and is started on system boot?
Why is the second fork needed?
Answered in What is the reason for performing a double fork when creating a daemon?
bash daemon shell command
My bash 4.2 does not have a builtin command named daemon. Are you sure yours is from bash? What version, what distribution?
environment variables seem to get discarded.
I can see no indication to that effect in the documentation. Are you sure it is due to daemon? Have you checked whether they are present before, and missing after that call?
run the daemon as a particular user
Read about setresuid and related functions.
What is the simplest way to create a C++ daemon that keeps its environment variables, runs as a specific user, and is started on system boot?
Depends. If you want to keep your code simple, forget about all of this and let the init script do this via e.g. start-stop-daemon. If you want to handle this in your app, daemon combined with retresuid should be a good approach, although not the only one.

Listen For Process Start and End

I'm new to Windows API programming. I am aware that there are ways to check if a process is already running (via enumeration). However, I was wondering if there was a way to listen for when a process starts and ends (for example, notepad.exe) and then perform some action when the starting or ending of that process has been detected. I assume that one could run a continuous enumeration and check loop for every marginal unit of time, but I was wondering if there was a cleaner solution.
Use WMI, Win32_ProcessStartTrace and Win32_ProcessStopTrace classes. Sample C# code is here.
You'll need to write the equivalent C++ code. Which, erm, isn't quite that compact. It's mostly boilerplate, the survival guide is available here.
If you can run code in kernel, check Detecting Windows NT/2K process execution.
Hans Passant has probably given you the best answer, but... It is slow and fairly heavy-weight to write in C or C++.
On versions of Windows less than or equal to Vista, you can get 95ish% coverage with a Windows WH_CBT hook, which can be set with SetWindowsHookEx.
There are a few problems:
This misses some service starts/stops which you can mitigate by keeping a list of running procs and occasionally scanning the list for changes. You do not have to keep procs in this list that have explorer.exe as a parent/grandparent process. Christian Steiber's proc handle idea is good for managing the removal of procs from the table.
It misses things executed directly by the kernel. This can be mitigated the same way as #1.
There are misbehaved apps that do not follow the hook system rules which can cause your app to miss notifications. Again, this can be mitigated by keeping a process table.
The positives are it is pretty lightweight and easy to write.
For Windows 7 and up, look at SetWinEventHook. I have not written the code to cover Win7 so I have no comments.
Process handles are actually objects that you can "Wait" for, with something like "WaitForMultipleObjects".
While it doesn't send a notification of some sort, you can do this as part of your event loop by using the MsgWaitForMultipleObjects() version of the call to combine it with your message processing.
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion
\Image File Execution Options
You can place a registry key here with your process name then add a REG_SZ named 'Debugger' and your listner application name to relay the process start notification.
Unfortunately there is no such zero-overhead aproach to recieving process exit that i know of.

Auto Scheduled SAS Queries

I run some SAS queries monthly and they all take a fairly long time to run. I was wondering if there was any way I can schedule these to run on a certain date each month in a certain order?
Thanks for your help!
On unix you could set the programs up to run in batch mode with a cron job.
One trick you could use would be to set up a master SAS program to run everything.
Make one program that just contains any global variables that need to be changed each month and then call your monthly programs with includes.
something like:
%let globalvar1 = ThisMonth;
%let globalvar2 = LastMonth;
%include '/path/to/sas/program1';
%include '/path/to/sas/program2';
Then you run only this one program in batch...it will run them in the correct order and automatically wait for them to finish executing before moving to the next program (setting up separate cron jobs would require you to overestimate how long each one takes so they wouldn't conflict).
This will dump everything into one log file...which may be good or bad.
Another option would be to use X to call the program from the OS at each run.
I am not 100% on the syntax but this should work if you use the right syntax for your OS (this could work on unix or windows so you would only have to schedule one program).
At the end of each program just add:
X "Path/to/sas.exe" -batch -noterminal nextProgram.sas
This will let you chain the programs together so that they start the next one running after they finish. Then you just use task scheduler/cron to start "sas.exe -batch -noterminal firstProgram.sas"
Depending on what system you are working with, methods may be different.
The main idea is that you may store all queries to a SAS processing file then use system's scheduler (For example, task scheduler for Windows), to run monthly.
A quick help (for Windows):
http://analytics.ncsu.edu/sesug/2006/CC04_06.PDF

Running out of file descriptors for mmaped files despite high limits in multithreaded web-app

I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?
So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.
your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)