How can I debugging Ray framework? - ray

thanks for reply. I edit my question.
I want to distinguish whether the rpc_address is host or remote node, so I add some information in evet_stats.cc:
const auto &rpc_address = CoreWorkerProcess::GetCoreWorker().GetRpcAddress();
RAY_LOG(INFO) << rpc_address.SerializeAsString() << "\n\n";
then add core_worker_lib in the ray_common's dependency in BUILD.bzael, but I find the
ERROR: /home/Ray/ray/BUILD.bazel:2028:11: in cc_library rule //:gcs_client_lib: cycle in dependency graph:
//:ray_pkg
//:cp_raylet
//:raylet
//:raylet_lib
.-> //:gcs_client_lib
| //:gcs_service_rpc
| //:pubsub_lib
| //:pubsub_rpc
| //:grpc_common_lib
| //:ray_common
| //:core_worker_lib
`-- //:gcs_client_lib
so, MY question is:
how can I use CoreWorkerProcess in event_stat.cc ?
when I change the code of event_stat.cc, I use " pip3 install -e . --verbose" to recompile project, but it is too slow. Is there a another way to fast recompile when altering the cpp code of ray ?

Related

What is the best way to organize headers in C++?

I'm looking for the most portable and most organized way to include headers in C++. I'm making a game, and right now, my project structure looks like this:
game
| util
| | foo.cpp
| | foo.h
| ...
game-client
| main.cpp
| graphics
| | gfx.cpp
| | gfx.h
| ...
game-server
| main.cpp
| ...
Say I want to include foo.h from gfx.cpp. As far as I know, there are 3 ways to do this:
#include "../../game/util/foo.h. This is what I'm currently doing, but it gets messier the deeper into the folder structure I go.
#include "foo.h". My editor (Xcode) compiles fine with just this, but I'm not sure about other compilers.
#include "game/util/foo.h", and adding the base directory to the include path.
Which one is the best? (most portable, most organized, scales the best with many folders, etc.)
I found the below approach most useful when you are dealing with a large code base.
Public headers
module_name/include/module_name/public_header.hpp
module_name/include/module_name/my_class.hpp
...
Private headers and source
module_name/src/something_private.cpp
module_name/src/something_private.hpp
module_name/src/my_class.cpp
Notes:
module_name is repeated to ensure that the module name is provided while including a public header from this library/module.
This improves the readability and also avoids extra time spent to
find the location of the header when same name is used in multiple
modules.

So, what exactly is the deal with QSharedMemory on application crash?

When a Qt application that uses QSharedMemory crashes, some memory handles are left stuck in the system.
The "recommended" way to get rid of them is to
if(memory.attach(QSharedMemory::ReadWrite))
memory.detach();
bool created = memory.create(dataSize, QSharedMemory::ReadWrite);
In theory the above code should work like this:
We attach to a left over piece of sh...ared memory, detach from it, it detects that we are the last living user and gracefully goes down.
Except... that is not what happens in a lot of cases. What I actually see happening, a lot, is this:
// fails with memory.error() = SharedMemoryError::NotFound
memory.attach(QSharedMemory::ReadWrite);
// fails with "segment already exists" .. wait, what?! (see above)
bool created = memory.create(dataSize, QSharedMemory::ReadWrite);
The only somewhat working way I've found for me to work around this is to write a pid file on application startup containing the pid of the currently running app.
The next time the same app is run it picks up this file and does
//QProcess::make sure that PID is not reused by another app at the moment
//the output of the command below should be empty
ps -p $previouspid -o comm=
//QProcess::(runs this script, reads output)
ipcs -m -p | grep $user | grep $previouspid | sed "s/ / /g" | cut -f1 -d " "
//QProcess::(passes the result of the previous script to clean up stuff)
ipcrm -m $1
Now, I can see the problems with such approach myself, but it is the only thing that works
The question is: can someone explain to me what exactly is the deal with not so not existing memory in the first piece of code above and how to deal with it properly?

Isolate crash prone (SEGV) but speed critical legacy code into a separate binary

I have a code base (mostly C++) which is well tested and crash free. Mostly. A part of the code -- which is irreplaceable, hard to maintain or improve and links against a binary-only library* -- causes all crashes. These to not happen often, but when they do, the entire program crashes.
+----------------------+
| Shiny new sane |
| code base |
| |
| +-----------------+ | If the legacy code crashes,
| | | | the entire program does, too.
| | Legacy Code | |
| | * Crash prone * | |
| | int abc(data) | |
| +-----------------+ |
| |
+----------------------+
Is it possible to extract that part of the code into a separate program, start that from the main program, move the data between these programs (on Linux, OS X and, if possible, Windows), tolerate crashes in the child process and restart the child? Something like this:
+----------------+ // start,
| Shiny new sane | ------. // re-start on crash
| code base | | // and
| | v // input data
| | +-----------------+
| return | | |
| results <-------- | Legacy Code |
+----------------+ | * Crash prone * |
| int abc(data) |
(or not results +-----------------+
because abc crashed)
Ideally the communication would be fast enough so that the synchronous call to int abc(char *data) can be replaced transparently with a wrapper (assuming the non-crash case). And because of slight memory leaks, the legacy program should be restarted every hour or so. Crashes are deterministic, so bad input data should not be sent twice.
The code base is C++11 and C, notable external libraries are Qt and boost. It runs on Linux, OSX and Windows.
--
*: some of the crashes/leaks stem from this library which has no source code available.
Well, if I were you, I wouldn't start from here ...
However, you are where you are. Yes, you can do it. You are going to have to serialize your input arguments, send them, deserialize them in the child process, run the function, serialize the outputs, return them, and then deserialize them. Boost will have lots of useful code to help with this (see asio).
Global variables will make life much more "interesting". Does the legacy code use Qt? - that probably won't like being split into two processes.
If you were using Windows only, I would say "use DCOM" - it makes this very simple.
Restarting is simple enough if the legacy is only used from one thread (the code which handles "return" just looks to see if it needs to restart, and kills the processes.) If you have multiple threads, then the shiny code will need to check if a restart is required, block any further threads, wait until all calls have returned, restart the process, and then unblock everything.
Boost::interprocess looks to have everything you need for the communication - it's got shared memory, mutexes, and condition variables. Boost::serialization will do the job for marshalling and unmarshalling.

Interposing functions calls from static library to system framework

I have a situation in which I need to interpose function calls made by a third-party static library to an iOS system framework (a shared library). Chances of cost-effective or timely maintenance support from the vendor of the static library are slim to non-existent.
In effect changing the calling sequence from:
+-------------+ +-------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +---------------------+
to:
+-------------+ +-------------+ +------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | Interposer | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +------------+ +---------------------+
The orthodox solutions to this problem is either to persuade the run-time linker to load a different library in place of FrameworkA or loading libraries with dlopen(). Neither of these are options on iOS.
One solution that does work is using sed to rename symbols in the the symbol table of libVendor.a.
Suppose I want to interpose calls FrameworkA_Foo() in FrameworkA made by functions in libVendor.a:
sed s/FrameworkA_Foo/InterposeA_Foo/g < libVendor.a > libInterposedVendor.a
And then interpose it with:
void InterposeA_Foo()
{
// do some stuff here
// ....
// then (maybe) forward
FrameworkA_Foo();
}
This works just so long as the length of the symbol names remains the same.
Whilst this approach works, it lacks elegance and feels fairly hacky. Do better solutions exist?
Other approaches considered:
Changing link order in order to get the interposing function to link to libVendor.a rather than the framework: Apple's linkers (unlike those on most UNIX platforms) recursively resolve symbols in libraries, and the order they are presented on the command line makes little difference)
Linker scripts: Not supported by lld
mach-o object-file editing tools: Found nothing that worked
[For clarity, this is C (rather than Objective-c) API and the toolchain is clang/LLVM. Using an Apple-supplied GCC is not an option due deprecation and lack of C++11 support]

How to set up replication in BerkeleyDB

I've been struggling for some time now on setting up a "simple" BerkeleyDB replication using the db_replicate utility.
However no luck in making it actually work, and I'm not finding any concrete example on how thing should be set up.
Here is the setup I have so far. Environment is a Debian Wheezy with BDB 5.1.29
Database generation
A simple python script reading "CSV" files and inserting each line into the BDB file
from glob import glob
from bsddb.db import DBEnv, DB
from bsddb.db import DB_CREATE, DB_PRIVATE, DB_INIT_MPOOL, DB_BTREE, DB_HASH, DB_INIT_LOCK, DB_INIT_LOG, DB_INIT_TXN, DB_INIT_REP, DB_THREAD
env = DBEnv()
env.set_cachesize(0, 1024 * 1024 * 32)
env.open('./db/', DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_LOG |
DB_INIT_TXN | DB_CREATE | DB_INIT_REP | DB_THREAD)
db = DB(env)
db.open('apd.db', dbname='stuff', flags=DB_CREATE, dbtype=DB_BTREE)
for csvfile in glob('Stuff/*.csv'):
for line in open(csvfile):
db.put(line.strip(), None)
db.close()
env.close()
DB Configuration
In the DB_CONFIG file, this is where I'm missing the most important part I guess
repmgr_set_local_site localhost 6000
Actual replication try
# Copy the database file to begin with
db5.1_hotbackup -h ./db/ -b ./other-place
# Start replication master
db5.1_replicate -M -h db
# Then try to connect to it
db5.1_replicate -h ./other-place
The only thing I currently get from the replicate tool is:
db5.1_replicate(20648): DB_ENV->open: No such file or directory
edit after stracing the process I found out it was trying to access to __db.001, so I've copied those files manually. The current output is:
db5.1_replicate(22295): repmgr is already started
db5.1_replicate(22295): repmgr is already started
db5.1_replicate(22295): repmgr_start: Invalid argument
I suppose I'm missing the actual configuration value for the client to connect to the server, but so far no luck as all the settings yielded unrecognized name-value pair errors
Does anyone know how this setup might be completed? Maybe I'm not even headed in the right direction an this should be something completely different?