How can I profile bazel's disk usage? - profiling

Bazel has some decent built-in profiling for both CPU and RAM use: https://docs.bazel.build/versions/main/skylark/performance.html
But I haven't been able to find anything that would let me profile how much disk space rules use up as they run.
Is there a way? If not, is there somewhere that I might I need to modify bazel in order to collect this information?

Related

Dynamic Linking ~ Limiting a DLL's system access

I know the question might seem a little vague but I will try to explain as clearly as I can.
In C++ there is a way to dynamically link code to your already running program. I am thinking about creating my own plugin system (For learning/research purposes) but I'd like to limit the plugins to specific system access for security purposes.
I would like to give the plugins limited access to for example disk writing such that it can only call functions from API I pass from my application (and write through my predefined interface) Is there a way to enforce this kind of behaviour from the application side?
If not: Are there other language's that support secure dynamically linked modules?
You should think of writing a plugin container (or a sand-box), then coordinate everything through the container, also make sure to drop privileges that you do not need inside the container process before running the plugin. Being run in a process means, you can run the container also as a unique user and not the one who started the process, after that you can limit the user and automatically the process will be limited. Having a dedicated user for a process is the most common and easiest way, it is also the only cross-platform way to limit a process, even on Windows you can use this method to limit a process.
Limiting access to shared resources that OS provides, like disk or RAM or CPU depends heavily on the OS, and you have not specified what OS. While it is doable on most OSes, Linux is the prime choice because it is written with multi-seat and server-use-cases in mind. For example in Linux you can use cgroups here to limit CPU, or RAM easily for each process, then you will only need to apply it for your plugin container process. There is blkio to control disk access, but you can still use the traditional quote mechanism in Linux to limit per-process or per-user share of disk space.
Supporting plugins is an involved process, and the best way to start is reading code that does some of that, Chromium sand-boxing is best place I can suggest, it is very cleanly written, and has nice documentation. Fortunately the code is not very big.
If you prefer less involvement with actual cgroups, there is an even easier mechanism for limiting resources, docker is fairly new but abstracts away low level OS constructs to easily contain applications, without the need to run them in Virtual Machines.
To block some calls, a first idea may be to hook the system calls which are forbidden and others API call which you don't want. You can also hook the dynamic linking calls to prevent your plugins to load another DLLs. Hook disk read/write API to block read/write.
Take a look at this, it may give you an idea to how can you forbid function calls.
You can also try to sandbox your plugins, try to look some open source sandbox and understand how they work. It should help you.
In this case you really have to sandbox the environment in that the DLL runs. Building such a sandbox is not easy at all, and it is something you probably do not want to do at all. System calls can be hidden in strings, or generated through meta programming at execution time, so hard to detect by just analysing the binary. Luckyly people have already build solutions. For example google's project native client with the goal to generally allow C++ code to be run safely in the browser. And when it is safe enough for a browser, it is probably safe enough for you and it might work outside of the browser.

How to build an application layer pre-fetching system

I'm working in a C/C++ mixed project that has the following situation.
I need to have a iteration to go through very small chunks (rarely larger chunks as well) in a file one by one. Ideally, I should just read them once consecutively. I think will be a better solution in this case to read a big chunk into a buffer and consume it later, rather than read each of them instantly when I need.
The problem is, how do I balance the cache size? Is there any well-known algorithm/library that I can take advantage of?
UPDATE: (changes the title)
Thanks for you guys' replies and I understand there are different levels of caching mechanism working in our boxes. But that not enough in my case.
I think I missed something important here. Actually I'm building an application upon an existing framework, in which requesting reads to the engine frquently will cost too much for me. (Yes, i believe the engine do take advantage of OS and disk level caches.) And what I'm trying to do is indeed to build an application level pre-fetching system.
Thoughts?
in general you should try to use what the OS gives you, rather than creating your own cache (because you run the risk of caching twice). for linux, you can request OS level caching via readahead(); i don't know what the windows equivalent would be.
looking into this some more, there is also a block level (ie disk) parameter, set via blockdev --setra. it's probably not a good idea to change that on your system (unless it is dedicated to just this one task), but if the value there (blockdev --getra) is already larger than your typical chunk size then you may not need to do anything else.
[and just to address the other point mentioned in the question comments - while an OS will cache file data in free memory, i don't believe that it will pre-emptively read an otherwise unread file (apart from to meet the requirements above). but if anyone knows otherwise, please post details...]
Have you tried mmap()ing the file instead of read()ing from it? In some cases this might be more efficient, in some cases this might not. However it is usually best to let the system optimize for you, since it knows more about the hardware than an application. mmap() will let the system know that you need the whole file, so it might just be more optimal.

Cache Hit/Miss for a value in C/C++ Program

This is my requirement, I know that certain algorithms makes good use of Cache, some do not, some do more I/O than others on particular data set, etc. I would like to see and analyze that happening myself.
So I was wondering if there was a way I could know how a certain memory/variable is read, i.e. is it from cache, or was there a cache miss. Further if there was a page fault while retrieving this value etc.
Thanks a lot!
If you really want to know when your caches are hitting/missing, modern processors have performance counters that you can use for exactly this purpose. I have used them extensively for academic research. The easiest way to use them is through perfmon2. Perfmon2 has both a library you can link into your program or a stand-alone program that will monitor an existing program. For example, here's the stand-alone program recording all level 1 data cache read requests and misses:
pfmon -eL1D_CACHE_LD:MESI,L1D_CACHE_LD:I_STATE your_program
For reference, Appendix A of this document (PDF) lists Intel's documentation on what hardware counters are available.
I would try using the valgrind cachegrind tool, it can print out annotated source lines with the number of hits/misses in which cache for that line.
I don't know if AMD CodeAnalyst can show that level of granularity but it wouldn't hurt to check.
Depends on the specific compiler, the OS, and the specific model of processor you're running on. Nothing (that I'm aware of) in the C/C++ language give you access to what's happening at the cache level.
There are various measurement tools, but they would be largely independent of the language.
There are some "rules" for minimizing cache and paging issues, though it would take me some time to think of a reasonably comprehensive list.

ramdisk with a file mirror

I wanted to speed up compilation so i was thinking i could have my files be build on a ramdisk but also have it flushed to the filesystem automatically and use the filesystem if there is not enough ram.
I may need something similar for an app i am writing where i would like files to be cached in ram and flushed into the FS. What are my options? Is there something like this that already exist? (perhaps fuse?) The app is a toy app (for now) and i would need to compile c++ code repeatedly. As we know, the longer it takes to compile when there is a specific problem to solve before progressing. the less we can get done.
Ram-disks went the way of the dodo with the file system cache. It can make much better decisions than a static cache, having awareness of RAM usage by other programs and the position of the disk write head. The lazy write-back is for free.
Compilation is CPU-bound, not disk bound. If you utilize all your CPU cores using the appropriate build flag, you can easily saturate them on typical PCs. Unless you have some sort of supercomputer, I don't think this will speed things up much.
For VS2008 this flag is /MP. It also exists on VS2005.
Not sure it helps answering a question from almost 12 years ago, but just now I was looking for some software to sync file systems or directories to do exactly that.
Both answers are correct, in the sense that the file system cache will attempt to predict what you need and make available in RAM, and, generally, building a software benefits a lot of multithreading, possibly maxing out your CPU. But right now I'm building with Unity and I don't have control over build optimization.
That said, a virtual RAM disk has advantages, because those files are always in RAM, different from file system cache (FSC) that has to deal with resources and applications competing for disk access.
Another difference is that when an application closes a file handle or forces a sync, FSC will attempt to get that files as soon as possible to the disk, in order to avoid problems (power failure and so on). I believe you can alter the FSC behavior on Linux.
A sync to a RAM disk does not write to disk, that may be responsible for the difference of performance you said in your comment.
That said, I still need to look for something to auto-sync my two file systems!

Limiting CPU speed for profiling

I'm trying to optimize several bottlenecks on an application which is supposed to run on a really wide range of CPUs and architectures (some of them very close to embeded devices).
The results of my profiler, however, aren't really significant because of the speed of my CPU. Is there any way (preferably under Windows or Mac OS X) to limit the speed of my CPU for profiling purposes?
I've thought about using a virtual machine, but haven't found any with such functionality.
This works well and supports multicore. http://www.cpukiller.com/
It's a common misconception that you need to know how fast your code is to know where your performance problems are. That confuses problem-finding with problem-measurement.
This is the method I use.
If there is some wasteful logic in the program, it will be wasteful no matter what CPU runs it.
What you need to know is where it is. For measurement, you don't need to know how big it is; you only need to know that it is big enough to need to be fixed.
Usually there are a number of problems, of different sizes. You will probably find the biggest ones first, but no matter what order you fix them in, each one you fix will make it easier to find the remaining ones, because they will take a larger percentage.
I'm afraid I don't know any answer other than to start looking around in your area for old hardware. The CPU isn't the only variable that can (usually) affect things. L1/L2 cache size, memory bus speed, memory speed/latency, hard drive speed, etc. are all significant factors in many applications.
There was an app on Downloadsquad.com recently. I dont remember the name of it but it did some fun stiff woth processors and task manager. It may have only been to manage what apps are on what cpu but maybe it would give you this. I will try to look for it this afternoon, and respond back if I find it.
Many profilers (for example oprofile - but thats linux only) let you set the frequency that they collect data. See if your profiler supports this, and if not try a different one that does.
I've thought about using a virtual
machine, but haven't found any with
such functionality.
Why do you need a VM that explicitly offers that functionality? Just limit the CPU usage of the VM in the host OS (where it is just a regular process). That should have exactly the same effect.
You can do this e.g. using cpulimit on Linux; similar solutions exist for MS Windows.