How does Bazel record Cache hit - build

Hi all the Bazel users,
I am trying to understand the logging of the cache hit and action execution environment in Bazel.
Firstly, I have noticed that there is a Action_cache.proto file which provides information about the cache hits. Can I assume that it is the local cache hits count?
Secondly, in ActionResult.java class there is a boolean locallyExecuted which returns true if all spawns of action were executed locally. In my understanding 1 action = 1 spawn, so does it mean 1 action = x spawn, and some spawns could be executed locally/remotely? That would be confusing since the runner's list in SpawnStats.java is based on each spawn but when printed in the build summary (in the console) we can see that number of runners = number of actions.
Lastly, in SpawnMetrics.java we can find the different execution kinds namely REMOTE, LOCAL, WORKER, OTHER. I was just wondering if WORKER belongs to local execution?
Thanks for your help!

Calculations for checking the cache rate is mentioned here
INFO: 7 processes: 3 remote cache hit, 4 linux-sandbox.
This means that out of 7 attempted actions, 3 got a remote cache hit and 4 actions did not have cache hits and were executed locally using linux-sandbox strategy. Local cache hits are not included in this summary. If you are getting 0 processes (or a number lower than expected), run bazel clean followed by your build/test command.

Related

How do parallel multi instance loop work in Camunda 7.16.6

I'm using the camunda-enginge 7.16.6.
I have a Process with a multi instance loop like this one that repeats parallel a 1000 times.
This loop is execute parallel. My assumption was, that n camunda executors now starts their work so executor #1 executes Task 2, then Task 3, then Task 4, and executor #2 and all others do the same. So after a short while at least some of the 1000 times finished all three Tasks in the loop
However what I observed so far is, that Task 2 gets execute 1000 times and only when that is finished, Task 3 gets executed a 1000 times and so on.
I also noticed, that camunda takes a lot of time by itself, outside of the tasks.
Is my Observation correct and is this behavior documented somewhere? Can you change that behavior?
I've run some tests an can explain the behavior:
The Order of Tasks and the overall time to finish is influenced by whenever or not there are transaction boundaries (async after, the red bars in the Screenshot).
Its a bit described here.
By setting the asyncBefore='true' attribute we introduce an additional save point at which the process state will be persisted and committed to the database. A separate job executor thread will continue the process asynchronously by using a separate database transaction. In case this transaction fails the service task will be retried and eventually marked as failed - in order to be dealt with by a human operator.
repeat 1000 times, parallel, no transaction
One Job Executor rushes trough the process, the Order is 1, [2,3,4|2,3,4|...], 5. Not really parallel. But this is as documented here:
The Job Executor makes sure that jobs from a single process instance are never executed concurrently.
It can be turned off if you are an expert and know what you are doing (and have understood this section).
Overall this took around 5 seconds.
repeat 1000 times, parallel, with transaction
Here, due the transactions, there will be 1000 waiting Jobs for Task 7, and each finish Task 7 creates another Job of Task 8. Since the execution of the Jobs is by the order in the database (see here), the order is 6,[7,7,7...8,8,8...9,9,9...],10.
The transaction handling which includes maintaining the variables has a huge impact on the runtime, with Transactions in parallel mode it runs 06:33 minutes.
If you turn off the exclusive-flag it takes around 4:30 minutes, but at the cost of thousands of OptimisticLockingExceptions.
Afaik the recommended approach to gain true parallelism would be to move Task 7, Task 8 and Task 9 to a seperate process and spawn 1000 instances of that process.
You can influence the order of execution if you tweak the job executor settings & priority, see here, but that seems to require the exclusive flag, too. If you do that, the Order will be 6,[7,7,7|8,9,8,9(in random order),...]10
repeat 1000 times, sequential, no transaction
The Order is 11,[12,13,14|12,13,14,...]15
This takes only 2 seconds.
repeat 1000 times, sequential, with transaction
The order is as expected 16,[17,18,19|17,18,19|...],20
Due the Transactions this takes 02:45 minutes.
I heard from colleges, that one should use parallel only if it involves long running/blocking tasks like a human task - in sequential mode there would only be one human task, and after that one is done, another will be created. in parallel mode, you have 1000 human tasks which is more likely the desired behavior.
Parallel performance seems to be improved in Camunda 8

Robot Framework: Set Timeout in Robot framework

I have created a framework in which I have used Set Browser Implicit Wait 30
I have 50 suite that contains total of 700 test cases. A few of the test cases (200 TC's) has steps to find if Element present and element not present. My Objective is that I do not want to wait until 30 seconds to check if Element Present or Element not Present. I tried using Wait Until Element Is Visible ${locator} timeout=10, expecting to wait only 10 seconds for the Element , but it wait for 30 seconds.
Question : Can somebody help with the right approach to deal with such scenarios in my framework? If I agree to wait until 30 seconds, the time taken to complete such test case will be more. I am trying to save 20*200 secs currently Please advise
The simplest solution is to change the implicit wait right before checking that an element does not exist, and then changing it back afterwards. You can do this with the keyword set selenium implicit wait.
For example, your keyword might look something like this:
*** Keywords ***
verify element is not on page
[Arguments] ${locator}
${old_wait}= Set selenium implicit wait 10
run keyword and continue on failure
... page should not contain element ${locator}
set selenium implicit wait ${old_wait}
You can simply add timeout="${Time}" next to the keyword you want to execute (Exp., Wait Until Page Contains Element ${locator} timeout=50)
The problem you're running into deals with issue of "Implicit wait vs Explicit Wait". Searching the internet will provide you with a lot of good explanations on why mixing is not recommended, but I think Jim Evans (Creator of IE Webdriver) explained it nicely in this stackoverflow answer.
Improving the performance of your test run is typically done by utilizing one or both of these:
Shorten the duration of each individual test
Run test in parallel.
Shortening the duration of a test typically means being in complete control of the application under test resulting in the script knowing when the application has successfully loaded the moment it happens. This means having a a low or none Implicit wait and working exclusively with Fluent waits (waiting for a condition to occur). This will result in your tests running at the speed your application allows.
This may mean investing time understanding the application you test on a technical level. By using a custom locator you can still use all the regular SeleniumLibrary keywords and have a centralized waiting function.
Running tests in parallel starts with having tests that run standalone and have no dependencies on other tests. In Robot Framework this means having Test Suite Files that can run independently of each other. Most of us use Pabot to run our suites in parallel and merge the log file afterwards.
Running several browser application tests in parallel means running more than 1 browser at the same time. If you test in Chrome, this can be done on a single host - though it's not always recommended. When you run IE then you require multiple boxes/sessions. Then you start to require a Selenium Grid type solution to distribute the execution load across multiple machines.

Elastic Beanstalk high CPU load after a week of running

I am running a single-instance worker on AWS Beanstalk. It is a single-container Docker that runs some processes once every business day. Mostly, the processes sync a large number of small files from S3 and analyze those.
The setup runs fine for about a week, and then CPU load starts growing linearly in time, as in this screenshot.
The CPU load stays at a considerable level, slowing down my scheduled processes. At the same time, my top-resource tracking running inside the container (privileged Docker mode to enable it):
echo "%CPU %MEM ARGS $(date)" && ps -e -o pcpu,pmem,args --sort=pcpu | cut -d" " -f1-5 | tail
shows nearly no CPU load (which changes only during the time that my daily process runs, seemingly accurately reflecting system load at those times).
What am I missing here in terms of the origin of this "background" system load? Wondering if anybody seen some similar behavior, and/or could suggest additional diagnostics from inside the running container.
So far I have been re-starting the setup every week to remove the "background" load, but that is sub-optimal since the first run after each restart has to collect over 1 million small files from S3 (while subsequent daily runs add only a few thousand files per day).
The profile is a bit odd. Especially that it is a linear growth. Almost like something is accumulating and taking progressively longer to process.
I don't have enough information to point at a specific issue. A few things that you could check:
Are you collecting files anywhere, whether intentionally or in a cache or transfer folder? It could be that the system is running background processes (AV, index, defrag, dedupe, etc) and the "large number of small files" are accumulating to become something that needs to be paged or handled inefficiently.
Does any part of your process use a weekly naming convention or house keeping process. Might you be getting conflicts, or accumulating work load as the week rolls over. i.e. the 2nd week is actually processing both the 1st & 2nd week data, but never completing so that the next day it is progressively worse. I saw something similar where an inappropriate bubble sort process was not completing (never reached the completion condition due to the slow but steady inflow of data causing it to constantly reset) and the demand by the process got progressively higher as the array got larger.
Do you have some logging on a weekly rollover cycle ?
Are there any other key performance metrics following the trend ? (network, disk IO, memory, paging, etc)
Do consider if it is a false positive. if it is high CPU there should be other metrics mirroring the CPU behaviour, cache use, disk IO, S3 transfer statistics/logging.
RL

Dynamically Evaluate load and create Threads depending on machine performance

Hi i have started to work on a project where i use parallel computing to separate job loads among multiple machines, such as hashing and other forms of mathematical calculations. Im using C++
it is running on a Master/slave or Server/Client model if you prefer where every client connects to the server and waits for a job. The server can than take a job and seperate it depending on the number of clients
1000 jobs -- > 3 clients
IE: client 1 --> calculate(0 to 333)
Client 2 --> calculate(334 to 666)
Client 3 --> calculate(667 to 999)
I wanted to further enhance the speed by creating multiple threads on every running client. But since every machine are not likely (almost 100%) not going to have the same hardware, i cannot arbitrarily decide on a number of threads to run on every client.
i would like to know if one of you guys knew a way to evaluate the load a thread has on the cpu and extrapolate the number of threads that can be run concurently on the machine.
there are ways i see of doing this.
I start threads one by one, evaluating the cpu load every time and stop when i reach a certain prefix ceiling of (50% - 75% etc) but this has the flaw that ill have to stop and re-separate the job every time i start a new thread.
(and this is the more complex)
run some kind of test thread and calculate its impact on the cpu base load and extrapolate the number of threads that can be run on the machine and than start threads and separate jobs accordingly.
any idea or pointer are welcome, thanks in advance !

Concurrency problems in netty application

I implemented a simple http server link, but the result of the test (ab -n 10000 -c 100 http://localhost:8080/status) is very bad (look through the test.png in the previous link)
I don't understand why it doesn't work correctly with multiple threads.
I believe that, by default, Netty's default thread pool is configured with as many threads as there are cores on the machine. The idea being to handle requests asynchronously and non-blocking (where possible).
Your /status test includes a database transaction which blocks because of the intrinsic design of database drivers etc. So your performance - at high level - is essentially a result of:-
a.) you are running a pretty hefty test of 10,000 requests attempting to run 100 requests in parallel
b.) you are calling into a database for each request so this is will not be quick (relatively speaking compared to some non-blocking I/O operation)
A couple of questions/considerations for you:-
Machine Spec.?
What is the spec. of the machine you are running your application and test on?
How many cores?
If you only have 8 cores available then you will only have 8 threads running in parallel at any time. That means those batches of 100 requests per time will be queueing up
Consider what is running on the machine during the test
It sound like you are running the application AND Apache Bench on the same machine so be aware that both your application and the testing tool will both be contending for those cores (this is in addition to any background processes going on also contending for those cores - such as the OS)
What will the load be?
Predicting load is difficult right. If you do think you are likely to have 100 requests into the database at any one time then you may need to think about:-
a. your production environment may need a couple of instance to handle the load
b. try changing the config. of Netty's default thread pool to increase the number of threads
c. think about your application architecture - can you cache any of those results instead of going to the database for each request
May be linked to the usage of Database access (synchronous task) within one of your handler (at least in your TrafficShappingHandler) ?
You might need to "make async" your database calls (other threads in a producer/consumer way for instance)...
If something else, I do not have enough information...