MRTG ThreshPro is only one time. Not runs every 5 minute - rrdtool

I have setup MRTG-rrdtools-routers2.cgi and setup working fine and happy as a beginner :)
I have set 'ThreshDir:', 'ThreshMinI' and 'ThreshProgI' in MRTG cfgs. At the first run my script in 'ThreshProgI' is run without any issue but it not going to run in the next 5 minutes runs.
I see that in the 'ThreshDir:' location, there is a file generate at at first MRTG run. If I remove that file then my script in 'ThreshProgI' will run in the next MRTG run.
So far what I notice here is that after generating the 'ThreshDir:' file, 'ThreshProgI' will stop working in my setup. What could be the reason for this, how can I make 'ThreshProgI' run every 5 minutes (when 'ThreshMinI' fails).

This is by design.
MRTG only runs the threshold program on the FIRST time the threshold is broken, and not on the following runs, until it recovers. The last status is held in the ThreshDir in order to manage this.
There is another definition for a threshold program to run on recovery.
The only way to trick MRTG into running the threshold program on every run regardless of the previous status, is to delete the status history file in the ThreshDir each pass (as you are doing).

Related

Robot Framework: Set Timeout in Robot framework

I have created a framework in which I have used Set Browser Implicit Wait 30
I have 50 suite that contains total of 700 test cases. A few of the test cases (200 TC's) has steps to find if Element present and element not present. My Objective is that I do not want to wait until 30 seconds to check if Element Present or Element not Present. I tried using Wait Until Element Is Visible ${locator} timeout=10, expecting to wait only 10 seconds for the Element , but it wait for 30 seconds.
Question : Can somebody help with the right approach to deal with such scenarios in my framework? If I agree to wait until 30 seconds, the time taken to complete such test case will be more. I am trying to save 20*200 secs currently Please advise
The simplest solution is to change the implicit wait right before checking that an element does not exist, and then changing it back afterwards. You can do this with the keyword set selenium implicit wait.
For example, your keyword might look something like this:
*** Keywords ***
verify element is not on page
[Arguments] ${locator}
${old_wait}= Set selenium implicit wait 10
run keyword and continue on failure
... page should not contain element ${locator}
set selenium implicit wait ${old_wait}
You can simply add timeout="${Time}" next to the keyword you want to execute (Exp., Wait Until Page Contains Element ${locator} timeout=50)
The problem you're running into deals with issue of "Implicit wait vs Explicit Wait". Searching the internet will provide you with a lot of good explanations on why mixing is not recommended, but I think Jim Evans (Creator of IE Webdriver) explained it nicely in this stackoverflow answer.
Improving the performance of your test run is typically done by utilizing one or both of these:
Shorten the duration of each individual test
Run test in parallel.
Shortening the duration of a test typically means being in complete control of the application under test resulting in the script knowing when the application has successfully loaded the moment it happens. This means having a a low or none Implicit wait and working exclusively with Fluent waits (waiting for a condition to occur). This will result in your tests running at the speed your application allows.
This may mean investing time understanding the application you test on a technical level. By using a custom locator you can still use all the regular SeleniumLibrary keywords and have a centralized waiting function.
Running tests in parallel starts with having tests that run standalone and have no dependencies on other tests. In Robot Framework this means having Test Suite Files that can run independently of each other. Most of us use Pabot to run our suites in parallel and merge the log file afterwards.
Running several browser application tests in parallel means running more than 1 browser at the same time. If you test in Chrome, this can be done on a single host - though it's not always recommended. When you run IE then you require multiple boxes/sessions. Then you start to require a Selenium Grid type solution to distribute the execution load across multiple machines.

Calabash: Increase wait timeout for predefined step

Then I wait to see "Choose a drink:#0"
# calabash-cucumber-0.18.0/features/step_definitions/calabash_steps.rb:154
execution expired (Calabash::Cucumber::WaitHelpers::WaitError)
While using predefined steps, it works fine in local. However it fails in CI. There are no network calls made. It's just a transition to a new screen.
Any ideas on how to increase the wait timeout for the predefined steps will be appreciated.
The best thing to do would be to write your own step.
If you look at the step you can see that if you set WAIT_TIMEOUT you can increase the wait time.
$ WAIT_TIMEOUT=120 bundle exec cucumber
I think you should write your own step.

C++ executing a bash script which terminates and restarts the current process

So here is the situation, we have a C++ datafeed client program which we run ~30 instances of with different parameters, and there are 3 scripts written to run/stop them: start.sh stop.sh and restart.sh (which runs stop.sh and then start.sh).
When there is a high volume of data the client "falls behind" real time. We test this by comparing the system time to the most recent data entry times listed. If any of the clients falls behind more than 10 minutes or so, I want to call the restart script to start all the binaries fresh so our data is as close to real time as possible.
Normally I call a script using System(script.sh), however the restart script looks up and kills the process using kill, BUT calling System() also makes the current program execution ignore SIGQUIT and SIGINT until system() returns.
On top of this if there are two concurrent executions with the same arguments they will conflict and the program will hang (this stems from establishing database connections), so I can not start the new instance until the old one is killed and I can not kill the current one if it ignores SIGQUIT.
Is there any way around this? The current state of the binary and missing some data does not matter at all if it has reached the threshold, I also can not just have the program restart itself, since if one of the instances falls behind, we want to restart all 30 of the instances (so gaps in the data are at uniform times). Is there a clean way to call a script from within C++ which hands over control and allows the script to restart the program from scratch?
FYI we are running on CentOS 6.3
Use exec() instead of system(). It will replace your process with the new one. Note there is a significant different in how exec() is called and how it behaves: system() passes its string argument to the system shell to run. exec() actually executes an executable file, and you need to supply the arguments to the process one at a time, instead of letting the shell parse them apart for you.
Here's my two cents.
Temporary solution: Use SIGKILL.
Long-term solution: Optimize your code or the general logic of your service tree, using other system calls like exec or by rewritting it to use threads.
If you want better answers maybe you should post some code and or degeneralize the issue.

Is it possible to stagger builds in Hudson/Jenkins?

I have Jenkins set up to build XBMC images for different platforms. My system takes around 6 hours to build each image, so I prefer to run them in parallel, usually 2 or 3 at a time. The problem with this is, that if they have to download updates to modules (like linux kernel or sometihng), the 2 or 3 building in parallel will download at the same time, corrupting the download (they point to the same folder)
Is it possible in jenkins/hudson to specify an offset? (I know you can schedule builds, as well as use a trigger that builds after completion of one project) something like:
Build 1: immediately
Build 2: start 20 minutes after build 1
Build 3: start 20 minutes after build 2
I tried looking for a plugin as well as google but no luck. I also know that I could schedule via the cron-like schedule capabilities in jenkins, but I have my build trigger set up to poll the GIT repo to look for changes for a build, I'm not just blind scheduling.
One way to do it is to choose the "Quiet Period" option under "Advanced".
Set it to 1200 seconds for Job 2, and 2400 seconds for Job 3.
That means Job 1 will be queued immediately when a change is noticed in git, Job 2 will go into the queue with a 20 minute delay, and Job 3 with a 40 minute delay.
Another way to do this would be to make the job some sort of a build flow (whether with the build flow plugin or by saying that the last task of job A is to run job B). If you can turn the download into its own job, then you can define the "download" job as single-threaded, and the rest as multithreaded.
Doing this serializes only what needs to be serialized. Doing an "every twenty minutes" thing will waste time when it takes fifteen minutes to download, and will fail (possibly in a hard-to-debug way) when there's a slowdown and it takes twenty-five minutes to download.

Kill Bash copy child process to simulate crash

I'm trying to test a Bash script which copies files individually and does some stuff to each file. It is meant to be resumable, so I'd like to make sure to test this properly. What is an elegant solution to kill or otherwise abort the script which does the copying from the test script, making sure it does not have time to copy and process all the files?
I have the PID of the child process, I can change the source code of both scripts, and I can create arbitrarily large files to test on.
Clarification: I start the script in the background with &, get the PID as $!, then I have a loop which checks that there is at least one file in the target directory (the test script copies three files). At that point I run kill -9 $PID, but the process is not interrupted - The files are copied successfully. This happens even if the files are big enough that creating them (with dd and /dev/urandom) takes a couple seconds.
Could it be that the files are only visible to the shell when cp has finished? It would be a bit strange, but it would explain why the kill command is too late.
Also, the idea is not to test resuming the same process, but cutting off the first process (simulate a system crash) and resuming with another invocation.
Send a KILL signal to the child process:
kill -KILL $childpid
You can try an play the timing game by using large files and sleeps. You may have an issue with the repeatability of the test.
You can add throttling code to the script your testing and then just throttle it all the way down. You can do throttling code by passing in a value which is:
a sleep value for sleeping in the loop
the number of files to process
the number of seconds after which the script will die
a nice value to execute the script at
Some of these may work better or worse from a testing point of view. nice'ing may get you variable results, as will setting up a background process to kill your script after N seconds. You can also try more than one of these at the same time which may give you the control you want. For example, accepting both a sleep value and the kill seconds could give you fine grained throttling control.