We have a Linux server with multiple users logged in. If someone runs make -jN it hogs the whole server CPU usage and responsiveness to other users decreases drastically.
Is there any way to decrease the priority of make process run by anyone in Linux?
Make has a '-l' (--load-average) option.
If you specify 'make -l 3', make will not launch additional jobs if there are already jobs running and the load is over 3.
From the manpage:
-l [load], --load-average[=load]
Specifies that no new jobs (commands) should be started if there
are others jobs running and the load average is at least load (a
floating-point number). With no argument, removes a previous load
limit.
It doesn't really decrease the priority of make, but it can avoid causing too much load.
replace make with your own script and add a "nice -n <>" command, so that higher the -jN, more the niceness.
start a super-user process that does ps -u "user name" | grep make, and count the number of processes. use renice on the process ids make them in line, or any other algorithm you want
Related
I have the problem that I capture emails and they arrive in masses, the issue is that every time they arrive in masses the platform crashes, the question is how to make it go running the process 1 at a time, is it possible? because currently I filled the entire procmail server where there were multiple processes at once, plus we add the executives who were working and the server died and we had to reboot and delete data from the procmail to get it working again.
Because once we capture the data it is working and making subprocesses.
This is the code:
SHELL = /bin/sh
LOGFILE = /var/log/procmail.log
LOGABSTRACT = "all"
VERBOSE = "on"
:0c
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
:0:
/var/log/plaform_catchemail
If by "platform" you mean the PHP script, you can serialize access to it by using a lock file.
:0c:.catchemail.lock
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
This means, if the file .catchemail.lock does not exist in your $MAILDIR, go ahead and create it, and hold it for the duration of this recipe.
If it does exist, sleep and try again.
There is a failure scenario if the lock is held for too long; Procmail's default behavior in this case is to bounce the message (i.e. cause the delivering MTA to regard it as undeliverable, and return an error message to the sender). You probably want to avoid that, ideally by telling the MTA to attempt delivery again at a later time. (The precise mechanism will depend on your MTA; but basically, by setting a suitable exit code.) But what's feasible and scalable ultimately depends on how many messages you receive vs how many you can process under this constraint.
Is there a way to limit the number of threads a gsutil -m command spawns? Can I say something like gsutil -m --threads=4 to spawn exactly four threads?
You should set the parallel_thread_count and parallel_process_count values in the boto configuration file to 4
Gsutil Top-Level Command-Line Options
-m flag
Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and
setmeta) to run in parallel. This can significantly improve
performance if you are performing operations on a large number of
files over a reasonably fast network connection.
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads and
processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration file. You
might want to experiment with these values, as the best values can
vary based on a number of factors, including network speed, number of
CPUs, and available memory.
Using the -m option may make your performance worse if you are using a
slower network, such as the typical network speeds offered by
non-business home network plans. It can also make your performance
worse for cases that perform all operations locally (e.g., gsutil
rsync, where both source and destination URLs are on the local disk),
because it can "thrash" your local disk.
If a download or upload operation using parallel transfer fails before
the entire transfer is complete (e.g. failing after 300 of 1000 files
have been transferred), you will need to restart the entire transfer.
Also, although most commands will normally fail upon encountering an
error when the -m flag is disabled, all commands will continue to try
all operations when -m is enabled with multiple threads or processes,
and the number of failed operations (if any) will be reported as an
exception at the end of the command's execution.
I am running a single-instance worker on AWS Beanstalk. It is a single-container Docker that runs some processes once every business day. Mostly, the processes sync a large number of small files from S3 and analyze those.
The setup runs fine for about a week, and then CPU load starts growing linearly in time, as in this screenshot.
The CPU load stays at a considerable level, slowing down my scheduled processes. At the same time, my top-resource tracking running inside the container (privileged Docker mode to enable it):
echo "%CPU %MEM ARGS $(date)" && ps -e -o pcpu,pmem,args --sort=pcpu | cut -d" " -f1-5 | tail
shows nearly no CPU load (which changes only during the time that my daily process runs, seemingly accurately reflecting system load at those times).
What am I missing here in terms of the origin of this "background" system load? Wondering if anybody seen some similar behavior, and/or could suggest additional diagnostics from inside the running container.
So far I have been re-starting the setup every week to remove the "background" load, but that is sub-optimal since the first run after each restart has to collect over 1 million small files from S3 (while subsequent daily runs add only a few thousand files per day).
The profile is a bit odd. Especially that it is a linear growth. Almost like something is accumulating and taking progressively longer to process.
I don't have enough information to point at a specific issue. A few things that you could check:
Are you collecting files anywhere, whether intentionally or in a cache or transfer folder? It could be that the system is running background processes (AV, index, defrag, dedupe, etc) and the "large number of small files" are accumulating to become something that needs to be paged or handled inefficiently.
Does any part of your process use a weekly naming convention or house keeping process. Might you be getting conflicts, or accumulating work load as the week rolls over. i.e. the 2nd week is actually processing both the 1st & 2nd week data, but never completing so that the next day it is progressively worse. I saw something similar where an inappropriate bubble sort process was not completing (never reached the completion condition due to the slow but steady inflow of data causing it to constantly reset) and the demand by the process got progressively higher as the array got larger.
Do you have some logging on a weekly rollover cycle ?
Are there any other key performance metrics following the trend ? (network, disk IO, memory, paging, etc)
Do consider if it is a false positive. if it is high CPU there should be other metrics mirroring the CPU behaviour, cache use, disk IO, S3 transfer statistics/logging.
RL
So here is the situation, we have a C++ datafeed client program which we run ~30 instances of with different parameters, and there are 3 scripts written to run/stop them: start.sh stop.sh and restart.sh (which runs stop.sh and then start.sh).
When there is a high volume of data the client "falls behind" real time. We test this by comparing the system time to the most recent data entry times listed. If any of the clients falls behind more than 10 minutes or so, I want to call the restart script to start all the binaries fresh so our data is as close to real time as possible.
Normally I call a script using System(script.sh), however the restart script looks up and kills the process using kill, BUT calling System() also makes the current program execution ignore SIGQUIT and SIGINT until system() returns.
On top of this if there are two concurrent executions with the same arguments they will conflict and the program will hang (this stems from establishing database connections), so I can not start the new instance until the old one is killed and I can not kill the current one if it ignores SIGQUIT.
Is there any way around this? The current state of the binary and missing some data does not matter at all if it has reached the threshold, I also can not just have the program restart itself, since if one of the instances falls behind, we want to restart all 30 of the instances (so gaps in the data are at uniform times). Is there a clean way to call a script from within C++ which hands over control and allows the script to restart the program from scratch?
FYI we are running on CentOS 6.3
Use exec() instead of system(). It will replace your process with the new one. Note there is a significant different in how exec() is called and how it behaves: system() passes its string argument to the system shell to run. exec() actually executes an executable file, and you need to supply the arguments to the process one at a time, instead of letting the shell parse them apart for you.
Here's my two cents.
Temporary solution: Use SIGKILL.
Long-term solution: Optimize your code or the general logic of your service tree, using other system calls like exec or by rewritting it to use threads.
If you want better answers maybe you should post some code and or degeneralize the issue.
I'm trying to test a Bash script which copies files individually and does some stuff to each file. It is meant to be resumable, so I'd like to make sure to test this properly. What is an elegant solution to kill or otherwise abort the script which does the copying from the test script, making sure it does not have time to copy and process all the files?
I have the PID of the child process, I can change the source code of both scripts, and I can create arbitrarily large files to test on.
Clarification: I start the script in the background with &, get the PID as $!, then I have a loop which checks that there is at least one file in the target directory (the test script copies three files). At that point I run kill -9 $PID, but the process is not interrupted - The files are copied successfully. This happens even if the files are big enough that creating them (with dd and /dev/urandom) takes a couple seconds.
Could it be that the files are only visible to the shell when cp has finished? It would be a bit strange, but it would explain why the kill command is too late.
Also, the idea is not to test resuming the same process, but cutting off the first process (simulate a system crash) and resuming with another invocation.
Send a KILL signal to the child process:
kill -KILL $childpid
You can try an play the timing game by using large files and sleeps. You may have an issue with the repeatability of the test.
You can add throttling code to the script your testing and then just throttle it all the way down. You can do throttling code by passing in a value which is:
a sleep value for sleeping in the loop
the number of files to process
the number of seconds after which the script will die
a nice value to execute the script at
Some of these may work better or worse from a testing point of view. nice'ing may get you variable results, as will setting up a background process to kill your script after N seconds. You can also try more than one of these at the same time which may give you the control you want. For example, accepting both a sleep value and the kill seconds could give you fine grained throttling control.