extract a base directory from the output of ps - regex

I am looking to extract a basedir from the output of ps -ef | grep classpath myprog.jar
root 20925 20886 1 17:41 pts/0 00:01:07 /opt/myprog/java/jre/bin -classpath myprog.jar
java is always a sub-dir under the basedir but the install path can vary from server to server e.g.
/usr/local/myprog/java/jre/bin
/opt/test/testing/myprog/java/jre/bin
So once i have my string how do I extract everything from before java until the beginning of the path?
That is, /usr/local/myprog or /opt/test/testing/myprog/

Using sed:
$ echo "root 20925 20886 1 17:41 pts/0 00:01:07 /opt/myprog/java/jre/bin -classpath myprog.jar" | sed 's/.*\ \(.*\)\/java.*/\1/'
/opt/myprog

Using grep -P:
ps -ef | grep -oP '\S+(?=/java)'
/opt/myprog
If your grep doesn't support -P then use:
s='root 20925 20886 1 17:41 pts/0 00:01:07 /opt/myprog/java/jre/bin -classpath myprog.jar'
[[ "$s" =~ (/[^[:blank:]]+)/java ]] && echo "${BASH_REMATCH[1]}"
/opt/myprog

echo "root 20925 20886 1 17:41 pts/0 00:01:07 /opt/myprog/java/jre/bin -classpath myprog.jar" | awk '{split($8,a,"/java"); print a[1]}'

Use pgrep to find all of the Java processes instead of using ps -ef | grep .... This way, you don't have to worry about your grep command showing up as one of your items.
Instead of running ps -ef, you can use the -o option to only pull up the desired fields, and most ps commands take --no-header to eliminate the header fields. This way, your script doesn't have to worry about header lines.
Finally, I am using Shell Parameter Expansion which is sometimes way easier than using sed to change a variable:
$ ps -o pid,args --no-headers $(pgrep -f "java .* myproj.jar") | while read pid command arguments
do
directory=${command%/java*}
echo "The directory for Process ID $pid is $directory"
done
By the way, you could be running multiple commands, so I loop through the ps command.

ps axo args | awk '/classpath myprog.jar/{print substr($0, 0,index($0, "java")-1)}'
For example:
$ echo '/opt/myprog/java/jre/bin -classpath myprog.jar' \
| awk '/classpath myprog.jar/{print substr($0, 0,index($0, "java")-1)}'
/opt/myprog/
You can (and probably should) switch both of the $0's to $1's if you know for sure that your path will not contain spaces. Or add additional fields to the ps -o list using commas (as in, o pid,args) and use $2 rather than $1.

You can match the following regex:
'((\/\w+)+)\/java'
and the first captured group \1 or $1 will contain the wanted string
Demo: http://regex101.com/r/zU2vV4

Related

Bash KSH Script Removing Last N Char

I am trying to do remove last 15 character from a string and I have to run that BASH script with "ksh". It works with "bash" very well, but with "ksh" it does not. Here is my code,
#!/bin/bash
ggate_location="'$(ps -ef|grep mgr)'"
for word in $ggate_location
do
[[ $word =~ mgr\.prm$ ]] && echo ${word::-15}
done
What am I doing wrong?
This is the output of $(ps -ef | grep mgr)
ggate 53158 1 1 Sep04 ? 1-14:53:02 ./mgr PARAMFILE /gecici/GoldenGate/ggs12c/dirprm/mgr.prm REPORTFILE /gecici/GoldenGate/ggs12c/dirrpt/MGR.rpt PRO
ggate 143867 32840 0 16:07 pts/5 00:00:00 grep --color=auto mgr
If performance isn't critical, the following pipeline which relies on external executables should work just fine :
ps -o cmd= | grep -Ewo '[^::space::]*mgr\.prm' | cut -c -15
ps -o cmd= asks ps only to display command-lines (without header), grep filters the lines to those which contain a word ending in mgr.\prm, cut returns only the first 15 characters of that word.
Note that the grep -word-regexp flag isn't POSIX-defined and probably won't work unless you're using GNU grep. In that case I recommend either using grep with the -PCRE regex flavour flag and add a word-\boundary to the end of the pattern, or adding ( |$) to the end of the pattern.

How to get only process ID in specify process name in Linux?

How to get only the process ID for a specified process name in Linux?
ps -ef|grep java
test 31372 31265 0 13:41 pts/1 00:00:00 grep java
Based on the process id I will write some logic. So how do I get only the process id for a specific process name.
Sample program:
PIDS= ps -ef|grep java
if [ -z "$PIDS" ]; then
echo "nothing"
else
mail test#domain.example
fi
You can pipe your output to awk to print just the PID. For example:
ps -ef | grep nginx | awk '{print $2}'
9439
You can use:
ps -ef | grep '[j]ava'
Or if pgrep is available then better to use:
pgrep -f java
Use this: ps -C <name> -o pid=
This command ignore grep process, and just return PID:
ps -ef | grep -v grep | grep java | awk '{print $2}'
why not just pidof ?
pidof <process_name>
it will return a list of pids matching the process name
https://linux.die.net/man/8/pidof

grep with extended regex over multiple lines

I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks
Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234

help with grep [[:alpha:]]* -o

file.txt contains:
##w##
##wew##
using mac 10.6, bash shell, the command:
cat file.txt | grep [[:alpha:]]* -o
outputs nothing. I'm trying to extract the text inside the hash signs. What am i doing wrong?
(Note that it is better practice in this instance to pass the filename as an argument to grep instead of piping the output of cat to grep: grep PATTERN file instead of cat file | grep PATTERN.)
What shell are you using to execute this command? I suspect that your problem is that the shell is interpreting the asterisk as a wildcard and trying to glob files.
Try quoting your pattern, e.g. grep '[[:alpha:]]*' -o file.txt.
I've noticed that this works fine with the version of grep that's on my Linux machine, but the grep on my Mac requires the command grep -E '[[:alpha:]]+' -o file.txt.
sed 's/#//g' file.txt
/SCRIPTS [31]> cat file.txt
##w##
##wew##
/SCRIPTS [32]> sed 's/#//g' file.txt
w
wew
if you have bash >3.1
while read -r line
do
case "$line" in
*"#"* )
if [[ $line =~ "^#+(.*)##+$" ]];then
echo ${BASH_REMATCH[1]}
fi
esac
done <"file"