Bash print word after match [duplicate] - regex

This question already has answers here:
Get string after character [duplicate]
(5 answers)
Closed 7 years ago.
I have a variable that stores the output of a file. Within that output, I would like to print the first word after Database:. I'm fairly new to regex, but this is what I've tried so far:
sed -n -e 's/^.*Database: //p' "$output"
When I try this, I am getting a sed: can't read prints_output: File name too long error.
Does sed only take in a filename? I am running a hive query to desc formatted table and storing the results in output like so:
output=`hive -S -e "desc formatted table"`
output is then set to the result of that:
...
# Detailed Table Information
Database: sample_db
Owner: sample_owner
CreateTime: Thu Feb 26 23:36:43 PDT 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: maprfs:/some/location
Table Type: EXTERNAL_TABLE
Table Parameters:
...

Superficially, you should be using:
hive -S -e "desc formatted table" |
sed -n -e 's/^.*Database: //p'
This will show the complete line containing Database:. When you've got that working, you can eliminate the unwanted material on the line too.
Alternatively, you could use:
echo "$output" |
sed -n -e 's/^.*Database: //p'
Or, again, given that you're using Bash, you could use:
sed -n -e 's/^.*Database: //p' <<< "$output"
I'd use the first unless you need the whole output preserved for rescanning. Then I'd probably capture the output in a file (with tee):
hive -S -e "desc formatted table" |
tee output.log |
sed -n -e 's/^.*Database: //p'

Try using egrep:
egrep -oh 'Database:[[:blank:]][[:alnum:]]*[[:blank:]]' <output_file> | awk '{print $2;}'

Related

Dynamically substitue pattern with env variable with bash [duplicate]

This question already has answers here:
Bash Search File for Pattern, Replace Pattern With Code that Includes Git Branch Name
(1 answer)
Replace a string in shell script using a variable
(12 answers)
Closed 6 years ago.
I have a file file.txt with this content: Hi {YOU}, it's {ME}
I would like to dynamically create a new file file1.txt like this
YOU=John
ME=Leonardo
cat ./file.txt | sed 'SED_COMMAND_HERE' > file1.txt
which content would be: Hi John, it's Leonardo
The sed command I tried so far is like this s#{\([A-Z]*\)}#'"$\1"'#g but the "substitution" part doesn't work correctly, it prints out Hi $YOU, it's $ME
The sed utility can do multiple things to each input line:
$ sed -e "s/{YOU}/$YOU/" -e "s/{ME}/$ME/" inputfile.txt >outputfile.txt
This assumes that {YOU} and {ME} occurs only once each on the line, otherwise, just add g ("s/{YOU}/$YOU/g" etc.)
You can use awk with 2 files.
$> cat file.txt
Hi {YOU}, it's {ME}
$> cat repl.txt
YOU=John
ME=Leonardo
$> awk -F= 'FNR==NR{a["{" $1 "}"]=$2; next} {for (i in a) gsub(i,a[i])}1' repl.txt file.txt
Hi John, it's Leonardo
First awk command goes through replacement file and stores each key-value in an array a be wrapping keys with { and }.
In second iteration we just replace each key by value in actual file.
Update:
To do this without creating repl.txt you can use `process substitution**:
awk -F= 'FNR==NR{a["{" $1 "}"]=$2; next} {
for (i in a) gsub(i,a[i])} 1' <(( set -o posix ; set ) | grep -E '^(YOU|ME)=') file.txt

Escape dollar sign in regexp for sed

I will introduce what my question is about before actually asking - feel free to skip this section!
Some background info about my setup
To update files manually in a software system, I am creating a bash script to remove all files that are not present in the new version, using diff:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g"); do echo "rm -f $i" >> REMOVEOLDFILES.sh; done
This works fine. However, apparently my files often have a dollar sign ($) in the filename, this is due to some permutations of the GWT framework. Here is one example line from the above created bash script:
rm -f var/lib/tomcat7/webapps/ROOT/WEB-INF/classes/ExampleFile$3$1$1$1$2$1$1.class
Executing this script would not remove the wanted files, because bash reads these as argument variables. Hence I have to escape the dollar signs with "\$".
My actual question
I now want to add a sed-Command in the aforementioned pipeline, replacing this dollar sign. As a matter of fact, sed also reads the dollar sign as special character for regular expressions, so obviously I have to escape it as well.
But somehow this doesn't work and I could not find an explanation after googling a lot.
Here are some variations I have tried:
echo "Bla$bla" | sed "s/\$/2/g" # Output: Bla2
echo "Bla$bla" | sed 's/$$/2/g' # Output: Bla
echo "Bla$bla" | sed 's/\\$/2/g' # Output: Bla
echo "Bla$bla" | sed 's/#"\$"/2/g' # Output: Bla
echo "Bla$bla" | sed 's/\\\$/2/g' # Output: Bla
The desired output in this example should be "Bla2bla".
What am I missing?
I am using GNU sed 4.2.2
EDIT
I just realized, that the above example is wrong to begin with - the echo command already interprets the $ as a variable and the following sed doesn't get it anyway... Here a proper example:
Create a textfile test with the content bla$bla
cat test gives bla$bla
cat test | sed "s/$/2/g" gives bla$bla2
cat test | sed "s/\$/2/g" gives bla$bla2
cat test | sed "s/\\$/2/g" gives bla2bla
Hence, the last version is the answer. Remember: when testing, first make sure your test is correct, before you question the test object........
The correct way to escape a dollar sign in regular expressions for sed is double-backslash. Then, for creating the escaped version in the output, we need some additional slashes:
cat filenames.txt | sed "s/\\$/\\\\$/g" > escaped-filenames.txt
Yep, that's four backslashes in a row. This creates the required changes: a filename like bla$1$2.class would then change to bla\$1\$2.class.
This I can then insert into the full pipeline:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g" | sed "s/\\$/\\\\$/g"; do echo "rm -f $i" >> REMOVEOLDFILES.sh; done
Alternative to solve the background problem
chepner posted an alternative to solve the backround problem by simply adding single-quotes around the filenames for the output. This way, the $-signs are not read as variables by bash when executing the script and the files are also properly removed:
for i in $(diff -r old new 2>/dev/null | grep "Only in old" | cut -d "/" -f 3- | sed "s/: /\//g"); do echo "rm -f '$i'" >> REMOVEOLDFILES.sh; done
(note the changed echo "rm -f '$i'" in that line)
There are other problems with your script, but file names containing $ are not a problem if you properly quote the argument to rm in the resulting script.
echo "rm -f '$i'" >> REMOVEOLDFILES.sh
or using printf, which makes quoting a little nicer and is more portable:
printf "rm -f '%s'" "$i" >> REMOVEOLDFILES.sh
(Note that I'm addressing the real problem, not necessarily the question you asked.)
There is already a nice answer directly in the edited question that helped me a lot - thank you!
I just want to add a bit of curious behavior that I stumbled across: matching against a dollar sign at the end of lines (e.g. when modifying PS1 in your .bashrc file).
As a workaround, I match for additional whitespace.
$ DOLLAR_TERMINATED="123456 $"
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$/END/"
123456END
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$$/END/"
sed: -e expression #1, char 13: Invalid back reference
$ echo "${DOLLAR_TERMINATED}" | sed -e "s/ \\$\s*$/END/"
123456END
Explanation to the above, line by line:
Defining DOLLAR_TERMINATED - I want to replace the dollar sign at the end of DOLLAR_TERMINATED with "END"
It works if I don't check for the line ending
It won't work if I match for the line ending as well (adding one more $ on the left side)
It works if I additionally match for (non-present) whitespace
(My sed version is 4.2.2 from February 2016, bash is version 4.3.48(1)-release (x86_64-pc-linux-gnu), in case that makes any difference)

Seletively extract number from file name

I have a list of files in the format as: AA13_11BB, CC290_23DD, EE92_34RR. I need to extract only the numbers after the _ character, not the ones before. For those three file names, I would like to get 11, 23, 34 as output and after each extraction, store the number into a variable.
I'm very new to bash and regex. Currently, from AA13_11BB, I am able to either obtain 13_11:
for imgs in $DIR; do
LEVEL=$(echo $imgs | egrep -o [_0-9]+);
done
or two separate numbers 13 and 11:
LEVEL=$(echo $imgs | egrep -o [0-9]+)
May I please have some advice how to obtain my desired output? Thank you!
Use egrep with sed:
LEVEL=$(echo $imgs | egrep -o '_[0-9]+' | sed 's/_//' )
To complement the existing helpful answers, using the core of hjpotter92's answer:
The following processes all filenames in $DIR in a single command and reads all extracted tokens into array:
IFS=$'\n' read -d '' -ra levels < \
<(printf '%s\n' "$DIR"/* | egrep -o '_[0-9]+' | sed 's/_//')
IFS=$'\n' read -d '' -ra levels splits the input into lines and stores them as elements of array ${levels[#]}.
<(...) is a process substitution that allows the output from a command to act as an (ephemeral) input file.
printf '%s\n' "$DIR"/* uses pathname expansion to output each filename on its own line.
egrep -o '_[0-9]+' | sed 's/_//' is the same as in hjpotter92's answer - it works equally on multiple input lines, as is the case here.
To process the extracted tokens later, use:
for level in "${levels[#]}"; do
echo "$level" # work with $level
done
You can do it in one sed using the regex .*_([0-9]+).* (escape it properly for sed):
sed "s/.*_\([0-9]\+\).*/\1/" <<< "AA13_11BB"
It replaces the line with the first captured group (the sub-regex inside the ()), outputting:
11
In your script:
LEVEL=$(sed "s/.*_\([0-9]\+\).*/\1/" <<< $imgs)
Update: as suggested by #mklement0, in both BSD sed and GNU sed you can shorten the command using the -E parameter:
LEVEL=$(sed -E "s/.*_([0-9]+).*/\1/" <<< $imgs)
Using grep with -P flag
for imgs in $DIR
do
LEVEL=$(echo $imgs | grep -Po '(?<=_)[0-9]{2}')
echo $LEVEL
done

using vars as numbers in sed [duplicate]

This question already has answers here:
sed substitution with Bash variables
(6 answers)
Closed 8 years ago.
Having some difficulty in getting a sed | grep pipe to work when using vars as numbers.
In the string below the '3,5p' works fine, but when substituting the numbers for vars I get the error
sed: -e expression #1, char 4: extra characters after command
working=$(sed -n '3,5p' ${myFile} | grep -n "string" |cut -f1 -d: )
notWorking=$(sed -n '${LINESTART},${LINEEND}p' ${myFile} | grep -n "string" |cut -f1 -d: )
I would also be interested in any advice how I could change command so the line number returned is replaced with $string2 in the file myFile
thanks
Art
You need the variables to be expanded by sed. For that, you have to enclose the expression within double quotes:
sed -n "${LINESTART},${LINEEND}p" ${myFile}
^ ^
instead of
sed -n '${LINESTART},${LINEEND}p' ${myFile}
As you are checking for the line number in $myFile where string is found, it line is in between $LINESTART and $LINEEND, you can do:
awk 'NR>=start && NR<=end && /string/ {print NR}' start=$LINESTART end=$LINEEND ${myFile}
Suppose you want to replace a string just if it appears in specific lines. You can use this:
sed -i.bak "$LINESTART,$LINEEND s/FIND/REPLACE/' file
-i.bak makes a backup of the file and does an in-place edit: file will contain the modified file, while file.bak will be the backup.
Test
$ cat a
hello
this is
something
i want changed
end
but this is not to be changed
$ sed -i.bak '3,5 s/changed/NEW/' a
$ cat a
hello
this is
something
i want NEW <---- "changed" got replaced
end
but this is not to be changed <---- this "changed" did not

Transform mysql 'INSERT' statement into a CSV line

I need to convert mysql dump file to CSV format before importing to a data warehouse server.
INSERT INTO `temp` VALUES (30686631,1346959848246,1346959850865,1346959998054,'18663196147','18663196147','18668839208','17326812123',3372579,'1866319614700','A',1,'','',0,147,30686632,'KeyAd','1101','38.325.Monitor2.1101#10.40.10.170','10.40.10.40',5060,'10.40.10.46',5060,'100038455383251101_Monitor2#10.40.10.170','<sip:+18668839208#10.40.10.46:5060>;tag=sansay507370834rdb810','\"O\'HALLORAE,AEAN\" <sip:+17326812123#10.40.10.40;isup-oli=00>;tag=sansay507370829rdb1779','200',0,'',0,NULL,'','',3398812,NULL,NULL);
I'm using this command to remove mysql insert statement
sed -e 's/^INSERT INTO `temp` VALUES (//' -e 's/);$//' -e 's/(//;s/);//;s/,/|/g;s|["'\'']||g'
there seems to be an issue with names when they come between two slashes \ \ ,I can't figure out how to fix it.
From MySQL insert
'\"O\'HALLORAE,AEAN\"
can't figure out how to form the output to
"O'HALLORAN,SEAN"
Desierd output:
30686631|1346959848246|1346959850865|1346959998054|18663196147|18663196147|18668839208|17326812123|3372579|1866319614700|A|1|||0|147|30686632|KeyAd|1101|38.325.Monitor2.1101#10.40.10.170|10.40.10.40|5060|10.40.10.46|5060|100038455383251101_Monitor2#10.40.10.170|<sip:+18668839208#10.40.10.46:5060>;tag=sansay507370834rdb810| "O'HALLORAN,SEAN" <sip:+17326812123#10.40.10.40;isup-oli=00>;tag=sansay507370829rdb1779|200|0||0|NULL|||3398812|NULL|NULL
Try this:
$ sed -e 's/INSERT INTO `temp` VALUES (//' -e 's/);$//' -re 's/("[^"]*),([^"]*")/\1\x1\2/g;s/,/|/g;s/\x1/,/g;s/\\([^\])/\1/g' file | sed "s/'|/|/g;s/|'/|/g"
Output:
30686631|1346959848246|1346959850865|1346959998054|18663196147|18663196147|18668839208|17326812123|3372579|1866319614700|A|1|||0|147|30686632|KeyAd|1101|38.325.Monitor2.1101#10.40.10.170|10.40.10.40|5060|10.40.10.46|5060|100038455383251101_Monitor2#10.40.10.170|<sip:+18668839208#10.40.10.46:5060>;tag=sansay507370834rdb810|"O'HALLORAN,SEAN" <sip:+17326812123#10.40.10.40;isup-oli=00>;tag=sansay507370829rdb1779|200|0||0|NULL|||3398812|NULL|NULL
If ruby is an acceptable dependency for you, you can leverage its parser if you can transform the statement into a valid ruby array:
script.sh:
#!/bin/bash
# -r to preserve backslashes
read -r statement
ruby=$(echo -n $statement | sed -e 's/^.*VALUES //' -e 's/;$//' -e 's/^(/[/' -e 's/)$/]/' -e 's/NULL/"NULL"/g' -e 's/\\"/"/g')
echo $ruby | ruby -rcsv -e 'puts CSV.generate_line(eval($stdin.read), "|")'
Usage:
chmod +x script.sh
echo <your statement> | ./script.sh
30686631|1346959848246|1346959850865|1346959998054|18663196147|18663196147|18668839208|17326812123|3372579|1866319614700|A|1|""|""|0|147|30686632|KeyAd|1101|38.325.Monitor2.1101#10.40.10.170|10.40.10.40|5060|10.40.10.46|5060|100038455383251101_Monitor2#10.40.10.170|<sip:+18668839208#10.40.10.46:5060>;tag=sansay507370834rdb810|"""O'HALLORAE,AEAN"" <sip:+17326812123#10.40.10.40;isup-oli=00>;tag=sansay507370829rdb1779"|200|0|""|0|NULL|""|""|3398812|NULL|NULL
This loads as expected on openoffice (after setting the delimiter to "|")