Sed copy pattern between range only once - regex

I am using sed to edit some sql script. I want to copy all the lines from the first "CREATE" pattern until the first "ALTER" pattern. The issue I am having is that sed copies all lines between each set of CREATE and ALTER instead of only the first occurrence (more than once).
sed -n -e '/CREATE/,/ALTER/w createTables.sql' $filename

Perl to the rescue:
perl -ne 'print if /CREATE/ .. /ALTER/ && close ARGV' -- "$filename" > createTables.sql
It closes the input when the ALTER is matched, i.e. it doesn't read any further.

Using sed
sed -n '/CREATE/,/ALTER/{p;/ALTER/q}' file > createTables.sql
or alternatively(note the newline)
sed -n '/CREATE/,/ALTER/{w createTables.sql
/ALTER/q}' file

Related

replace string with underscore and dots using sed or awk

I have a bunch of files with filenames composed of underscore and dots, here is one example:
META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
I want to remove the part that contains .bed.nodup.sortedbed.roadmap.sort.fgwas.gz. so the expected filename output would be META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
I am using these sed commands but neither one works:
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo $stringZ | sed -e 's/\([[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.\)//g'
echo $stringZ | sed -e 's/\[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.[[:lower:]]\.//g'
Any solution is sed or awk would help a lot
Don't use external utilities and regexes for such a simple task! Use parameter expansions instead.
stringZ=META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params
echo "${stringZ/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz}"
To perform the renaming of all the files containing .bed.nodup.sortedbed.roadmap.sort.fgwas.gz, use this:
shopt -s nullglob
substring=.bed.nodup.sortedbed.roadmap.sort.fgwas.gz
for file in *"$substring"*; do
echo mv -- "$file" "${file/"$substring"}"
done
Note. I left echo in front of mv so that nothing is going to be renamed; the commands will only be displayed on your terminal. Remove echo if you're satisfied with what you see.
Your regex doesn't really feel too much more general than the fixed pattern would be, but if you want to make it work, you need to allow for more than one lower case character between each dot. Right now you're looking for exactly one, but you can fix it with \+ after each [[:lower:]] like
printf '%s' "$stringZ" | sed -e 's/\([[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.[[:lower:]]\+\.\)//g'
which with
stringZ="META_ALL_whrAdjBMI_GLOBAL_August2016.bed.nodup.sortedbed.roadmap.sort.fgwas.gz.r0-ADRL.GLND.FET-EnhA.out.params"
give me the output
META_ALL_whrAdjBMI_GLOBAL_August2016.r0-ADRL.GLND.FET-EnhA.out.params
Try this:
#!/bin/bash
for line in $(ls -1 META*);
do
f2=$(echo $line | sed 's/.bed.nodup.sortedbed.roadmap.sort.fgwas.gz//')
mv $line $f2
done

Using sed to find and replace within matched substrings

I'd like to use sed to process a property file such as:
java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace
I'd like to replace the .'s and -'s with _'s but only up to the ='s token. The output would be
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
I've tried various approaches including using addresses but I keep failing. Does anybody know how to do this?
What about...
$ echo foo.bar=/bla/bla-bla | sed -e 's/\([^-.]*\)[-.]\([^-.]*=.*\)/\1_\2/'
foo_bar=/bla/bla-bla
This won't work for the case where you have more than 1 dot or dash one the left, though. I'll have to think about it further.
awk makes life easier in this case:
awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1' file
here you go:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
if you really want to do with sed (gnu sed)
sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge' file
same example:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In this case I would use AWK instead of sed:
awk -F"=" '{gsub("\\.|-","_",$1); print $1"="$2;}' <file.properties>
Output:
java_home/usr/bin/java
groovy_home/usr/lib/groovy
workspace_home/build/me/my-workspace
This might work for you (GNU sed):
sed -r 's/=/\n&/;h;y/-./__/;G;s/\n.*\n//' file
"You wait ages for a bus..."
This works with any number of dots and hyphens in the line and does not require GNU sed:
sed 'h; s/.*=//; x; s/=.*//; s/[.-]/_/g; G; s/\n/=/' < data
Here's how:
h: save a copy of the line in the hold space
s: throw away everything before the equal sign in the pattern space
x: swap the pattern and hold
s: blow away everything after the = in the pattern
s: replaces dots and hyphens with underscores
G: join the pattern and hold with a newline
s: replace that newline with an equal to glue it all back together
Other way using sed
sed -re 's/(.*)([.-])(.*)=(.*)/\1_\3=\4/g' temp.txt
Output
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In case there are more than .- on left hand side then this
sed -re ':a; s/^([^.-]+)([\.-])(.*)=/\1_\3=/1;t a' temp.txt

Sed substitute recursively

echo ddayaynightday | sed 's/day//g'
It ends up daynight
Is there anyway to make it substitute until no more match ?
My preferred form, for this case:
echo ddayaynightday | sed -e ':loop' -e 's/day//g' -e 't loop'
This is the same as everyone else's, except that it uses multiple -e commands to make the three lines and uses the t construct—which means "branch if you did a successful substitution"—to iterate.
This might work for you:
echo ddayaynightday | sed ':a;s/day//g;ta'
night
The g flag deliberately doesn't re-match against the substituted portion of the string. What you'll need to do is a bit different. Try this:
echo ddayaynightday | sed $':begin\n/day/{ s///; bbegin\n}'
Due to BSD Sed's quirkiness the embedded newlines are required. If you're using GNU Sed you may be able to get away with
sed ':begin;/day/{ s///; bbegin }'
with bash:
str=ddayaynightday
while true; do tmp=${str//day/}; [[ $tmp = $str ]] && break; str=$tmp; done
echo $str
The following works:
$ echo ddayaynightday | sed ':loop;/day/{s///g;b loop}'
night
Depending on your system, the ; may not work to separate commands, so you can use the following instead:
echo ddayaynightday | sed -e ':loop' -e '/day/{s///g
b loop}'
Explanation:
:loop # Create the label 'loop'
/day/{ # if the pattern space matches 'day'
s///g # remove all occurrence of 'day' from the pattern space
b loop # go back to the label 'loop'
}
If the b loop portion of the command is not executed, the current contents of the pattern space are printed and the next line is read.
Ok, here they're: while and strlen in bash.
Using them one may implement my idea:
Repeat until its length will stop changing.
There's neither way to set flag nor way to write such regex, to "substitute until no more match".

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.
What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt
sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines
I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.

With sed or awk, how do I match from the end of the current line back to a specified character?

I have a list of file locations in a text file. For example:
/var/lib/mlocate
/var/lib/dpkg/info/mlocate.conffiles
/var/lib/dpkg/info/mlocate.list
/var/lib/dpkg/info/mlocate.md5sums
/var/lib/dpkg/info/mlocate.postinst
/var/lib/dpkg/info/mlocate.postrm
/var/lib/dpkg/info/mlocate.prerm
What I want to do is use sed or awk to read from the end of each line until the first forward slash (i.e., pick the actual file name from each file address).
I'm a bit shakey on syntax for both sed and awk. Can anyone help?
$ sed -e 's!^.*/!!' locations.txt
mlocate
mlocate.conffiles
mlocate.list
mlocate.md5sums
mlocate.postinst
mlocate.postrm
mlocate.prerm
Regular-expression quantifiers are greedy, which means .* matches as much of the input as possible. Read a pattern of the form .*X as "the last X in the string." In this case, we're deleting everything up through the final / in each line.
I used bangs rather than the usual forward-slash delimiters to avoid a need for escaping the literal forward slash we want to match. Otherwise, an equivalent albeit less readable command is
$ sed -e 's/^.*\///' locations.txt
Use command basename
$~hawk] basename /var/lib/mlocate
mlocate
I am for "basename" too, but for the sake of completeness, here is an awk one-liner:
awk -F/ 'NF>0{print $NF}' <file.txt
There's really no need to use sed or awk here, simply us basename
IFS=$'\n'
for file in $(cat filelist); do
basename $file;
done
If you want the directory part instead use dirname.
Pure Bash:
while read -r line
do
[[ ${#line} != 0 ]] && echo "${line##*/}"
done < files.txt
Edit: Excludes blank lines.
Thius would do the trick too if file contains the list of paths
$ xargs -d '\n' -n 1 -a file basename
This is a less-clever, plodding version of gbacon's:
sed -e 's/^.*\/\([^\/]*\)$/\1/'
#OP, you can use awk
awk -F"/" 'NF{ print $NF }' file
NF mean number of fields, $NF means get the value of last field
or with the shell
while read -r line
do
line=${line##*/} # means longest match from the front till the "/"
[ ! -z "$line" ] && echo $line
done <"file"
NB: if you have big files, use awk.