Use bash to remove symbols from text file

Use bash to remove symbols from text file - regex

I have a bunch of txt-files containing stuff like this:
text_i_need_to_remove{text_i_need_to_retain}
text_i need_to_remove{text_i_need_to_retain}
...
How do I remove text before curly braces (and curly braces themselves) and retain just only text_i_need_to_retain?

Deleting everything upto { or } at end of line
:%s/.*{\|}$//g
From bash shell, you can use text processing tools like sed and awk. Assume file is named ip.txt
1) With sed, which is pretty similar to regex we used inside vim. The -i flag allows to make change in place, i.e it modifies the input file itself.
$ sed -i 's/.*{\|}$//g' ip.txt
2) With awk, one can again use substitution or in this case, split the line on curly brackets and use only the second column.
$ awk -F'{|}' '{print $2}' ip.txt > tmp && mv tmp ip.txt
If you have GNU awk, there is -i inplace option for in place editing
$ gawk -i inplace -F'{|}' '{print $2}' ip.txt
To make changed to all files in current directory, use
sed -i 's/.*{\|}$//g' *
Or if they have common extension, say .txt, use
sed -i 's/.*{\|}$//g' *.txt

:%s/^.*{\(.*\)}$/\1/ or in bash, sed 's/^.*{\(.*\)}$/\1/ foo.txt
\(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.

you can use this in vim;
:%s/^.*{// | %s/}$//
you can also use this script; first run this, if everythink is ok, uncomment sed with -i option as below;
#!/bin/bash
for item in $(ls /dir/where/my/files/are)
do
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/$item
done
sed -i ; inplace replace
or
Only use as below;
sed -i 's/^.*{//;s/}$//' /dir/where/my/files/are/*

Perl can be used to do the substitution on all files:
perl -i -pe 's/.*{|}$//g' *.txt

Related

Sed copy pattern between range only once

I am using sed to edit some sql script. I want to copy all the lines from the first "CREATE" pattern until the first "ALTER" pattern. The issue I am having is that sed copies all lines between each set of CREATE and ALTER instead of only the first occurrence (more than once).
sed -n -e '/CREATE/,/ALTER/w createTables.sql' $filename

Perl to the rescue:
perl -ne 'print if /CREATE/ .. /ALTER/ && close ARGV' -- "$filename" > createTables.sql
It closes the input when the ALTER is matched, i.e. it doesn't read any further.

Using sed
sed -n '/CREATE/,/ALTER/{p;/ALTER/q}' file > createTables.sql
or alternatively(note the newline)
sed -n '/CREATE/,/ALTER/{w createTables.sql
/ALTER/q}' file

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'

It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.

For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?

I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3

Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file

With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file

pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file

I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}

Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?

Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).

I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.

This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile

Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.

I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file

paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file

This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.

I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.

What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt

sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines

I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use bash to remove symbols from text file - regex

I have a bunch of txt-files containing stuff like this: text_i_need_to_remove{text_i_need_to_retain} text_i need_to_remove{text_i_need_to_retain} ... How do I remove text before curly braces (and curly braces themselves) and retain just only text_i_need_to_retain?

:%s/^.{\(.\)}$/\1/ or in bash, sed 's/^.{\(.\)}$/\1/ foo.txt \(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.

Perl can be used to do the substitution on all files: perl -i -pe 's/.{|}$//g' .txt

Related

Sed copy pattern between range only once

Extract few matching strings from matching lines in file using sed

How to cut a string from a string

Bash - how to put each line within quotation

Filter apache log file using regular expression

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use bash to remove symbols from text file - regex

I have a bunch of txt-files containing stuff like this: text_i_need_to_remove{text_i_need_to_retain} text_i need_to_remove{text_i_need_to_retain} ... How do I remove text before curly braces (and curly braces themselves) and retain just only text_i_need_to_retain?

:%s/^.*{\(.*\)}$/\1/ or in bash, sed 's/^.*{\(.*\)}$/\1/ foo.txt \(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.

Perl can be used to do the substitution on all files: perl -i -pe 's/.*{|}$//g' *.txt

Related

Sed copy pattern between range only once

Extract few matching strings from matching lines in file using sed

How to cut a string from a string

Bash - how to put each line within quotation

Filter apache log file using regular expression

Categories

Resources

:%s/^.{\(.\)}$/\1/ or in bash, sed 's/^.{\(.\)}$/\1/ foo.txt \(.*\) is a control group which feeds into \1 and looks like a lumbering zombie.

Perl can be used to do the substitution on all files: perl -i -pe 's/.{|}$//g' .txt