Hi I have to delete some lines in a file:
file 1
1 2 3
4 5 6
file 2
1 2 3 6
5 7 8 7
4 5 6 9
I have to delete all the lines of file 1 that i find in file 2:
output
5 7 8 7
I used sed:
for sample_index in $(seq 1 3)
do
sample=$(awk 'NR=='$sample_index'' file1)
sed "/${sample}/d" file2 > tmp
done
but it doesnt work.it doesn't print anything. do you have any idea?It gives me error of 'sed: -e expression #1, char 0: precedent regular expression needed'
This could be a start:
$ grep -vf file1 file2
5 7 8 7
One potential pitfall here is that the output won't change if you put 5 6 9 as the second line of file1. I'm not sure if if you want that or not. If not, you can try
grep -vf <(sed 's/^/^/' file1) file2
This should work if your real data as 3 columns:
awk 'NR==FNR{a[$1$2$3]++;next}!($1$2$3 in a)' file{1,2}
For variable columns:
awk 'NR==FNR{a[$0]++;next}{for(x in a) if(index($0,x)>0) next}1' file{1,2}
And the code for GNU sed
sed -r 's#(.*)#/\1/d#' file1 | sed -f - file2
Related
This question already has answers here:
regex: find one-digit number
(5 answers)
Closed 6 years ago.
I have to make a regex to match one digit only.
it should match 7 and a7b but not 77.
I made this but it doesn`t seem to work in sed.
(?<![\d])(?<![\S])[1](?![^\s.,?!])(?!^[\d])
(?<![\d])(?<!^[\a-z])\d(?![^a-z])(?!^[\d])
What am I doing wrong?
Edit:
I need to replace only 1-digit numbers with something like
sed 's/regex/#/g' file //regex to match "1"
file content
1 2 3 4 5 11 1
agdse1tg1xw
6 97 45 12
Should become
# 2 3 4 5 11 #
agdse#tg#xw
6 97 45 12
Input
a77
a7b
2ab
882
9
abcfg9
9fg
ab9
Script
sed -En '/^[^[:digit:]]*[[:digit:]]{1}[^[:digit:]]*$/p' filename
Output
a7b
2ab
9
abcfg9
9fg
ab9
To do what you show in the Example in your question is:
$ sed -r 's/(^|[^0-9])1([^0-9]|$)/\1#\2/g' file
# 2 3 4 5 11 #
agdse#tg#xw
6 97 45 12
but that only works because you didn't have 1 1 in your data. If you did you'd need 2 passes:
$ echo '1 1' | sed -r 's/(^|[^0-9])[0-9]([^0-9]|$)/\1#\2/g'
# 1
$ echo '1 1' | sed -r 's/(^|[^0-9])[0-9]([^0-9]|$)/\1#\2/g; s/(^|[^0-9])[0-9]([^0-9]|$)/\1#\2/g'
# #
and if you wanted to do that for any single digit it would be:
$ sed -r 's/(^|[^0-9])[0-9]([^0-9]|$)/\1#\2/g; s/(^|[^0-9])[0-9]([^0-9]|$)/\1#\2/g' file
# # # # # 11 #
agdse#tg#xw
# 97 45 12
sed only supports BRE and ERE, but you can enable PCRE with grep -P:
% printf 'a77\na7b\n2ab\n82\n' | grep -P '(?<!\d)\d(?!\d)'
a7b
2ab
grep will as demonstrated print matched lines, but have an option to print the match only:
% printf 'a77\na7b\n2ab\n82\n' | grep -oP '(?<!\d)\d(?!\d)'
7
2
Deleting the match and two lines before it works:
sed -i.bak -e '/match/,-2d' someCommonName.txt
Deleting the match and two lines after it works:
sed -i.bak -e '/match/,+2d' someCommonName.txt
But deleting the match, two lines after it and two lines before it does not work?
sed -i.bak -e '/match/-2,+2d' someCommonName.txt
sed: -e expression #1 unknown command: `-'
Why is that?
sed operates on a range of addresses. That means either one or two expressions, not three.
/match/ is an address which matches a regex.
-2 is an address which specifies two lines before
+2 is an address which specifies two lines after
Therefore:
/match/,-2 is a range which specifies the line matching match to two lines before.
/match/-2,+2d, on the other hand, includes three addresses, and thus makes no sense.
To delete two lines before and after a pattern, I would recommend something like this (modified from this answer):
sed -n "1N;2N;/\npattern$/{N;N;d};P;N;D"
This keeps 3 lines in the buffer and reads through the file. When the pattern is found in the last line, it reads two more lines and deletes all 5. Note that this will not work if the pattern is in the first two lines of the file, but it is a start.
sed -i .bak '/match/,-2 {/match/!d;};/match/,+2d' YourFile
try this (cannot test here, -2 is not available in my sed version)
I don't have a complete solution but an outline: sed is a pretty simple tool which doesn't do two things at once. My approach would be to run sed once deleting the two lines after the pattern but keeping the pattern itself. The result can then be piped to sed again to remove the pattern and the two lines before.
FWIW this is how I'd really do the job (just change the b and a values to delete different numbers of lines before/after match is found):
$ cat file
1
2
3
4
5 match
6
7
8
9
$ awk -v b=2 -v a=2 'NR==FNR{if (/match/) for (i=(NR-b);i<=(NR+a);i++) skip[i]; next } !(FNR in skip)' file file
1
2
8
9
$ awk -v b=3 -v a=1 'NR==FNR{if (/match/) for (i=(NR-b);i<=(NR+a);i++) skip[i]; next } !(FNR in skip)' file file
1
7
8
9
Note that the above assumes that when 2 "match"s appear within a removal window you want to base the deletions on the original occurrence, not what would happen after the first match being found causes the 2nd match to be deleted:
$ cat file2
1
2
3
4 match
5
6 match
7
8
9
$ awk -v b=2 -v a=2 'NR==FNR{if (/match/) for (i=(NR-b);i<=(NR+a);i++) skip[i]; next } !(FNR in skip)' file2 file2
1
9
as opposed to the output being:
1
7
8
9
since deleting the 2 lines after the first match would delete the 2nd match and so the 2 lines after THAT would not be deleted since they no longer are within 2 lines after a match.
Something else to consider:
$ diff --changed-group-format='%<' --unchanged-group-format='' file <(grep -A2 -B2 match file)
1
2
8
9
$ diff --changed-group-format='%<' --unchanged-group-format='' file2 <(grep -A2 -B2 match file2)
1
9
That uses bash and GNU diff 3.2, idk if/which other shells/diffs would support those constructs/options.
Say I have this file data.txt:
a=0,b=3,c=5
a=2,b=0,c=4
a=3,b=6,c=7
I want to use grep to extract 2 columns corresponding to the values of a and c:
0 5
2 4
3 7
I know how to extract each column separately:
grep -oP 'a=\K([0-9]+)' data.txt
0
2
3
And:
grep -oP 'c=\K([0-9]+)' data.txt
5
4
7
But I can't figure how to extract the two groups. I tried the following, which didn't work:
grep -oP 'a=\K([0-9]+),.+c=\K([0-9]+)' data.txt
5
4
7
I am also curious about grep being able to do so. \K "removes" the previous content that is stored, so you cannot use it twice in the same expression: it will just show the last group. Hence, it should be done differently.
In the meanwhile, I would use sed:
sed -r 's/^a=([0-9]+).*c=([0-9]+)$/\1 \2/' file
it catches the digits after a= and c=, whenever this happens on lines starting with a= and not containing anything else after c=digits.
For your input, it returns:
0 5
2 4
3 7
You could try the below grep command. But note that , grep would display each match in separate new line. So you won't get the format like you mentioned in the question.
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file
0
5
2
4
3
7
To get the mentioned format , you need to pass the output of grep to paste or any other commands .
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file | paste -d' ' - -
0 5
2 4
3 7
use this :
awk -F[=,] '{print $2" "$6}' data.txt
I am using the separators as = and ,, then spliting on them
I have the following script to remove all lines before a line which matches with a word:
str='
1
2
3
banana
4
5
6
banana
8
9
10
'
echo "$str" | awk -v pattern=banana '
print_it {print}
$0 ~ pattern {print_it = 1}
'
It returns:
4
5
6
banana
8
9
10
But I want to include the first match too. This is the desired output:
banana
4
5
6
banana
8
9
10
How could I do this? Do you have any better idea with another command?
I've also tried sed '0,/^banana$/d', but seems it only works with files, and I want to use it with a variable.
And how could I get all lines before a match using awk?
I mean. With banana in the regex this would be the output:
1
2
3
This awk should do:
echo "$str" | awk '/banana/ {f=1} f'
banana
4
5
6
banana
8
9
10
sed -n '/^banana$/,$p'
Should do what you want. -n instructs sed to print nothing by default, and the p command specifies that all addressed lines should be printed. This will work on a stream, and is different than the awk solution since this requires the entire line to match 'banana' exactly whereas your awk solution merely requires 'banana' to be in the string, but I'm copying your sed example. Not sure what you mean by "use it with a variable". If you mean that you want the string 'banana' to be in a variable, you can easily do sed -n "/$variable/,\$p" (note the double quotes and the escaped $) or sed -n "/^$variable\$/,\$p" or sed -n "/^$variable"'$/,$p'. You can also echo "$str" | sed -n '/banana/,$p' just like you do with awk.
Just invert the commands in the awk:
echo "$str" | awk -v pattern=banana '
$0 ~ pattern {print_it = 1} <--- if line matches, activate the flag
print_it {print} <--- if the flag is active, print the line
'
The print_it flag is activated when pattern is found. From that moment on (inclusive that line), you print lines when the flag is ON. Previously the print was done before the checking.
cat in.txt | awk "/banana/,0"
In case you don't want to preserve the matched line then you can use
cat in.txt | sed "0,/banana/d"
I am trying to do something like this in shellscript:
STEP=5
LIST=[1-$STEP]
for i in $LIST
echo $i
done
The output I expect is:
1 2 3 4 5
I probably have seen this usage before ( e.g. [A-Z] ) but I cannot remember the correct syntax. Thank you for your help!
Try this. Note that you use the echo command which includes an LF. Use echo -n to get output on the same line as shown
STEP=5
for i in `seq 1 $STEP`; do
echo $i
done
Assuming this is bash:
$ echo {1..5}
1 2 3 4 5
$ STEP=5
$ echo {1..$STEP}
{1..5}
$ eval echo {1..$STEP}
1 2 3 4 5