(I put a exact text and command I executed so would be looking a bit messy.)
I have a .TXT file looking like
11111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111
And outcome I am looking for would be like
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Command I have tried is
sed -i 's/\(.\{14\}\)\(.\{7\}\)\(.\{2\}\)\(.\{1\}\)\(.\{3\}\)\(.\{13\}\)\(.\{1\}\)\(.\{8\}\)\(.\{16\}\)\(.\{3\}\)/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT
And outcome I have got was
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way.
Is this a bug or something in sed command?
It is printing 0 in output because sed capture groups and their back-references can be up to 9 only and \10 is interpreted as \1 followed by literal 0.
You can solve it easily using FIELDWIDTHS feature of gnu-awk:
awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Just for academic exercise, here is a working sed to solve this using 2 substitutions:
sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file
sed can't reference capture groups > 9, Perl can:
perl -i -pe 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT
If you insist to use sed, you can do something like:
sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
This question already has answers here:
How can I print all the characters until a certain pattern (excluding the pattern itself) using grep/awk/sed?
(2 answers)
Closed 2 years ago.
I have a file, my_file.
The contents of the file look like this:
4: something
5: something
7: another thing
I want to print out the following:
4
5
7
Basically I want to get all the numbers before the character :
Here is what I tried:
grep -i "^[0-9]+(?=(:)" my_file
This returned nothing. How can I change this command to make it work?
This is a use-case for awk:
$ awk -F":" '{print $1}' < inputfile
because you're using : as a field delimiter.
Try this:
grep -Eo "^[0-9]+" my_file # you can use either E (extended) or P (pearl) regular expressions
-o is for only matching
We also need to specify that we are using regex.
Both of the following will work:
-E extended regular expressions
-P pearl regular expressions
Breakdown:
^ signifies the start
[0-9] match a digit
+ match 1 or more from [0-9]
Output:
4
5
7
Using grep
grep -oE '^[0-9]+:' my_file | tr -d ':'
using sed
sed 's#:.*$##g' my_file
Demo :
$cat test.txt
4: something
5: something
7: another thing
$sed 's#:.*$##g' test.txt
4
5
7
$grep -oE '^[0-9]+:' test.txt | tr -d ':'
4
5
7
Given the contents of test.txt as follows:
Hello 10 love 20 haha 30
Hello Hello 11 love love 21 haha 31
41 Hello Hello 42 love love 43 haha 44
I want some kind of grep expression so that after saying:
$ cat test.txt | grep ???
I get this output:
20
21
42
How to implement this function?
Seems like you're trying to get the second number..
grep -oP '^\D*\d+\D*\K\d+' file
or
Use sed.
sed 's/^[^[:digit:]]*[[:digit:]]\+[^[:digit:]]*\([[:digit:]]\+\).*/\1/' file
DEMO
An alternative you might like to consider, using awk:
awk -F'[^[:digit:]]+' '{ print /^[[:digit:]]/ ? $2 : $3 }' file
This sets the field separator to one or more non-digit characters, which means that the field you're interested in is either the second or the third field, depending on whether the line starts with a digit or not.
For brevity you may prefer to use the range [0-9] instead of [[:digit:]]:
awk -F'[^0-9]+' '{ print /^[0-9]/ ? $2 : $3 }' file
Or you could use perl to capture the part of the line you're interested in:
perl -lne 'print $1 if /\d\D+(\d+)/' file
\d matches digits and \D matches non-digits, so this captures the second set of digits found on the line. In the case where a second set of digits aren't found, nothing will be printed (this differs to the behaviour of the awk script).
I have the following script to remove all lines before a line which matches with a word:
str='
1
2
3
banana
4
5
6
banana
8
9
10
'
echo "$str" | awk -v pattern=banana '
print_it {print}
$0 ~ pattern {print_it = 1}
'
It returns:
4
5
6
banana
8
9
10
But I want to include the first match too. This is the desired output:
banana
4
5
6
banana
8
9
10
How could I do this? Do you have any better idea with another command?
I've also tried sed '0,/^banana$/d', but seems it only works with files, and I want to use it with a variable.
And how could I get all lines before a match using awk?
I mean. With banana in the regex this would be the output:
1
2
3
This awk should do:
echo "$str" | awk '/banana/ {f=1} f'
banana
4
5
6
banana
8
9
10
sed -n '/^banana$/,$p'
Should do what you want. -n instructs sed to print nothing by default, and the p command specifies that all addressed lines should be printed. This will work on a stream, and is different than the awk solution since this requires the entire line to match 'banana' exactly whereas your awk solution merely requires 'banana' to be in the string, but I'm copying your sed example. Not sure what you mean by "use it with a variable". If you mean that you want the string 'banana' to be in a variable, you can easily do sed -n "/$variable/,\$p" (note the double quotes and the escaped $) or sed -n "/^$variable\$/,\$p" or sed -n "/^$variable"'$/,$p'. You can also echo "$str" | sed -n '/banana/,$p' just like you do with awk.
Just invert the commands in the awk:
echo "$str" | awk -v pattern=banana '
$0 ~ pattern {print_it = 1} <--- if line matches, activate the flag
print_it {print} <--- if the flag is active, print the line
'
The print_it flag is activated when pattern is found. From that moment on (inclusive that line), you print lines when the flag is ON. Previously the print was done before the checking.
cat in.txt | awk "/banana/,0"
In case you don't want to preserve the matched line then you can use
cat in.txt | sed "0,/banana/d"
How do I display data from the beginning of a file until the first occurrence of a regular expression?
For example, if I have a file that contains:
One
Two
Three
Bravo
Four
Five
I want to start displaying the contents of the file starting at line 1 and stopping when I find the string "B*". So the output should look like this:
One
Two
Three
perl -pe 'last if /^B/' source.txt
An explanation: the -p switch adds a loop around the code, turning it into this:
while ( <> ) {
last if /^B.*/; # The bit we provide
print;
}
The last keyword exits the surrounding loop immediately if the condition holds - in this case, /^B/, which indicates that the line begins with a B.
if its from the start of the file
awk '/^B/{exit}1' file
if you want to start from specific line number
awk '/^B/{exit}NR>=10' file # start from line 10
sed -n '1,/^B/p'
Print from line 1 to /^B/ (inclusive). -n suppresses default echo.
Update: Opps.... didn't want "Bravo", so instead the reverse action is needed ;-)
sed -n '/^B/,$!p'
/I3az/
sed '/^B/,$d'
Read that as follows: Delete (d) all lines beginning with the first line that starts with a "B" (/^B/), up and until the last line ($).
Some of the sed commands given by others will continue to unnecessarily process the input after the regex is found which could be quite slow for large input. This quits when the regex is found:
sed -n '/^Bravo/q;p'
in Perl:
perl -nle '/B.*/ && last; print; ' source.txt
Just sharing some answers I've received:
Print data starting at the first line, and continue until we find a match to the regex, then stop:
<command> | perl -n -e 'print "$_" if 1 ... /<regex>/;'
Print data starting at the first line, and continue until we find a match to the regex, BUT don't display the line that matches the regular expression:
<command> | perl -pe '/<regex>/ && exit;'
Doing it in sed:
<command> | sed -n '1,/<regex>/p'
Your problem is a variation on an answer in perlfaq6: How can I pull out lines between two patterns that are themselves on different lines?.
You can use Perl's somewhat exotic .. operator (documented in perlop):
perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.
Here's another example of using ..:
while (<>) {
$in_header = 1 .. /^$/;
$in_body = /^$/ .. eof;
# now choose between them
} continue {
$. = 0 if eof; # fix $.
}
Here is a perl one-liner:
perl -pe 'last if /B/' file
If Perl is a possibilty, you could do something like this:
% perl -0ne 'if (/B.*/) { print $`; last }' INPUT_FILE
one liner with basic shell commands:
head -`grep -n B file|head -1|cut -f1 -d":"` file