sed stops fetching the remaining content after a linefeed character is encountered - regex

sed -nE "s/(IMAGE)(.*)/\1\2/p" somefile > sfh.raw
somefile contains random ASCII as well as binary data after the image. The above sed command works, if there is no newline binary data in the file. If there is a newline it just outputs only until the new line, ignoring the rest of the file.
Is there a way we can make sed (.*) capture everything including the new line and continue until the end of the somefile content.
IMAGE254656
dsfdfdl;flkdfldsfkdsfkdlsfdfldfkdsfo;dsfkldsfdsfsd

Consider using awk for this:
awk '/^IMAGE/{i=1}i' somefile
awk processes a file line by line and allows you to set variables and check contents and lots of other very fancy stuff for each line.
This script checks each line to see if it starts with IMAGE. If so, it sets variable i to 1. Then it checks to see if i is set. If so, it does its default behavior of printing the line.

Related

a simple sed script displaying only changed lines

How could I make a separate sed script (let's call it script.sed) that would display only the changed lines without having to use the -n option while executing it? (Sorry for my English)
I have a file called data2.txt with digits and I need to change the lines ending with ".5" and print those changed lines out in the console.
I know how to do it with a single command (sed -n 's/.5$//gp' data2.txt), however our university professor requires us to do the same using sed -f script.sed data2.txt command.
Any ideas?
The following should work for your sed script:
s/.5$//gp
d
The -n option will suppress automatic printing of the line, the other way to do that is to use the d command. From man page:
d Delete pattern space. Start next cycle.
This works because the automatic printing of the line happens at the end of a cycle, and using the d command means you never reach the end of a cycle so no lines are printed automatically.
This might work for you (GNU sed):
#n
s/.5$//p
Save this to a file and run as:
sed -f file.sed file.txt

sed: display lines selected for deleting

How to use verbose flag in sed. Eg. If I'm deleting some lines using sed command then I want them to get displayed on a screen whichever lines are getting deleted. Also let me know if this can be done through a script?
Thanks in advance
sed doesn't have a verbose flag.
You can write a sed script that separates deleted lines from other lines, though. You can look at the deleted lines later, and decide whether deleting them was a good idea.
Here's an example. I want to delete from test.dat every line that starts with a number.
$ cat test.dat
1 First line
2 Second line
3 Third line
A Keep this one
Here's the sed script that will "do" the deleting. It looks for lines that start with a number, writes them to the file "deleted.dat", and then deletes them from the pattern space.
$ cat code/sed/delete-verbose.sed
/^[0-9]/{
w /home/myusername/deleted.dat
d
}
Here's what happens when you run it.
$ sed -f code/sed/delete-verbose.sed test.dat
A Keep this one
And here's what it wrote to "deleted.dat".
$ cat deleted.dat
1 First line
2 Second line
3 Third line
When you're confident the script is going to do the right thing, redirect output to another file, or edit the file in-place (-i option).
This might work for you (GNU sed);
sed -e '/pattern_to_delete/{w /dev/stderr' -e ';d}' input_file > output_file
There is no verbose flag but by sending the lines to be deleted to stderr the effect you require can be achieved.

awk script to remove ASCII from file type

Here is a simple command
file * | awk '/ASCII text/ {gsub(/:/,"",$1); print $1}' | xargs chmod -x
I am not able to understand the use of awk in the above as showed.
How is it working?
There was a deleted answer which came pretty close to avoiding the problems with whitespace or colons in filenames and the output of file. I've voted to undelete the answer, but I'm going to go ahead and post some improvements to it and add some explanation.
file -0 * | awk -F '\0' '$2 ~ /ASCII text/ {print $1 "\0"}' | xargs -0 chmod -x
Since nulls aren't allowed in filenames, it's safe to use them as delimiters. Each step in this pipeline uses nulls. file outputs them, awk accepts them in input and outputs them and xargs accepts them in input. I've also made the match specific to the description field so it won't trigger a false positive in the perhaps unusual case of a file which is named something like "ASCII text" but in fact its contents are not.
As others have said, the AWK command you posted matches lines of output from the file command that include "ASCII text" somewhere in the line. Then every colon is deleted (since gsub() is a global substitution) from field one which is the colon-space-delimited filename. A potential problem occurs if the filename contains either a colon or a space (or both or multiples). The filename will get truncated and the chmod will fail or might even be falsely triggered on a file with a similar name (e.g. "foo bar" and "foo" both exist, "foo" is not an ASCII text file so you don't want it to be touched, but "foo bar" gets truncated to "foo" and oops!). The reason spaces are potential problems is that AWK, by default, does field splitting on spaces and tabs.
Breakdown of the AWK portion of the pipeline you posted:
/ASCII text/ { - for each line that matches the regular expression
gsub(/:/,"",$1); - for each colon (as a regular expression) in the first field, substitute an empty string
print $1} - print the thus modified first field
I'm guessing but it looks like it's extracting the part before the : in the output of the file command (i.e. the filename). The gsub part will remove the : in the filename and so something like foo.txt: ASCII text will become foo.txt ASCII text. Then, the print will print the first item in the space separated list (in this case, the filename foo.txt). All these files will be made unexecutable by the chmod.
This looks quite tedious. It's probably easier to just say awk -F: '{print $1}' after grepping instead of the whole substitution trick. Also, this will break if the filename has spaces in it.
It's using file to determine the type (contents) of each file, then selecting the ones that are ASCII text and removing everything from the first colon (which is assumed to be the separator between the filename and file type; this is fragile when file names have colons in them; as Noufel noted, it's also doing it the hard way), then using xargs to batch then up and clear the execute bits. (The usual reason for doing this is files transferred from Windows, which doesn't have execute bits so often all files end up with execute bits set as seen by Unixes.)
The breakage on spaces is fixable; xargs understands quoting. I would break on the last colon instead of the first, though, since file doesn't usually include colons in its ASCII text type strings.

How can I remove text at beginning of a file using a regex?

I have a bunch of files that contain a semi-standard header. That is, the look of it is very similar but the text changes somewhat.
I want to remove this header from all of the files.
From looking at the files, I know that what I want to remove is encapsulated between similar words.
So, for instance, I have:
Foo bar...some text here...
more text
Foo bar...I want to keep everything after this point
I tried this command in perl:
perl -pi -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
But it doesn't work. I'm not a regex expert but hoping someone knows how to basically remove a chunk of text from the beginning of a file based on a text match and not the number of characters...
By default, ARGV (aka <> which is used behind-the-scenes by -p) only reads a single line at a time.
Workarounds:
Unset $/, which tells Perl to read a whole file at a time.
perl -pi -e "BEGIN{undef$/}s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
BEGIN is necessary to have that code run before the first read is done.
Use -0, which sets $/ = "\0".
perl -pi -0 -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
Take advantage of the flip-flop operator.
perl -ni -e "print unless 1 ... /^Foo.bar/'
This will skip printing starting from line 1 to /^Foo.bar/.
If your header stretches across more than one line you must tell perl how much to read. If the files are small in comparison to memory you may want to just slurp the whole file into memory:
perl -0777pi.orig -e 's/your regex/your replace/s' file1 file2 file3
The -0777 option sets perl to slurp mode, so $_ will hold the each whole file each time through the loop. Also, always remember to set the backup extension. If you don't you may find that you have wiped out your data accidentally and have no way to get it back. See perldoc perlrun for more information.
Given information from the comments, it looks like you are trying to strip all of the annoying stuff from the front of a Project Gutenberg ebook. If you understand all of the copyright issues involved, you should be able to get rid of the front matter like this:
perl -ni.orig -e 'print unless 1 .. /^\*END/' 00ws110.txt
The Project Gutenberg header ends with
*END*THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.29.93*END*
A safer regex would take into account the *END* at the end of the line as well, but I am lazy.
I might be misinterpreting what you're asking for, but it looks to me that simple:
perl -ni -e 'print unless 1..($. > 1 && /^Foo bar/)'
Here you go! This replaces the first line of the file:
use Tie::File;
tie my #array,"Tie::File","path_to_file" or die("can't tie the file");
$array[0] =~s/text_i_want_to_replace/replacement_text/gi;
untie #array;
You can operate on the array and you will see the modifications in the array. You can delete elements from the array and it will erase the line from the file. Applying substitution on elements will substitute text from the lines.
If you want to delete the first two lines, and keep something from the third, you can do something like this :
# tie the #array before this
shift #array;
shift #array;
$array[0]=~s/foo bar\.\.\.//gi;
# untie the #array
and this will do exactly what you need!

What Vim command to use to delete all text after a certain character on every line of a file?

Scenario:
I have a text file that has pipe (as in the | character) delimited data.
Each field of data in the pipe delimited fields can be of variable length, so counting characters won't work (or using some sort of substring function... if that even exists in Vim).
Is it possible, using Vim to delete all data from the second pipe to the end of the line for the entire file? There are approx 150,000 lines, so doing this manually would only be appealing to a masochist...
For example, change the following lines from:
1111|random sized text 12345|more random data la la la|1111|abcde
2222|random sized text abcdefghijk|la la la la|2222|defgh
3333|random sized text|more random data|33333|ijklmnop
to:
1111|random sized text 12345
2222|random sized text abcdefghijk
3333|random sized text
I'm sure this can be done somehow... I hope.
UPDATE: I should have mentioned that I'm running this on Windows XP, so I don't have access to some of the mentioned *nix commands (cut is not recognized on Windows).
:%s/^\v([^|]+\|[^|]+)\|.*$/\1/
You can also record a macro:
qq02f|Djq
and then you will be able to play it with 100#q to run the macro on the next 100 lines.
Macro explanation:
qq: starts macro recording;
0: goes to the first character of the line;
2f|: finds the second occurrence of the | character on the line;
D: deletes the text after the current position to the end of the line;
j: goes to the next line;
q: ends macro recording.
If you don't have to use Vim, another alternative would be the unix cut command:
cut -d '|' -f 1-2 file > out.file
Instead of substitution, one can use the :normal command to repeat
a sequence of two Normal mode commands on each line: 2f|, jumping
to the second | character on the line, and then D, deleting
everything up to the end of line.
:%norm!2f|D
Just another Vim way to do the same thing:
%s/^\(.\{-}|\)\{2}\zs.*//
%s/^\(.\{-}\zs|\)\{2}.*// " If you want to remove the second pipe as well.
This time, the regex matches as few characters as possible (\{-}) that are followed by |, and twice (\{2}), they are ignored to replace all following text (\zs) by nothing (//).
You can use :command to make a user command to run the substitution:
:command -range=% YourNameHere <line1>,<line2>s/^\v([^|]+\|[^|]+)\|.*$/\1/
You can also do:
:%s/^\([^\|]\+|[^\|]\+\)\|.*$/\1/g
Use Awk:
awk -F"|" '{$0=$1"|"$2}1' file
I've found that vim isn't great at handling very large files. I'm not sure how large your file is. Maybe cat and sed together would work better.
Here is a sed solution:
sed -e 's/^\([^|]*|[^|]*\).*$/\1/'
This will filter all lines in the buffer (1,$) through cut to do the job:
:1,$!cut -d '|' -f 1-2
To do it only on the current line, try:
:.!cut -d '|' -f 1-2
Why use Vim? Why not just run
cat my_pipe_file | cut -d'|' -f1-2