how to remove lines from file that don't match regex?

how to remove lines from file that don't match regex? - regex

I have a big file that looks like this:
7f0c41d6-f9c6-47aa-a034-d40bc629c973.csv
159890
159891
24faaed6-62ee-4175-8430-5d73b09911c8.csv
159907
5bad221f-25ef-44fa-9086-fd152e697928.csv
642e4ac3-3d46-4b4c-b5c8-aa2fa54d0b04.csv
d0e145a5-ceb8-4d4b-ae47-11e0c9a6548d.csv
159929
ba678cbd-af57-493b-a69e-e7504b4bc328.csv
7750840f-9bf9-4a68-9f25-a2ba0968d481.csv
159955
159959
And I'm only interesting in *.csv files, can someone point me how to remove files that do not end with .csv.
Thank you.

grep "\.csv$" file
will pull out only those lines ending in .csv
Then if you want to put them in a different file;
grep "\.csv$" file > newfile

sed is your friend:
sed -i.bak '/\.csv$/!d' file
-i.bak : in-place edit. creates backup file with .bak extension

([0-9a-zA-Z-]*.csv$)
This is the regex code that only select the filename ending with .csv extensions.
Hope this will help you.

If you are familiar with the vim text editor (vim or vi is typically installed on many linux boxes), use the following vim Ex mode command to remove lines that don't match a particular pattern:
:v/<pattern>/d
For example, if I wanted to delete all lines that didn't contain "column" I would run:
:v/"column"/d
Hope this helps.

If it is the case that you do not want to have to save the names of files in another file just to remove unwanted files, then this may also be an added solution for your needs (understanding that this is an old question).
This single line for loop using the grep "\.csv" file solution recursively so you don't need to manage multiple files names being saved here or there.
for f in *; do if [ ! "$(echo ${f} | grep -Eo '.csv')" == ".csv" ]; then rm "${f}"; fi; done
As a visual aid to show you that it works as intended (for removing all files except csv files) here is a quick and dirty screenshot showing the results using your sample output.
And here is a slightly shorter version of the single line command:
for f in *; do if [ ! "$(echo ${f} | grep -o '.csv')" ]; then rm "${f}"; fi; done
And here is it's sample output using your sample's csv file names and some randomly generated text files.
The purpose for using such a loop with a conditional is to guarantee you only rid yourself of the files you want gone (the non-csv files) and only in the current working directory without parsing the ls command.
Hopefully this helps you and anyone else that is looking for a similar solution.

Related

How can I insert text after a certain string is found? [duplicate]

I would like to run a find and replace on an HTML file through the command line.
My command looks something like this:
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html > index.html
When I run this and look at the file afterward, it is empty. It deleted the contents of my file.
When I run this after restoring the file again:
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html
The stdout is the contents of the file, and the find and replace has been executed.
Why is this happening?

When the shell sees > index.html in the command line it opens the file index.html for writing, wiping off all its previous contents.
To fix this you need to pass the -i option to sed to make the changes inline and create a backup of the original file before it does the changes in-place:
sed -i.bak s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html
Without the .bak the command will fail on some platforms, such as Mac OSX.

An alternative, useful, pattern is:
sed -e 'script script' index.html > index.html.tmp && mv index.html.tmp index.html
That has much the same effect, without using the -i option, and additionally means that, if the sed script fails for some reason, the input file isn't clobbered. Further, if the edit is successful, there's no backup file left lying around. This sort of idiom can be useful in Makefiles.
Quite a lot of seds have the -i option, but not all of them; the posix sed is one which doesn't. If you're aiming for portability, therefore, it's best avoided.

sed -i 's/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' index.html
This does a global in-place substitution on the file index.html. Quoting the string prevents problems with whitespace in the query and replacement.

use sed's -i option, e.g.
sed -i bak -e s/STRING_TO_REPLACE/REPLACE_WITH/g index.html

To change multiple files (and saving a backup of each as *.bak):
perl -p -i -e "s/\|/x/g" *
will take all files in directory and replace | with x
this is called a “Perl pie” (easy as a pie)

You should try using the option -i for in-place editing.

Warning: this is a dangerous method! It abuses the i/o buffers in linux and with specific options of buffering it manages to work on small files. It is an interesting curiosity. But don't use it for a real situation!
Besides the -i option of sed
you can use the tee utility.
From man:
tee - read from standard input and write to standard output and files
So, the solution would be:
sed s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html | tee | tee index.html
-- here the tee is repeated to make sure that the pipeline is buffered. Then all commands in the pipeline are blocked until they get some input to work on. Each command in the pipeline starts when the upstream commands have written 1 buffer of bytes (the size is defined somewhere) to the input of the command. So the last command tee index.html, which opens the file for writing and therefore empties it, runs after the upstream pipeline has finished and the output is in the buffer within the pipeline.
Most likely the following won't work:
sed s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html | tee index.html
-- it will run both commands of the pipeline at the same time without any blocking. (Without blocking the pipeline should pass the bytes line by line instead of buffer by buffer. Same as when you run cat | sed s/bar/GGG/. Without blocking it's more interactive and usually pipelines of just 2 commands run without buffering and blocking. Longer pipelines are buffered.) The tee index.html will open the file for writing and it will be emptied. However, if you turn the buffering always on, the second version will work too.

sed -i.bak "s#https.*\.com#$pub_url#g" MyHTMLFile.html
If you have a link to be added, try this. Search for the URL as above (starting with https and ending with.com here) and replace it with a URL string. I have used a variable $pub_url here. s here means search and g means global replacement.
It works !

The problem with the command
sed 'code' file > file
is that file is truncated by the shell before sed actually gets to process it. As a result, you get an empty file.
The sed way to do this is to use -i to edit in place, as other answers suggested. However, this is not always what you want. -i will create a temporary file that will then be used to replace the original file. This is problematic if your original file was a link (the link will be replaced by a regular file). If you need to preserve links, you can use a temporary variable to store the output of sed before writing it back to the file, like this:
tmp=$(sed 'code' file); echo -n "$tmp" > file
Better yet, use printf instead of echo since echo is likely to process \\ as \ in some shells (e.g. dash):
tmp=$(sed 'code' file); printf "%s" "$tmp" > file

And the ed answer:
printf "%s\n" '1,$s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' w q | ed index.html
To reiterate what codaddict answered, the shell handles the redirection first, wiping out the "input.html" file, and then the shell invokes the "sed" command passing it a now empty file.

I was searching for the option where I can define the line range and found the answer. For example I want to change host1 to host2 from line 36-57.
sed '36,57 s/host1/host2/g' myfile.txt > myfile1.txt
You can use gi option as well to ignore the character case.
sed '30,40 s/version/story/gi' myfile.txt > myfile1.txt

With all due respect to the above correct answers, it's always a good idea to "dry run" scripts like that, so that you don't corrupt your file and have to start again from scratch.
Just get your script to spill the output to the command line instead of writing it to the file, for example, like that:
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html
OR
less index.html | sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g
This way you can see and check the output of the command without getting your file truncated.

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!

Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.

It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.

You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.

If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.

I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?

This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

SED / AWK multi-file replacement?

I've been trying to figure out a command that will search through 13+ files and replace
all matches and variances of forms data and replace them with form data enhancements.
The trick is that there could be [whitespace] - or _ as a separator that I would like
to preserve. I'm running form command line so I believe I could run this script multiple
times and just point it at the file, or if there's a way to capture all files in a directory
(even including directory names) it might just be easier.
I believe its something to the tune of
sed "s/forms_data/form-data-enhancements/g ; s/forms-data/form-data-enhancements/g ; s/forms data/form data enhancements/g" oldfile > newfile
nut I'm not sure.....
variances might be
forms-data
forms_data
forms data
etcetra. Would someone mind sharing a bit of sed awk wisdom? The best I can find is something called an arrary replace but was unable to get any information on how to use it.
Thanks greatly.

Will this work for you -
sed -i 's/\<forms[ -_]data\>/form data enhancements/g' /path/to/files*
-i will do in-file substitution. So first pick a file and run the command without the -i option. If everything looks ok then you can go ahead and use the -i.
Update:
If you would like to retain the separators then you can do something like this -
sed -i 's/\<forms\([ -_]\)data\>/form\1data\1enhancements/' /path/to/files*

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js