Delete folders that have numbers in them but don't delete folders that don't have numbers - regex

So there was a bug in our recent code, and instead of creating bunch of numbered folders in a root folder, it created bunch of folders right next to root folder
To put it simply, what we wanted was:
customers
|-18
|-158
|-405
|-1238
|-1447
...
|-4797
Unfortunately, what we got was:
customers
customers18
customers158
customers405
customers1238
customers1447
..
customers4797
Now, there are about 1000 folders (with their internal sub-folder structure) that we need to delete. I tried to look up regex and other filtering method, but it seems like they don't work on rm command.
What is the command line I need to delete all the "customers[numbers]" folders but NOT the "customers" folder?

Try the following command. It should work -
ls | grep -P "customers[0-9]+" | xargs -d"\n" rm -R

Related

Cannot use regular expression in hadoop in Linux command line

I have a folder that contains a large number of subfolders that are dates from 2018. In my HDFS I have created a folder of just December dates (formatted 2018-12-) and I need to delete specifically days 21 - 25. I copied this folder from my HDFS to my docker container and used the command
rm -r *[21-25]
in the folder it worked as expected. But when I run this same command adapted to hdfs
hdfs dfs –rm -r /home/cloudera/logs/2018-Dec/*[21-25]
it gives me the error
rm: `/home/cloudera/logs/2018-Dec/*[21-25]': No such file or directory."
If you need something to be explained in more detail leave a comment. I'm brand new to all of this and I don't 100% understand how to say some of these things.
I figured it out with the help of #Barmer. I was referring to my local systems base directory also I had to change the regular expression to 2[1-5]. So the command ended up being hdfs dfs -rm -r /user/cloudera/logs/2018-Dec/*2[1-5].

Compare Folders, Create Archive of Differences

I have folder A and folder B
Folder A contains approx 100 files all text, js, php, bash etc. They are stored in the root of the folder and sub folders and further sub folders within folder A.
Folder B is a copy of Folder A, but some of the files have been updated.
Is there any way I can compare A to B and create a tar.gz file containing only the files that have changed in Folder B
I would need to keep the folder structure intact when the tar.gz is created.
Currently I use WinMerge to check for differences, but I'm happy to look at any windows or Linux application/commands that will help with this.
Thanks
This line excludes files that are only in one or the other, but creates the tar.gz file that you want.
diff -rq folderA folderB | grep -v "^Only in" | sed "s/^.*and folderB/folderB/g" | sed "s/ differ//g" | tar czf output.tar.gz -T -
Broken down it goes:
dif -rq folderA folderB
Do a recursive diff between these folders, be quiet about it - only output the file names.
| grep -v "^Only in"
Exclude output lines that indicate one file is only in one of the folders. I'm assuming from your description this isn't an issue for you, but the two folders I was playing with were a bit dirty.
| sed "s/^.*and folderB/folderB/g"
Discard the first bit of the output up until it says " and " and then the name of the second folder. This actually takes away the second folder name as well, but then replaces it back in
| sed "s/ differ//g"
Discard the end bit of the diff output.
| tar czf output.tar.gz -T -
Tell tar to do the thing. c == create a tar file z means compress it (zip) f means the filename is coming shortly. output.tar.gz is your output file -T means "get the filenames from the file I'm about to tell you" the final - means "use stdin instead"
I suggest you build this up yourself in the individual steps so you can see how it is constructed, and what the output of each step is like.

Copy all Files in a List to a Unique Directory

I am trying to take a text file that contains a list of files and copy them all to a directory. Within this directory, they will have unique directory names. An example of text file the structure can be seen below:
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s01_2011_11_01/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s01_2011_11_01/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s02_2011_11_11/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s02_2011_11_11/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s01_2009_02_13/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s02_2010_10_02/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s03_2010_10_02/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_2.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_3.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_4.edf
I need a shell command or an EMACS macro to go through this list and copy them all to unique directories within the current working directory. The unique directory will depend on the file; for example, for the first two files, the directory would be
/001/00000003/s01_2011_11_01/
I have tried doing this using an EMACS macro, but I was not able to get it to work. A shell command or EMACs macro would work.
Something as simple as:
cat list | sed "s/^.*edf\/\(.*\)\/\(.*\)$/mkdir -p root_dir\/\1 \&\& cp \0 root_dir\/\1\/\2/" | sh
If on OSX - install gnu-sed and use gsed instead of sed. Run command without | sh to see what it'll do. Make sure to tweak root_dir, of course.

Shell script to create directories and files from a list of file names

I'm (still) not a shell-wizard, but I'm trying to find a way to create directories and files from a list of file names.
Let's take this source file (source.txt) as an example:
README.md
foo/index.html
foo/bar/README.md
foo/bar/index.html
foo/baz/README.md
I'll use this command to remove empty lines and trim useless spaces:
$ more source.txt | sed '/^$/d;s/^ *//;s/ *$//'
It will give me this list:
README.md
foo/index.html
foo/bar/README.md
foo/bar/index.html
foo/baz/README.md
Now I'm trying to loop on every line and create the related file (it it doesn't already exists), with it's parents directories.
How could I do this?
Ideally, I would put this script in an alias to quickly use it.
As always, posting a question brings me to the end of the problem...
I came to a satisfying solution, using dirname and basename in a for .. in loop:
for i in `cat source.txt | sed '/^$/d;s/^ *//;s/ *$//'`;
do mkdir -p `dirname $i`;
touch `echo $(dirname $i)$(echo "/")$(basename $i)`;
done
This one-line command will:
read the file names list
create directories
create empty files in their own directory

bz2 files compression question

When we compress a folder, we type the command tar -cjf folder.tar.bz2 folder, it compresses the entire folder into it.
Is there anyway to compress everything within the folder but the folder should not appear in the archive?
Example- when open the archive, the files within the folder will appear instead of the entire folder.
Use -C parameter of tar
tar -C folder -jcvf folder.tar.bz2 .
I tried this in my PC and it worked ;)
This should do it:
cd folder; tar -cjf ../folder.tar.bz2 *
The * at the end gets expanded by the shell to the list of all files (except hidden) in the current directory. Try echo *.
For hidden files, there are two possible approaches:
Use the ls command with its -A option (list "almost all" files, that is all except . and .. entries for this and parent directory.
cd folder; tar -cjf ../folder.tar.bz2 $(ls -A)
Use wildcard expressions (note that this doesn't work in dash, and, when any of the patterns doesn't match, you'll get it verbatim in the argument list)
cd folder; tar -cjf ../folder.tar.bz2 * .[^.]* ..?*
I think what you're referring to is simply cd folder; tar -cjf ../folder.tar.bz2 * .[^.]*, but I could be wrong. This places the filenames at the top level in the resulting archive, as opposed to after folder/.
tar -jcvf folder.tar.bz2 folder/*