I am compressing a big directory about 50GB containing files and folder using putty SSH COMMAND LINE.i am using this command:
tar czspf file.tar.gz directory/
it starts work fine, but after some time it gets terminated with single word message "Terminated" and compression stopped near about 16GB of tar archive.
Is there any way to escape from terminated error or how to deal this problem, or any other method to make a tar of directory avoiding the terminate error.Thanks
Probably you conflict with some kind of file size limit. Not all file system supports very big files. In this case you could pipe the output of the tar into a split command like this:
tar czsp directory/|split -b4G fileprefix-
Related
This command copies a huge number of files from Google Cloud storage to my local server.
gsutil -m cp -r gs://my-bucket/files/ .
There are 200+ files, each of which is over 5GB in size.
Once all files are downloaded, another process kicks in and starts reading the files one by one and extract the info needed.
The problem is, even though the gsutil copy process is fast and downloads files in batches of multiple files at a very high speed, I still need to wait till all the files are downloaded before starting to process them.
Ideally I would like to start processing the first file as soon as it is downloaded. But with multiple cp mode, there seems to be no way of knowing when a file is downloaded (or is there?).
From Google docs, this can be done in individual file copy mode.
if ! gsutil cp ./local-file gs://your-bucket/your-object; then
<< Code that handles failures >>
fi
That means if I run the cp without -m flag, I can get a boolean on success for that file and I can kick off the file processing.
Problem with this approach is the overall download will take much longer as files are now downloading one by one.
Any insight?
One thing you could do is have a separate process that periodically lists the directory, filtering out the files that are incompletely downloaded (they are downloaded to a filename ending with '.gstmp' and then renamed after the download completes) and keeps track of files you haven't yet processed. You could terminate the periodic listing process when the gsutil cp process completes, or you could just leave it running, so it processes downloads for the next time you download all the files.
Two potential complications with doing that are:
If the number of files being downloaded is very large, the periodic directory listings could be slow. How big "very large" is depends on the type of file system you're using. You could experiment by creating a directory with the approximate number of files you expect to download, and seeing how long it takes to list. Another option would be to use the gsutil cp -L option, which builds a manifest showing what files have been downloaded. You could then have a loop reading through the manifest, looking for files that have downloaded successfully.
If the multi-file download fails partway through (e.g., due to a network connection that's dropped for longer than gsutil will retry), you'll end up with a partial set of files. For this case you might considering using gsutil rsync, which can be restarted and pick up where you left off.
I have a large ~10GB zip file that was created using the standard Windows method (right click, select "send to compressed (zipped) folder"). I am able to unzip it just file on my Macbook.
I'm trying to unzip it on an EC2 machine. I know the file is a zip file because when I run file file.zip it says:
file.zip: Zip archive data, at least v2.0 to extract
Running unzip returns the following error:
Archive: file.zip
warning [file.zip]: 3082769992 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [file.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
Running tar xvf file.zip returns the following:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Archive contains `<{\204\027\333"D\344\210\321o\331' where numeric off_t value expected
tar: Archive contains `S\354\202},F\3546\276[\265\273' where numeric time_t value expected
tar: Archive contains ``3c\254\372$:e' where numeric uid_t value expected
tar: Archive contains `\265\306\025+ÜĞL\352' where numeric gid_t value expected
...etc
Does anyone know what might be going wrong?
Actually, 7-zip should makes this well, you can install it by:
sudo apt-get install p7zip-full
Then, you can extract your zip file as follows:
7z e file.zip
If your zip archive has 88,000 files and you are dealing with ~10Gig of content, you will need an unzip program that supports the zip64 extension.
You can check if your unzip program supports zip64 like this
$ unzip -v | grep -i zip64
ZIP64_SUPPORT (archives using Zip64 for large files supported)
If it doesn't have ZIP64_SUPPORT, you are out of luck. I suspect your unzip doesn't support zip64.
Alternatives are to get a version of unzip that does support zip64 or use an alternative program, like z-zip.
Your Entire File did not zip most probably and you prematurely moved it. At Least that was the issue with me.
I was unable to install 7z on my machine due to no sudo access, but I managed to repair the archive using
zip -FF archive.zip --out archive_repaired.zip -fz
and unzip worked on the repaired archive.
I found the solution via this github issue
I am working on iMX6 Arm processor based hardware platform with embedded linux. I am using the tar command to compress a directory containing 3 files. When I try to decompress the folder to get back the three files I am facing different issues:
I get one or two files after decompressing.
The data in the original file and the decompressed file, there is data mismatch.
I am using these commands to create and decompress the tar file:
nohup tar -zcpf /home/root/upload_new_U1/unit1-`(date +%Y%m%d_%H-00)`.tar.gz upload_now_U1
tar -zxpf unit1-`(date +%Y%m%d_%H-00)`.tar.gz
Please help.
I was using VirtualBox on my PC(WIN 7)
I managed to View some files in my .VDI file..
How can I open or view the contents of my .vdi file and retrieve the files from there?
I had a corrupted VDI file (according to countless VDI-viewer programs I've used with cryptic errors like invalid handle, no file selected, please format disk) and I was not able to open the file, even with VirtualBox. I tried to convert it using the VirtualBox command line tools, with no success. I tried mounting it to a new virtual machine, tried mounting it with ImDisk, no dice. I read four Microsoft TechNet articles, downloaded their utilities and tried countless things; no success.
However, when I tried 7Zip (https://www.7-zip.org/download.html) I was able to view all of the files, and extract them selectively. Here's how I did it:
install 7zip (make sure that you also install the context-menu items, if prompted.)
right-click on the VDI file, select "Open Archive"
when the window appears, right click on the largest file in the archive (there should be two files, one is "Basic Microsoft Data Partition" and the other one something else, called system or something.) Left click on the largest one and click "Open inside". The file size is listed to the right of each file in bytes.
you should see all of the files inside of the archive. You can drag files that you'd like to extract right to your desktop. You can double click on folders to view inside them too.
If 7zip gives you a cryptic error after extracting the files, it means that you closed the folder's window that you are copying files to in Windows Explorer.
If you didn't close the window and you're still getting an error, try extracting each sub-folder individually. Also make sure that you have enough local hard drive space to copy the files to, even if you are copying them just to an external disk, as 7zip copies them first to your local disk. If the files are highly compressible, you might be able to get away with using NTFS compression for the AppData/temp folder so that when 7zip extracts the files locally, it'll compress them so that it can copy them over to your other disk.
You can mount partitions from .vdi images using qemu-nbd:
sudo apt install qemu-utils
sudo modprobe nbd
vdi="/path/to/your.vdi" # <<== Edit this
sudo qemu-nbd -c /dev/nbd0 "$vdi"
# view partitions and select the one you want to mount.
# Using parted here, but you can also use cfdisk, fdisk, etc.
sudo parted /dev/nbd0 print
part=nbd0p2 # <<== partition you want to mount
sudo mkdir /mnt/vdi
sudo mount /dev/$part /mnt/vdi
Some users seem to need to add a parameter to the modprobe command. I didn't with Ubuntu 16.04, but if it doesn't work for you, try adding max_part=16 :
sudo modprobe nbd max_part=16
When done:
sudo umount /dev/$part
sudo qemu-nbd --disconnect /dev/nbd0
Try out VMXray.
You can explore your vmdk image right inside your browser. Select the files that you want to extract and extract them to the desired location. Not just vmdk, you can use VMXRay for looking into and extracting files from RAW, QEMU/KVM QCOW2, Virtualbox VDI, and ISO images. ext2, ext3, FAT and NTFS are current supported file systems. You can also use this to recover deleted photos from raw dumps of your camera's SD card, for example.
And, do not worry, no data from your files is ever sent over the network. Data never leaves your machine. VMXRay works completely inside your browser.
As a first approach you can simply try any archive viewer to open .vdi file.
I tried 7zip to open Ubuntu Mate .vdi file and it shown all Linux file system like below.
An easy way is to attach the VDI as a second disk in another Virtual Machine.
The drive does not appear immediately; in Windows go to Disk Manager, bring the disk online and assign it a drive letter.
You can use ImDisk to mount VDI file as a local drive in Windows. Follow this virtualbox forum thread and become happy )) Also you can convert VDI to VHD and use default Windows Disk manager to mount VHD (described here)
Is there a faster way to remove a directory then simply submitting
rm -r -f *directory*
? I am asking this because our daily cross-platform builds are really huge (e.g. 4GB per build). So the harddisks on some of the machines are frequently running out of space.
This is namely the case for our AIX and Solaris platforms.
Maybe there are 'special' commands for directory remove on these platforms?
PASTE-EDIT (moved my own separate answer into the question):
I am generally wondering why 'rm -r -f' is so slow. Doesn't 'rm' just need to modify the '..' or '.' files to de-allocate filesystem entries.
something like
mv *directory* /dev/null
would be nice.
For deleting a directory from a filesystem, rm is your fastest option.
On linux, sometimes we do our builds (few GB) in a ramdisk, and it has a really impressive delete speed :) You could also try different filesystems, but on AIX/Solaris you may not have many options...
If your goal is to have the directory $dir empty now, you can rename it, and delete it later from a background/cron job:
mv "$dir" "$dir.old"
mkdir "$dir"
# later
rm -r -f "$dir.old"
Another trick is that you create a seperate filesystem for $dir, and when you want to delete it, you just simply re-create the filesystem. Something like this:
# initialization
mkfs.something /dev/device
mount /dev/device "$dir"
# when you want to delete it:
umount "$dir"
# re-init
mkfs.something /dev/device
mount /dev/device "$dir"
I forgot the source of this trick but it works:
EMPTYDIR=$(mktemp -d)
rsync -r --delete $EMPTYDIR/ dir_to_be_emptied/
On AIX at least, you should be using LVM, the logical volume manager. All our systems bundle all the physical hard drive into a single volume group and then create one big honkin' file system out of that.
That way, you can add physical devices to your machine at will and increase the size of your file system to whatever you need.
One other solution I've seen is to allocate a trash directory on each file system and use a combination of mv and a find cron job to tackle the space problem.
Basically, have a cron job that runs every ten minutes and executes:
rm -rf /trash/*
rm -rf /filesys1/trash/*
rm -rf /filesys2/trash/*
Then, when you want your specific directory on that file system recycled, use something like:
mv /filesys1/overnight /filesys1/trash/overnight
and, within the next ten minutes your disk space will start being recovered. The filesys1/overnight directory will immediately be available for use even before the trashed version has started being deleted.
It's important that the trash directory be on the same filesystem as the directory you want to get rid of, otherwise you have a massive copy/delete operation on your hands rather than a relatively quick move.
rm -r directory works by recursing depth-first down through directory, deleting files, and deleting the directories on the way back up. It has to, since you cannot delete a directory that is not empty.
Long, boring details: Each file system object is represented by an inode in the file system, which has file system-wide, flat array of inodes.[1] If you just deleted directory without first deleting its children then the children would remain allocated, but without any pointers to them. (fsck checks for that kind of thing when it runs, since it represents file system damage.)
[1] That may not be strictly true for every file system out there, and there may be a file system that works the way you describe. It would possibly require something like a garbage collector. However, all the common ones I know of act like fs objects are owned by inodes, and directories are lists of name/inode number pairs.
If rm -rf is slow, perhaps you are using a "sync" option or similar, which is writing to the disk too often. On Linux ext3 with normal options, rm -rf is very quick.
One option for fast removal which would work on Linux and presumably also on various Unixen is to use a loop device, something like:
hole temp.img $[5*1024*1024*1024] # create a 5Gb "hole" file
mkfs.ext3 temp.img
mkdir -p mnt-temp
sudo mount temp.img mnt-temp -o loop
The "hole" program is one I wrote myself to create a large empty file using a "hole" rather than allocated blocks on the disk, which is much faster and doesn't use any disk space until you really need it. http://sam.nipl.net/coding/c-examples/hole.c
I just noticed that GNU coreutils contains a similar program "truncate", so if you have that you can use this to create the image:
truncate --size=$[5*1024*1024*1024] temp.img
Now you can use the mounted image under mnt-temp for temporary storage, for your build. When you are done with it, do this to remove it:
sudo umount mnt-temp
rm test.img
rmdir mnt-temp
I think you will find that removing a single large file is much quicker than removing lots of little files!
If you don't care to compile my "hole.c" program, you can use dd, but this is much slower:
dd if=/dev/zero of=temp.img bs=1024 count=$[5*1024*1024] # create a 5Gb allocated file
I think that actually there is nothing else than "rm -rf" as you quoted to delete your directories.
to avoid doing it manually over and over you can cron daily a script that recursively deletes all the build directories of your build root directory if they're "old enough" with something like :
find <buildRootDir>/* -prune -mtime +4 -exec rm -rf {} \;
(here mtime +4 indicates "any file older than 4 days)
Another way would be to configure your builder (if it allows such things) to crush the previous build with the current one.
I was looking into this as well.
I had a dir with 600,000+ files.
rm * would fail, because there are too many entries.
find . -exec rm {} \; was nice, and deleting ~750 files every 5 seconds. Was checking the rm rate via another shell.
So, instead I wrote a short script to rm many files at once. Which obtained about ~1000 files every 5 seconds. The idea is to put as many files into 1 rm command as you can to increase the efficiency.
#!/usr/bin/ksh
string="";
count=0;
for i in $(cat filelist);do
string="$string $i";
count=$(($count + 1));
if [[ $count -eq 40 ]];then
count=1;
rm $string
string="";
fi
done
On Solaris, this is the fastest way I have found.
find /dir/to/clean -type f|xargs rm
If you have files with odd paths, use
find /dir/to/clean -type f|while read line; do echo "$line";done|xargs rm
Use
perl -e 'for(<*>){((stat)[9]<(unlink))}'
Please refer below link:
http://www.slashroot.in/which-is-the-fastest-method-to-delete-files-in-linux
Needed to delete 700 Gbytes from dozens of directories on AWS EBS 1 TB disk (ext3) before copying remainder to a new 200 Gbyte XFS volume. It was taking hours leaving that volume at 100%wa. Since the disk IO and server time are not free, this took only a fraction of a second per directory.
where /dev/sdb
is an empty volume of any size
directory_to_delete=/ebs/var/tmp/
mount /dev/sdb $directory_to_delete
nohup rsync -avh /ebs/ /ebs2/
I coded a small Java application RdPro (Recursive Directory Purge tool) which is faster than rm. It also can remove target directories user specified under a root.Works for both Linux/Unix and Windows. It has both a command line version and a GUI version.
https://github.com/mhisoft/rdpro
I had to delete more than 3,00,000 files in windows. I had cygwin installed. Luckily i had all the primary directory in a database. Created a for loop and based on line entry and delete using rm -rf
I just use find ./ -delete in the folder to empty, and it has deleted 620000 directories (total size) 100GB in arround 10 minutes.
Source : a comment in this site https://www.slashroot.in/comment/1286#comment-1286