How can I compress a .RData file which is a large data frame?

How can I compress a .RData file which is a large data frame? - compression

everybody!
I have a 50MB .RData file which is one large data frame. Can I compress it to 20-25MB somehow? I tried .zip and .gz, but the former almost did nothing and the latter even generated a bigger file.
Also, what's the most space-efficient way to export the workspace in RStudio? Again, using .RData generates a massive file, and I wonder if I can make it smaller.
Anybody with more experience? Thank you very much!

Related

Best way to store multiple (more than 100000) binary files in Django

I need to operate on about 100000 small (5kb) sound files. I think to use FileField and save them on disk. But I am in doubt because there will be folder with 100000 files, I am sure that can seriously hit the perfomance. What can you advise?

I don't think there will be performance issue, because FileField will only store the file path of the file you have, thus retrieving a file takes O(1) time by following the path.

Why isn't lossless compression automatic on computers?

I was just wondering what could be the impact if, say, Microsoft decided to automaticly "lossless" compress every single file saved in a computer.
What are the pros? The cons? Is it feasible?

Speed.
When compressing a file of any kind you're encoding its contents in a more compact form, often using dictionaries and/or prefix codes (An example: huffman coding). To access the data you have to uncompress it, and this translates to time and used memory, as to access a specific piece of the file you have to decompress it as a whole. While decompressing you ave to save the results somewhere and the most appropriate place is RAM.
Of course this wouldn't be a great problem (decompressing the whole file) if all of it needed to be read, and not even in the case of a stream reading it, but if a program wanted to write to the compressed file all the data would have to be compressed again, or at least a part of it.
As you can see, compressing files in the filesystem would reduce a lot the bandwidth available to applications - to read a single byte you have to read a chunk of the file and decompress it - and also require more RAM.

Library for extracting zip on the fly

I have a rather large ZIP file, which gets downloaded (cannot change the file). The quest now is to unzip the file while it is downloading instead of having to wait till the central directory end is received.
Does such a library exist?

I wrote "pinch" a while back. It's in Objective-C but the method to decode files from a zip might be a way to get it in C++? Yeah, some coding will be necessary.
http://forrst.com/posts/Now_in_ObjC_Pinch_Retrieve_a_file_from_inside-I54
https://github.com/epatel/pinch-objc

I'm not sure such a library exists. Unless you are on a very fast line [or have a very slow processor], it's unlikely to save you a huge amount of time. Decompressing several gigabytes only takes a few seconds if all the data is in ram [it may then take a while to write the uncompressed data to the disk, and loading it from the disk may add to the total time].
However, assuming the sending end supports "range" downloading, you could possibly write something that downloads the directory first [by reading the fixed header first, then reading the directory and then downloading the rest of the file from start to finish]. Presumably that's how "pinch" linked in epatel's answer works.

Compressing KML files

is there anyway to compress a KML file?
My KML files are created once pr. night. And are allways named the same. But the biggest file takes up almost 10mb. And that takes about 10 seconds to download on a computer with high speed internet. I would like for this file to either be smaller. or if there are some other ways to minimize the file content, without loosing any features i would like to know :)
Im using Openlayers to view the map and load the KML files.

The first thing to try is gzip compression - check out this - https://developers.google.com/speed/articles/gzip

KMZ files are zipped KML files.
KMZ Files

FSCTL_GET_RETRIEVAL_POINTERS fails for small files

I am using FSCTL_GET_RETRIEVAL_POINTERS to obtain file's physical offset(sectors).
The problem I am facing is I am not able to get the sectors of files whose size is 1 kb or less.
I know record of files with size 1 kb or less is stored in the MFT itself.
Can someone help me to obtain the sectors of such files?
Any kind of help will be appreciated, it will be more better if someone can provide me with the sample code doing the same.
Thanks in advance.

You need to parse the file $MFT to retrieve the physical location, the location is stored in the attribute 0x80. You can use winhex to open the raw disk and view the file records of these small files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I compress a .RData file which is a large data frame? - compression

Related

Best way to store multiple (more than 100000) binary files in Django

Why isn't lossless compression automatic on computers?

Library for extracting zip on the fly

Compressing KML files

FSCTL_GET_RETRIEVAL_POINTERS fails for small files

Categories

Resources