is there anyway to compress a KML file?
My KML files are created once pr. night. And are allways named the same. But the biggest file takes up almost 10mb. And that takes about 10 seconds to download on a computer with high speed internet. I would like for this file to either be smaller. or if there are some other ways to minimize the file content, without loosing any features i would like to know :)
Im using Openlayers to view the map and load the KML files.
The first thing to try is gzip compression - check out this - https://developers.google.com/speed/articles/gzip
KMZ files are zipped KML files.
KMZ Files
Related
Is it possible to concatenate 1000 CSV file that have header into one file with no duplicated header directly in Google Cloud Storage? I could easily do this by downloading the file into my local hard drive but I would prefer to do it natively in Cloud Storage.
They all have same columns, and have header row.
I wrote an article to handle CSV files with BigQuery. To avoid several files, and if the volume is less than 1Gb, the recommended way is the following
Create a temporary table in BigQuery with all your CSV.
Use the Export API (not the export function)
Let me know if you need more guidance.
The problem with most solutions is that you still end up with a large number of split files where you have to then strip the headers and join them, etc...
Any method of avoiding multiple files tends to be also quite a lot of extra work.
It gets to be quite a hassle especially when big query spits out 3500 split gzipped csv files.
I needed a simple and batch file automatable method for achieving this.
Therefore wrote a CSV Merge (Sorry windows only though) to solve exactly this problem.
https://github.com/tcwicks/DataUtilities
Download latest release, unzip and use.
Also wrote an article on with scenario and usage examples:
https://medium.com/#TCWicks/merge-multiple-csv-flat-files-exported-from-bigquery-redshift-etc-d10aa0a36826
Hope it is of use to someone.
p.s. Recommend tab delimited over CSV as it tends to have less data issues.
I am facing an issue in Spark while reading data as the input partitions are huge and I am getting Slow Down 503 error in Spark.
After checking with AWS team they mentioned this is happening in reading files since the request rate is too high.
One of the solution they provided is to combine small files into Bigger one so we can reduce the number of files. Does anyone knows how to merge the small files in S3 into bigger file ? Is there any utility available for doing this ?
Please note that, I am not referring to small part files under one partition. Say I have Level 1 partition as Created_date and level 2 partition VIN . I have one part file under each VIN, but there are too many partitions for VIN. So I am exploring if we can merge these several VIN's part files in S3 into generic CSV then we can handle this issue of S3 slow down.
Your answers are much appreciated!.
Thanks and regards,
Raghav Chandra Shetty
First off I'm not familiar with "Spark".
Combining files in S3 is not possible. S3 is just a place to put your files as is. I think what AWS support is telling you is that you can reduce the number of calls you make by simply having less files. So it's up to you BEFORE you upload your files to S3 to make then bigger (combine them). Either by placing more data into each file or creating a tarball/zip.
You can get similar if not better speeds, plus save on your request limit, by downloading 1, 100MB file then downloading 100, 1MB files. Then you can start taking advantage of the MultiPart Upload/Download feature of S3.
I am trying to download some files via mechanize. Files smaller than 1GB are downloaded without causing any trouble. However, if a file is bigger than 1GB the script runs out of memory:
The mechanize_response.py script throws out of memory at the following line
self.__cache.write(self.wrapped.read())
__cache is a cStringIO.StringIO, It seems that it can not handle more than 1GB.
How to download files larger than 1GB?
Thanks
It sounds like you are trying to download the file into memory but you don't have enough. Try using the retrieve method with a file name to stream the downloaded file to disc.
I finally figured out a work around.
Other than using browser.retrieve or browser.open I used mechanize.urlopen which returned the urllib2 Handler. This allowed me to download files larger than 1GB.
I am still interested in figuring out how to make retrieve work for files larger than 1GB.
I have a rather large ZIP file, which gets downloaded (cannot change the file). The quest now is to unzip the file while it is downloading instead of having to wait till the central directory end is received.
Does such a library exist?
I wrote "pinch" a while back. It's in Objective-C but the method to decode files from a zip might be a way to get it in C++? Yeah, some coding will be necessary.
http://forrst.com/posts/Now_in_ObjC_Pinch_Retrieve_a_file_from_inside-I54
https://github.com/epatel/pinch-objc
I'm not sure such a library exists. Unless you are on a very fast line [or have a very slow processor], it's unlikely to save you a huge amount of time. Decompressing several gigabytes only takes a few seconds if all the data is in ram [it may then take a while to write the uncompressed data to the disk, and loading it from the disk may add to the total time].
However, assuming the sending end supports "range" downloading, you could possibly write something that downloads the directory first [by reading the fixed header first, then reading the directory and then downloading the rest of the file from start to finish]. Presumably that's how "pinch" linked in epatel's answer works.
In Django I'm looking for a way to serve several different files at once. I can't use static archives (.zip, .tar, etc.) because I don't have enough storage to cache these files and it will take far too long to generate them on the fly (each could be in the 100s of megabytes).
Is there a way I can indicate to the browser that several files are coming its way? Perhaps there is a container format that I can indicate before streaming files to the user?
Edit: There could be hundreds of files in each package so asking the user to download each one is very time consuming.
Ah, the .tar file format can be streamed. I'll experiment with this for now.
http://docs.python.org/library/tarfile.html