How can I read data from .xlsb file in Clojure? I have tried incanter.excel which uses apache.poi which doesn't support binary format. I have seen some examples of reading .xlsb in Java using ODBC but as far as I understand JDBC-ODBC bridge was removed in Java 8.
We just added a read-only streaming reader for xlsb files on Apache POI. It will be available with 3.16-beta3.
Related
I could not find anything online.
Im planning on saving Software small software data in the file.
I've had a look at clj-exif and exif-processor
but both do not seem to return what i need
You can use Java interop to read/write any type of file. You could use output-stream
to write any type of bytes to a file, such as the bytes of a jpeg image.
See also the Clojure Cookbook online for more tips, and this list of Clojure documentation.
I have a subset of the data set called as 'million song dataset' available on the website (http://labrosa.ee.columbia.edu/millionsong/) on which I would like to perform data mining operations on SAS Enterprise Miner (13.2).
The subset I have downloaded contains 10,000 files and they are all in HDF5 format.
How do you convert hdf5 files into a format that is readable by SAS Enterprise Miner(sas7bdat)
On Windows there is an ODBC driver for HD5. If you have SAS/ACCESS ODBC then you can use that to read the file.
I don't think it's feasible to do this directly, as hdf5 seems to be a binary file format. You might be able to use another application to convert hdf5 to a plain text format and then write SAS code to import that.
I think some of the other files on this page might be easier to import:
http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset
Since many days, I inquire a lot of informations about Big Data and especially about Thrift and HDFS/Hadoop.
I have many many XML files which I want to store in a HDFS file system. (and after, make statistics etc... from the data of these files)
So I would like to serialize my XML files with Thrift. (to validate the structure and to make durable ..)
Then, stock them in HDFS.
Is it possible ? ( XML => Thrift => HDFS ) without use RPC service.
To do the test, I would like to use a linux VM (for HDFS) and PHP language (for thrift).
Thank you.
You can use the serialization part without the RPC part, yes. Look for "serializer" in the Thrift source tree, you should find some examples. If not for PHP, then for sure for some other languages.
You have to do a little work on your own, because there is not such a thing a "the" way to convert XML into Thrift structures. The steps are - roughly - as follows
define the data structures to hold the XML data as Thrift IDL constructs
generate the desired code using the Thrift Compiler
add the serializer code as needed
put together some code that
reads each XML file
builds the Thrift structures from it
serializes the data and puts them into HDFS
Depending on the layout of your XML data and on the number of XML structures used, this may need some effort. It could be an idea to generate at least the IDL file programmatically by some other tool, maybe even some of the other code needed. Thrift cannot support you with this, although it could be an option - again, depending on your current situation, language and tools available.
I'd like to read the contents of a .csv file from a website, into a c++ program. Specifically, it is financial data of the form from google finance.
http://www.google.com/finance/historical?cid=22144&startdate=Nov+1%2C+2011&enddate=Nov+14%2C+2011
(If you append "&output=csv" to the above link it will download the data as a csv file)
I know that I can use something like libcurl to download the file and then read it in from there, but I wanted to read it directly into the program without having to write it to a file first.
Can I get some suggestions on the best way to do this? I was thinking boost.asio but I have no experience with it (or network programming in general).
If you are trying to download it from a web resource you will need to implement at least some part of the HTTP protocol. libcurl will do this for you.
You don't need to save it as a file. This example will show you how to download and store it in a memory buffer.
I'm using wxWidgets to write cross-plafrom applications. In one of applications I need to be able to load data from Microsoft Excel (.xls) files, but I need this to work on Linux as well, so I assume I cannot use OLE or whatever technology is available on Windows.
I see that there are many open source programs that can read excel files (OpenOffice, KOffice, etc.), so I wonder if there is some library that I could use?
Excel files it needs to support are very simple, straight tabular data. I don't need to extract any formatting except column/row position and the data itself.
Suggestedd reference: What is a simple and reliable C library for working with Excel files?
I came across other libraries (chicago on sf.net, xlsLib) but they seem to be outdated.
jrh
I can say that I know of a wxWidgets application that reads Excel .xls and .xlsx files on any platform. For the .xlsx files we used an XML parser and zip stream reader and grab the data we need, pretty easy to get going. For the .xls files we used: ExcelFormat, which works well and we found the author to be very generous with his support.
Maybe just some encouragement to give it a go? It was a couple of days work to get working.
Maybe http://www.libxl.com/ can help ?
I think that it is not something easy to do. xls files are quite complex and it is a proprietary format.
Maybe this is a stupid idea but why don't you upload and access your doc with Google docs. There are some apis to access your doc.
2 potential problems:
- Your app needs internet access
- Currently there is no C++ api.
But there are api for several languages including python see http://code.google.com/intl/fr/apis/gdata/articles/python_client_lib.html