I'm trying to read data from urls where CSV are compressed (gz) , I'm currently working with Excel files and CSV saved in my local machine.
How to connect and extract data from such urls?
Related
I have a .db extention file in azure data lake.
I want to fetch data from that .db file to powerbi.
Can you please help me on this?
I am able to import the .db file itself but I want data which is present inside that .db file
A .db file is probably (?) a SQLite database file. To import it's data in Power BI use the ODBC connector. To make this work you also need to install a SQLite ODBC driver on your device, that you can get from here: http://www.ch-werner.de/sqliteodbc.
I have a form that allows to upload excel files. How can I read that file in automate and import it into dataverse automatically.
I have tried this but not work.
We are able to load uncompressed CSV files and gzipped files completely fine.
However, if we want to load CSV files compressed in ".zip" - what is the best approach to move ahead?
Will we need to manually convert the zip to gz or BigQuery has added some support to handle this?
Thanks
BigQuery supports loading gzip files
The limitation is - If you use gzip compression BigQuery cannot read the data in parallel. Loading compressed CSV data into BigQuery is slower than loading uncompressed data.
You can try 42Layers.io for this. We use it to import ziped CSV files directly from FTP into BQ, and then set it on a schedule to do it every day. They also let you do field mapping to your existing tables within BQ. Pretty neat.
In my S3 bucket I have .xls file (this file is grouped file, I mean first 20 row having some image and some extract details about client).
So first I want to convert .xls into .csv then I load Redshift table through copy commands and ignore first 20 rows also.
Note: I manualy save as .xls into .csv then I try to load Redshift
table through copy commands is successfully loaded. Now my problem is
how to convert .xls into .csv through Pentaho jobs.
You can convert excel to csv by using transformation with just two steps inside:
Microsoft Excel input - it should read rows from your excel file
Text file output - it saves rows from step 1 to csv file
Using Sqoop I’ve successfully imported a few rows from a table that has a BLOB column.Now the part-m-00000 file contains all the records along with BLOB field as CSV.
Questions:
1) As per doc, knowledge about the Sqoop-specific format can help to read those blob records.
So , What does the Sqoop-specific format means ?
2) Basically the blob file is .gz file of a text file containing some float data in it. These .gz file is stored in Oracle DB as blob and imported into HDFS using Sqoop. So how could I be able to get back those float data from HDFS file.
Any sample code will of very great use.
I see these options.
Sqoop Import from Oracle directly to hive table with a binary data type. This option may limit the processing capabilities outside hive like MR, pig etc. i.e. you may need to know the knowledge of how the blob gets stored in hive as binary etc. The same limitation that you described in your question 1.
Sqoop import from oracle to avro, sequence or orc file formats which can hold binary. And you should be able to read this by creating a hive external table on top of it. You can write a hive UDF to decompress the binary data. This option is more flexible as the data can be processed easily with MR as well especially the avro, sequence file formats.
Hope this helps. How did you resolve?