Import Mainframe files in Sqoop - Packed decimal Conversion - hdfs

I am trying to pull few mainframe datasets into HDFS. There is an option in sqoop that supports mainframe connectivity. The problem I have is that few mainframe files contain packed decimal(comp-3) and binary(comp) Fields.
My questions are:
Sqoop does a job of converting EBCDIC to ASCII using the mainframe plugin. However, does it support conversion of packed decimal fields by default ?
If not, how do i get this done and load into HDFS ? Any open source utilities to get this done. Suggestions would help.
Is it possible to pass the metadata(copybook) of the mainframe file through sqoop command ?
Appreciate your help!!
Thanks,
Vinoth

No
I haven't tested it, but looks promising http://rbheemana.github.io/Cobol-to-Hive/
In order to do that, the copybook has to be visible via mainframe's ftp

Related

why is django-dbbackup in .psql.bin format? Can I decode it?

I just installed django-dbbackup.. All working as per the doc (linked).
One thing slightly puzzles me. Why does it dump into a binary format which I don't know how to read? (.psql.bin). Is there a Postgres command to de-bin it?
I found by Googling, that it's possible to get a text dump by adding to settings.py
DBBACKUP_CONNECTOR_MAPPING = {
'django.db.backends.postgresql':
'dbbackup.db.postgresql.PgDumpConnector',
}
This is about 4x bigger as output, but after gzip'ping the file it's about 0.7x the size of the binary and after bzip2, about 0.5x
However, this appears to be undocumented, and I don't like using undocumented for backups! (same reason I want to be able to look at the file :-)
Why does it dump into a binary format which I don't know how to read? (.psql.bin).
You'll get a .psql.bin when using PgDumpBinaryConnector, which is the default for Postgres databases.
Is there a Postgres command to de-bin it?
The magic difference between PgDumpConnector and PgDumpBinaryConnector is the latter passes --format=custom to pgdump which is documented as (emphasis mine)
Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.
IOW, I don't think there's an off-the-shelf de-binning command for it other than pg_restoreing and pg_dumping back out as regular SQL, because you're not supposed to read it if you're not PostgreSQL.
Of course, the format is de-facto documented in the source for pg_dump...

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes.We can see original characters in the database.In session level, we are using UTF-8 encoding.We have a multi language application and need to process Chinese, Russian, Greek,Polish,Japanese etc. characters.Please suggest.
try to change your page encoding. I also faced this kind of issue. We are using ANSII encoding, hence we created separate integration service with different encoding and file ran successfully.
There is an easy option. In session properties, select target flat file then click set file propeties. In that you can change the code-page. There you can choose UTF-8. By default it is in ANSII, that's why you are facing this issue.

Possible to use Parquet files and Text (csv) files as input to same M/R Job?

I tried researching this but found no useful information. I have an M/R job already reading from parquet (not partitioned, using a thrift schema). I need to add another set of input files to the process that are not in parquet format, they're just regular csv files.
Anyone know if this is possible or how it could be done?
Never mind, I think i found what I needed in another post unrelated to parquet.
Using multiple InputFormat classes while configuring MapReduce job
Here is the information I took from the answer I linked to and adapted to my own solution:
MultipleInputs.addInputPath(job, new Path("/path/to/parquet"), ParquetInputFormat.class, ParquetMapper.class);
MultipleInputs.addInputPath(job, new Path("/path/to/txt"), TextInputFormat.class, TextMapper.class);

Export Microstrategy grid data in text format to a FTP server

Can anybody please let me know whether it is possible to export microstrategy grid data in text format to a FTP server (required access will be provided). If not directly, then can we use some kind of java coding/web services to achieve this. I don't want the process but want to understand whether this can be achieved or not?
Thanks in Advance!
You can retrieve report results (and build a new report from scratch at that) via the SDK and from there you can process the data to your liking, i.e. transform & upload to a ftp-server.
Possibly easier would be to create a file-subscription and store the file to a specific directory where you automatically pick it up and deliver it to your ftp.
There might be other solutions as well, but Yes is the answer to the "Yes/No" part of your question.

How do you convert hdf5 files into a format that is readable by SAS Enterprise Miner(sas7bdat)

I have a subset of the data set called as 'million song dataset' available on the website (http://labrosa.ee.columbia.edu/millionsong/) on which I would like to perform data mining operations on SAS Enterprise Miner (13.2).
The subset I have downloaded contains 10,000 files and they are all in HDF5 format.
How do you convert hdf5 files into a format that is readable by SAS Enterprise Miner(sas7bdat)
On Windows there is an ODBC driver for HD5. If you have SAS/ACCESS ODBC then you can use that to read the file.
I don't think it's feasible to do this directly, as hdf5 seems to be a binary file format. You might be able to use another application to convert hdf5 to a plain text format and then write SAS code to import that.
I think some of the other files on this page might be easier to import:
http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset