Informatica - Number Datatype issue

Informatica - Number Datatype issue - informatica

I am loading data from flat-file to oracle table.
In flat-file I have a field, which holds the value like "871685900000027865" and its datatype in SourceQualifier is Decimal.
But in oracle target it is loading as
"8.71685900000003E17"
While running Debugger, I found out that, In the Source Qualifier itself data is changed to exponential form.
Please suggest an easy approach to load data as it is into target.
Client Screenshot For reference

Use "Enable High Precision" session property.
I'd also add that in the Flat File it's a string. Flat files do not have any datatype definititions - these are just flat text files. So once you've specified Decimal in Source Qualifier, it tries to do the conversion for you. And with High Precision not enabled, it will use the exponential form. This is by design.
But again: what you get from DB strictly depends on the table definition and client tool that you're using. Can you share or check the table definition? If the column is decimal, it should not store data in this form.

Related

What are the differences between Object Storages for example S3 and a columnar based Technology

I was thinking about the difference between those two approches.
Imagine you must handle information about pattern calls, which later should be
displayed to the user. A pattern call is a tuple consisting of a unique integer
identifier ("id"), a user defined name (“name"), a project relative path to the so
called pattern file ("patternFile") and a convenience flag, which states whether
the pattern should be called or not called. And the number of tuples are not known before and they won't be modified after initialization.
I thought that in this case a column based approach with big query for example would be better in terms of I/O and performance as well as the evolution of the schema. But actually I can't understand why. I would appreciate any help.

Amazon S3 is like a large key-value store. The Key is the filename (with full path) and the Value is the contents of the file. It's just a blob of data.
A columnar data store organizes data in such a way that specific data can be "jumped to", and only desired values need to be read from disk.
If you are wanting to perform a search on the data, then some form of logic is required on the data. This could be done by storing data in a database (typically a proprietary format) or by using a columnar storage format such as Parquet and ORC plus a query engine that understands this format (eg Amazon Athena).
The difference between S3 and columnar data stores is like the difference between a disk drive and an Oracle database.

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!

If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

MySQL check if table has correct schema

I am currently developing server software in C++ with a MySQL data backend. I am using the official MySQL/connector library from Oracle to work with MySQL. The connection itself is working and I'm not having any issues with that.
My problem is that the database and the table schemas tend to change every once in a while because new tables and columns keep getting added. Also exiting column may be changed for the same reason. To make sure I recognize outdated server software quickly I wanted to add a warning when the database has changed.
My first idea was to hardcode how the database (and tables and such) should look and then check whether the current database matches the hardcoded data. But I have no clue how to achive that.
In summary I want to be able to detect whether
A table has been added or removed
A column in a table has been altered
A column in a table has been added or removed
with as little C++ code as possible. Also it should be quite easy to maintain.
Additional information will be added when required.

I would suggest the following approach:
1) fork and execute the mysql command line client. Set up a pair of pipes, to mysql's standard input and output.
2) At this point you should be able to execute simple commands by piping them to mysql via the standard input pipe, and read the output from the standard output pipe.
You will need to make careful notes as to the output format of each mysql command, so that you know when you finished reading its output, and you can send the next command.
3) As the first order of being, execute:
show tables;
The output that comes back will list all tables in the database. Parsing the output into a list of table names is trival. Then execute for each table:
show create table <tablename>;
The resulting output shows all fields in the table, its keys, and constraints. Pretty much all of this table's schema. Lather, rinse, repeat, for every table.
4) In this manner you can capture a basic schema of the entire database, for comparison purposes. If necessary, use the same approach to capture the triggers, and other objects. You'll likely need to do some minor massaging of the data, and exclude a few bits. "show create table", for example, will include the current AUTO_INCREMENT values, which you can ignore.
This general approach, of driving a mysql process via its standard input and output, is bit wobbly, of course. With a little bit of work, you can use mysql's native client library, and execute all of these commands, and capture their results, directly. This should be more reliable.

libpq: get data type

I am coding a cpp project with the database "postgreSQL".
I created a table in my database its type is character varying(40).
Now I need to SELECT these data FROM the table in my cpp project. I knew that I should use the library libpq, this is the interface of "postgreSQL" for c/cpp.
I have succeeded in selecting data from the table. Now I am considering if it's possible to get the data type of this table. For example, here I want to get character varying(40).

You need to use PQftype.
As described here: http://www.idiap.ch/~formaz/doc/postgreSQL/libpq-chapter17861.htm
And just take a look here about decoding return values: http://www.postgresql.org/message-id/da7021e0608040738l3b0880a1q5a76b838937f8c78#mail.gmail.com
You must also use PQfsize to get field size.

What is a good design pattern to implement a dynamic data importer tool?

We are planning to build a dynamic data import tool. Basically taking information on one end in a specified format (access, excel, csv) and upload it into an web service.
The situation is that we do not know the export field names, so the application will need to be able to see the wsdl definition and map to the valid entries in the other end.
In the import section we can define most of the fields, but usually they have a few that are custom. Which I see no problem with that.
I just wonder if there is a design pattern that will fit this type of application or help with the development of it.

I am not sure where the complexity is in your application, so I will just give an example of how I have used patterns for importing data of different formats. I created a factory which takes file format as argument and returns a parser for particular file format. Then I use the builder pattern. The parser is provided with a builder which the parser calls as it is parsing the file to construct desired data objects in application.
// In this example file format describes a house (complex data object)
AbstractReader reader = factory.createReader("name of file format");
AbstractBuilder builder = new HouseBuilder(list_of_houses);
reader.import(text_stream, builder);
// now the list_of_houses should contain an extra house
// as defined in the text_stream

I would say the Adaptor Pattern, as you are "adapting" the data from a file to an object, like the SqlDataDataAdapter does it from a Sql table to a DataTable
have a different Adaptor for each file type/format? example SqlDataAdptor, MySqlDataAdapter, they handle the same commands but different datasources, to achive the same output DataTable
Adaptor pattern
HTH
Bones

Probably Bridge could fit, since you have to deal with different file formats.
And Façade to simplify the usage. Handle my reply with care, I'm just learning design patterns :)

You will probably also need Abstract Factory and Command patterns.
If the data doesn't match the input format you will probably need to transform it somehow.
That's where the command pattern come in. Because the formats are dynamic, you will need to base the commands you generate off of the input. That's where Abstract factory is useful.

Our situation is that we need to import parametric shapes from competitors files. The layout of their screen and data fields are similar but different enough so that there is a conversion process. In addition we have over a half dozen competitor and maintenance would be a nightmare if done through code only. Since most of them use tables to store their parameters for their shapes we wrote a general purpose collection of objects to convert X into Y.
In my CAD/CAM application the file import is a Command. However the conversion magic is done by a Ruleset via the following steps.
Import the data into a table. The field names are pulled in as well depending on the format.
We pass the table to a RuleSet. I will explain the structure the ruleset in a minute.
The Ruleset transform the data into a new set of objects (or tables) which we retrieve
We pass the result to the rest of the software.
A RuleSet is comprise of set of Rules. A Rule can contain another Rule. A rule has a CONDITION that it tests, and a MAP TABLE.
The MAP TABLE maps the incoming field with a field (or property) in the result. There are can be one mapping or a multitude. The mapping doesn't have to involve just poking the input value into a output field. We have a syntax for calculation and string concatenation as well.
This syntax is also used in the Condition and can incorporate multiple files like ([INFIELD1] & "-" & [INFIELD2])="A-B" or [DIM1] + [DIM2] > 10. Anything between the brackets is substituted with a incoming field.
Rules can contain other Rules. The way this works is that in order for a sub Rule mapping to apply both it's condition and those of it's parent (or parents) have to be true. If a subRule has a mapping that conflicts with a parent's mapping then the subRule Mapping applies.
If two Rules on the same level have condition that are true and have conflicting mapping then the rule with the higher index (or lower on the list if you are looking at tree view) will have it's mapping apply.
Nested Rules is equivalent to ANDs while rules on the same level are equivalent of ORs.
The result is a mapping table that is applied to the incoming data to transform it to the needed output.
It is amicable to be being displayed in a UI. Namely a Treeview showing the rules hierarchy and a side panel showing the mapping table and conditions of the rule. Just as importantly you can create wizards that automate common rule structures.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js