How do we dump data into informatica? - informatica

I have to dump data from various sources to Informatica. Sources are some manual files which would be dumped via a SFTP server, some via APIs, some with direct DB connection. In that case, how do we connect the files from the server? via some kind of connection to the SFTP server, API endpoint connection, putting DB connection via DB endpoint? In these cases, how do we authenticate? i dont want to use the username/password, is there a way to use Active Directory connect?

How does informatica authenticate if the source of the files are genuine?
If you mean the source itself, then you need to decide if the source is genuine before you create a connection to it
If you mean how to secure the connection, then that is a property of the source and defined by the owner of the source. Informatica can use almost any industry-standard secure protocols and authentication methods
Any way to scan for malicious files?
Informatica can implement any business rules you want to define to determine if the data in a file is malicious
If you are asking is there a "magic button" you can press that will tell you if a file is malicious, then the answer is no
Answer to Question about PocketETL
Once you've identified all the functionality required to implement your overall architecture, you have 2 basic options for how you satisfy these requirements:
Identify a single tool that covers as much of the functionality as possible and then fill in the gaps with other tools
simplest to implement
should "just work"
unlikely to be "best of breed" in all areas
unlikely to the cheapest solution
Implement point solutions for each area of functionality
likely to be a better solution, for you, in each area
may be cheaper
but you have to get all the components working together, which is unlikely to be trivial
you need to know how to implement and configure multiple products, not just one
So you could use Informatica to do everything or you could use PocketETL to do the first piece of data movement and then other tools to implement the rest of data pipeline

Related

Uploading large files to server

The project I'm working on logs data on distributed devices that needs to be joined in a single database on a remote server.
The logs cannot be streamed as they are recorded (network may not be available etc) so they must be sent in bulky 0.5-1GB text based csv files occasionally.
As far as I understand this means having a web service receive the data in form of post requests is out of the question because of file sizes.
So far I've come up with this approach: Use some file transfer protocol (ftp or similar) to upload files from device to server. Devices would have to figure out a unique filename to do this with. Have the server periodically check for new files, process them by committing them to the database and deleting them afterwards.
It seems like a very naive way to go about it, but simple to implement.
However, I want to avoid any pitfalls before I implement any specifics. Is this approach scaleable (more devices, larger files)? Implementation will either be done using a private/company owned server or a cloud service (Azure for instance) - will it work for different platforms?
You could actually do this through web/http as well, after setting a higher value for post request in the web server (post_max_size andupload_max_filesize for PHP). This will allow devices to interact regardless of platform. Should't be too hard to make a POST request server from any device. A simple cURL request could get this job done.
FTP is also possible. Or SCP, to make it safer.
Either way, I think this does need some application on the server to be able to fetch and manage these files using a database. Perhaps a small web application? ;)
As for the unique name, you could use a combination of the device's unique ID/name along with current unix time. You could even hash this (md5/sh1) afterwards if you like.

C++ runtime API

I want to create an application that, when executed, has runtime functions that are accessible by other applications.
For example, a C++ application that stores values in files and retrieves this information. While this application is running, any other C++ applications could access it's save and retrieve functionality to save and retrieve data, but it should have no other connection to this system.
Sounds like a simple job for web services, or a remote database, or even an LDAP server.
Store and retrieve are operations common to all of these.
If the goal is to learn some specific technology, then ask a more specific question. Otherwise, don't reinvent any wheels. There are plenty of things out there for store and retrieve.
One of the simplest "store and retrieve" APIs I know of is Berkeley DB or Sleepycat.
We built a giant, clustered, simple key based database for a major telecom company using LDAP on top of Berkeley DB (aka Sleepycat). All open-source software and commodity hardware and it supports mission critical operations for millions of customers.
A more modern rendition of this might use memcached as well.
If you go HTTP based, you can use something simple as libcurl against an Apache web server to implement "RESTful" services with GET and PUT commands.
If you run it locally (same server), and access via localhost (127.0.0.1) then there is very little latency in the TCP stack, and it amounts to little more than memcpys at the kernel level.
simple message passing would do, say, JSON over ØMQ, or i.e. all in all, msgpack-rpc or protobuf-remote or Cap'n Proto RPC

Does Biztalk Server support data exchange without use of web services

As I have very little knowledge on how ESB's work in tandem with database I'm asking a question regarding how communication can take place between the two hoping I'll atleast be pointed in the right direction to search in!
SITUATION : We have two systems(one of them is the client's) on different networks which have their own databases. We are required to do a regular real-time data exchange of all points present in our database with the other. We are also required to have a provision to be abel to import data into our system. This exchange has to follow SOA functionality over customer provided Biztalk ESB.We are supposed to provide the exchange by the use of ODBC.
Question: My query is whether it is possible to integrate the databases to the ESB as some endpoints without making any use of WEBSERVICES or extra interfaces, and send the data over the ESB as a pull-push transfer mechanism?
I have tried searching the net for this situation but have not come up with a lot of straightforward answers. Could someone please point me in the right direction.
ESB Toolkit in BizTalk is not an ESB! It is just small additional tool for some special cases.
Let's stop talk about the ESB, we need to solve the technical problem, right?
As I can understand you have two SQL databases and want to integrate them.
To do so with BizTalk the easiest way is to use the WCF-SQL ports/adapters.
You start the Wizards for this adapter, choose the tables/sp-s which should provide data/consume data, the Wizard will generate all needed Xml schemas for you.
Then you will use BizTalk Mapper to create the Xslt maps, which will transfer one SQL data format to another.
They you will create a pair of ports. One will consume data from one SQL database, the second will insert data to another SQL database. One of this port will use the mentioned above Xslt map.
If you need more processing, you could create and orchestration to manage additional processing, sophisticated error handling, etc.
I would recommend using MSMQ. There's a fairly detailed description of it here

Efficient way to transfer data from one django application to another

Currently, I'm working on a project where I have a server - client relationship between two django applications running on separate hosts.
The server has to store and provide a large amount of relational data, eg: Suppliers, Companys, Products, etc etc..
The client downloads data on request from the server and adds it to their database. clients can also upload from their station to the database to expand it.
The previous person that developed this used XMLRPC to transfer the vast (13MB typical) XML file from server to client. now really all we're sending are database agnostic objects to be stored in a database so i wondered if there was a more efficient way of doing it?
Please ask for more details if you need them, I wasn't really sure what you'd need to know
EDIT: Efficient in terms of Networking, and Server Side Processing. Clients can do the heavy lifting.
A shared database design seems more suitable. But of course there may be security, political or organisational reasons ruling that out. Plus there would be significant re-design required.
To reduce network bandwidth first check that HTTP gzip compression is enabled.
If it's just a dumb data transfer JSON would generally be a lot more compact than XMLRPC. Does the data look amenable to a straight translation to JSON? This would still require some server-side processing.
For minimal server-side processing (if the database tables are relatively similar) it may be very efficient to just send the client a dump of the relevant db query. Of course unless the tables have the same schema you would have to do some client-side processing of raw SQL, which is not ideal.

sftp versus SOAP call for file transfer

I have to transfer some files to a third party. We can invent the file format, but want to keep it simple, like CSV. These won't be big files - a few 10s of MB at most and there won't be many - 3 files per night.
Our preference for the protocol is sftp. We've done this lots in the past and we understand it well.
Their preference is to do it via a web service/SOAP/https call.
The reasons they give is reliability, mainly around knowing that they've fully received the file.
I don't buy this as a killer argument. You can easily build something into your file transfer process using sftp to make sure the transfer has completed, e.g. use headers/footers in the files, or move file between directories, etc.
The only other argument I can think of is that over http(s), ports 80/443 will be open, so there might be less firewall work for our infrastructure guys.
Can you think of any other arguments either way on this? Is there a consensus on what would be best practice here?
Thanks in advance.
File completeness is a common issue in "managed file transfer". If you went for a compromise "best practice", you'd end up running either AS/2 (a web service-ish way to transfer files that incorporates non-repudiation via signed integrity checks) or AS/3 (same thing over FTP or FTPS).
One of the problems with file integrity and SFTP is that you can't arbitrarily extend the protocol like you can FTP and FTPS. In other words, you can't add an XSHA1 command to your SFTP transfer just because you want to.
Yes, there are other workarounds (like transactional files that contain hashes of files received), but at the end of the day someone's going to have to do some work...but it really shouldn't be this hard.
If the third party you're talking to really doesn't have a non-web service call to accept large files, you might be their guinea pig as they try to navigate a brand new world. (Or, they may have jsut fired all their transmissions folks and are not just realizing that the world doesn't operate on SOAP...yet - seen that happen too.)
Either way, unless they GIVE you the magic code/utility/whatever to do the file-to-SOAP transaction for them (and that happens too), I'd stick to your sftp guns until they find the right guy on their end to talk bulk data transmissions.
SFTP is the protocol for secure file transfer, soap is an API protocol - which can be used for sending file attachments (i.e. MIME attachments), or as Base64 encoded data.
SFTP adds additional potential complexity around separate processes for encrypting/decrypting files (at-rest, if they contain sensitive data), file archiving, data latency, coordinating job scheduling, and setting-up FTP service accounts.