I have been told that CORBA programming is not modern and that I should
use newer technologies. OK ...
But what I appreciated in the CORBA world was the POA (despite its complexity)
because it was very flexible and gave me the opportunity to choose
adequate policies to my distributed objects.
Are there things similar to POA in the WEB Services world ? or should I code
it myself ?
Thanks for your replies !
Batches are a new approach to relational database access, remote procedure calls, and web services.
A Remote Batch statement combines remote and local execution: all the remote code is executed in a single round-trip to the server, where all data sent to the server and results from the batch are communicated in bulk. RBI supports remote blocks, iteration and conditionals, and local handling of remote exceptions. RBI is efficient even for fine-grained interfaces, eliminating the need for hand-optimized server interfaces.
Batch services also provide a simple and powerful interface to relational databases, with support for arbitrary nested queries and bulk updates. One important property of the system is that a single batch statement always generates a constant number of SQL queries, no matter how many nested loops are used.
Related
I am redesigning a small monolith ETL software written in Python. I find a microservice architecture suitable as it will give us the flexibility to use different technologies if needed (Python is not the nicest language for enterprise software in my opinion). So if we had three microservices (call them Extract, Transform, Load), we could use Java for Transform microservice in the future.
The problem is, it is not feasible here to pass the result of a service call in an API response (say HTTP). The output from Extract is going to be gigabytes of data.
One idea is to call Extract and have it store the results in a database (which is really what that module is doing in the monolith, so easy to implement). In this case, the service will return only a yes/no response (was the process successful or not).
I was wondering if there were a better way to approach this. What would be a better architecture? Is what I'm proposing reasonable?
If your ETL process works on individual records (some parallelize-able units of computation), then there are a lot of options you could go with, here are a few:
Messaging System-based
You could base your processing around a messaging system, like Apache Kafka. It requires a careful setup and configuration (depending on durability, availability and scalability requirements of your specific use-cases), but may give you a better fit than a relational db.
In this case, the ETL steps would work completely independently, and just consume some topics, produce into some other topics. Those other topics are then picked up by the next step, etc. There would be no direct communication (calls) between the E/T/L steps.
It's a clean and easy to understand solution, with independent components.
Off-the-shelf processing solutions
There are a couple of OTS solutions for data processing/computation and transformation: Apache Flink, Apache Storm, Apache Spark.
Although these solutions would obviously confine you to one particular technology, they may be better than building a similar system from scratch.
Non-persistent
If the actual data is streaming/record-based, and it is not required to persist the results between steps, you could just get away with long-polling the HTTP output of the previous step.
You say it is just too much data, but that data doesn't have to go to the database (if it's not required), and could just go to the next step instead. If the data is produced continuously (not everything in one batch), on the same local network, I don't think this would be a problem.
This would be technically very easy to do, very simple to validate and monitor.
I would suggest you to have a look into the Apache flink, It is very similar to what big sized enterprise apps like informatica, talend and data stage mappings but it process in a smaller scale but repetitively. It actually helps you to compute and transform the stuff on the fly/as they arrive and then store/load into a file/db.
The current infra we have with flink process close 28.5GB per every 4 hours and it just works. In the initial days, we had to run our daily batch and the flink stream to ensure both of them are producing consistent results and eventually most of the streams were left active and the daily batches were retired gradually.
Hope it helps someone.
There's none preventing you to have an SFTP server containing CSV or database storing the results. You can do whatever make senses. Using messaging to pass gigabytes of data, or streaming through HTTP may or may not make senses for your case.
This is an interesting problem. The best solution for this could be Reactive Spring Boot. You can have your Extract service to be as a Reactive Spring Boot app and instead of sending GBs of data, stream the data to the required service.
Now you might be wondering that while streaming, it might hold on the working thread. The answer is NO. IT works at the OS level. It doesn't hold up any request thread to stream the results. That's the beauty of the Reactive Spring Boot.
Go through this and explore
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1
I'm creating a web-application and decided to use micro-services approach. Would you please tell me what is the best approach or at least common to organize access to the database from all web-services (login, comments and etc. web-services). Is it well to create DAO web-service and use only it to to read/write values in the database of the application. Or each web-service should have its own dao layer.
Each microservice should be a full-fledged application with all necessary layers (which doesn't mean there cannot be shared code between microservices, but they have to run in separate processes).
Besides, it is often recommended that each microservice have its own database. See http://microservices.io/patterns/data/database-per-service.html https://www.nginx.com/blog/microservices-at-netflix-architectural-best-practices/ Therefore, I don't really see the point of a web service that would only act as a data access facade.
Microservices are great, but it is not good to start with too many microservices right away. If you have doubt about how to define the boundaries between microservices in your application, start by a monolith (all the time keeping the code clean and a good object-oriented with well designed layers and interfaces). When you get to a more mature state of the application, you will more easily see the right places to split to independently deployable services.
The key is to keep together things that should really be coupled. When we try to decouple everything from everything, we end up creating too many layers of interfaces, and this slows us down.
I think it's not a good approach.
DB operation is critical in any process, so it must be in the DAO layer inside de microservice. Why you don't what to implement inside.
Using a service, you loose control, and if you have to change the process logic you have to change DAO service (Affecting to all the services).
In my opinion it is not good idea.
I think that using Services to expose data from a database is ideal due to the flexibility it provides. Development of a REST service to expose some or all of your data as a service provides flexibility to consume the data directly to the UI via AJAX or by other services which can process the data and generate new information. These consumers do not need to implement a DAO and can be in any language. While a REST Service of your entire database is probably not a Micro-Service, a case could be made for breaking this down as Read only for Students, Professors and Classes for exposing on the School Web site(s), with different services for Create, Update and Delete (CUD) available only to the Registrars office desktop applications.
For example building a Service to exposes a statistical value on data will protect the data from examination by a user/program who only needs a statistical value without the requirement of having the service implement an entire DAO for the components of that statistic. Full function databases like SQL Server or Oracle provide a lot of functionality that application developers can use, including complex queries(using indexes), statistics the application of set operations on data.
Having a database service is a completely valid pattern. In fact, this is one of the key examples of where to start to export aspects of a monolith to a micro service in the Building Microservices book.
How to organize your code around such idea is a different issue. Yes, from the db client programmer's stand point, having the same DAO layer on each DB client makes a lot of sense.
The DAO pattern may be suitable to bind your DB to one programming language that you use. But then you need to ask yourself why you are exposing your database as a web service if all access to it will be mediated by the same DAO infrastructure. Or are you going to create one DAO pattern for each client programming language binding?
If all database clients are going to be written on the same programming language, then are you sure you really need to wrap your DB as a microservice? After all, the DB is usually already a remote service with a well-defined network protocol optimized to transfer data fast and reliably. Why adding HTTP on top of it? What are you expecting to gain from adding such complexity?
Another problem with using the DAO pattern is that the DAO structure does not necessarily follow the evolution of the web service. The web service may evolve in a way that does not make old clients incompatible. You may have different clients using different features of the micro service. In this case you are not sharing the same DAO layer structure on each client.
Make sure you are not using RPC-style programming over web services, which does not make much sense. You will be basically throwing away one of the key advantages of micro services, which is the decoupling between service and client.
I want to create an application that, when executed, has runtime functions that are accessible by other applications.
For example, a C++ application that stores values in files and retrieves this information. While this application is running, any other C++ applications could access it's save and retrieve functionality to save and retrieve data, but it should have no other connection to this system.
Sounds like a simple job for web services, or a remote database, or even an LDAP server.
Store and retrieve are operations common to all of these.
If the goal is to learn some specific technology, then ask a more specific question. Otherwise, don't reinvent any wheels. There are plenty of things out there for store and retrieve.
One of the simplest "store and retrieve" APIs I know of is Berkeley DB or Sleepycat.
We built a giant, clustered, simple key based database for a major telecom company using LDAP on top of Berkeley DB (aka Sleepycat). All open-source software and commodity hardware and it supports mission critical operations for millions of customers.
A more modern rendition of this might use memcached as well.
If you go HTTP based, you can use something simple as libcurl against an Apache web server to implement "RESTful" services with GET and PUT commands.
If you run it locally (same server), and access via localhost (127.0.0.1) then there is very little latency in the TCP stack, and it amounts to little more than memcpys at the kernel level.
simple message passing would do, say, JSON over ØMQ, or i.e. all in all, msgpack-rpc or protobuf-remote or Cap'n Proto RPC
My next project involves the creation of a data API within an enterprise framework. The data will be consumed by several applications running on different software platforms. While my colleagues generally favour SOAP, I would like to use a RESTful architecture.
Most of the applications will only need a few objects at every call. Other applications will however sometimes need to make several sequential calls each involving thousands of records. I'm concerned about performance. Serialization/deserialization & network usage are where I fear to find a bottleneck. If each request involves a large delay, all of the enterprise's applications will be sluggish.
Are my fears realistic? Will serialization to a voluminous format like XML or JSON be a problem? Are there alternatives?
In the past, we've had to do these large data transfers using a "flatter"/leaner file format such as CSV for performance. How can I hope to achieve the performance I need using a web service?
While I'd prefer replies specific to REST, I'm interested in hearing how SOAP users might deal with this as well.
One advantage of REST is that you are free to use whatever media type you like. Why not continue to use text/csv? You could also enable HTTP compression to further reduce bandwidth consumption.
REST services are great for taking advantage of all different kinds of data formats. Whatever format fits your scenario best.
We offer both XML and JSON. Your mentioned rendering time really can be an issue. On server side we have JAXB whose standard sun-implementation is somewhat slow, when it comes to marshall XML. XML has the disadvantage of verbosity, but is also nice in interoperability and has schema + explicit versioning.
We compensated the verbosity in several ways (especially limiting the result-set):
In case you have a container with items in it, offer paging in your xml response (both page-size and page-number, e.g. /items?page=0&size=3) . The client can itself reduce the size by reducing the page-size.
Offer collapsing elements, for instance several clients are only interested in one data field of your whole item. Do this with a parameter (e.g. /items?select=name), then only the nested element 'name' is included inline of your item element. This dramatically decreases size.
Generally give the clients the power to use result-set limiting. They will definitley use it, because it speeds up response time also on their side :)
Also use compression, it reduces verbose XML extremely (in our case the payload got 10 times smaller). From client side you can do it by header 'Accept-Encoding: gzip'. If you use Apache, server configuration is also straight-forward
I'd like to offer three guidelines:
one is the observation that there are many SOAP Web services out there (especially built with .NET 2.0 "ASMX" technology) that send down their data transfer objects serialized in XML. There are of course many RESTful services that send down XML or JSON. XML serialization/deserialization is rarely the constraining factor.
one common cause of bottlenecks in Web services is an interface that encourages client applications to get data by making those thousands of sequential calls (there is a term for it: a chatty interface). This is what you should avoid when you design your Web service's interface, regardless of what four-letter acronym you decide to go ahead with.
one thing to remember about REST is that it (partially) stands for a transfer of state, which may be ill-suited to some operations where you don't want to transfer the state of a business object from the server to a client application. In those cases, a SOAP Web service (as suggested by your colleagues) is more appropriate; or perhaps a combination of SOAP and REST services, where the REST services would take care of operations where the state transfer is appropriate, and the SOAP services would implement the rest (pun unintended :-)) of the operations.
Not having dealt much with creating web-services, either from scratch, or by breaking apart an existing application, where does one start? Should a web-service encapsulate an entity, much like a class does, or should the service have more/less to it?
I realize that much of this is based on a case by case analysis of what the needs are, but are there any general guide-lines or best practices or even small nuggets of information that web-service veterans can impart to a relative newbie?
Our web services are built around functional areas. Sometimes this is just for a single entity, sometimes it's more than that.
For example, if you have a CRM, one of your web services might revolve around managing Contacts. Creating, updating, searching for, etc. If you do some type of batch type processing, a web service might exist to create and submit a job.
As far as best practices, bear in mind that web services add to the processing overhead. Mainly in serializing / deserializing the data as it goes across the wire. Because of this the main upside is solely in scalability. Meaning that you trade an increased per transaction processing time for the ability to run the service through multiple machines.
The main parts to pull out into a web service are those areas which are common across multiple applications, or which you intend to expose publicly, or which would benefit from greater load balancing.
Of course, you need to analyze your application to see where any bottlenecks really are. In some cases it doesn't make sense. For example, if you have a single application that isn't sharing its code and/or the bottleneck is primarily database related.
Web Services are exactly what they sound like Services for the Web.
A web service should be built as an API for the service layer of your app.
A service usually encapsulates an entity larger than a single class.
To learn more about service layers and refactoring to add a service layer read about DDD.
Good Luck
The number 1 question is: To what end are you refactoring your application functionality to be consumned as a bunch of web services?