Best approach(es) or technolog(y/ies) for this specific problem? - web-services

I have a web-based interface for handing invoices, customer records and other transaction records which interacts currently with a database of all the aforementioned stored upon the same machine. As you can imagine, this is quite a simple set-up consisting of a web-app (PHP) and a database (MySQL). However, the ideal scenario is to keep the records on the machine they are currently on (easy) and move the web-app to another server within the same network (again, easy) ... but in addition, provide facilities on a public-facing website for managing accounts by customers and so forth. The problem is this - the public-facing web server is located in a completely separate location as it is a dedicated server provided by a well-known ISP.
What would be the best way to enable the records to be accessible from this other server whilst ensuring that all communications are secure. Speed is not a huge factor, although any outages on either side should be handled gracefully. Initially my thoughts went towards web services (XML-RPC/SOAP/Hessian), but these options seem to present difficulties (security being the main one, overcomplexity as well).
The web-app must remain PHP-based. The public-facing site is likely to be PHP-based as well, although Python (likely using Django) is another option. The introduction of any other technologies (Java etc) is not a problem, although it is preferred if they be Linux-friendly (so .NET would not be the best fit here).
Apologies if this question is somewhat verbose and vague. I am testing the water somewhat in regards to this kind of problem. Any advice or suggestions gratefully received.

I've done something similar. You can expose a web service to the internet that will do the database access, but requests to the service must match a strong hashed and salted password (which will be secured on the ISP's server in the DMZ.)
Either this or some sort of public/private key encryption scheme.

OK, this might seem a bit silly, but what if you just used mysql replication?
Instead of using all sorts of fancy web services, just have a master sql server on one machine, then have it replicate to another server that holds the slave sql server as well as the web app

Related

C++ runtime API

I want to create an application that, when executed, has runtime functions that are accessible by other applications.
For example, a C++ application that stores values in files and retrieves this information. While this application is running, any other C++ applications could access it's save and retrieve functionality to save and retrieve data, but it should have no other connection to this system.
Sounds like a simple job for web services, or a remote database, or even an LDAP server.
Store and retrieve are operations common to all of these.
If the goal is to learn some specific technology, then ask a more specific question. Otherwise, don't reinvent any wheels. There are plenty of things out there for store and retrieve.
One of the simplest "store and retrieve" APIs I know of is Berkeley DB or Sleepycat.
We built a giant, clustered, simple key based database for a major telecom company using LDAP on top of Berkeley DB (aka Sleepycat). All open-source software and commodity hardware and it supports mission critical operations for millions of customers.
A more modern rendition of this might use memcached as well.
If you go HTTP based, you can use something simple as libcurl against an Apache web server to implement "RESTful" services with GET and PUT commands.
If you run it locally (same server), and access via localhost (127.0.0.1) then there is very little latency in the TCP stack, and it amounts to little more than memcpys at the kernel level.
simple message passing would do, say, JSON over ØMQ, or i.e. all in all, msgpack-rpc or protobuf-remote or Cap'n Proto RPC

How to (programmatically or by other means) encrypt or protect customer data

I am working on a web project and I want to (as far as possible) handle user data in a way that reduces damage to the users privacy in case of someone compromising our servers/databases.
Of course we only have user dat'a that is needed for the website to do it's job but because of the nature of the project we have quite a bit of information on our users (part of the functionality is to apply yourself to jobs and sending your cv with it)
We thought about encrypting/decrypting sensitive data with a private/public keypair of which the private key is encrypted with the users password but found some security and implementation problems with that :P
the question is how do you implement user privacy and a protection against data theft on centralised web sever with browser compatible protocols while for functionality it is required that users can exchange sensible data?
To give some additional insight: this project is not yet in production stage so there is still time to make things right.
we are already doing some basic stuff like
serving https
enforcing https for sites that may handle sensitive data
hashing salted passwords
some hardening of our server and services on it
encrypted harddrives to prevent someone from reading all client information after stealing our servers / harddrives
but that's about it, there is besides the password hashes no mechanism that would stop/at least make it harder for someone who managed to get into (part of) the server to gain all data on all our users. Nor do we see a way to encrypt user data to disable our self from reading them as we need the data (we wouldn't have collected it otherwise) for some part of the website / the functionality we want it to provide. Even if we for example managed somehow (maybe with some javascript) that all data would get to us encrypted (by the client's browser) and we serve the client his privatekey encrypted with some passphrase (like for example his login password) we could not for examle scan user uploaded files for viruses and the like. On the other hand would a client side encryption at least with the browser/webserver concept leave some issues with security at least as we imagine it (you are welcome to prove me wrong) and seems quite like reinventing the wheel, and maybe as this project is not primarily about privacy, but rather privacy is a prefarable property we might not want to reinvent the wheel for it. I strongly believe I am not the first webdeveloper thinking about this, am I? So what have other projects done? What have you done to try to protect your users data?
if relevant we are using django and postrgreSQL for most things and javascript for some UI
The common way to deal with this issue is to split (partition) your data.
Keep minimal data on the Internet-facing web server and pass any sensitive data as quickly as possible to another server that is kept inside a second firewall. Often, data is pulled from the web server by the internal secure server to further increase security. This is how banks and finance houses handle sensitive data from the internet (or at least they should). There is even a set of standards (PCI) that cover the secure handling of credit card transactions that explain all of this in mind-numbing detail.
To further secure the internal server, you can put it on a separate network and secure physical access to it. You can also focus other security tools on it such as Data Loss Protection and Intrusion Protection.
In addition, if you have any data that you don't need to see in the clear, use a client-side encryption library to encrypt it locally. There are still risks of course since the users workstation might be compromised by malware but it still removes risks during data transmission and from server storage risks. It also puts responsibility onto the user rather than just on to your central servers.
You already seem to be a long way ahead of most web developers in ensuring that your customers are kept safe and secure. One other small change it would be worth considering would be to turn on enforced HTTPS for all transactions with your site. That way, there is very little chance of unexpected data leakage such as data being unexpectedly cached.
UPDATE:
Client side encryption can help a lot since it puts the encryption responsibility on the user. Check out LastPass for example. Without doing the encryption client-side, you could never trust the service. Similarly with backup services where you set your key locally so that the backups can never be unlocked by someone on the server - they never have the key.
Partitioning is one of the primary methods for enterprises to secure services that have Internet facing components. As I said, typically, the secure server PULLs data from the less secure one so the less secure server can never have any access to anything more secure even if fully compromised. Indeed there will be a firewall that prevents any traffic from the DMZ (where the less secure service is located) getting to the secure network. Only connections from the secure side are allowed through and they will be tightly controlled by security processes. In a typical bank or other high security setting, you may well find several layers like this, each of which having separate security controls, all partitioned from each other enforcing separation of data and security.
Hope that adds some clarity. Continue to ask if not!
UPDATE 2:
Even for simple, low cost setups, I would still recommend partitioning. For a low cost version, consider having two virtual servers with the dedicated firewall replaced by careful control of the software firewall on the more secure server. Follow the same principals outlined above for everything else.

How to 'web enable' a legacy C++ application

I am working on a system that splits users by organization. Each user belongs to an organization. Each organization stores its data in its own database which resides on a database server machine. A db server may manage databases for 1 or more organizations.
The existing (legacy) system assumes there is only one organization, however I want to 'scale' the application by running an 'instance' of it (tied to one organization), and run several instances on the server machine (i.e. run multiple instances of the 'single organization' application - one instance for each organization).
I will provide a RESTful API for each instance that is running on the server, so that a thin client can be used to access the services provided by the instance running on the server machine.
Here is a simple schematic that demonstrates the relationships:
Server 1 -> N database (each
organization has one database)
organization 1 -> N users
My question relates to how to 'direct' RESTful requests from a client, to the appropriate instance that is handling requests from users for that organization.
More specifically, when I receive a RESTful request, it will be from a user (who belongs to an organization), how (or indeed, what is the best way) to 'route' the request to the appropriate application instance running on the server?
From what I can gather, this is essentially a sharding problem. Regardless of how you split the instances at a hardware level (using VMs, multiple servers, all on one powerful server, etc), you need a central registry and brokering layer in your overall architecture that maps given users to the correct destination instance per request.
There are many ways to implement this of course, so just choose one that you know and is fast, and will scale, as all requests will come through it. I would suggest a lightweight stateless web application backed by a simple read only database that does the appropriate client identifier -> instance mapping, which you would load into memory/cache. To add flexibility on hardware and instance location, use (assuming Java) JNDI to store the hardware/port/etc information for each instance, and in your identifier mapping map the client identifier to the appropriate JNDI lookup key.
Letting the public API only specify the user sounds a little fragile to me. I would change the public API so that requests specify organization as well as user, and then have something trivial server-side that maps organizations to instances (eg. organization foo -> instance listening on port 7331).
That is a very tough question indeed; simply because there are many possible answers, and which one is the best can only be determined by you and your environment.
I would write an apache module in C++ to do that. Using this book, I managed to start writing very efficient modules.
To be able to give you more solutions (maybe just setting up a Squid proxy?), you'll need to specify how you will be able to determine to which server you need to redirect the client. If you can do it by IPs, though a GET param, though a POST XML param (like SOAP). Etc.
As the other answer says there are many ways to approach this issue. Lets assume that you DON'T have access to legacy software source code, which means you cannot modify it to listen on different ports for different instances.
Writing Apache module seems VERY extreme to solve this issue (and as someone who actually just finished writing a production apache module, I suggest avoiding it unless you are making serious money).
The approach can be as esoteric as you like. For instance if your legacy software runs on normal Intel architecture and you have the hardware capacity there are VM solutions, where you should be able to create a thin virtual machine, one running a single instance of the software and a multiplexer to tie them all.
If on the other hand you are running something like HPUX well :-) there are other approaches. How about you give a bit more detail?
Ahmed.

Difference between frontend, backend, and middleware in web development

I was wondering if anyone can compare/contrast the differences between frontend, backend, and middleware ("middle-end"?) succinctly.
Are there cases where they overlap?
Are there cases where they MUST overlap, and frontend/backend cannot be separated?
In terms of bottlenecks, which end is associated with which type of bottlenecks?
Here is one breakdown:
Front-end tier -> User Interface layer usually consisting of a mix of HTML, Javascript, CSS, Flash, and various server-side code like ASP.Net, classic ASP, PHP, etc. Think of this as being closest to the user in terms of code.
Middleware, middle-tier -> One tier back, generally referred to as the "plumbing" part of a system. Java and C# are common languages for writing this part that could be viewed as the glue between the UI and the data and can be webservices or WCF components or other SOA components possibly.
Back-end tier -> Databases and other data stores are generally at this level. Oracle, MS-SQL, MySQL, SAP, and various off-the-shelf pieces of software come to mind for this piece of software that is the final processing of the data.
Overlap can exist between any of these as you could have everything poured into one layer like an ASP.Net website that uses the built-in AJAX functionality that generates Javascript while the code behind may contain database commands making the code behind contain both middle and back-end tiers. Alternatively, one could use VBScript to act as all the layers using ADO objects and merging all three tiers into one.
Similarly, taking middleware and either front or back-end can be combined in some cases.
Bottlenecks generally have a few different levels to them:
1) Database or back-end processing -> This can vary from payroll or sales or other tasks where the throughput to the database is bogging things down.
2) Middleware bottlenecks -> This would be where some web service may be hitting capacity but the front and back ends have bandwidth to handle more traffic. Alternatively, there may be some server that is part of a system that isn't quite the UI part or the raw data that can be a bottleneck using something like Biztalk or MSMQ.
3) Front-end bottlenecks -> This could client or server-side issues. For example, if you took a low-end PC and had it load a web page that consisted of a lot of data being downloaded, the client could be where the bottleneck is. Similarly, the server could be queuing up requests if it is getting hammered with requests like what Amazon.com or other high-traffic websites may get at times.
Some of this is subject to interpretation, so it isn't perfect by any means and YMMV.
EDIT: Something to consider is that some systems can have multiple front-ends or back-ends. For example, a content management system will likely have a way for site visitors to view the content that is a front-end but what about how content editors are able to change the data on the site? The ability to pull up this data could be seen as front-end since it is a UI component or it could be seen as a back-end since it is used by internal users rather than the general public viewing the site. Thus, there is something to be said for context here.
Generally speaking, people refer to an application's presentation layer as its front end, its persistence layer (database, usually) as the back end, and anything between as middle tier. This set of ideas is often referred to as 3-tier architecture. They let you separate your application into more easily comprehensible (and testable!) chunks; you can also reuse lower-tier code more easily in higher tiers.
Which code is part of which tier is somewhat subjective; graphic designers tend to think of everything that isn't presentation as the back end, database people think of everything in front of the database as the front end, and so on.
Not all applications need to be separated out this way, though. It's certainly more work to have 3 separate sub-projects than it is to just open index.php and get cracking; depending on (1) how long you expect to have to maintain the app (2) how complex you expect the app to get, you may want to forgo the complexity.
There are in fact 3 questions in your question :
Define frontend, middle and back end
How and when do they overlap ?
Their associated usual bottlenecks.
What JB King has described is correct, but it is a particular, simple version, where in fact he mapped front, middle and bacn to an MVC layer.
He mapped M to the back, V to the front, and C to the middle.
For many people, it is just fine, since they come from the ugly world where even MVC was not applied, and you could have direct DB calls in a view.
However in real, complex web applications, you indeed have two or three different layers, called front, middle and back. Each of them may have an associated database and a controller.
The front-end will be visible by the end-user. It should not be confused with the front-office, which is the UI for parameters and administration of the front. The front-end will usually be some kind of CMS or e-commerce Platform (Magento, etc.)
The middle-end is not compulsory and is where the business logics is. It will be based on a PIM, a MDM tool, or some kind of custom database where you enrich your produts or your articles (for CMS). It'll also be the place where you code business functions that need to be shared between differents frontends (for instance between the PC frontend and the API-based mobile application). Sometimes, an ESB or tool like ActiveMQ will be your middle-end
The back-end will be a 3rd layer, surrouding your source database or your ERP. It may be jsut the API wrting to and reading from your ERP. It may be your supplier DB, if you are doing e-commerce. In fact, it really depends on web projects, but it is always a central repository. It'll be accessed either through a DB call, through an API, or an Hibernate layer, or a full-featured back-end application
This description means that answering the other 2 questions is not possible in this thread, as bottlenecks really depend on what your 3 ends contain : what JB King wrote remains true for simple MVC architectures
at the time the question was asked (5 years ago), maybe the MVC pattern was not yet so widely adopted. Now, there is absolutely no reason why the MVC pattern would not be followed and a view would be tied to DB calls.
If you read the question "Are there cases where they MUST overlap, and frontend/backend cannot be separated?" in a broader sense, with 3 different components, then there times when the 3 layers architecture is useless of course. Think of a simple personal blog, you'll not need to pull external data or poll RabbitMQ queues.
Here is a real world example which shows front/mid/back end.
General description:
Frontend is responsible for presenting data to user. Please note interesting quirk that you may have two different front ends associated with single backend
Backend provides business logic/data persistence.
Middleware (activemq in the picture) is responsible for system to system. integration between backends. Usually it is installed as separate application
Overlapping:
It is possible to have overlapping between frontend and backend. This usually leaads to long-term issues with application maintenance and scalability. Fairly common in legacy applications.
Most modern technology stacks encourage developers to have strict separation. For example in the picture you can see that backend of the first system has rest web service which is a clear separation line.
Bottlenecks
Most bottlenecks in large are caused by database/network. Databases are located in backend. As for network issues every connection goes through netowrk, so every connection has potential for being slow. With good application design these issues are avoidable to large extend.
In terms of networking and security, the Backend is by far the most (should be) secure node.
The middle-end portion, usually being a web server, will be somewhat in the wild and cut off in many respects from a company's network. The middle-end node is usually placed in the DMZ and segmented from the network with firewall settings. Most of the server-side code parsing of web pages is handled on the middle-end web server.
Getting to the backend means going through the middle-end, which has a carefully crafted set of rules allowing/disallowing access to the vital nummies which are stored on the database (backend) server.
Frontend refers to the client-side, whereas backend refers to the server-side of the application. Both are crucial to web development, but their roles, responsibilities and the environments they work in are totally different. Frontend is basically what users see whereas backend is how everything works
Frontend -> these are the client side of a website from where a user can interact with the server through User Interface. generally built using Html and CSS.
Middleware -> Middleware are the software or service which is responsible for the system to communicate and manage the data. it handles the communication between components and input/output
Backend -> Backend are the server side of any application which consist of all functioning and operations performed on data. this part is considered to be most essential part of any application. Only the server admin have access to this. it mainly consist of database and servers.

Identifying ASP.NET web service references

At my day job we have load balanced web servers which talk to load balanced app servers via web services (and lately WCF). At any given time, we have 4-6 different teams that have the ability to add new web sites or services or consume existing services. We probably have about 20-30 different web applications and corresponding services.
Unfortunately, given that we have no centralized control over this due to competing priorities, org structures, project timelines, financial buckets, etc., it is quite a mess. We have a variety of services that are reused, but a bunch that are specific to a front-end.
Ideally we would have better control over this situation, and we are trying to get control over it, but that is taking a while. One thing we would like to do is find out more about what all of the inter-relationships between web sites and the app servers.
I have used Reflector to find dependencies among assemblies, but would like to be able to see the traffic patterns between services.
What are the options for trying to map out web service relationships? For the most part, we are mainly talking about internal services (web to app, app to app, batch to app, etc.). Off the top of my head, I can think of two ways to approach it:
Analyze assemblies for any web references. The drawback here is that not everything is a web reference and I'm not sure how WCF connections are listed. However, this would at least be a start for finding 80% of the connections. Does anyone know of any tools that can do that analysis? Like I said, I've used Reflector for assembly references but can't find anything for web references.
Possibly tap into IIS and passively monitor the traffic coming in and out and somehow figure out what is being called and where from. We are looking at enterprise tools that could help but it would be a while before they are implemented (and cost a lot). But is there anything out there that could help out quickly and cheaply? One tool in particular (AmberPoint) can tap into IIS on the servers and monitor inbound and outbound traffic, adds a little special sauce and begin to build a map of the traffic. Very nice, but costs a bundle.
I know, I know, how the heck did you get into this mess in the first place? Beats me, just trying to help us get control of it and get out of it.
Thanks,
Matt
The easiest way is to look through the logs, but if that doesn't include the referrer than you may also want to monitor what is going out from your web to the app server. You can use tools like Wireshark or Microsoft Network Monitor to see this traffic.
The other "solution" and I use this loosely is to bind a specific web server to app server and then run through a bundle and see what it is hitting on the app server. You could probably do this in a test environment to lesson the effects on the users of the site.
You need a service registry (UDDI??)... If you had a means to catalog these services and their consumers, it would make this job of dependency discovery a lot easier. That is not an easy solution, though. It takes time and documentation to get a catalog in place.
I think the quickest solution would be to query your IIS logs and find source URLs which originate from your own servers. You would at least be able to track down which servers your consumers are coming from.
Also, if you already have some kind of authentication mechanism in place, you could trace who is using a particular service based on login.
You are right about AmberPoint. There are other tools that catalog the service traffic and provide reports showing what is happening to your services. Systinet, SOA Software and Actional also has a products similar to Amberpoint but Amberpoint has a free-ware version, I believe.