Offline use of semantic web vocabularies - offline

There are some useful vocabularies out there for use of semantic web applications, one of which is the well known "foaf".
How should I use it in an offline system, meaning a network disconnected from the www?
Is it downloadable? Should I use some DNS "trickery" in my network? Is it at all possible?

Regarding the comments above and some more reading I had, the solution seems to be downloading the necessary ontologies and saving them to the cache of whichever tool you're using.
Whenever an ontology is needed, the cache will be checked first, so no access to the network will be made.

Related

Is it possible to use Django and Node.Js?

I have a django backend set up for user-logins and user-management, along with my entire set of templates which are used by visitors to the site to display html files. However, I am trying to add real-time functionality to my site and I found a perfect library within Node.Js that allows two users to type in a text box and have the text appear on both their screens. Is it possible to merge the two backends?
It's absolutely possible (and sometimes extremely useful) to run multiple back-ends for different purposes. However it opens up a few cans of worms, depending on what kind of rigour your system is expected to have, who's in your team, etc:
State. You'll want session state to be shared between different app servers. The easiest way to do this is to store external session state in a framework-agnostic way. I'd suggest JSON objects in a key/value store and you'll probably benefit from JSON schema.
Domains/routing. You'll need your login cookie to be available to both app servers, which means either a single domain routed by Apache/Nginx or separate subdomains routed via DNS. I'd suggest separate subdomains for the following reason
Websockets. I may be out of date, but to my knowledge neither Apache nor Nginx support proxying of websockets, which means if you want to use that you'll sacrifice the flexibility of using an http server as a app proxy and instead expose Node directly via a subdomain.
Non-specified requirements. Things like monitoring, logging, error notification, build systems, testing, continuous integration/deployment, documentation, etc. all need to be extended to support a new type of component
Skills. You'll have to pay in time or money for the skill-sets required to manage a more complex application architecture
So, my advice would be to think very carefully about whether you need this. There can be a lot of time and thought involved.
Update: There are actually companies springing around who specialise in adding real-time to existing sites. I'm not going to name any names, but if you look for 'real-time' on the add-on marketplace for hosting platforms (e.g. Heroku) then you'll find them.
Update 2: Nginx now has support for Websockets
You can't merge them. You can send messages from Django to Node.Js through some queue system like Reddis.
If you really want to use two backends, you could use a database that is supported by both backends.
Though I would not recommended it.

File web service architecture

I need to implement a web service which could provide requested files to other internal applications or components running on different networks. Files are dispersed across different servers in different locations and can be big as few gigabytes.
I am thinking to create a RESTful web service which will have implementation to discover the file, redirect the HTTP request to another web service on different location and send the file via HTTP.
Is it a good idea to send the file via HTTP or will it be better for the web service to copy the file to the location where requester component could access it?
The biggest problem with distributing large files over HTTP is that you will come across all sorts of limits that prevent it. As a simple example, WCF allows you to configure maximum payload size but you can only configure it up to 2 GB. You will likely run across issues like this in all layers of your stack. I doubt any of them are insurmountable (to work around the above limitation you can stream chunks of the file, rather than the entire file, although that introduces it's own problems), but you will likely have lots of timeouts and random failures, which are fixed by tweaking the configuration of this or that service or client.
Also, when dealing with large files, you have to carefully consider how you deal with the inevitable failures during transfer (e.g. the network drops out). Depending on the specific technologies you use, they may have some "resume" functionality, but you will want to be sure this is reliable before committing to it.
One possibility would be to do what Facebook does when distributing large binaries - use BitTorrent. So, your web-service serves a torrent of the file, not the file itself. The big advantages of BitTorrent are it is very robust, and can scale well. It's worth considering, but it will depend a lot on your environment and specific workload.
If the files you are going to serve, do not change often or do not change at all, you could use many strategies, since the one advised by RB, or use pure HTTP which supports partial data operations, see RFC 2616.
But depending on your usage scenario, I would also suggest you to take a look at the Amazon Web Services - S3 (Simple Storage Service), which probably does already what you are trying to do, it's cheap and have high availability.

Best approach(es) or technolog(y/ies) for this specific problem?

I have a web-based interface for handing invoices, customer records and other transaction records which interacts currently with a database of all the aforementioned stored upon the same machine. As you can imagine, this is quite a simple set-up consisting of a web-app (PHP) and a database (MySQL). However, the ideal scenario is to keep the records on the machine they are currently on (easy) and move the web-app to another server within the same network (again, easy) ... but in addition, provide facilities on a public-facing website for managing accounts by customers and so forth. The problem is this - the public-facing web server is located in a completely separate location as it is a dedicated server provided by a well-known ISP.
What would be the best way to enable the records to be accessible from this other server whilst ensuring that all communications are secure. Speed is not a huge factor, although any outages on either side should be handled gracefully. Initially my thoughts went towards web services (XML-RPC/SOAP/Hessian), but these options seem to present difficulties (security being the main one, overcomplexity as well).
The web-app must remain PHP-based. The public-facing site is likely to be PHP-based as well, although Python (likely using Django) is another option. The introduction of any other technologies (Java etc) is not a problem, although it is preferred if they be Linux-friendly (so .NET would not be the best fit here).
Apologies if this question is somewhat verbose and vague. I am testing the water somewhat in regards to this kind of problem. Any advice or suggestions gratefully received.
I've done something similar. You can expose a web service to the internet that will do the database access, but requests to the service must match a strong hashed and salted password (which will be secured on the ISP's server in the DMZ.)
Either this or some sort of public/private key encryption scheme.
OK, this might seem a bit silly, but what if you just used mysql replication?
Instead of using all sorts of fancy web services, just have a master sql server on one machine, then have it replicate to another server that holds the slave sql server as well as the web app

Offline web application

I’m thinking about building an offline-enabled web application.
The architecture I’m considering is as follows:
Web server (remote) <--> Web server/cache (local) <--> Browser/Prism
The advantages I envision for this model are:
Deployment is web-based, with all the advantages of this approach
Offline-enabled
UI (html/js) synchronization is a non-issue
Data synchronization can be mostly automated
as long as I stay within a RESTful paradigm
I can break this as required but manual synchronization would largely remain surgical
The local web server is started as a service; I can run arbitrary code, including behind-the-scene data synchronization
I have complete control of the data (location, no size limit, no possibility of user deleting unknowingly)
Prism with an extension could allow to keep the javascript closed source
Any thoughts on this architecture? Why should I / shouldn’t I use it? I'm particularly looking for success/horror stories.
The long version
Notes:
Users are not very computer-literate.
For instance, even superficially
explaining how Gears works is totally
out of the question.
I WILL be held liable if data is loss, even if it’s really the users fault (short of him deleting random directories on his machine)
I can require users to install something on their machine. It doesn’t have to be 100% web-based and/or run in a sandbox
The common solutions to this problem don’t feel adequate somehow. Here is a short analysis of each.
Gears/HTML5:
no control over data, can be deleted
by users without any warning
no
control over location of data (not
uniform across browsers and
platforms)
users need to open application in browser for synchronization to happen; no automatic, behind-the-scene synchronization
different browsers are treated differently, no uniform view of data on a single machine
limited disk space available
synchronization is completely manual, sql-based storage makes this a pain (would be less complicated if sql tables were completely replicated but it’s not so in my case). This is a very complex problem.
my code would be almost completely open sourced (html/js)
Adobe AIR:
some of the above
no server-side includes (!)
can run in the background, but not windowless
manual synchronization
web caching seems complicated
feels like a kludge somehow, I’ve had trouble installing on some machines
My requirements are:
Web-based (must). For a number of
reasons, sharing data between users
for instance.
Offline (must). The application must be fully usable offline (w/ some rare exceptions).
Quick development (must). I’m a single developer going against players with far more business resources.
Closed source (nice to have). Yes, I understand the open source model. However, at this point I don’t want competitors to copy me too easily. Again, they have more resources so they could take my hard work and make it better in less time than I could myself. Obviously, they can still copy me developing their own code -- that is fine.
Horror stories from a CRM product:
If your application is heavily used, storing a complete copy of its data on a user's machine is unfeasible.
If your application features data that can be updated by many users, replication is not simple. If three users with local changes synch, who wins?
In reality, this isn't really what users want. They want real-time access to the most current data from anywhere. We had better luck offering a mobile interface to a single source of truth.
The part about running the local Web server as a service appears unwise. Besides the fact that you are tied to certain operating environments that are available in the client, you are also imposing an additional burden of managing the server, on the end user. Additionally, the local Web server itself cannot be deployed in a Web-based model.
All in all, I am not too thrilled by the prospect of a real "local Web server". There is a certain bias to it, no doubt since I have proposed embedded Web servers that run inside a Web browser as part of my proposal for seamless off-line Web storage. See BITSY 0.5.0 (http://www.oracle.com/technology/tech/feeds/spec/bitsy.html)
I wonder how essential your requirement to prevent data loss at any cost is. What happens when you are offline and the disk crashes? Or there is a loss of device? In general, you want the local cache to be the least farther ahead of the server, but be prepared to tolerate loss of data to the extent that the server is behind the client. This may involve some amount of contractual negotiation or training. In practice this may not be a deal-breaker.
The only way to do this reliably is to offer some sort of "check out and lock" at the record level. When a user is going remote they must check out the records they want to work with. This check out copied the data to a local DB and prevents the record in the central DB from being modified while the record is checked out.
When the roaming user reconnects and check their locked records back in the data is updated on the central DB and unlocked.

Dependencies on external (web) services

I'm currently involved in a project where we are developing a large website that relies heavily on an external service (for some functionality) developed by another company. The external service occasionally breaks and doesn't provide us with the data that we need. This is a major problem for us since the requirements on "our" website are very high.
How should we handle this? We are reluctant to cache data from the external site to use as a "backup" since we might then display data that is outdated or wrong. We also feel that we should not try to "patch" problems in an external system by storing local copies of the external data since that could lead to synchronization problem where the local data is out of date or wrong.
Does anyone have any similar experiences? Any ideas how we solve this (or at least mitigate the problem)?
Caching would be my first choice.
It depends on the site your developing, but could you not cache all results and inform users that this is what you have done - for example "this was accurate on 2009-10-08 - click here to refresh"
UPDATE: Without knowing more about what sort of data you are getting from the web service and what the audience of your website is its hard to know what to suggest as the solution will depend a lot on those factors. You need to think about your customers to decide whether or not you can show them potentially out of date data or not.
If the requests to the web service tend to be similar then one thing to consider is the possibility that caching will also help you with performance and scaling as well as helping with resilience.
You could use caching as a backup mechanism for when the external resource is not available. In this way you are using live data for as long as the service is available and when it goes down is when you dig into your cache. It would be important to somehow mark the data in such a way that the consumer of that data knows how fresh it is.
You know what your needs and requirements are and based on what I can read the freshness of the data is critical but so it the availability of your service. Based on this understanding if you can cache data I would definitely do that but only use it as a backup when needed.