Using shared files as database

Using shared files as database - c++

As a project for my database classes, I built a simple object-oriented database (coded in C++). The DB manages concurrency by using a gateway file, which grants read/write access to the entire DB. To access the same DB across different machines, you use shared folders.
I built a little quizzing application on top of that. Everything works fine on a single system with multiple users as well as on a 3 computer network on my home. But when its run on my University's network, I keep getting inconsistent data corruption in the form of bad CRCs (in my database, not the disk), file headers being inconsistent with file data, and other weird errors, which I'm unable to track down.
The network is problematic - sometimes some nodes on the network become unreachable, and sometimes copying a file takes across the n/w takes an inordinate amount of time.
Occasionally, I get an error message 'Windows delayed write failed', so I'm thinking the problems are being caused by problems with file sharing across the network. From some analysis its seems that data is being cached, and so I don't know really know whether a disk write is successful.
Does anyone have any experience in using shared files as databases? I want to know whether using shared files is reliable, and whether I should be looking at bugs in my code as the cause of the problems.
Thanks.

No, it's not reliable. It was the reason why CVS disabled mode that used shared files for repository. The solution is to create a server (e.g. a simple TCP/IP server).

Related

Uploading large files to server

The project I'm working on logs data on distributed devices that needs to be joined in a single database on a remote server.
The logs cannot be streamed as they are recorded (network may not be available etc) so they must be sent in bulky 0.5-1GB text based csv files occasionally.
As far as I understand this means having a web service receive the data in form of post requests is out of the question because of file sizes.
So far I've come up with this approach: Use some file transfer protocol (ftp or similar) to upload files from device to server. Devices would have to figure out a unique filename to do this with. Have the server periodically check for new files, process them by committing them to the database and deleting them afterwards.
It seems like a very naive way to go about it, but simple to implement.
However, I want to avoid any pitfalls before I implement any specifics. Is this approach scaleable (more devices, larger files)? Implementation will either be done using a private/company owned server or a cloud service (Azure for instance) - will it work for different platforms?

You could actually do this through web/http as well, after setting a higher value for post request in the web server (post_max_size andupload_max_filesize for PHP). This will allow devices to interact regardless of platform. Should't be too hard to make a POST request server from any device. A simple cURL request could get this job done.
FTP is also possible. Or SCP, to make it safer.
Either way, I think this does need some application on the server to be able to fetch and manage these files using a database. Perhaps a small web application? ;)
As for the unique name, you could use a combination of the device's unique ID/name along with current unix time. You could even hash this (md5/sh1) afterwards if you like.

How can I use sqlite3 database on a network share drive?

I am writing a time tracking Windows application in C++ that uses sqlite3 engine to store its data. For my purpose it would be nice to share the database file across the local network (in a Windows network share folder) among several copies of my application, so that multiple users of the software could share data.
Is there a mechanism to do that with SQLite?

"nice to share the database file across the local network" You really don't want to do that. It will end up being more trouble than it's worth. In ideal circumstances it works, although the performance sucks a bit. In non-ideal circumstances, it will block forever without giving you any idea why and what's at fault.
It's much easier to partition your system into a server and a client. They can both run within the same application. When the application starts, it checks if there are any servers on the local network, and if there aren't, it starts one. It then connects to the server.
That's what Filemaker at least used to do 20 years ago, and it worked pretty well. Should be a breeze to implement using modern frameworks today (say Qt or boost).

File web service architecture

I need to implement a web service which could provide requested files to other internal applications or components running on different networks. Files are dispersed across different servers in different locations and can be big as few gigabytes.
I am thinking to create a RESTful web service which will have implementation to discover the file, redirect the HTTP request to another web service on different location and send the file via HTTP.
Is it a good idea to send the file via HTTP or will it be better for the web service to copy the file to the location where requester component could access it?

The biggest problem with distributing large files over HTTP is that you will come across all sorts of limits that prevent it. As a simple example, WCF allows you to configure maximum payload size but you can only configure it up to 2 GB. You will likely run across issues like this in all layers of your stack. I doubt any of them are insurmountable (to work around the above limitation you can stream chunks of the file, rather than the entire file, although that introduces it's own problems), but you will likely have lots of timeouts and random failures, which are fixed by tweaking the configuration of this or that service or client.
Also, when dealing with large files, you have to carefully consider how you deal with the inevitable failures during transfer (e.g. the network drops out). Depending on the specific technologies you use, they may have some "resume" functionality, but you will want to be sure this is reliable before committing to it.
One possibility would be to do what Facebook does when distributing large binaries - use BitTorrent. So, your web-service serves a torrent of the file, not the file itself. The big advantages of BitTorrent are it is very robust, and can scale well. It's worth considering, but it will depend a lot on your environment and specific workload.

If the files you are going to serve, do not change often or do not change at all, you could use many strategies, since the one advised by RB, or use pure HTTP which supports partial data operations, see RFC 2616.
But depending on your usage scenario, I would also suggest you to take a look at the Amazon Web Services - S3 (Simple Storage Service), which probably does already what you are trying to do, it's cheap and have high availability.

sftp versus SOAP call for file transfer

I have to transfer some files to a third party. We can invent the file format, but want to keep it simple, like CSV. These won't be big files - a few 10s of MB at most and there won't be many - 3 files per night.
Our preference for the protocol is sftp. We've done this lots in the past and we understand it well.
Their preference is to do it via a web service/SOAP/https call.
The reasons they give is reliability, mainly around knowing that they've fully received the file.
I don't buy this as a killer argument. You can easily build something into your file transfer process using sftp to make sure the transfer has completed, e.g. use headers/footers in the files, or move file between directories, etc.
The only other argument I can think of is that over http(s), ports 80/443 will be open, so there might be less firewall work for our infrastructure guys.
Can you think of any other arguments either way on this? Is there a consensus on what would be best practice here?
Thanks in advance.

File completeness is a common issue in "managed file transfer". If you went for a compromise "best practice", you'd end up running either AS/2 (a web service-ish way to transfer files that incorporates non-repudiation via signed integrity checks) or AS/3 (same thing over FTP or FTPS).
One of the problems with file integrity and SFTP is that you can't arbitrarily extend the protocol like you can FTP and FTPS. In other words, you can't add an XSHA1 command to your SFTP transfer just because you want to.
Yes, there are other workarounds (like transactional files that contain hashes of files received), but at the end of the day someone's going to have to do some work...but it really shouldn't be this hard.
If the third party you're talking to really doesn't have a non-web service call to accept large files, you might be their guinea pig as they try to navigate a brand new world. (Or, they may have jsut fired all their transmissions folks and are not just realizing that the world doesn't operate on SOAP...yet - seen that happen too.)
Either way, unless they GIVE you the magic code/utility/whatever to do the file-to-SOAP transaction for them (and that happens too), I'd stick to your sftp guns until they find the right guy on their end to talk bulk data transmissions.

SFTP is the protocol for secure file transfer, soap is an API protocol - which can be used for sending file attachments (i.e. MIME attachments), or as Base64 encoded data.
SFTP adds additional potential complexity around separate processes for encrypting/decrypting files (at-rest, if they contain sensitive data), file archiving, data latency, coordinating job scheduling, and setting-up FTP service accounts.

I’m thinking about building an offline-enabled web application.
The architecture I’m considering is as follows:
Web server (remote) <--> Web server/cache (local) <--> Browser/Prism
The advantages I envision for this model are:
Deployment is web-based, with all the advantages of this approach
Offline-enabled
UI (html/js) synchronization is a non-issue
Data synchronization can be mostly automated
as long as I stay within a RESTful paradigm
I can break this as required but manual synchronization would largely remain surgical
The local web server is started as a service; I can run arbitrary code, including behind-the-scene data synchronization
I have complete control of the data (location, no size limit, no possibility of user deleting unknowingly)
Prism with an extension could allow to keep the javascript closed source
Any thoughts on this architecture? Why should I / shouldn’t I use it? I'm particularly looking for success/horror stories.
The long version
Notes:
Users are not very computer-literate.
For instance, even superficially
explaining how Gears works is totally
out of the question.
I WILL be held liable if data is loss, even if it’s really the users fault (short of him deleting random directories on his machine)
I can require users to install something on their machine. It doesn’t have to be 100% web-based and/or run in a sandbox
The common solutions to this problem don’t feel adequate somehow. Here is a short analysis of each.
Gears/HTML5:
no control over data, can be deleted
by users without any warning
no
control over location of data (not
uniform across browsers and
platforms)
users need to open application in browser for synchronization to happen; no automatic, behind-the-scene synchronization
different browsers are treated differently, no uniform view of data on a single machine
limited disk space available
synchronization is completely manual, sql-based storage makes this a pain (would be less complicated if sql tables were completely replicated but it’s not so in my case). This is a very complex problem.
my code would be almost completely open sourced (html/js)
Adobe AIR:
some of the above
no server-side includes (!)
can run in the background, but not windowless
manual synchronization
web caching seems complicated
feels like a kludge somehow, I’ve had trouble installing on some machines
My requirements are:
Web-based (must). For a number of
reasons, sharing data between users
for instance.
Offline (must). The application must be fully usable offline (w/ some rare exceptions).
Quick development (must). I’m a single developer going against players with far more business resources.
Closed source (nice to have). Yes, I understand the open source model. However, at this point I don’t want competitors to copy me too easily. Again, they have more resources so they could take my hard work and make it better in less time than I could myself. Obviously, they can still copy me developing their own code -- that is fine.

Horror stories from a CRM product:
If your application is heavily used, storing a complete copy of its data on a user's machine is unfeasible.
If your application features data that can be updated by many users, replication is not simple. If three users with local changes synch, who wins?
In reality, this isn't really what users want. They want real-time access to the most current data from anywhere. We had better luck offering a mobile interface to a single source of truth.

The part about running the local Web server as a service appears unwise. Besides the fact that you are tied to certain operating environments that are available in the client, you are also imposing an additional burden of managing the server, on the end user. Additionally, the local Web server itself cannot be deployed in a Web-based model.
All in all, I am not too thrilled by the prospect of a real "local Web server". There is a certain bias to it, no doubt since I have proposed embedded Web servers that run inside a Web browser as part of my proposal for seamless off-line Web storage. See BITSY 0.5.0 (http://www.oracle.com/technology/tech/feeds/spec/bitsy.html)
I wonder how essential your requirement to prevent data loss at any cost is. What happens when you are offline and the disk crashes? Or there is a loss of device? In general, you want the local cache to be the least farther ahead of the server, but be prepared to tolerate loss of data to the extent that the server is behind the client. This may involve some amount of contractual negotiation or training. In practice this may not be a deal-breaker.

The only way to do this reliably is to offer some sort of "check out and lock" at the record level. When a user is going remote they must check out the records they want to work with. This check out copied the data to a local DB and prevents the record in the central DB from being modified while the record is checked out.
When the roaming user reconnects and check their locked records back in the data is updated on the central DB and unlocked.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js