Files in domain model - web-services

What are the best practices for dealing with binaries in domain model? I frequently must associate images and other files with business objects and the simple byte[] is not adequate even for the simplest of cases.
The files:
Does not have a fixed size and can be quite large thus:
Have to be streamed or buffered, preferably in asynchronous manner;
Must be cached both on server and client to avoid redundant transfer;
On unreliable connections the data transfer can be easily interrupted and has to be
resumed - therefore the transfer could start not from the beginning of file but from arbitrary position.
Are handled differently than the rest of the data:
In web applications are not part of the page content but are downloaded by browser separately;
Might be a black box that is handled by third-party software;
For performance reasons might not even be stored in the database.
How do we go about expressing such files in domain model (or more specifically, in model classes)? If the rest of the model is transferred via DTOs and WCF web services and persisted with NHibernate in the database, but the files not necessarily so, how to make the file handling transparent, part of the overall transaction where applicable yet support all that is necessary for them to be consumed not only in web applications, but also in ordinary desktop applications.
For WPF and ASP.NET the file object must expose some form of Url property that can be data-bound to WPF controls or used in IMG or HTML tags. Uploading a file is a lot more complicated. Preferably, proper presentation and content practices such as MVVM must be maintained there.
I am really lost here as I am not satisfied with any of my previous solutions. What would you advice?

You have to be careful not to try and shoehorn too much functionality into a single class here, your wording sounds a bit like you want a single "File" object that will do everything, this is not a good idea.
You will need to have a concept of a File representation that can be passed around everywhere as you have identified - but this needs to be little more than an identifier and possibly a name - it is then up to individual components to decide how they treat this, for example the HTML page may use a File json object and infer that jsFile.Id needs to be retrieved using ftp://xxx/uploads/{id} or something, while in order to display additional associated information a WCF service might receive the file id and look up info in a database.
It probably makes sense to have a FileAttributesDTO class or some such just to distinguish it from when you are dealing with the physical file. You need to consider seperation of concerns and nail down as many use cases as you can before you proceed really. For example will you really need additional information or would a simple wrapper around an FTP service get you all you need.

Related

REST vs RPC for a C++ API

I am writing a C++ API which is to be used as a web service. The functions in the API take in images/path_to_images as input parameters, process them, and give a different set of images/paths_to_images as outputs. I was thinking of implementing a REST interface to enable developers to use this API for their projects (independent of whatever language they'd like to work in). But, I understand REST is good only when you have a collection of data that you want to query or manipulate, which is not exactly the case here.
[The collection I have is of different functions that manipulate the supplied data.]
So, is it better for me to implement an RPC interface for this, or can this be done using REST itself?
Like lcfseth, I would also go for REST. REST is indeed resource-based and, in your case, you might consider that there's no resource to deal with. However, that's not exactly true, the image converter in your system is the resource. You POST images to it and it returns new images. So I'd simply create a URL such as:
POST http://example.com/image-converter
You POST images to it and it returns some array with the path to the new images.
Potentially, you could also have:
GET http://example.com/image-converter
which could tell you about the status of the image conversion (assuming it is a time consuming process).
The advantage of doing it like that is that you are re-using HTTP verbs that developers are familiar with, the interface is almost self-documenting (though of course you still need to document the format accepted and returned by the POST call). With RPC, you would have to define new verbs and document them.
REST use common operation GET,POST,DELETE,HEAD,PUT. As you can imagine, this is very data oriented. However there is no restriction on the data type and no restriction on the size of the data (none I'm aware of anyway).
So it's possible to use it in almost every context (including sending binary data). One of the advantages of REST is that web browser understand REST and your user won't need to have a dedicated application to send requests.
RPC presents more possibilities and can also be used. You can define custom operations for example.
Not sure you need that much power given what you intend to do.
Personally I would go with REST.
Here's a link you might wanna read:
http://www.sitepen.com/blog/2008/03/25/rest-and-rpc-relationship/
Compared to RPC, REST's(json style interface) is lightweight, it's easy for API user to use. RPC(soap/xml) seems complex and heavy.
I guess that what you want is HTTP+JSON based API, not the REST API that claimed by the REST author
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven

Can you help clarify some points regarding RESTful services and Code Generation?

I've been struggling with understanding a few points I keep reading regarding RESTful services. I'm hoping someone can help clarify.
1a) There seems to be a general aversion to generated code when talking about RESTful services.
1b) The argument that if you use a WADL to generate a client for a RESTful service, when the service changes - so does your client code.
Why I don't get it: Whether you are referencing a WADL and using generated code or you have manually extracted data from a RESTful response and mapped them to your UI (or whatever you're doing with them) if something changes in the underlying service it seems just as likely that the code will break in both cases. For instance, if the data returned changes from FirstName and LastName to FullName, in both instances you will have to update your code to grab the new field and perhaps handle it differently.
2) The argument that RESTful services don't need a WADL because the return types should be well-known MIME types and you should already know how to handle them.
Why I don't get it: Is the expectation that for every "type" of data a service returns there will be a unique MIME type in existence? If this is the case, does that mean the consumer of the RESTful services is expected to read the RFC to determine the structure of the returned data, how to use each field, etc.?
I've done a lot of reading to try to figure this out for myself so I hope someone can provide concrete examples and real-world scenarios.
REST can be very subtle. I've also done lots of reading on it and every once in a while I went back and read Chapter 5 of Fielding's dissertation, each time finding more insight. It was as clear as mud the first time (all though some things made sense) but only got better once I tried to apply the principles and used the building blocks.
So, based on my current understanding let's give it a go:
Why do RESTafarians not like code generation?
The short answer: If you make use of hypermedia (+links) There is no need.
Context: Explicitly defining a contract (WADL) between client and server does not reduce coupling enough: If you change the server the client breaks and you need to regenerate the code. (IMHO even automating it is just a patch to the underlying coupling issue).
REST helps you to decouple on different levels. Hypermedia discoverability is one of the goods ones to start with. See also the related concept HATEOAS
We let the client “discover” what can be done from the resource we are operating on instead of defining a contract before. We load the resource, check for “named links” and then follow those links or fill in forms (or links to forms) to update the resource. The server acts as a guide to the client via the options it proposes based on state. (Think business process / workflow / behavior). If we use a contract we need to know this "out of band" information and update the contract on change.
If we use hypermedia with links there is no need to have “separate contract”. Everything is included within the hypermedia – why design a separate document? Even URI templates are out of band information but if kept simple can work like Amazon S3.
Yes, we still need a common ground to stand on when transferring representations (hypermedia), so we define your own media types or use widely accepted ones such as Atom or Micro-formats. Thus, with the constraints of basic building blocks (link + forms + data - hypermedia) we reduce coupling by keeping out of band information to a minimum.
As first it seems that going for hypermedia does not change the impact of change :) : But, there are subtle differences. For one, if I have a WADL I need to update another doc and deploy/distribute. Using pure hypermedia there is no impact since it's embedded. (Imagine changes rippling through a complex interweave of systems). As per your example having FirstName + LastName and adding FullName does not really impact the clients, but removing First+Last and replacing with FullName does even in hypermedia.
As a side note: The REST uniform interface (verb constraints - GET, PUT, POST, DELETE + other verbs) decouples implementation from services.
Maybe I'm totally wrong but another possibility might be a “psychological kick back” to code generation: WADL makes one think of the WSDL(contract) part in “traditional web services (WSDL+SOAP)” / RPC which goes against REST. In REST state is transferred via hypermedia and not RPC which are method calls to update state on the server.
Disclaimer: I've not completed the referenced article in detail but I does give some great points.
I have worked on API projects for quite a while.
To answer your first question.
Yes, If the services return values change (Ex: First name and Last name becomes Full Name) your code might break. You will no longer get the first name and last name.
You have to understand that WADL is a Agreement. If it has to change, then the client needs to be notified. To avoid breaking the client code, we release a new version of the API.
The version 1.0 will have First Name and last name without breaking your code. We will release 1.1 version which will have the change to Full name.
So the answer in short, WADL is there to stay. As long as you use that version of the API. Your code will not break. If you want to get full name, then you have to move to the new versions. With lot of code generation plugins in the technology market, generating the code should not be a issue.
To answer your next question of why not WADL and how you get to know the mime types.
WADL is for code generation and serves as a contract. With that you can use JAXB or any mapping framework to convert the JSON string to generated bean objects.
If not WADL, you don't need to inspect every element to determine the type. You can easily do this.
var obj =
jQuery.parseJSON('{"name":"John"}');
alert( obj.name === "John" );
Let me know, If you have any questions.

flash - django communication -- amf, xml, or json?

We are considering to develop a Flash front-end to a web application written using Django. The Flash front-end will send a simple "id" to the server and in response receive a couple of objects. The application will be open only to authenticated users.
To the extend of my current knowledge (which is basic for Flash) we can either use AMF or take an XML or JSON approach. AMF seems to have an upperhand as there are examples out on the internet showing it can cooperate easily with Django's authentication mechanism (most examples feature pyAMF). On the other hand, implementing a XML/JSON based solution may be easier and hassle free.
Guidance will be much appreciated.
We've used pyAMF + Django on many projects here, and it's a breeze to setup and get running. If you need speed, AMF3 is probably your best bet. It's the smallest/fastest way to transfer data, and serialization is taken care of for you.
On the flip side, setting up json with Django isn't much work either, and it would give you the ability to hook other, non-AMF systems into it without any extra work. You just sacrifice a little speed for that benefit.
If you think you'll ever need other systems working with the backend, or if you think you might switch to an HTML-only, or offer some kind of non-Flash version of your app, I'd go JSON, otherwise, I'd use AMF.
first of all, you should design your app in such a way this doesn't matter. the transport layer should be completely encapsulated, leaving the encoding format transparent to the rest of the app.
personally I prefer JSON to AMF because it's human readable (which makes debugging easier) and there are implementations for every platform/language (so you can reuse the server part with JavaScript for example). And I prefer JSON to XML because it's more compact and semantically less unambiguous as well as closer to common object models. Also it can transport numerical and boolean data in a typesafe manner.
JSON will probaby have the least complications and there's a great google code project that has JSON encoders and decoders here: http://code.google.com/p/as3corelib/

Is there a database implementation that has notifications and revisions?

I am looking for a database library that can be used within an editor to replace a custom document format. In my case the document would contain a functional program.
I want application data to be persistent even while editing, so that when the program crashes, no data is lost. I know that all databases offer that.
On top of that, I want to access and edit the document from multiple threads, processes, possibly even multiple computers.
Format: a simple key/value database would totally suffice. SQL usually needs to be wrapped, and if I can avoid pulling in a heavy ORM dependency, that would be splendid.
Revisions: I want to be able to roll back changes up to the first change to the document that has ever been made, not only in one session, but also between sessions/program runs.
I need notifications: each process must be able to be notified of changes to the document so it can update its view accordingly.
I see these requirements as rather basic, a foundation to solve the usual tough problems of an editing application: undo/redo, multiple views on the same data. Thus, the database system should be lightweight and undemanding.
Thank you for your insights in advance :)
Berkeley DB is an undemanding, light-weight key-value database that supports locking and transactions. There are bindings for it in a lot of programming languages, including C++ and python. You'll have to implement revisions and notifications yourself, but that's actually not all that difficult.
It might be a bit more power than what you ask for, but You should definitely look at CouchDB.
It is a document database with "document" being defined as a JSON record.
It stores all the changes to the documents as revisions, so you instantly get revisions.
It has powerful javascript based view engine to aggregate all the data you need from the database.
All the commits to the database are written to the end of the repository file and the writes are atomic, meaning that unsuccessful writes do not corrupt the database.
Another nice bonus You'll get is easy and flexible replication and of your database.
See the full feature list on their homepage
On the minus side (depending on Your point of view) is the fact that it is written in Erlang and (as far as I know) runs as an external process...
I don't know anything about notifications though - it seems that if you are working with replicated databases, the changes are instantly replicated/synchronized between databases. Other than that I suppose you should be able to roll your own notification schema...
Check out ZODB. It doesn't have notifications built in, so you would need a messaging system there (since you may use separate computers). But it has transactions, you can roll back forever (unless you pack the database, which removes earlier revisions), you can access it directly as an integrated part of the application, or it can run as client/server (with multiple clients of course), you can have automatic persistency, there is no ORM, etc.
It's pretty much Python-only though (it's based on Pickles).
http://en.wikipedia.org/wiki/Zope_Object_Database
http://pypi.python.org/pypi/ZODB3
http://wiki.zope.org/ZODB/guide/index.html
http://wiki.zope.org/ZODB/Documentation

Admin interface to manage two related data sources

In the project there are two data sources: one is project's own database, another is (semi-)legacy web service. The problem is that admin part has to keep them in sync and manage both so that user doesn't have to know they're separate (or, do know, but they do not care).
Here's an example: there's list of languages. Both apps - project and legacy - need to use them. However, they both add their own meaning. For example, project may need active/inactive, and legacy will need language code.
But admin part has to manage everything - language name, active/inactive, language code. When loading, data from both systems has to be merged and presented, and when saved, data has to be updated in both systems.
Thus, what's the best way to represent this separated data (to be used in the admin page)? Notice that I use ASP.NET MVC / NHibernate.
How do I manage legacy data?
Do I connect admin part to legacy web service external interface - where it currently only has GetXXX() methods - and add the missed C[R]UD methods?
Or, do I connect directly to legacy database - which is possible since I do control it.
Where do I do split/merge of data - in the controller/service layer, or in the repository/data layer?
In the controller layer I'll do "var viewmodel = new ViewModel { MyData = ..., LegacyData = ... }; The problem - code cluttered with legacy issues.
In the data layer, I'll do "var model = repository.Get(id)" and model will contain data from both worlds, and when I do "repository.Save(entity)" it will update both data sources - in local db only project specific fields will be stored. The problems: a) possible leaky abstraction b) getting data from web service always while it is only need sometimes and usually for admin part only
a modification, use ICombinedRepository<Language> which will provide additional split/merge. Problems: still need either new model or IWithLegacy<Language, LegacyLanguage>...
Have a single "sync" method; this will remove legacy items not present in the project item list, update those that are present, create legacy items that are missed, etc...
Well, to summarize the main issues:
do I develop CRUD interface on web service or connect directly to its database (which is under my complete control, so that I may even later decide to move that web service part into the main app or make it use the main db)?
do I have separate classes for project's and legacy entities, thus managed separately, or have project's entities have all the legacy fields, managed transparently when saved/loaded?
Anyway, are there any useful tips on managing mostly duplicated data from different sources? What are the best practices?
In the non-admin part, I'd like to completely hide the notion of the legacy data. Which is what I do now, behind the repository interfaces. But for admin part it's not that clear or easy...
What you are describing here seems to warrant the need for an Anti-Corruption Layer. You can find solutions related to this topic here: DDD, Anti Corruption layer, how-to?
When you have two conceptual Bounded Contexts, but you're only using DDD for one of them, the Anti-Corruption layer comes into play. When reading from your data source (performing a get operation [R]), the anti-corruption layer will translate your legacy data into usable objects for your project. When writing to your data source (performing a set operation [CUD]), the anti-corruption layer will translate your DDD objects into objects understood by your legacy code.
Whether or not to use the existing Web Service depends on whether or not you're willing to change existing code. Sticking with DRY practices, you don't want to duplicate what you already have. If you want to keep the Web Service, you can add CUD methods inside the anti-corruption layer without impacting your legacy application.
In the anti-corruption layer, you will want to make use of adapters and facades to bring together separate classes for your DDD project and the legacy application.
The anti-corruption layer is exactly where you handle splitting and merging.
Let me know if you have any questions on this, as it can be a somewhat advanced topic. I'll try to answer as best I can.
Good luck!