When creating or editing a model that contains a reference/foreign key to another object, you have to use the uri of that object. For example, imagine we have two classes: User and Group. Each Group has many Users and each User can belong to exactly one group.
Then, if we are creating a User, we might send an object that looks like this:
{"name":"John Doe", "group":"/path/to/group/1/"}
instead of
{"name":"John Doe", "group_id":1}
I believe this is related to one of the principles of HATEOAS, but I can't find the rationale for using the resource uri rather than the id. What are some reasons for using the uri?
(I'm not interested in opinions about which is better, but in any resources that can help me understand this design choice.)
I'll take a stab
The simplest reason is that surrogate keys like your 1 only mean something within the boundaries of your system. They are meaningless outside of the system.
Expanding on this, you could build your app such that there's no limitations on the URLs that identify groups, only the conformance of the resources gathered from the response of those URLS. Someone could add a user in your system that is in a group in the FaceBook system, as long as the two systems could negotiate what a group is. There are standards for concepts like group, and it's not impossible to do such a thing.
This is how most web apps work. EG: the citation links in a wikipedia article which can point to any other article (until the wiki trolls remove it for not being an appropriate citation resource...)
having your app work like this gets you closer to RESTful conformance. Whether or not you consider RESTful architecture a good idea is what you asked us not to discuss, so i won't.
Another often cited benefit would be the ability for you to completely re-key your setup. You may dismiss this at first...but if you really use 1 for id's, that's probably an int or long, and you'll soon run out of those. Also such an id means you have to sequence them appropriately. At some point you may wish you had used a guid as your id's. Anyone holding on to your old ID scheme would be considered legacy. The URLs give you a little abstraction from this..old url's remain a legacy thing, but it's easier to identify a legacy url than it is to identify a legacy id (granted not much...it's pretty easy to know if you're getting a long or a guid, but a bit easier to see a url as /old/path/group/1 vs /new/path/group/). Generally using URLs gives you a little more forward compatibility and room to grow.
I also find providing URLs as identifiers makes it very easy for a client to retrieve information about that thing. the self link is so VERY convenient. Suppose i have some reference to a group:1....what good is that? How many UI's are going to show a control that says "add group 1". You'll want to show more. If you pass around URLs as identifiers of selections then clients can always retrieve more information about what that selection actually is. In some apps you could pass around the whole object (which would include the id) to deal with this, but it's nice to just save the URL for later retrieval (think bookmarks). Even more importantly it's always nice to be able to refresh that object regularly in order to get the latest state of it. A self link can do that very nicely, and i'd argue it's useful enough to always include...and if an always included self link identifies the resource...why do you need to also provide your surrogate key as a secondary identifier?
One side note. I try to avoid services that require a url as a parameter. I'd prefer to create the user, than have the service offer up possible group memberships as links, then have the client choose to request those state transitions from non-membership to membership. If you need to "create the user with groups" i'd go with intermediate states prior to actual submission/commitment of the new user to the service. I've found the less inputs the client has to provide, the easier the application is to use.
Related
This is more a conceptual than technical question, I guess. Suppose I have a REST API for dealing with a huge rent-a-car fleet.
The API is modelled around the business entities/resources in the very standard and coherent (even if controversial) way like that:
/cars/1234 - detailed data about a certain car
/clients/5678 - detailed data about a certain client
/cars - a list of cars and their URIs
/clients - a list of clients
However, the fleet is huge and a list of all cars is not that useful. I would rather have it filtered, like:
GET /cars?type=minivan
For properly using the "type" parameter, I should have a list of valid values such as "minivan", "convertible", "station-wagon", "hatchback", "sedan", etc. Ok, there are not that many kinds of cars out there, but let´s suppose this list is something too big for an enum in the API´s Swagger definition.
So... What would be the most consistent and natural way for a REST API to offer a list of valid values for a query parameter like that?
As a subordinated resource like /cars/types ? This would break the /cars/{id} URL pattern, isn´t it?
As a separate resource such as /tables/cars/types? This would break the consistency around the main resources of the business model itself, right?
As part of the body of a response for OPTIONS /cars ? It looks like the "RESTfullest" way to me, but some of my coworkers disagree, and OPTIONS seems to be rarely used for things like that.
Maybe as part of a response to GET /cars?&metadata=values or something alike? The "values" here would seem more semantically related to the returned data than to the query parameters, isn´t it?
Anything else?
I have googled and searched in SO for some recommendations about this particular subject, but I could not find anything to help me with arguments for such a decision...
Thank you!
Fabricio Rocha
Brasilia, Brasil
"a good REST API is like an ugly website" -- Rickard Öberg
So how would you do it in a website? Well, you'd probably have a link that goes to a form, and the form would have a list control / radio buttons with semantic cues for each option, with the expectation that the user would select a value from the options available, and the user agent would encode that value into the URL of the GET request when the form was submitted.
So in REST, you do the same thing. In the initial response, you would include a link to your "form" resource; and when a user agent gets the form resource you return a hypermedia representation of your form, with the available options encoded within it, and when the form is submitted your resources pick the client choice(s) out of the query part of the identifier.
But you probably aren't doing REST: it's a colossal PITA, and the benefits of the REST architecture constraints probably don't pay off in your context. So you are likely just looking for a reasonable spelling of an identifier for a resource that returns a message with a list of options.
As a subordinated resource like /cars/types ? This would break the /cars/{id} URL pattern, isn´t it?
Assuming your routing implementation can handle the ambiguity, that's a fine choice. You might consider whether there's just one list, or different lists for different contexts, and how to handle that.
As a separate resource such as /tables/cars/types? This would break the consistency around the main resources of the business model itself, right?
Remember OO programming and encapsulation? Decoupling the API from the underlying data model is a good thing.
That said, I'm personally not fond of the "tables" as an element in your hierarchy. If you wanted to head that direction, I'd suggest /dimensions -- it's a spelling you might use if you were designing a data warehouse
As part of the body of a response for OPTIONS /cars ? It looks like the "RESTfullest" way to me, but some of my coworkers disagree, and OPTIONS seems to be rarely used for things like that.
Yikes! RFC 7231 suggests that a very confusing idea.
The OPTIONS method requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary.
(emphasis added). When writing APIs for the web, you should always be keeping in mind that a client request may go through intermediaries that you do not control; your ability to provide a good experience in those circumstances depends on not confusing the intermediaries by deviating from the uniform interface.
Maybe as part of a response to GET /cars?&metadata=values or something alike?
For the most part, machines are pretty comfortable with any spelling. URI design guidelines normally focus on the human audience. I think that particular spelling will confuse your human consumers, especially if /cars?... would otherwise identify the resource that is a search result.
Anything else? I still feel that what people expect to find under /cars is... a bunch of cars (their representations, I mean), not a list of values among them...
So let's change up your question a little bit
What would be the most consistent and natural way for a REST API to document a list of valid values for a query parameter like that?
If there's one thing the web is really good for, it is documenting things. Pick almost any well documented web API, and pay careful attention to where you are reading about the endpoints -- that will give you some good ideas.
For instance, you could look at the StackExchange API where
https://api.stackexchange.com/docs/questions
tells you everything you need to know about the family of resources resources at
https://api.stackexchange.com/2.2/questions
Types, unsurprisingly, are documented like:
https://api.stackexchange.com/docs/types/flag-option
If you wanted to be sexy about it, you could use Accept-Type to negotiate a redirect to either human readable documentation or machine readable documentation.
I'm in a similar situation but I have a large number of fields each of which has a large number of possible values, and in some cases the values come from a hierarchy so my field is an array of strings. (Borrowing from your example: you might want to record the plant that manufactured the car but rather than a 1-dimensional list these are organised by: continent, country, and state).
I think I'll implement a /taxonomies resource to provide all the data to users. I see WordPress uses a similar scheme (http://v2.wp-api.org/reference/taxonomies/) although I haven't studied it closely yet.
Google analytics store a unique user id in a cookie names _ga. Some self traffic was already counted, and I was wondering if there's a way to filter it out by providing the _ga cookie value to some exclusion filter.
Any ideas?
Firstly, I'm gonna put it out there that there is no solution for excluding or removing historical data, except to make a filter or segment for your reports, which doesn't remove or prevent that data from showing up; it simply hides it. So if you're looking for something that gets rid of the data that is already there, sorry, not happening. Now on to making sure more data doesn't show up.
GA does not offer a way to exclude traffic by its visitor cookie (or any cookie in general). In order to do this, you will need to read the cookie yourself and expose it to something that GA can exclude by. For example, you can pop a custom variable or override/append the page name.
But this isn't really that convenient for lots of reasons, such as having to burn a custom variable slot, or having to write some server-side or client-side code to read the cookie and act on a value, etc..
And even if you do decide to do this, you're going to have to consider at least 2 scenarios in which this won't work or break:
1) This won't work if you go to your site from a different browser, since browsers can't read each other's cookies.
2) It will break as soon as you clear your cookies.
An alternative you should consider is to make an exclusion filter on your IP address. This has the benefit of:
works regardless of which browser you are on
you don't have to write any code for it that burns or overwrites any variables
you don't have to care about the cookie
General Notes
I don't presume to know your situation or knowledge, so take this for what it's worth: simply throwing out general advice because nothing goes without saying.
If you were on your own site to QA something and are wanting to remove data from some kind of development or QA efforts, a better solution in general is to have a separate environment for developing and QAing stuff. For example a separate dev.yoursite.com subdomain that mirrors live. Then you can easily make an exclusion on that subdomain (or have a separate view or property or any number of ways to keep that dev/qa traffic out).
Another thing is.. how much traffic are we talking here anyway? You really shouldn't be worrying about a few hits and whatnot that you've personally made on your live site. Again, I don't know your full situation and what your normal numbers look like, but in the grand scheme of things, a few extra hits is a drop of water in a bucket, and in general it's more useful to look at trends in the data over time, not exact numbers at a given point in time.
I've been struggling with understanding a few points I keep reading regarding RESTful services. I'm hoping someone can help clarify.
1a) There seems to be a general aversion to generated code when talking about RESTful services.
1b) The argument that if you use a WADL to generate a client for a RESTful service, when the service changes - so does your client code.
Why I don't get it: Whether you are referencing a WADL and using generated code or you have manually extracted data from a RESTful response and mapped them to your UI (or whatever you're doing with them) if something changes in the underlying service it seems just as likely that the code will break in both cases. For instance, if the data returned changes from FirstName and LastName to FullName, in both instances you will have to update your code to grab the new field and perhaps handle it differently.
2) The argument that RESTful services don't need a WADL because the return types should be well-known MIME types and you should already know how to handle them.
Why I don't get it: Is the expectation that for every "type" of data a service returns there will be a unique MIME type in existence? If this is the case, does that mean the consumer of the RESTful services is expected to read the RFC to determine the structure of the returned data, how to use each field, etc.?
I've done a lot of reading to try to figure this out for myself so I hope someone can provide concrete examples and real-world scenarios.
REST can be very subtle. I've also done lots of reading on it and every once in a while I went back and read Chapter 5 of Fielding's dissertation, each time finding more insight. It was as clear as mud the first time (all though some things made sense) but only got better once I tried to apply the principles and used the building blocks.
So, based on my current understanding let's give it a go:
Why do RESTafarians not like code generation?
The short answer: If you make use of hypermedia (+links) There is no need.
Context: Explicitly defining a contract (WADL) between client and server does not reduce coupling enough: If you change the server the client breaks and you need to regenerate the code. (IMHO even automating it is just a patch to the underlying coupling issue).
REST helps you to decouple on different levels. Hypermedia discoverability is one of the goods ones to start with. See also the related concept HATEOAS
We let the client “discover” what can be done from the resource we are operating on instead of defining a contract before. We load the resource, check for “named links” and then follow those links or fill in forms (or links to forms) to update the resource. The server acts as a guide to the client via the options it proposes based on state. (Think business process / workflow / behavior). If we use a contract we need to know this "out of band" information and update the contract on change.
If we use hypermedia with links there is no need to have “separate contract”. Everything is included within the hypermedia – why design a separate document? Even URI templates are out of band information but if kept simple can work like Amazon S3.
Yes, we still need a common ground to stand on when transferring representations (hypermedia), so we define your own media types or use widely accepted ones such as Atom or Micro-formats. Thus, with the constraints of basic building blocks (link + forms + data - hypermedia) we reduce coupling by keeping out of band information to a minimum.
As first it seems that going for hypermedia does not change the impact of change :) : But, there are subtle differences. For one, if I have a WADL I need to update another doc and deploy/distribute. Using pure hypermedia there is no impact since it's embedded. (Imagine changes rippling through a complex interweave of systems). As per your example having FirstName + LastName and adding FullName does not really impact the clients, but removing First+Last and replacing with FullName does even in hypermedia.
As a side note: The REST uniform interface (verb constraints - GET, PUT, POST, DELETE + other verbs) decouples implementation from services.
Maybe I'm totally wrong but another possibility might be a “psychological kick back” to code generation: WADL makes one think of the WSDL(contract) part in “traditional web services (WSDL+SOAP)” / RPC which goes against REST. In REST state is transferred via hypermedia and not RPC which are method calls to update state on the server.
Disclaimer: I've not completed the referenced article in detail but I does give some great points.
I have worked on API projects for quite a while.
To answer your first question.
Yes, If the services return values change (Ex: First name and Last name becomes Full Name) your code might break. You will no longer get the first name and last name.
You have to understand that WADL is a Agreement. If it has to change, then the client needs to be notified. To avoid breaking the client code, we release a new version of the API.
The version 1.0 will have First Name and last name without breaking your code. We will release 1.1 version which will have the change to Full name.
So the answer in short, WADL is there to stay. As long as you use that version of the API. Your code will not break. If you want to get full name, then you have to move to the new versions. With lot of code generation plugins in the technology market, generating the code should not be a issue.
To answer your next question of why not WADL and how you get to know the mime types.
WADL is for code generation and serves as a contract. With that you can use JAXB or any mapping framework to convert the JSON string to generated bean objects.
If not WADL, you don't need to inspect every element to determine the type. You can easily do this.
var obj =
jQuery.parseJSON('{"name":"John"}');
alert( obj.name === "John" );
Let me know, If you have any questions.
I've been reading up on REST, and I'm trying to figure out what the advantages to using it are. Specifically, what is the advantage to REST-style URLs that make them worth implementing over a more typical GET request with a query string?
Why is this URL:
http://www.parts-depot.com/parts/getPart?id=00345
Considered inferior to this?
http://www.parts-depot.com/parts/00345
In the above examples (taken from here) the second URL is indeed more elegant looking and concise. But it comes at a cost... the first URL is pretty easy to implement in any web language, out of the box. The second requires additional code and/or server configuration to parse out values, as well as additional documentation and time spent explaining the system to junior programmers and justifying it to peers.
So, my question is, aside from the pleasure of having URLs that look cool, what advantages do RESTful URLs gain for me that would make using them worth the cost of implementation?
The hope is that if you make your URL refer to a noun then there is a better chance that you will implement the HTTP verbs correctly. Beyond that there is absolutely no advantage of one URL versus another.
The reality is that the contents of an URL are completely irrelevant to a RESTful system. It is simply an identifier.
It's not what it looks like, it is what you do with it that is important.
One way of looking at REST:
http://tomayko.com/writings/rest-to-my-wife (which has now been taken down, sadly, but can still be see on web.archive.org)
So anyway, HTTP—this protocol Fielding
and his friends created—is all about
applying verbs to nouns. For instance,
when you go to a web page, the browser
does an HTTP GET on the URL you type
in and back comes a web page.
...
Instead, the large majority are busy
writing layers of complex
specifications for doing this stuff in
a different way that isn’t nearly as
useful or eloquent. Nouns aren’t
universal and verbs aren’t
polymorphic. We’re throwing out
decades of real field usage and proven
technique and starting over with
something that looks a lot like other
systems that have failed in the past.
We’re using HTTP but only because it
helps us talk to our network and
security people less. We’re trading
simplicity for flashy tools and
wizards.
One thing that jumps out at me (nice question by the way) is what they describe. The first describes an operation (getPart), the second describes a resource (part 00345).
Also, maybe you couldn't use other HTTP verbs with the first - you'd need a new method for putPart, for example. The second can be reused with different verbs (like PUT, DELETE, POST) to 'manipulate' the resource? I suppose you're also kinda saying GET twice - once with the verb, again in the method, so the second is more consistent with the intent of the HTTP protocol?
One that I always like as a savvy web-user, but certainly shouldn't be used as a guiding principle for when to use such a URL scheme is that those types of URLs are "hackable". In particular for things like blogs where I can just edit a date or a page number within a URL instead of having to find where the "next page" button is.
The biggest advantage of REST IMO is that it allows a clean way to use the HTTP Verbs (which are the most important on REST services). Actually, using REST means you are using the HTTP protocol and its verbs.
Using your urls, and imagining you want to post a "part", instead of getting it
First case should be like this:
You are using a GET where you should have used a post
http://www.parts-depot.com/parts/postPart?param1=lalala¶m2=lelele¶m3=lilili
While on a REST context, it should be
http://www.parts-depot.com/parts
and on the body, (for example) a xml like this
<part>
<param1>lalala<param1>
<param2>lelele<param1>
<param3>lilili<param1>
</part>
URI semantics are defined by RFC 2396. The extracts particularly pertinent to this question are 3.3. "Path Component":
The path component contains data, specific to the authority (or the
scheme if there is no authority component), identifying the resource
within the scope of that scheme and authority.
And 3.4 "Query Component":
The query component is a string of information to be interpreted by
the resource.
Note that the query component is not part of the resource identifier, it is merely information to be interpreted by the resource.
As such, the resource being identified by your first example is actually just /parts/getPart. If your intention is that the URL should identify one specific part resource then the first example does not do that, whereas the second one (/parts/00345) does.
So the 'advantage' of the second style of URL is that it is semantically correct, whereas the first one is not.
"The second requires additional code
and/or server configuration to parse
out values,"
Really? You choose a poor framework, then. My experience is that the RESTful version is exactly the same amount of code. Maybe I just lucked into a cool framework.
"as well as additional documentation
and time spent explaining the system
to junior programmers"
Only once. After they get it, you shouldn't have to explain it again.
"and justifying it to peers."
Only once. After they get it, you shouldn't have to explain it again.
Don't use query/search parts in URLs which aren't queries or searches, if you do that - according to the URL spec - you are likely implying something about that resource that you don't really want to.
Use query parts for resources that are a subset of some bigger resource - pagination is a good example of where this could be applied.
I'm working on building intelligence around link propagation, and because I need to deal with many short URL services where a reverse-lookup from an exact URL address is required, I need to be able to resolve multiple approximate versions of the same URL.
An example would be a URL like http://www.example.com?ref=affil&hl=en&ct=0
Of course, changing GET params in certain circumstances can refer to a completely different page, especially if the GET params in question refer to a profile or content ID.
But a quick parse of the page would quickly determine how similar the pages were to each other. Using a bit of machine learning, it could quickly become clear which GET params don't effect the content of the pages returned for a given site.
I'm assuming a service to send a URL and get a list of very similar URLs could only be offered by the likes of Google or Yahoo (or Twitter), but they don't seem to offer this feature, and I haven't found any other services that do.
If you know of any services that do cluster together groups of almost identical URLs in the aforementioned way, please let me know.
My bounty is a hug.
Every URL is akin an "address" to a location of data on the internet. The "host" part of the URL (in your example, "www.example.com") is a web-server, or a set of web-servers somewhere in the world. If we think of a URL as an "address", then the host could be a "country".
The country itself might keep track of every piece of mail that enters it. Some do, some don't. I'm talking about web-servers! Of course real countries don't make note of every piece of mail you get! :-)
But even if that "country" keeps track of every piece of mail - I really doubt they have any mechanism in place to send that list to you.
As for organizations that might do that harvesting themselves, I think the best bet would be Google, but even there the situation is rather grim. You see, because Google isn't the owner every web-server ("country") in the world, they cannot know of every URL that accesses that web-server.
But they can do the reverse. Since they can index every page they encounter, they can get a pretty good idea of every URL that appears in public HTML pages on the web. Of course, this won't include URLs people send to each other in chats, SMSs, or e-mails. But still, they can get a pretty good idea of what URLs exist.
I guess what I'm trying to say is that what you're looking for doesn't exist, really. The only way you can get all the URLs used to access a single website, is to be owner of that website.
Sorry, mate.
It sounds like you need to create some sort of discrete similarity rank between pages. This could be done by finding the number of similar words between two pages and normalizing the value to a bounded range then mapping certain portions of the range to different similarity ranks.
You would also need to know for each pair that you compare what GET parameters they had in common or how close they were. This information would become the attributes that define each of your instances (stored along side the rank mentioned above). After you have amassed a few hundred pairs of comparisons you could perhaps do some feature subset selection to identify the GET parameters that most identify how similar two pages are.
Of course, this could end up not finding anything useful at all as this dataset is likely to contain a great deal of noise.
If you are interested in this approach you should look into Infogain and feature subset selection in general. This is a link to my professors lecture notes which may come in handy. http://stuff.ttoy.net/cs591o/FSS.html