Regex in #Path matches only only 1 of 2 two routes specified, resulting in 404 - regex

Here is what dropwizard logs to the console in terms configured resources and their paths:
INFO [07:07:13.741] i.d.jersey.DropwizardResourceConfig: The following paths were found for the configured resources:
DELETE /apps/affiliate/internal/v1/templates/ (aff.affiliate.http.internal.AffiliateURLTemplatesInternalAPIEndpoint)
GET /apps/affiliate/internal/v1/templates/ (aff.affiliate.http.internal.AffiliateURLTemplatesInternalAPIEndpoint)
POST /apps/affiliate/internal/v1/templates/ (aff.affiliate.http.internal.AffiliateURLTemplatesInternalAPIEndpoint)
GET /apps/affiliate/v1/generate-url (aff.affiliate.http.AffiliateEndpoint)
GET /apps/affiliate/v1/redirect-search-url (aff.affiliate.http.AffiliateEndpoint)
GET /openapi.{type:json|yaml} (io.swagger.v3.jaxrs2.integration.resources.OpenApiResource)
GET /{path: apps/affiliate/v1/redirect|api/affiliate/v1/redirect} (aff.affiliate.http.RedirectEndpoint)
The problem is with the last path, specified as a regular expression.
My expectation is that it should trigger for incoming requests to both /apps/affiliate/v1/redirect and /api/affiliate/v1/redirect.
However, visiting /apps/affiliate/v1/redirect results in a 404, but visiting api/affiliate/v1/redirect results in a 200. How can I get my resource to respond to either of those paths?
The code is hard to provide but this is essentially the scaffolding (fwiw, all methods work/api works, I'm just having trouble having one of the methods respond to the regex (my actual problem)).
// AffiliateURLTemplatesInternalAPIEndpoint.kt
#Path("/apps/affiliate/internal/v1/templates")
#Produces(MediaType.APPLICATION_JSON)
public class AffiliateURLTemplatesInternalAPIEndpoint() : DropwizardResource() {
#GET
#Path("/")
public fun methodA()
#POST
#Path("/")
public fun methodB()
#DELETE
#Path("/")
public fun methodC()
}
// AffiliateEndpoint.kt
#Path("/apps/affiliate/v1")
class AffiliateEndpoint() : DropwizardResource() {
#GET
#Path("generate-url")
fun methodA()
#GET
#Path("redirect-search-url")
fun methodB()
// RedirectEndpoint.kt
#Path("/{path: apps/affiliate/v1/redirect|api/affiliate/v1/redirect}")
#Produces(MediaType.APPLICATION_JSON)
class RedirectEndpoint() : DropwizardResource() {
#GET
fun methodA()

The 404 is indeed being correctly returned.
Why? JAX-RS’ URL Matching Algorithm.
It was only after #Paul Samsotha asked me to paste my code that I finally realized the reason for the 404. 🤦
The Dropwizard/Jersey output I was relying on shows all the routes it found, but leaves out critical context about how paths have been structured in the code. Due to the way JAX-RS has implemented route matching sorting and precedence roles, code structure is essential in determining which routes will be triggered. So in this case the helpful output ended up being mostly misleading.
Read Section 3.7 - Matching Requests to Resource Methods of JAX-RS Spec 3.0 if you dare, but the answers are there.
Also, Chapter 4 of Bill Burke's RESTful Java with JAX-RS 2.0 gives great insight into route matching behavior. Unfortunately it doesn't go into clarifying an important distinction (the exact situation I got into) which is that you can't simply combine a resource and its methods paths (like the output) when applying JAX-RS url matching rules. Actually, I went through a bunch of JAX-RS write ups and none of them mentioned this actual distinction.
Instead you first try to find a match a root resource class, then look at resource methods. If you don't find a match either at the root or method level, you must return a 404.
Still, I found it to be a great resource at shining light on the spec and is much less intimidating that the spec.
Now to the actual explanation of the 404.
Jersey (which implements the JAX-RS spec), first collects all the paths associated with root resources:
/apps/affiliate/internal/v1/templates
/apps/affiliate/v1
/{path: apps/affiliate/v1/redirect|api/affiliate/v1/redirect}
It then applies its sorting and precedence logic according the spec (paraphrased from Burke's book):
The primary key of the sort is the number of literal characters in the full URI matching pattern. The sort is in descending order.
If two or more patterns rank similar in the first, then the secondary key of the sort is the number of template expressions embedded within the pattern. This sort is in descending order.
Finally, if two or more patterns rank similar in the second, then the tertiary key of the sort is the number of non-default template expressions. A default template expression is one that does not define a regular expression.
When an incoming GET request to /apps/affiliate/v1/redirect arrives, both
/apps/affiliate/v1
/{path: apps/affiliate/v1/redirect|api/affiliate/v1/redirect}
match, but the first pattern takes precedence because it has the greatest number of literal characters that match (18 vs 1).
Now that a root resource is selected, it looks at root resource's methods and compiles a list of available paths/http methods that match the incoming request. A bit of an extra detail, but for pattern matching purposes at the method level, the root resource's path will be concatenated to the resource method's path.
The following patterns are available to select from:
GET /apps/affiliate/v1/generate-url
GET /apps/affiliate/v1/redirect-search-url")
Since the request was a GET to /apps/affiliate/v1/redirect neither of the above routes match. Hence my 404 :(.
It makes complete sense, but I got into this rabbit hole because my assumptions about routing rules and precedence from experience working with other routing libraries did not align with the actual JAX-RS specs. I expected the library to have a master list for each and every method available (much like the initial output from Dropwizard/Jersey) and for each request to run through sorting and precedence rules on that master list. Alas, that is not the case.

Related

Regular Expressions - Parsing Domain Issues

I am trying to find the domain -- everything but the subdomain.
I have this regexp right now:
(?:[-a-zA-Z0-9]+\.)*([-a-zA-Z0-9]+(?:\.[a-zA-Z]{2,3})){1,2}
This works for things like:
domain.tld
subdomain.tld
But it runs into trouble with tld's like ".com.au" or ".co.uk":
domain.co.uk (finds co.uk, should find domain.co.uk)
subdomain.domain.co.uk (finds co.uk, should find domain.co.uk)
Any ideas?
I'm not sure this problem is "reasonably solvable"; Mozilla maintains a list of 'public suffix' domains that is intended to help browser authors accept cookies for only domains within one administrative control (e.g., prevent someone from setting a cookie valid for *.co.uk. or *.union.aero.). It obviously isn't perfect (near the end, you'll find a long list of is-a-caterer.com-style domains, so foo.is-a-caterer.com couldn't set a cookie that would be used by bar.is-a-caterer.com, but is-a-caterer.com is perfectly well a "domain" as you've defined it.)
So, if you're prepared to use the list as provided, you could write a quick little parser that would know how to apply the general rules and exceptions to determine where in the given input string your "domain" comes, and return just the portion you're interested in.
I think simpler approaches are doomed to failure: some ccTLDs such as .ca don't use second-level domains, some such as .br use dozens, and some, like lib.or.us are several levels away from the "domain" such as multnomah.lib.or.us. Unless you're using curated lists of which domains are a public suffix, you're doomed to being wrong for some non-trivial set of input strings.

TMG SF_NOTIFY_POLICY_CHECK_COMPLETED Event

According to http://msdn.microsoft.com/en-us/library/ff823993%28v=VS.85%29.aspx, during this event the web filter can request GUID of the matching rule. I am assuming that is done by performing a GetServerVariable with type of SELECTED_RULE_GUID, since I could find no other readily identifiable means of doing so.
My problem comes from the fact that I want to see if the rule is allowing or blocking the request. If it's being blocked then my filter doesn't have to take any action, but if it's being allowed I need to do some work. SF_NOTIFY_POLICY_CHECK_COMPLETED seems to be the best event to watch, since it occurs last enough that authentication and various ms_auth traffic has been handled, but just before the request either gets routed or fetched from cache.
I had thought that perhaps I needed to use COM and the IFPC interfaces (following along with example code for registering Web Filters to TMG) to get details on the rule. However, going down via FPC -> FPCArray -> FPCArrayPolicy -> FPCPolicyRules, the only element-returning function takes either an index or a name.
Which is problematic given that I only have a GUID.
The FPCPolicyRule object (singular) doesn't seem have any field related to GUID either, which eliminates just iterating over the collection for it.
So my question boils down to, from the SF_NOTIFY_POLICY_CHECK_COMPLETED event, how would a web filter determine if the request has been allowed or denied?
After more investigation and testing, the GUID is accessible via the PersistentName of the FPCPolicyRule object. Since FPCPolicyRules->Item member only works on either Name or Index, I had to iterate through its items comparing each PersistentName against the GUID.
Apologies if this was obvious, took me a good day to work out :)

Match all characters in group except for first and last occurrence

Say I request
parent/child/child/page-name
in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.
^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')
The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.
I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.
All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?
To be honest, I'm not even sure if a regular expression is the best solution to my problem.
Thank you.
You could try using:
^([\w-]+)(/.*/)([\w-]+)$
And then access the three matching groups created using Match.SubMatches. See here for more details.
EDIT
Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.

How to solve two REST problems: the interface document; loss of privacy in descriptive URLs

Coming from a lot of frustrating times with WSDL/Soap, I very much like the REST paradigm, but am trying to solve two basic problems in our application, before moving over to REST. The first problem relates to the lack of an interface document. I think I finally see how to handle this situation: One can query his way down from a top-level "/resources" resource using various requests of GET, HEAD, and OPTIONS to find the one needed resource in the correct hypermedia format. Is this the idea? If so, the client need only be provided with a top-level resource URI: http://www.mywebservicesite.com/mywebservice/resources. He will then have to do some searching and possible keep track of what he is discovering, so that he can use the URIs again efficiently in future to do GETs, POSTs, PUTs, and DELETEs. Are there any thoughts on what should happen here?
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber. We do have an implementation of opaque URLs we use in the context of a session, and I'm wondering how opaque URLs might be applied to REST. The general problem is how to keep domain-specific details out of URLs, and still benefit from what REST has to offer.
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber.
I think you've misunderstood the point of opaque URIs. The notion of opaque URIs is with respect to clients: A client shall not decipher a URI to guess anything of semantic meaning from it. So a service may well have URIs like /resources/.../customer/Madonna/phonenumber, and that's quite a good idea. The URIs should be treated as opaque by clients: not infer from the URI that it represents Madonna's phone number, and that Madonna is a customer of some sort. That knowledge can only be obtained by looking inside the URI itself, or perhaps by remembering where the URI was discovered.
Edit:
A consequence of this is that navigation should happen by links, not by deconstructing the URI. So if you see /resouces/customer/Madonna/phonenumber (and it actually represents Customer Madonna's phone number) you should have links in that resource to point to the Madonna resource: e.g.
{
"phone_number" : "01-234-56",
"customer_URI": "/resources/customer/Madonna"
}
That's the only way to navigate from a phone number resource to a customer resource. An important aspect is that the server implementation might or might not have domain specific information in the URI, The Madonna record might just as well live somewhere else: /resources/customers/byid/81496237. This is why clients should treat URIs as opaque.
Edit 2:
Another question you have (in the comments) is then how a client, with the required no knowledge of the server's URIs is supposed to be able to find anything. Clients have the following possibilities to find resources:
Provide a search interface. This could be done by providing an OpenSearch description document, which tells clients how to search for items. An OpenSearch template can include several variables, and several endpoints, depending on what you're looking for. So if you have a "customer ID" that's unique, you could have the following template: /customers/byid/{proprietary:customerid}", the customerid element needs to be documented somewhere, inside the proprietary namespace. A client can then know how to use such a template.
Provide a custom form. This implies making a custom media type in which you explicitly define how (based on an instance of the document) a URI to a customer can be forged. <customers template="/customers/byid/{id}"/>. The documentation (for the media type) would have to state that the template attribute must be interpreted as a relative URI after the string substitution "{id}" to an actual customer ID.
Provide links to all resources. Some resources aren't innumerable, so you can simply make a link to each and every one of them, optionally including identifying information along with the links. This could also be done in a custom media type: <customer id="12345" href="/customer/byid/12345"/>.
It should be noted that #1 and #2 are two ways of saying the same thing: Clients are allowed to create URIs if they
haven't got the URI structure a priori
a media type exists for which the documentation states that URIs should be created
This is much the same way as a web browser has no idea of any URI structure on the web, except for the rules laid out in the definition of HTML forms, to add a ? and then all the query parameters separated by &.
In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2. It's more common to actually provide real links between resources, rather than always relying on lookup or search techniques.
I haven't really used web RPC systems (WSDL/Soap), but i think the 'interface document' is there mostly to allow client libraries to create the service API, right? if so, REST shouldn't need it, because the verbs are already defined and don't really need to be documented again.
AFAIUI, the REST way is to document the structure of each resource (usually encoded in XML or JSON). In that document, you'll also have to document the relationship between those resources. In my case, a resource is often a container of other resources (sometimes more than one type), therefore the structure doc specifies what field holds a list of URLs pointing to the contained resources. Ideally, only one unique resource will need a single, fixed (documented) URL. everithing else follows from there.
The URL 'style' is meaningless to the client, since it shouldn't 'construct' an URL. Every URL it needs should be already constructed on a resource field. That let's you change the URL structure without changing the client (that has saved tons of time to me). Your URLs can be as opaque or as descriptive as you like. (personally, i don't like text keys or slugs; my keys are all BIGINTs or UUIDs)
I am currently building a REST "agent" that addresses the first part of your question. The agent offers a temporary bookmarking service. The client code that is interacting with the agent can request that an URL be bookmarked using some identifier. If the client code needs to retrieve that representation again, it simply asks the agent for the url that corresponds to the saved bookmark and then navigates to that bookmark. Currently those bookmarks are not persisted so they only last for the lifetime of the client application, but I have found it a useful mechanism for accessing commonly used resources. E.g. The root representation provides a login link. I bookmark that link and if the client ever receives a 401 then I can redirect to the "login" bookmark.
To address an issue you mentioned in a comment, the agent also has the ability to store retrieved representations in a dictionary. If it becomes necessary to aggregate and manipulate multiple representations at the same time then I can simply request that the agent store the current representation in a dictionary associated to a key and then continue navigating to the next resource. Once the client has accumulated all the necessary representation it can do what it needs to do.

Detail question on REST URLs

This is one of those little detail (and possibly religious) questions. Let's assume we're constructing a REST architecture, and for definiteness lets assume the service needs three parameters, x, y, and z. Reading the various works about REST, it would seem that this should be expressed as a URI like
http://myservice.example.com/service/ x / y / z
Having written a lot of CGIs in the past, it seems about as natural to express this
http://myservice.example.com/service?x=val,y=val,z=val
Is there any particular reason to prefer the all-slashes form?
The reason is small but here it is.
Cool URI's Don't Change.
The http://myservice.example.com/resource/x/y/z/ form makes a claim in front of God and everybody that this is the path to a specific resource.
Note that I changed the name. There may be a service involved, but the REST principle is that you're describing a specific web resource, named /x/y/z/.
The http://myservice.example.com/service?x=val,y=val,z=val form doesn't make as strong a claim. It says there's a piece of code named service that will try to do some sort of query. No guarantees.
Query parameters are rarely "cool". Take a look at the Google Chart API. Should that use a /full/path/notation for all of the fields? Would each URL be cool if it did?
Query parameters are useful. Optional fields can be omitted. New keys can be added to support new functionality. Over time, old fields can be deprecated and removed. Doing this is clumsier with a /path/notation .
Quoting from http://www.xml.com/pub/a/2004/08/11/rest.html
URI Opacity [BP]
The creator of a URI decides the encoding
of the URI, and users should not derive
metadata from the URI itself. URI opacity
only applies to the path of a URI. The
query string and fragment have special
meaning that can be understood by users.
There must be a shared vocabulary between
a service and its consumers.
This sounds like query strings are what you want.
One downside to query strings is that the are unordered. The GET ending with "?x=1&y=2" is different than that ending with "?y=2&x=1". This means the browser and any other intermediate systems won't be able to cache it, because caching is done based on the full URL. If this is a concern, then generate the query string in a well-defined order.
While constructing URIs this is the priniciple I follow. I don't know whether it is perfectly acceptable in all cases
Say for instance, that I have to get the details of an employee, then the URI will be of the form:
GET /employees/1/ and not GET /employees?id=1 since I treat every employee as a resource and the whole URI "employees/{id}" is used in identification of the resource.
On the other hand, if I have algorithmic operations that do not identify a specific resource as such,but merely require inputs to the algorithm which in turn identify the resource, then I use query strings.
For instance GET /employees?empname='%Bob%'&maxResults=100 might give me all employees whose names have the word Bob in them, with the maximum results returned by the query limited to 100.
Hope this answers your question
URIs are strictly split into a hierarchical part (the path) and a non-hierarchical path (the query), and both serve to identify the resource
Tthe URI spec itself (RFC 3986) clearly sets the path and the query portion of a URI as equal.
Section 3.3:
The path component contains data [...] that along with [the] query component
serves to identify a resource.
Section 3.4:
The query component contains [...] data that, along with
[...] the path component serves to identify a resource
So your choice in using x/y/z versus x=val&y=val&z=val has mainly to do if x, y or z are hierarchical in nature or if they're non-hierarchical, and if you can perceive them as always being hierarchical or non-hierarchical for the foreseeable future, along with any technical limitations you might be having on selecting one over the other.
But to answer your question, as others have noted: Neither is more RESTful than the other, since they both end up identifying a resource.
If the resource is the service, independent of parameters, it should be
http://myservice.example.com/service?x=val&y=val&z=val
This is a GET query. One of the principles behind REST is that you GET to read (but not modify!) the resource; you can POST to modify a resource & get a response; you can PUT to write to a resource; and you can DELETE to remove a resource.
If the resource specific with those parameters is a persistent resource, it needs a name. You could (if you organized your webservice this way) POST to http://myservice.example.com/service?x=val&y=val&z=val to create a particular instance of the service and have it return an ID to name this instance, e.g.
http://myservice.example.com/service/12312549
then use GET/POST/PUT/DELETE to interact with that instance.
First of all, defining URIs as part of your API violates a constraint of the REST architecture. You cannot do that and call your API RESTful.
Secondly, the reason query parameters are bad for non-query resource access is that they are generally not cached. It is also a violation of HTTP standards.
A URL with slashes like /x/y/z/ would impose a hierarchy and is not suited for the exact case of just passing three parameters.
If, like you said, x y z are indeed just parameters and the order is not important, it would be more RESTful to use semicolons:
http://myservice.example.com/service/x;y;z/
If your "service" however is just an algorithm that works the same with different parameters, there would also be nothing unRESTful with using ?x=val format.