Akka Http turn off header parsing - akka

I'm trying to implement a transparent proxy with Akka-Http & Akka-Stream.
However, I'm running into an issue where Akka-Http maniuplates and parses the response headers from the upstream server.
For example, when the upstream server sends the following header:
Expires: "0"
Akka will parse this into Expires Header and correct the the value to:
Expires: "Wed, 01 Jan 1800 00:00:00 GMT"
Although start of unix time is better than "0", I don't want this proxy to touch any of the headers. I want the proxy to be transparent and not "fix" any of the headers passing through.
Here is the simple proxy:
Http().bind("localhost", 9000).to(Sink.foreach { connection =>
logger.info("Accepted new connection from " + connection.remoteAddress)
connection handleWith pipeline
}).run()
The proxy flow:
Flow[HttpRequest].map(x => (x, UUID.randomUUID().toString()).via(Http().superPool[String]()).map(x => x._1)
I noticed that the http-server configuration allows me to configure and keep the raw request headers, but there doesn't seem to be one for http-client.
raw-request-uri-header = off
Is there way I can configure Akka to leave the header values as is when I respond to the client?

This is not possible currently.
I wonder how hard it would be to expose such mode, and how much complexity we'd have to pay for it, however I err on the side of this feature not being able to pull its weight.
Feel free to open a ticket for it on http://github.com/akka/akka where we could discuss it further. Some headers are treated specially so we really do want to parse them into the proper model – imagine websocket upgrades, Connection headers etc, so there would have to be a strong case behind this feature request to make it pull its weight IMO.
(I'm currently maintaining Akka HTTP).

Related

Timeout for incomplete HTTP requests

I'm new to Envoy Filters and I want to set the timeout period for incomplete HTTP requests to be equal to or less than 40 seconds.
Envoy Filters provides HTTP connection manager extension where you can do :
common_http_protocol_options:
idle_timeout: 40s
I'm not very confident if this is the solution to what I'm trying to implement since a lot of concepts are new to me. Your help is very appreciated.

AWS HTTP API Gateway 503 Service Unavailable

I have an HTTP API Gateway with a HTTP Integration backend server on EC2. The API has lots of queries during the day and looking at the logs i realized that the API is returning sometimes a 503 HTTP Code with a body:
{ "message": "Service Unavailable" }
When i found out this, i tried the API and running the HTTP requests many times on Postman, when i try twenty times i get at least one 503.
I then thought that the HTTP Integration Server was busy but the server is not loaded and i tried going directly to the HTTP Integration Server and i get 200 responses all the times.
The timeout parameter is set to 30000ms and the endpoint average response time is 200ms so timeout is not a problem. Also the HTTP 503 is not after 30 seconds of the request but instantly.
Can anyone help me?
Thanks
I solved this issue by editing the keep-alive connection parameters of my internal integration server. The AWS API Gateway needs the keep alive parameters on a standard configuration, so I started tweaking my NGINX server parameters until I solved the issue.
Had the same issue on a selfmade Microservice with Node that was integrated into AWS API-Gateway. After some reconfiguration of the Cloudwatch-Logs I got further indicator on what is wrong: INTEGRATION_NETWORK_FAILURE
Verify your problem is alike - i.e. through elaborated log output
In API-Gateway - Logging add more output in "Log format"
Use this or similar content for "Log format":
{"httpMethod":"$context.httpMethod","integrationErrorMessage":"$context.integrationErrorMessage","protocol":"$context.protocol","requestId":"$context.requestId","requestTime":"$context.requestTime","resourcePath":"$context.resourcePath","responseLength":"$context.responseLength","routeKey":"$context.routeKey","sourceIp":"$context.identity.sourceIp","status":"$context.status","errMsg":"$context.error.message","errType":"$context.error.responseType","intError":"$context.integration.error","intIntStatus":"$context.integration.integrationStatus","intLat":"$context.integration.latency","intReqID":"$context.integration.requestId","intStatus":"$context.integration.status"}
After using API-Gateway Endpoint and failing consult the logs again - should be looking like that:
Solve in NodeJS Microservice (using Express)
Add timeouts for headers and keep-alive on express servers socket configuration when upon listening.
const app = require('express')();
// if not already set and required to advertise the keep-alive through HTTP-Response you might want to use this
/*
app.use((req: Request, res: Response, next: NextFunction) => {
res.setHeader('Connection', 'keep-alive');
res.setHeader('Keep-Alive', 'timeout=30');
next();
});
*/
/* ..you r main logic.. */
const server = app.listen(8080, 'localhost', () => {
console.warn(`⚡️[server]: Server is running at http://localhost:8080`);
});
server.keepAliveTimeout = 30 * 1000; // <- important lines
server.headersTimeout = 35 * 1000; // <- important lines
Reason
Some AWS Components seem to demand a connection kept alive - even if server responding otherwise (connection: close). Upon reusage in API Gateway (and possibly AWS ELBs) the recycling will fail because other-side most likely already closed hence the assumed "NETWORK-FAILURE".
This error seems intermittent - since at least the API-Gateway seems to close unused connections after a while providing a clean execution the next time. I can only assume they do that for high-performance and not divert to anything less.

What HTTP status code should I use for a GET request that may return stale data?

The scenario is: I'm implementing a RESTful web-service that will act as a cache to entities stored on remote a C system. One of the web-service's requirements is that, when the remote C system is offline, it would answer GET requests with the last cached data, but flagging it as "stale".
The way I was planning to flag the data as stale was returning a HTTP status code other than 200 (OK). I considered using 503 (service unavailable), but I believe that it would make some C#/Java HTTP clients throw exceptions, and that would indirectly force the users to use exceptions for control flow.
Can you suggest a more appropriate status code? Or should I just return 200 and add a staleness flag to the response body? Another option would be defining a separate resource that informs the connectivity state, and let the clients handle that separately.
Simply set the Last-Modified header appropriately, and let the client decide if it's stale. Stale data will have the Last-Modified date farther back than "normal". For fresh data, keep the Last-Modified header current.
I would return 200 OK and an appropriate application-specific response. No other HTTP status code seems appropriate, because the decision if and how to use the response is being passed to the client. I would also advise against using standard HTTP cache control headers for this purpose. I would use them only to control third-party (intermediary and client) caches. Using these headers to communicate application-specific information uneccesarily ties application logic to cache control. While it might not be immediately obvious, there are real long-term benefits in the ability to independently evolve application logic and caching strategy.
If you are serving stale responses RFC-2616 says:
If a stored response is not "fresh enough" by the most
restrictive freshness requirement of both the client and the
origin server, in carefully considered circumstances the cache
MAY still return the response with the appropriate Warning
header (see section 13.1.5 and 14.46), unless such a response
is prohibited (e.g., by a "no-store" cache-directive, or by a
"no-cache" cache-request-directive; see section 14.9).
In other words, serving 200 OK is perfectly fine.
In Mark Nottingham's caching article he says
Under certain circumstances — for example, when it’s disconnected from
a network — a cache can serve stale responses without checking with
the origin server.
In your case, your web service is behaving like an intermediary cache.
A representation is stale when either it's Expires or Max-age header has passed. Therefore if you returned a representation with
Cache-control: Max-age=0
Then you are effectively saying that the representation you are returning is already stale. Assuming that when you retrieve representations from the "System C" that the data can be considered fresh for some non-zero amount of time, your web service can return representations with something like,
Cache-control: Max-age=3600
The client can check cache control header for max-age == 0 to determine if the representation was stale when it was first retrieved or not.

HTTP Request signature without session

I am thinking of a rest web service that ensure for every request sent to him that :
The request was generated by the user who claim it ;
The request has not been modified by someone else (uri/method/content/date);
For GET requests, it should be possible to generate a URI with enough information in it to check the signature and set a date of expiration. That way a user can delegate temporary READ permissions to a collaborator for a limited time period on a ressource with a generated URI.
Clients are authenticated with id and a content-signature based on their password.
There should be no session at all, and so server state ! The server and the client share a secret key (a password)
After thinking about it and talking with some really nice folks, it seems there is no rest service existing to do that as simple as it should be for my use case. (HTTP Digest and OAuth can do this with server state and are very chatty)
So I Imagined one, and I'm asking your greats comments on how it should be designed (I will release it OpenSource and Hope it can help others).
The service use a custom "Content-signature" header to store credentials. An authenticated request should contains this header :
Content-signature: <METHOD>-<USERID>-<SIGNATURE>
<METHOD> is the sign method used, in our case SRAS.
<USERID> stands for the user ID mentioned earlier.
<SIGNATURE> = SHA2(SHA2(<PASSWORD>):SHA2(<REQUEST_HASH>));
<REQUEST_HASH> = <HTTP_METHOD>\n
<HTTP_URI>\n
<REQUEST_DATE>\n
<BODY_CONTENT>;
A request is invalidated 10 minutes after it has been created.
For example a typical HTTP REQUEST would be :
POST /ressource HTTP/1.1
Host: www.elphia.fr
Date: Sun, 06 Nov 1994 08:49:37 GMT
Content-signature: SRAS-62ABCD651FD52614BC42FD-760FA9826BC654BC42FD
{ test: "yes" }
The server will answer :
401 Unauthorized
OR
200 OK
Variables would be :
<USERID> = 62ABCD651FD52614BC42FD
<REQUEST_HASH> = POST\n
/ressource\n
Sun, 06 Nov 1994 08:49:37 GMT\n
{ test: "yes" }\n
URI Parameters
Some parameters can be added to the URI (they overload the headers informations) :
_sras.content-signature=<METHOD>-<USERID>-<SIGNATURE> : PUT the credentials in the URI, not in the HTTP header. This allow a user to share a signed request ;
_sras.date=Sun, 06 Nov 1994 08:49:37 GMT (request date*) : The date when the request was created.
_sras.expires=Sun, 06 Nov 1994 08:49:37 GMT (expire date*) : Tell the server the request should not expire before the specified date
*date format : http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.18
Thanks for your comments.
There are several issues that you need to consider when designing a signature protocol. Some of these issues might not apply to your particular service:
1- It is customary to add an "X-Namespace-" prefix to non-standard headers, in your case you could name your header something like: "X-SRAS-Content-Signature".
2- The Date header might not provide enough resolution for the nonce value, I would therefore advise for a timestamp having at least 1 millisecond of resolution.
3- If you do not store at least the last nonce, one could still replay a message in the 10 minutes window, which is probably unacceptable on a POST request (could create multiple instances with same values in your REST web service). This should not be a problem for GET PUT or DELETE verbs.
However, on a PUT, this could be used for a denial of service attack by forcing to update many times the same object within the proposed 10 minutes window. On a GET or DELETE a similar problem exists.
You therefore probably need to store at least the last used nonce associated with each user id and share this state between all your authentication servers in real-time.
4- This method also requires that the client and servers be clock synchronized with less than 10 minutes skew. This can be tricky to debug, or impossible to enforce if you have AJAX clients for which you do not control the clock. This also requires to set all timestamps in UTC.
An alternative is to drop the 10 minutes window requirement but verify that timestamps increase monotonically, which again requires to store the last nonce. This is still a problem if the client's clock is updated to a date prior to the last used nonce. Access would be denied until the client's clock pass the last nonce or the server nonce state is reset.
A monotonically increasing counter is not an option for clients that cannot store a state, unless the client could request the last used nonce to the server. This would be done once at the beginning of each session and then the counter would be incremented at each request.
5- You also need to pay attention to retransmissions due to networks errors. You cannot assume that the server has not received the last message for which a TCP Ack has not been received by the client before the TCP connection dropped. Therefore the nonce needs to be incremented between each retransmission above the TCP level and the signature re-calculated with the new nonce. Yet a message number needs to be added to prevent double execution on the server: a double POST would result in 2 object being created.
6- You also need to sign the userid, otherwise, an attacker might be able to replay the same message for all users which nonces have not yet reached that of the replayed message.
7- Your method does not guaranty the client that the server is authentic and has not been DNS-hijacked. Server authentication is usually considered important for secure communications. This service could be provided by signing responses from the server, using the same nonce as that of the request.
I would note that you can accomplish this with OAuth, most notably "2-legged OAuth" where client and server share a secret. See https://www.rfc-editor.org/rfc/rfc5849#page-14. In your case, you want to omit the oauth_token parameter and probably use the HMAC-SHA1 signature method. There's nothing particularly chatty about this; you don't need to go through the OAuth token acquisition flows to do things this way. This has the advantage of being able to use any of several existing open source OAuth libraries.
As far as server-side state, you do need to keep track of what secrets go with which clients, as well as which nonces have been used recently (to prevent replay attacks). You can skip the nonce checking / lifetimes if you run things over HTTPS, but if you're going to do that, then HTTPS + Basic Auth gives you everything you described without having to write new software.

What are REST API error handling best practices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm looking for guidance on good practices when it comes to return errors from a REST API. I'm working on a new API so I can take it any direction right now. My content type is XML at the moment, but I plan to support JSON in future.
I am now adding some error cases, like for instance a client attempts to add a new resource but has exceeded his storage quota. I am already handling certain error cases with HTTP status codes (401 for authentication, 403 for authorization and 404 for plain bad request URIs). I looked over the blessed HTTP error codes but none of the 400-417 range seems right to report application specific errors. So at first I was tempted to return my application error with 200 OK and a specific XML payload (ie. Pay us more and you'll get the storage you need!) but I stopped to think about it and it seems to soapy (/shrug in horror). Besides it feels like I'm splitting the error responses into distinct cases, as some are http status code driven and other are content driven.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
A great resource to pick the correct HTTP error code for your API:
http://www.codetinkerer.com/2015/12/04/choosing-an-http-status-code.html
An excerpt from the article:
Where to start:
2XX/3XX:
4XX:
5XX:
So at first I was tempted to return my application error with 200 OK and a specific XML payload (ie. Pay us more and you'll get the storage you need!) but I stopped to think about it and it seems to soapy (/shrug in horror).
I wouldn't return a 200 unless there really was nothing wrong with the request. From RFC2616, 200 means "the request has succeeded."
If the client's storage quota has been exceeded (for whatever reason), I'd return a 403 (Forbidden):
The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
This tells the client that the request was OK, but that it failed (something a 200 doesn't do). This also gives you the opportunity to explain the problem (and its solution) in the response body.
What other specific error conditions did you have in mind?
The main choice is do you want to treat the HTTP status code as part of your REST API or not.
Both ways work fine. I agree that, strictly speaking, one of the ideas of REST is that you should use the HTTP Status code as a part of your API (return 200 or 201 for a successful operation and a 4xx or 5xx depending on various error cases.) However, there are no REST police. You can do what you want. I have seen far more egregious non-REST APIs being called "RESTful."
At this point (August, 2015) I do recommend that you use the HTTP Status code as part of your API. It is now much easier to see the return code when using frameworks than it was in the past. In particular, it is now easier to see the non-200 return case and the body of non-200 responses than it was in the past.
The HTTP Status code is part of your api
You will need to carefully pick 4xx codes that fit your error conditions. You can include a rest, xml, or plaintext message as the payload that includes a sub-code and a descriptive comment.
The clients will need to use a software framework that enables them to get at the HTTP-level status code. Usually do-able, not always straight-forward.
The clients will have to distinguish between HTTP status codes that indicate a communications error and your own status codes that indicate an application-level issue.
The HTTP Status code is NOT part of your api
The HTTP status code will always be 200 if your app received the request and then responded (both success and error cases)
ALL of your responses should include "envelope" or "header" information. Typically something like:
envelope_ver: 1.0
status: # use any codes you like. Reserve a code for success.
msg: "ok" # A human string that reflects the code. Useful for debugging.
data: ... # The data of the response, if any.
This method can be easier for clients since the status for the response is always in the same place (no sub-codes needed), no limits on the codes, no need to fetch the HTTP-level status-code.
Here's a post with a similar idea: http://yuiblog.com/blog/2008/10/15/datatable-260-part-one/
Main issues:
Be sure to include version numbers so you can later change the semantics of the api if needed.
Document...
Remember there are more status codes than those defined in the HTTP/1.1 RFCs, the IANA registry is at http://www.iana.org/assignments/http-status-codes. For the case you mentioned status code 507 sounds right.
As others have pointed, having a response entity in an error code is perfectly allowable.
Do remember that 5xx errors are server-side, aka the client cannot change anything to its request to make the request pass. If the client's quota is exceeded, that's definitly not a server error, so 5xx should be avoided.
I know this is extremely late to the party, but now, in year 2013, we have a few media types to cover error handling in a common distributed (RESTful) fashion. See "vnd.error", application/vnd.error+json (https://github.com/blongden/vnd.error) and "Problem Details for HTTP APIs", application/problem+json (https://datatracker.ietf.org/doc/html/draft-nottingham-http-problem-05).
There are two sorts of errors. Application errors and HTTP errors. The HTTP errors are just to let your AJAX handler know that things went fine and should not be used for anything else.
5xx Server Error
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
506 Variant Also Negotiates (RFC 2295 )
507 Insufficient Storage (WebDAV) (RFC 4918 )
509 Bandwidth Limit Exceeded (Apache bw/limited extension)
510 Not Extended (RFC 2774 )
2xx Success
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information (since HTTP/1.1)
204 No Content
205 Reset Content
206 Partial Content
207 Multi-Status (WebDAV)
However, how you design your application errors is really up to you. Stack Overflow for example sends out an object with response, data and message properties. The response I believe contains true or false to indicate if the operation was successful (usually for write operations). The data contains the payload (usually for read operations) and the message contains any additional metadata or useful messages (such as error messages when the response is false).
Agreed. The basic philosophy of REST is to use the web infrastructure. The HTTP Status codes are the messaging framework that allows parties to communicate with each other without increasing the HTTP payload. They are already established universal codes conveying the status of response, and therefore, to be truly RESTful, the applications must use this framework to communicate the response status.
Sending an error response in a HTTP 200 envelope is misleading, and forces the client (api consumer) to parse the message, most likely in a non-standard, or proprietary way. This is also not efficient - you will force your clients to parse the HTTP payload every single time to understand the "real" response status. This increases processing, adds latency, and creates an environment for the client to make mistakes.
Modeling your api on existing 'best practices' might be the way to go.
For example, here is how Twitter handles error codes
https://developer.twitter.com/en/docs/basics/response-codes
Please stick to the semantics of protocol. Use 2xx for successful responses and 4xx , 5xx for error responses - be it your business exceptions or other. Had using 2xx for any response been the intended use case in the protocol, they would not have other status codes in the first place.
Don't forget the 5xx errors as well for application errors.
In this case what about 409 (Conflict)? This assumes that the user can fix the problem by deleting stored resources.
Otherwise 507 (not entirely standard) may also work. I wouldn't use 200 unless you use 200 for errors in general.
If the client quota is exceeded it is a server error, avoid 5xx in this instance.