Istio Envoy Filter Rate Limiting URL Parsing - istio

What's the best way to parse URLs using EnvoyFilters in istio? I've seen this article and it looks like the only option is to write regex parsers to do so?
https://dev.to/tresmonauten/setup-an-ingress-rate-limiter-with-envoy-and-istio-1i9g
Regex'ing every single request seems expensive to do. I havent seen any examples of people actually doing this. The rate limiting I've seen has always using some header to do so. Is the preferred Istio way of rate limiting requiring clients to send headers?

Related

Django - How to store all the requests/responses with the least overhead?

I'm working on a Django middleware to store all requests/responses in my main database (Postgres / SQLite).
But it's not hard to guess that the overhead will be crazy, so I'm thinking to use Redis to queue the requests for an amount of time and then send them slowly to my database.
e.g. receiving 100 requests, storing them in database, waiting to receive another 100 requests and doing the same, or something like this.
The model is like this:
url
method
status
user
remote_ip
referer
user_agent
user_ip
metadata # any important piece of data related to request/response e.g. errors or ...
created_at
updated_at
My questions are "is it a good approach? and how we can implement it? do you have any example that does such a thing?"
And the other question is that "is there any better solution"?
This doesn't suit the concrete question/answer format particularly well, unfortunately.
"Is this a good approach" is difficult to answer directly with a yes or no response. It will work and your proposed implementation looks sound, but you'll be implementing quite a bit of software and adding quite a bit of complexity to your project.
Whether this is desirable isn't easily answerable without context only you have.
Some things you'll want to answer:
What am I doing with these stored requests? Debugging? Providing an audit trail?
If it's for debugging, what does a database record get us that our web server's request logs do not?
If it's for an audit trail, is every individual HTTP request the best representation of that audit trail? Does an auditor care that someone asked for /favicon.ico? Does it convey the meaning and context they need?
Do we absolutely need every request stored? For how long? How do we handle going over our storage budget? How do we handle in edge cases like the client hanging up before getting the response, or we've processed the request but crashed before sending a response or logging the record?
Does logging a request in band with the request itself present a performance cost we actually can't afford?
Compare the strengths and weaknesses of your approach to some alternatives:
We can rely on the web server's logs, which we're already paying the cost for and are built to handle many of the oddball cases here.
We can write an HTTPLog model in band with the request using a simple middleware function, which solves some complexities like "what if redis is down but django and the database aren't?"
We write an audit logging system by providing any context needed to an out-of-band process (perhaps through signals or redis+celery)
Above all: capture your actual requirements first, implement the simplest solution that works second, and optimize only after you actually see performance issues.
I would not put this functionality in my Django application. There are many tools to do that. One of them is NGINX, which is a reverse proxy server which you can put infront of Django. Then you can use the access log from NGINX. Also, you can format those logs according to your need. Usually for this big amount of data, it is better to not store them in database, because this data will rarely be used. You can store them in a S3 bucket or just in plain files and use a log parser tool to parse them.

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?
You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

AWS WAF Rate Limit by Request URL Component

I want to start by saying that I'm a total newcomer to AWS.
I'm investigating using AWS WAF for dynamic rate limiting based on a component of the request URL. The AWS website has a tutorial for doing this by IP address, but I have no idea if it can be modified to do what I need.
So, with that in mind, please tell me what, if any, of the following is actually possible:
Rate limit by a component of the URL (an API key in this case)
Determine limit dynamically (different behaviour for different keys)
Perform some non-blocking action in the first instance of exceeding
the limit, then block if the limit is exceeded consistently
Log both of the above actions and do something with the outputted logs (i.e. forward them somewhere)
Again, I'm not looking for detailed how-tos here as they would probably warrant seperate questions - just want to know if this is possible.
API Gateway is probably the right fit for what you are looking to implement. It has throttling implemented out of the box.
Take a look at API Gateway Usage Plans for implementation details for your specific use case.

Rest vs. Soap. Has REST a better performance?

I read some questions already posted here regarding Soap and Rest
and I didn't find the answer I am looking for.
We have a system which has been built using Soap web services.
The system is not very performant and it is under discussion
to replace all Soap web services for REST web services.
Somebody has argued that Rest has a better performance.
I don't know if this is true. (This was my first question)
Assuming that this is true, is there any disadvantage using
REST instead of Soap? (Are we loosing something?)
Thanks in advance.
Performance is broad topic.
If you mean the load of the server, REST has a bit better performance because it bears minimal overhead on top of HTTP. Usually SOAP brings with it a stack of different (generated) handlers and parsers. Anyway, the performance difference itself is not that big, but RESTful service is more easy to scale up since you don't have any server side sessions.
If you mean the performance of the network (i.e. bandwidth), REST has much better performance. Basically, it's just HTTP. No overhead. So, if your service runs on top of HTTP anyway, you can't get much leaner than REST. Furthermore if you encode your representations in JSON (as opposed to XML), you'll save many more bytes.
In short, I would say 'yes', you'll be more performant with REST. Also, it (in my opinion) will make your interface easier to consume for your clients. So, not only your server becomes leaner but the client too.
However, couple of things to consider (since you asked 'what will you lose?'):
RESTful interfaces tend to be a bit more "chatty", so depending on your domain and how you design your resources, you may end up doing more HTTP requests.
SOAP has a very wide tool support. For example, consultants love it because they can use tools to define the interface and generate the wsdl file and developers love it because they can use another set of tools to generate all the networking code from that wsdl file. Moreover, XML as representation has schemas and validators, which in some cases may be a key issue. (JSON and REST do have similar stuff coming but the tool support is far behind)
SOAP requires an XML message to be parsed and all that <riduculouslylongnamespace:mylongtagname>extra</riduculouslylongnamespace:mylongtagname> stuff to be sent and receieved.
REST usually uses something much more terse and easily parsed like JSON.
However in practice the difference is not that great.
Building a DOM from XMLis usually done a superfast superoptimised piece of code like XERCES in C++ or Java whereas most JSON parsers are in the roll your own or interpreted catagory.
In fast network environment (LAN or Broadband) there is not much difference between sending a one or two K versus 10 to 15 k.
You phrase the question as if REST and SOAP were somehow interchangeable in an existing system. They are not.
When you use SOAP (a technology), you usually have a system that is defined in 'methods', since in effect you are dealing with RPC.
When you use REST (an architectural style, not a technology) then you are creating a system that is defined in terms of 'resources' and not at all in methods. There is no 1:1 mapping between SOAP and REST. The system architecture is fundamentally different.
Or are you merely talking about "RPC via URI", which often gets confused with REST?
I'm definitely not an expert when it comes to SOAP vs REST, but the only performance difference I know of is that SOAP has a lot of overhead when sending/receiving packets since it's XML based, requires a SOAP header, etc. REST uses the URL + querystring to make a request, and thus doesn't send that many kB over the wire.
I'm sure there are other ppl here on SO who can give you better and more detailed answers, but at least I tried ;)
One thing the other answers seem to overlook is REST support for caching and other benefits of HTTP. While SOAP uses HTTP, it does not take advantage HTTP's supporting infrastructure. The SOAP 1.1 binding only defines the use of the POST verb. This was fixed with version 1.2 with the introduction of GET bindings, however this may be an issue if using the older version or not using the appropriate bindings.
Security is another key performance concern. REST applications typically use TLS or other session layer security mechanisms. TLS is much faster than using application level security mechanisms such as WS Security (WS Security also suffers from security flaws).
However, I think that these are mostly minor issues when comparing SOAP and REST based services. You can find work arounds for either SOAP's or REST's performance issues. My personal opinion is that neither SOAP, nor REST (by REST I mean HTTP-based REST services) are appropriate for services requiring high throughput and low-latency. For those types of services, you probably want to go with something like Apache Thrift, 0MQ, or the myriad of other binary RPC protocols.
It all depends. REST doesn't really have a (good) answer for the situation where the request data may become large. I feel this point if sometimes overlooked when hyping REST.
Let's imagine a service that allows you to request informational data for thousands of different items.
The SOAP developer would define a method that would allow you retrieve the information for one or as many items as you like ... in a single call.
The REST developer would be concerned that his URI would become too long so he would define a GET method that would take a single item as parameter. You would then have to call this multiple times, once for each item, in order to get your data. Clean and easy to understand ... but.
In this case there would be a lot more round-trips required for the REST service to accomplish what can be done with a single call on the SOAP service.
Yes, I know there are workarounds for how to handle large request data in the REST scenario. For example you can pack stuff into the body of your request. But then you will have to define carefully (on both the server and the client side) how this is to be interpreted. In these situations you start to feel the pain that REST is not really a standard (like SOAP) but more of a way of doing things.
For situations where only relatively limited amount of data is exchanged REST is a very good choice. At the end of the day this is the majority of use cases.
Just to add a little to wuher's answer.
Http Header bytes when requesting this page using the Chrome web browser: 761
Bytes required for the sample soap message in wikipedia article: 299
My conclusion: It is not the size of bytes on the wire that allows REST to perform well.
It is highly unlikely that simply converting a SOAP service over to REST is going to gain any significant performance benefits. The advantage REST has is that if you follow the constraints then you can take advantage of the mechanisms that HTTP provides for producing scalable systems. Caching and partitioning are the tools in your toolbelt.
In general, a REST based Web service is preferred due to its simplicity, performance, scalability, and support for multiple data formats. SOAP is favored where service requires comprehensive support for security and transactional reliability.
The answer really depends on the functional and non-functional requirements. Asking the questions listed below will help you choose.
REf: http://java-success.blogspot.ca/2012/02/java-web-services-interview-questions.html
Does the service expose data or business logic? (REST is a better choice for exposing data, SOAP WS might be a better choice for logic).
Do the consumers and the service providers require a formal contract? (SOAP has a formal contract via WSDL)
Do we need to support multiple data formats?
Do we need to make AJAX calls? (REST can use the XMLHttpRequest)
Is the call synchronous or asynchronous?
Is the call stateful or stateless? (REST is suited for statless CRUD operations)
What level of security is required? (SOAP WS has better support for security)
What level of transaction support is required? (SOAP WS has better support for transaction management)
Do we have limited band width? (SOAP is more verbose)
What’s best for the developers who will build clients for the service? (REST is easier to implement, test, and maintain)
You don't have to make the choice, modern frameworks allow you to expose data in those formats with a minimum change. Follow your business requirements and load-test the specific implementation to understand the throughput, there is no correct answer to this question without correct load test of a specific system.

Developing/Testing Twitter apps without slamming the API

I'm currently working on an app that works with Twitter, but while developing/testing (especially those parts that don't rely heavily on real Twitter data), I'd like to avoid constantly hitting the API or publishing junk tweets.
Is there a general strategy people use for taking it easy on the API (caching aside)? I was thinking of rolling my own library that would essentially intercept outgoing requests and return mock responses, but I wanted to make sure I wasn't missing anything obvious first.
I would probably start by mocking the specific parts of the API you need for your application. In fact, this may actually force you to come up with a cleaner design for your app, because it more or less requires you to think about your application in terms of "what" it should do rather than "how" it should do it.
For example, if you are using the Twitter Search API, your application most likely should not care whether or not you are using the JSON or the Atom format option. The ability to search Twitter using a given query and get results back represents the functionality you want, so you should mock the API at that level of abstraction. The output format is just an implementation detail.
By mocking the API in terms of functionality instead of in terms of low-level implementation details, you can ensure that the application does what you expect it to do, before you actually connect to Twitter for real. At that point, you've already verified that the app works as intended, so the only thing left is to write the code to make the REST requests and parse the responses, which should be fairly straightforward, so you probably won't end up hitting Twitter with a lot of junk data at that point.
Caching is probably the best solution. Besides that, I believe the API is limited to 100 requests per hour. So maybe make a function that keeps counting each request and as it gets close to 100, it says, OK, every 10 API requests I will pull data. It wouldn't be hard set, probably a gradient function that curbs off when you are nearing the limit.
I've used Tweet#, it caches and should do everything you need since it has 100% of twitter's api covered and then some...
http://dimebrain.com/2009/01/introducing-tweet-the-complete-fluent-c-library-for-twitter.html
Cache stuff in a database... If the cache is too old then request the latest data via the API.
Also think about getting your application account white-listed, it will allow you to have a 20,000 api request limit per hour vs the measly 100 (which is made for a user not an application).
http://twitter.com/help/request_whitelisting