Network AAA - concurrent login accounting - concurrency

I am looking for a network AAA (authentication, authorization, accounting) protocol that that manage concurrent network resource accessing from one account. An account, say, is logged in by two users concurrently, how can I distribute the session timeout of the account between the two users?

I am assuming you are not looking for the specific AAA functionality as used by telecommunications companies, but rather, RADIUS on steroids. Perhaps the easiest way to do this is to put something like FreeRADIUS on steroids.
I'll assume your particular NAS device (Wifi hub, packet gateway, etc) supports the following RADIUS records.
Access Request
Access Accept/Reject
Accounting Start
Accounting Stop
Interim Accounting
Session Disconnect
When you get a session start, let FreeRADIUS run some sort of script or log that start into the database. This is your clock start for each user. Even if the user logs in three times, you'll get start messages. When they log out for each session, you'll get a session stop. At a minimum, simply run the database and compute the deltas and apply the accounting rules to that user. If that user used 10, 20 and 30 minutes in concurrent sessions, you'll get stop records showing 10, 20 and 30 minutes.
This works, but it doesn't go quite far enough. First, if the sessions are long, you won't know about the time of those sessions until they terminate. That could be days from now. This is where the accounting records, particularly the interim accounting records come in. If your NAS supports it, you can tell it to generate an interim accounting record for a session, say, every 30 minutes. Thus, if a session lasts 30 minutes or less, you'll get the start and stop records. If a session lasts 45 minutes however, you'll get:
A start record at time 0
An interim accounting update at time 30
A stop record at time 45
It's not really the AAA you care about -- any RADIUS server likely will do the job -- FreeRADIUS, OpenRADIUS, Microsoft RADIUS server. It's your NAS device. If it can't send the records, you can't process them.

Related

Optimization and loadbalancing of microservice based backend

I have a client which has a pretty popular ticket selling service, to the point that the microservice based backend is struggling to keep up, I need to come up with a solution to optimize and loadbalance the system. The infrastructure works through a series of interconnected microservices.
When a user enter the sales channels (mobile or web app), the request is directed to an AWS API Gateway which is in charge of orchestrating the communication towards the microservice in charge of obtaining the requested resources.
These resources are provided from a third party API
This third party has physical servers in each venue in charge of synchronizing the information between the POS systems and the digital sales channels.
We have a REDIS instance in charge of caching these requests that we make to the third party API, we cache each endpoint with a TTL relative to the frequency of updating the information.
Here is some background info:
We get traffic mostly from 2 major countries
On a normal day, about 100 thousands users will use the service, with an 70%/30% traffic relation in between the two countries
On important days, each country has different opening hours (Country A starts sales at 10 am UTC, but country B starts at 5 pm UTC), on these days the traffic increases some n times
We have a main MiddleWare through which all requests made by clients are processed.
We have a REDIS cache database that stores GETs with different TTLs for each endpoint.
We have a MiddleWare that decides to make the request to the cache or to the third party's API, as the case may be.
And these are the complaints I have gotten that need to be deal with:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
Every time the above happens, the computation layer must be manually increased from the infrastructure.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
I was thinking of a couple of thinks that I could suggest:
Adding in instrumentation of the requests by using AWS X-Ray
Adding in a separate table for errors in the redis cache layer (old data has to be better than no data for the end user)
Adding in AWS elastic load balancing for the main middleware
But I'm not sure how realistic would be to implement these 3 things, I'm also not sure if they would even solve the problem, I personally don't really have experience with optimizing this type of backed. I would appreciate any suggestions, recommendations, links, documentation, etc. I'm really desperate for a solution to this problem
few thoughts:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
A common approach in aws is to regionalize stack - assuming you are using cdk/cloud formation creating regionalized stack should be a straightforward task.
But it is a question if this will solve the problem. Your system suffers from availability issues, regionalization will isolate this problem down to regions. So we should be able to do better (see below)
Every time the above happens, the computation layer must be manually increased from the infrastructure.
AWS has an option to automatically scale up and down based on traffic patterns. This is a neat feature, given you set limits to make sure you are not overcharged.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
It seems that the large variance is because you have to contact the servers at venues. I recommend to decouple that activity. Basically calls to venues should be done async; there are several ways you could do that - queues and customer push/pull are the approaches (please, comment if more details are needed. but this is quite standard problem - lots of data in the internet)
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
That's a code fix, when you do send data to cloudwatch (do you?). You could put country as a context to all request, via filter or something. And when error is logged that context is logged as well. You probably need venue id even more than country, as you can conclude country from venue id.
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
Don't store errors + add a circuit breaker pattern.

AWS WAF How to rate limit path by IP below the minimum of 2000 requests/minute

I have a path (mysite.com/myapiendpoint for sake of example) that is both resource intensive to service, and very prone to bot abuse. I need to rate limit access to that specific path to something like 10 requests per minute per client IP address. How can this be done?
I'm hosting off an EC2 instance with CloudFront and AWS WAF in front. I have the standard "Rate Based Rule" enabled, but its 2,000 requests per minute per IP address minimum is absolutely unusable for my application.
I was considering using API Gateway for this, and have used it in the past, but its rate limiting as I understand it is not based on IP address, so bots would simply use up the limit and legitimate users would constantly be denied usage of the endpoint.
My site does not use sessions of any sort, so I don't think I could do any sort of rate limiting in the server itself. Also please bear in mind my site is a one-man-operation and I'm somewhat new to AWS :)
How can I limit the usage per IP to something like 10 requests per minute, preferably in WAF?
[Edit]
After more research I'm wondering if I could enable header forwarding to the origin (running node/express) and use a rate-limiter package. Is this a viable solution?
I don't know if this is still useful to you - but I just got a tip from AWS support. If you add the rate limit rule multiple times, it effectively reduces the number of requests each time. Basically what happens is each time you add the rule, it counts an extra request for each IP. So say an IP makes a single request. If you have 2 rate limit rules applied, the request is counted twice. So basically, instead of 2000 requests, the IP only has to make 1000 before it gets blocked. If you add 3 rules, it will count each request 3 times - so the IP will be blocked at 667 requests.
The other thing they clarified is that the "window" is 5 minutes, but if the total is breached anywhere in that window, it will be blocked. I thought the WAF would only evaluate the requests after a 5 minute period. So for example. Say you have a single rule for 2000 requests in 5 minutes. Say an IP makes 2000 requests in the 1st minute, then only 10 requests after that for the next 4 minutes. I initially understood that the IP would only be blocked after minute 5 (because WAF evaluates a 5 minute window). But apparently, if the IP exceeds the limit anywhere in that window, it will be locked immediately. So if that IP makes 2000 requests in minute 1, it will actually be blocked from minute 2, 3, 4 and 5. But then will be allowed again from minute 6 onward.
This clarified a lot for me. Having said that, I haven't tested this yet. I assume the AWS support techie knows what he's talking about - but definitely worth testing first.
AWS have now finally released an update which allows the rate limit to go as low as 100 requests every 5 minutes.
Announcement post: https://aws.amazon.com/about-aws/whats-new/2019/08/lower-threshold-for-aws-waf-rate-based-rules/
Using rule twice will not work, because WAF rate based rule will count on cloud watch logs basis, both rule will count 2000 requests separately, so it would not work for you.
You can use AWS-WAF automation cloud front template, and choose lambda/Athena parser, this way, request count will perform on s3 logs basis, also you will be able to block SQL,XSS and bad bot requests.

Google API quota usage conditions

I'm looking for ways to overcome API quotas without resorting to requesting a raise, and would like to know what is classified as .. per user.
For example:
Google People API has this quota: Read Requests per 100 seconds per user
I setup an OAuth Client ID: 123-5ampl3.apps.googleusercontent.com
And for whatever reason, my queries are going to exceed "100 seconds per user".
My question/concerns:
Can I create another client ID 123-an0th3r.apps.googleusercontent.com and have both call the same API so that I now essentially have requests per 200 seconds per user?
Or is per user not tied to the client IDs, but instead, to the project id.
Could I create another project and re-route extra API calls to there?
Or must I throttle my querying so it stays within the limit.
Thanks!

Client Error Youtube API python

I have a python program which query youtube to get the video details. I use the version-3 api. I have multiple processes m and a python pool of 10 processes in each python process.
songs_pool = Pool()
songs_pool =Pool(processes=10)
return_pool = songs_pool.map(getVideo,songs_list)
I get some client errors when the value of m is increased to more than 2 and the pool is increased to >5. I get forbidden errors. When I check the number of requests in the google analytics,it shows that the number of requests are 250 per sec. But according to the documentation the limit is 3000 requests per sec. I dont understand why am I getting the client errors. Can you tell me if there is a way to not get this errors and run the program quicker.
if m = 2 and process = 10 , i get no errors but it takes so much time to complete.
But if I increase them , then I get client errors which are ~ 5% of the total requests.
The per-user-limit is 3000 requests per second from a single IP address, and as soon as you go above that in a given second you'll start getting the forbidden errors. The analytics you see in the developers console will only report your average number of requests over a 5 minute period; therefore, if you had zero requests for 4 minutes, then started running your routine, the console may show only 250 requests per second (as an average) but your app likely is overrunning the limit in a given period of time or two.
It seems that you're handling it in the best way possible if speed is your concern; you'll want to run it fast enough to get a very small number of errors (so you know you're staying up there at your limit). Another option, though, might be to look into using etags; if you find yourself requesting info on the same videos a lot, you can let etags tell you whether or not any info has changed (and if the API responds that nothing has changed, it doesn't count against either your quota or your reqests/sec.)

Redis is taking too long to respond

Experiencing very high response latency with Redis, to the point of not being able to output information when using the info command through redis-cli.
This server handles requests from around 200 concurrent processes but it does not store too much information (at least to our knowledge). When the server is responsive, the info command reports used memory around 20 - 30 MB.
When running top on the server, during periods of high response latency, CPU usage hovers around 95 - 100%.
What are some possible causes for this kind of behavior?
It is difficult to propose an explanation only based on the provided data, but here is my guess. I suppose that you have already checked the obvious latency sources (the ones linked to persistence), that no Redis command is hogging the CPU in the slow log, and that the size of the job data pickled by Python-rq is not huge.
According to the documentation, Python-rq inserts the jobs into Redis as hash objects, and let Redis expires the related keys (500 seconds seems to be the default value) to get rid of the jobs. If you have some serious throughput, at a point, you will have many items in Redis waiting to be expired. Their number will be high compared to the pending jobs.
You can check this point by looking at the number of items to be expired in the result of the INFO command.
Redis expiration is based on a lazy mechanism (applied when a key is accessed), and a active mechanism based on key sampling, which is run in the event loop (in pseudo background mode, every 100 ms). The point is when the active expiration mechanism is running, no Redis command can be processed.
To avoid impacting the performance of the client applications too much, only a limited number of keys are processed each time the active mechanism is triggered (by default, 10 keys). However, if more than 25% keys are found to be expired, it tries to expire more keys and loops. This is the way this probabilistic algorithm automatically adapt its activity to the number of keys Redis has to expire.
When many keys are to be expired, this adaptive algorithm can impact the performance of Redis significantly though. You can find more information here.
My suggestion would be to try to prevent Python-rq to delegate item cleaning to Redis by setting expiration. This is a poor design for a queuing system anyway.
I think reduce ttl should not be the right way to avoid CPU usage when Redis expire keys.
Didier says, with a good point, that the current architecture of Python-rq that it delegates the cleaning jobs to Redis by using the key-expire feature. And surely, like Didier said it is not the best way. ( this is used only when result_ttl is greater than 0 )
Then the problem should rise when you have a set of keys/jobs with a expiration dates near one of the other, and it could be done when you have a bursts of job creation.
But Python-rq sets expire key when the job has been finished in one worker,
Then it doesn't have too sense, because the keys should spread around the time with enough time between them to avoid this situation