I am working on exposing a lambda to the public net with the lambda residing behind an already existing VPC (so I can later on limit the IP range of incoming requests using a security group).
To test that everything works, I set up a small lambda that simply prints hello world. I am running into a problem where the connection is extremely slow. The lambda executes in less than a milisecond, but each CURL to the endpoint run extremely slow.
Using curl for diagnostics I have found that:
curl -kso /dev/null my-alb-url -w "==============\n\n
| dnslookup: %{time_namelookup}\n
| connect: %{time_connect}\n
| appconnect: %{time_appconnect}\n
| pretransfer: %{time_pretransfer}\n
| starttransfer: %{time_starttransfer}\n
| total: %{time_total}\n
| size: %{size_download}\n
| HTTPCode=%{http_code}\n\n"
==============
| dnslookup: 0.061576
| connect: 75.256759
| appconnect: 0.000000
| pretransfer: 75.257615
| starttransfer: 75.794737
| total: 75.795154
| size: 28
| HTTPCode=200
The load balancer:
is connected to two availability zones that are both public facing
forward to a target group only containing my lambda
is linked with a security group that has enabled all inbound and outbound traffic
To make things more confusing, this is not an issue on every request but it is seemingly random.
What would be the best way to debug this issue?
I managed to resolve it but I am not entirely sure how.
I triple checked the subnet setup and just redid the setup again. 10 minutes later and now it works as intended. My suspicion is that one of the AZs linked to the wrong subnet.
Related
This is a bit of share and compare kind of question.
I have the following stack deployed on AWS:
ELB > ECS Fargate > node/express > RDS
I'm (negatively) surprised by some of the latencies observed for some simple requests that involve or not DB queries:
simple requests to /healthcheck would average at 150/200ms
simple SELECT queries done directly to my RDS instance thru pgadmin would average at 400ms (I only have a few entries on the requested table).
I tried to search for benchmark results but couldn't find anything useful. So I'd be grateful for anyone sharing his experience for a similar stack.
Thanks a lot!
Additional info on the deployment:
both ECS and RDS deployed within the same region (eu-west-1)
requests made from Spain (could that be it?)
ECS sits on 256 cpu units and 512 reserved memory
I'm the only one making requests on a dev environment (is there any "cold start" on ELB?)
RDS sits on an db.t2.micro instance and a postgresql v12.4 engine
Thanks #Maurice, I've added the info in the ticket but here a summary:
no utilization issue: single digit CPU utilization and memory at c. 25%; CPU never goes up 10% with several requests and memory always stable.
I instantiate the DB connection via Sequelize when creating the app and reuse it for each request. DB pooling used via Sequelize with 4 max connections
A typical cURL latency analysis on the ELB dns:
❯ curl -kso /dev/null http://be-api-main-elb-uat.wantedtv.com -w "==============\n\n
| dnslookup: %{time_namelookup}\n
| connect: %{time_connect}\n
| appconnect: %{time_appconnect}\n
| pretransfer: %{time_pretransfer}\n
| starttransfer: %{time_starttransfer}\n
| total: %{time_total}\n
| size: %{size_download}\n
| HTTPCode=%{http_code}\n\n"
==============
| dnslookup: 0,003741
| connect: 0,065718
| appconnect: 0,000000
| pretransfer: 0,065813
| starttransfer: 0,155532
| total: 0,155639
| size: 92
| HTTPCode=200
ECS sits on 256 cpu units and 512 reserved memory
It might be worth allocating more resources to see if that has some improvement, especially since there are some weird hidden limitations tied to the different CPU and Memory levels that might not be that apparent at first. Since 0.25 vCPU doesn't even give you a full thread to work with there could be other preempting going on that isn't visible to you.
Outside of that there are other things you can look for-
Is your application pooling requests to RDS, or creating new ones each time? I know you're intending to use pooling but it might be worth confirming it is actually working.
Are you exposing your container directly to the load balancer, or using a sidecar container such as NGINX to handle request buffering?
What happens if you hit the containers directly instead of through the load balancer? This can at least help isolate whether the issue is on the load balancer or on the container side.
How does your application handle concurrency?
How much data is being sent in each request? It's possible that large amounts of data may be locking up threads or processes and making other requests slow down as a result.
Are there any other services involved that aren't obvious? I once had a service crash because the logging service we were using broke, causing the messages to queue up and lock down the services.
The basic idea with a lot of this is to try and isolate the various components to identify the one causing the slow down. I do believe it'll end up being something in the task itself (service container, sidecar, or service) considering you mentioned quick responses from the database server itself.
using Amazon-RDS, with medium sized instance (db.t2.medium) has max connections limit of aroud 400, still get almost full db connections, even when only 2 users are using the app, using it with mobile apis only (android) not making calls from anywhere else.
What might be the issue, where are all these connections coming from ?
DDOS ? can ddos led to this, but we bought brand new server
You're probably not closing connections when you're done with them.
Log into the database as the root user and execute this query:
select HOST, COMMAND, count(*) from INFORMATION_SCHEMA.PROCESSLIST group by 1, 2;
It will give you output that looks like this:
+-----------+---------+----------+
| HOST | COMMAND | count(*) |
+-----------+---------+----------+
| localhost | Query | 1 |
| localhost | Sleep | 1 |
+-----------+---------+----------+
If you have two users with stable IP addresses, you'll probably see four lines of output: two for each user, with a high count for Sleep. This indicates that you're leaving connections open.
If you're running on mobile, however, the IP addresses may not be stable. You'll need to do a second level of analysis to see if they're all from the same ISP(s).
The only way that a DDOS would fill up your connection pool is if you've leaked the database password. If that's the case, you should consider your database corrupted and start over (with more attention to security).
I have a redshift cluster launched and running on aws and the inbound query is authorized by configuring the VPC security group
Then I try to connect to the redshift with pgAdmin and received following error
An error has occurred:
ERROR: permission denied to set parameter "client_min_messages" to "notice"
and
An error has occurred:
Column not found in pgSet: "datlastsysoid"
PgAdmin is mainly a Postgres client and is not a supported client for redshift. Due to its incompatibility, opening a connection always tries to set client_min_messages, but Redshift refuses to accept such a setting. This causes the error you experienced.
Redshift supports only the below parameters which have to be set at the cluster -
dev=# show all;
name | setting
---------------------------+----------------------
analyze_threshold_percent | 10
datestyle | ISO, MDY
extra_float_digits | 0
query_group | default
search_path | $user, public, admin
statement_timeout | 0
wlm_query_slot_count | 1
(7 rows)
You can use other clients like psql or SQLWorkbench/J as pgAdmin has deviations and doesn't support connections to redshift. You can also refer to this where an issue is reported on Github.
I'm trying to put a set of EC2 instances behind a couple of Varnish servers. Our Varnish configuration very seldom changes (once or twice a year) but we are always adding/removing/replacing web backends for all kinds of reasons (updates, problems, load spikes). This creates problems because we always have to update our Varnish configuration, which has led to mistakes and heartbreak.
What I would like to do is manage the set of backend servers simply by adding or removing them from an Elastic Load Balancer. I've tried specifying the ELB endpoint as a backend, but I get this error:
Message from VCC-compiler:
Backend host "XXXXXXXXXXX-123456789.us-east-1.elb.amazonaws.com": resolves to multiple IPv4 addresses.
Only one address is allowed.
Please specify which exact address you want to use, we found these:
123.123.123.1
63.123.23.2
31.13.67.3
('input' Line 2 Pos 17)
.host = "XXXXXXXXXXX-123456789.us-east-1.elb.amazonaws.com";
The only consistent public interface ELB provides is its DNS name. The set of IP addresses this DNS name resolves to changes over time and with load.
In this case I would rather NOT specify one exact address - I would like to round-robin between whatever comes back from the DNS. Is this possible? Or could someone suggest another solution that would accomplish the same thing?
Thanks,
Sam
You could use a NGINX web server to deal with the CNAME resolution problem:
User-> Varnish -> NGNIX -> ELB -> EC2 Instances
(Cache Section) (Application Section)
You have a configuration example in this post: http://blog.domenech.org/2013/09/using-varnish-proxy-cache-with-amazon-web-services-elastic-load-balancer-elb.html
Juan
I wouldn't recommend putting an ELB behind Varnish.
The problem lies on the fact that Varnish is resolving the name
assigned to the ELB, and it’s caching the IP addresses until the VCL
get’s reloaded. Because of the dynamic nature of the ELB, the IPs
linked to the cname can change at any time, resulting in Varnish
routing traffic to an IP which is not linked to the correct ELB
anymore.
This is an interesting article you might like to read.
Yes, you can.
in your default.vcl put:
include "/etc/varnish/backends.vcl";
and set backend to:
set req.backend = default_director;
so, run this script to create backends.vcl:
#!/bin/bash
FILE_CURRENT_IPS='/tmp/elb_current_ips'
FILE_OLD_IPS='/tmp/elb_old_ips'
TMP_BACKEND_CONFIG='/tmp/tmp_backends.vcl'
BACKEND_CONFIG='/etc/varnish/backends.vcl'
ELB='XXXXXXXXXXXXXX.us-east-1.elb.amazonaws.com'
IPS=($(dig +short $ELB | sort))
if [ ! -f $FILE_OLD_IPS ]; then
touch $FILE_OLD_IPS
fi
echo ${IPS[#]} > $FILE_CURRENT_IPS
DIFF=`diff $FILE_CURRENT_IPS $FILE_OLD_IPS | wc -l`
cat /dev/null > $TMP_BACKEND_CONFIG
if [ $DIFF -gt 0 ]; then
COUNT=0
for i in ${IPS[#]}; do
let COUNT++
IP=$i
cat <<EOF >> $TMP_BACKEND_CONFIG
backend app_$COUNT {
.host = "$IP";
.port = "80";
.connect_timeout = 10s;
.first_byte_timeout = 35s;
.between_bytes_timeout = 5s;
}
EOF
done
COUNT=0
echo 'director default_director round-robin {' >> $TMP_BACKEND_CONFIG
for i in ${IPS[#]}; do
let COUNT++
cat <<EOF >> $TMP_BACKEND_CONFIG
{ .backend = app_$COUNT; }
EOF
done
echo '}' >> $TMP_BACKEND_CONFIG
echo 'NEW BACKENDS'
mv -f $TMP_BACKEND_CONFIG $BACKEND_CONFIG
fi
mv $FILE_CURRENT_IPS $FILE_OLD_IPS
I wrote this script to have a way to auto update the vcl once a new
instance comes up or down.
it requires that the .vcl has an include to backend.vcl
This script is just a part of the solution, the tasks should be:
1. get new servername and IP (auto scale) can use AWS API cmds to do that, also via bash
2. update vcl (this script)
3. reload varnish
The script is here
http://felipeferreira.net/?p=1358
Other pepole did it in different ways
http://blog.cloudreach.co.uk/2013/01/varnish-and-autoscaling-love-story.html
You don get to 10K petitions if had to resolve an ip on each one. Varnish resolve ips on start and do not refresh it unless its restarted o reloaded. Indeed varnish refuses to start if found two ip for a dns name in a backend definition, like the ip returned for multi-az ELBs.
So we solved a simmilar issue placing varnish in front of nginx. Nginx can define an ELB as a backend so Varnish backend is a local nginx an nginx backend is the ELB.
But I don't feel comfy with this solution.
You Could make the ELB in your private VPC so that it would have a local ip. This way you don't have to use any DNS kind of Cnames or anything which Varnish does not support as easily.
Using internal ELB does not help the problem, because it usually have 2 Internal IP's!
Backend host "internal-XXX.us-east-1.elb.amazonaws.com": resolves to multiple IPv4 addresses.
Only one address is allowed.
Please specify which exact address you want to use, we found these:
10.30.10.134
10.30.10.46
('input' Line 13 Pos 12)
What I am not sure is if this IPs will remain always the same or they can change? anyone?
I my previous answer (more than three years ago) I hadn't solve this issue, my [nginx - varnish - nxinx ] -> ELB solution worked until ELB changes IPs
But from some time ago we are using the same setup but with nginx compiled with jdomain plugin
So the idea is to place a nginx in the same host that varnish an there configure the upstream like this:
resolver 10.0.0.2; ## IP for the aws resolver on the subnet
upstream backend {
jdomain internal-elb-dns-name port=80;
}
that upstream will automatically reconfigure the upstream ips the IP if the ELB changes its addresses
It might not be a solution using varnish but it works as expected
I installed the wso2GadgetServer-1.4.2 within our company network. Access to external datasources is available via proxy. Within the /repository/conf/wrapper.conf I added the following:
wrapper.java.additional.11=-Dhttp.proxyHost=<ip of our proxy>
wrapper.java.additional.12=-Dhttp.proxyPort=<port the proxy is listening to>
wrapper.java.additional.13=-Dhttp.nonProxyHosts=127.0.0.1|localhost
I (re-)started the GadgetServer and opened the dashboard again. The (external) content of the (predefined) gadget was not displayed. Instead a timeout message was shown. Within the /repository/logs/wrapper.logs I found corresponding entries like the following:
INFO | jvm 1 | 2012/04/12 08:24:21 | Apr 12, 2012 8:24:20 AM org.apache.shindig.gadgets.servlet.ProxyBase outputError
INFO | jvm 1 | 2012/04/12 08:24:21 | Warnung: Request failed
INFO | jvm 1 | 2012/04/12 08:24:21 | org.apache.shindig.gadgets.GadgetException: org.apache.http.conn.ConnectTimeoutException: Connect to soa-platform.blogspot.com/209.85.148.132:80 timed out
INFO | jvm 1 | 2012/04/12 08:24:21 | at org.apache.shindig.gadgets.http.BasicHttpFetcher.fetch(BasicHttpFetcher.java:351)
[...]
INFO | jvm 1 | 2012/04/12 08:24:21 | Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to soa-platform.blogspot.com/209.85.148.132:80 timed out
Does anybody have an idea, how to resolve the problem?
btw: 2 guys adressed the same problem at http://wso2.org/forum/thread/21081
In the article WSO2 ESB 4.0.3 - Configure forward proxy for client software I have found a solution. If the http. is removed from the configuration, then the external content of the gadget is displayed:
wrapper.java.additional.11=-DproxyHost=<ip of our proxy>
wrapper.java.additional.12=-DproxyPort=<port the proxy is listening to>
wrapper.java.additional.13=-DnonProxyHosts=127.0.0.1|localhost