What happened with polyfill.io 's CDN SSL certificate?

What happened with polyfill.io 's CDN SSL certificate? - web-services

Certificate errors happen from time to time but this looks very fishy too me. A certificate for all those names? What's going on? Got the CDN hacked?
If so, what is the best thing to do? Removing it until it is fixed? That would be bad for a good part of our user-base. A hacked CDN is worse though, I guess… Maybe someone knows what really is going on?

I am one of the maintainers of polyfill.io, and a Fastly employee. Yesterday we enacted a change to our DNS configuration to enable support for HTTP/2.0. In doing so, a small typo was made in the hostname, resulting in our DNS targeting the wrong endpoint on Fastly's network, and a cert that was not valid for polyfill.io or cdn.polyfill.io. Having realised the error, we corrected the entry and it took around 30 minutes to propagate.
Lessons learnt include not increasing DNS TTL until some time after a change is made, in case the change needs to be rolled back.
The reason there are so many names listed on the cert is that we are sharing a cert with other Fastly customers. This is perfectly normal practice for CDN providers.
More information is available on the relevant GitHub issue:
https://github.com/Financial-Times/polyfill-service/issues/1208
We're very disappointed to suffer this downtime. Generally, polyfill.io has a very good uptime record, and we plan for origin outages. It's hard to mitigate the risks associated with DNS changes to the main public domain, but we are very sorry to everyone impacted.
Polyfill.io uses pingdom to independently monitor our uptime and reports that number here: https://polyfill.io/v2/docs/usage (data has up to 24 hrs latency).

Looks like "they" (see below) botched it, I can't see cdn.polyfill.io or *.polyfill.io on that large list, hence the error saying much the same.
(or maybe I overlooked some other problem)
To enlighten you about the names, virtual hosting (the act of hosting multiple websites on the same IP address on the same HTTP port) occurs over HTTPS /after/ the encryption is established, thus, at the time the server presents a certificate to the browser, it doesn't know which site exactly the user is after, that information is part of the encrypted request.
Thus, it is necessary for the certificate to cover all secure websites operating on that IP address and port combination.
CDN for Content Delivery Network, presumably a huge bunch of stuff is being hosted on this "network", probably not even owned by polyfill (i've no idea who they are), given the first name on the certificate is "f2.shared.global.fastly.net" you can speculate the true CDN, who actually messed up the cert, and what else they're hosting on the CDN there :)

Related

Webpage resource request stalled for nearly a minute in Chrome

A resource on my webapp takes nearly a minute to load after a long stall. This happens consistently. As shown below, only 3 requests on this page actually hit the server itself, the rest hit the memory or disk cache. This problem only seems to occur on Chrome, both Safari and Firefox do not exhibit this behavior.
I have implemented the Cache-Control: no-store suggestion in this SO question but the problem persists. request stalled for a long time occasionally in chrome
Also included below is an example of what the response looks like once it finally does come in.
My app is hosted in AWS behind a Network Load Balancer which proxies to an EC2 instance running nginx and the app itself.
Any ideas what is causing this?

I encountered the exact same problem. We are using Elastic Beanstalk with Network Load Balancer (NLB) with TLS termination at NLB.
The feedback I got from AWS support is:
This problem can occur when a client connects to a TLS listener on a Network Load Balancer and does not send data immediately after completing the TLS handshake. The root cause is an edge case in the handling of new connections. Note that this only occurs if the Target Group for the TLS listener is configured to use the TCP protocol without Proxy Protocol v2 enabled
They are working on a fix for this issue now.
Somehow this problem can only be noticed when you are using Chrome browser.
In the meantime, you have these 2 options as workaround:
enable Proxy Protocol v2 on the Target Group OR
configure the Target Group to use TLS protocol for routing traffic to the targets

I know it's a late answer but I write it for someone seeking a solution.
TL;DR: In my case, enabling cross-zone load balancing attribute of NLB solved the problem.
With investigation using WireShark I figured out there were two different IPv4 addresses Chrome communicated with.
Sending packets to one of them always succeeded and to the other always failed.
Actually the two addresses delegated two Availability Zones.
By default, cross-zone load balancing is disabled if you choose NLB (on the contrary the same attribute of ALB is enabled by default).
Let's say there are two AZs; AZ-1 / AZ-2.
When you attach both AZs to a NLB, it has a node for each AZ.
The node belongs to AZ-1 just routes traffic to instances which also belong to AZ-1. AZ-2 instances are ignored.
My modest app (hosted on Fargate) has just one app server (ECS task) in AZ-2 so that the NLB node in AZ-1 cannot route traffic to anywhere.
I'm not familiar with TCP/IP or Browser implementation but in my understanding, your browser somehow selects the actual ip address after DNS lookup.
If the AZ-2 node is selected in the above case then everything goes fine, but if the AZ-1 is selected your browser starts stalling.
Maybe Chrome has a random strategy to select ip while Safari or Firefox has a sticky one, so that the problem only appears on Chrome.
After enabling cross-zone load balancing the ECS task on AZ-2 is visible from the AZ-1 NLB node, and it works fine with Chrome browser too.
(Please feel free to correct my poor English. Thank you!)

I see two things that could be responsible for delays:
1) Usage of CDNs
If the resources that load slow are loaded from CDNs (Content Delivery Networks) you should try to download them to the server and deliver directly.
Especially if you use http2 this can be a remarkable gain in speed, but also with http1. I've no experience with AWS, so I don't know how things are served there by default.
It's not shown clearly in your screenshot if the resources are loaded from CDN but as it's about scripts I think that's a reasonable assumption.
2) Chrome’s resource scheduler
General description: https://blog.chromium.org/2013/04/chrome-27-beta-speedier-web-and-new.html
It's possible or even probable that this scheduler has changed since the article was published but it's at least shown in your screenshot.
I think if you optimize the page with help of the https://www.webpagetest.org and the chrome web tools you can solve any problems with the scheduler but also other problems concerning speed and perhaps other issues too. Here is the link: https://developers.google.com/web/tools/
EDIT
3) Proxy-Issue
In general it's possible that chrome has either problems or reasons to delay because of the proxy-server. Details can't be known before locking at the log-files, perhaps you've to adjust that log-files are even produced and that the log-level is enough to tell you about any problems (Level Warning or even Info).

After monitoring the chrome net-export logs, it seems as though I was running into this issue: https://bugs.chromium.org/p/chromium/issues/detail?id=447463.
I still don't have a solution for how to fix the problem though.

Akamai request from Server unless ddos attack detected

Im looking for some help and no Akamai expert. Basically Im wondering if it is possible to implement Akamai in the following scenario:
1. Normal running request passes through akamai and only static assets such as css and js are cached, everything else is served direct from the server.
2. DDOS attack detected: Not sure if this is possible but the idea would be if akamai has a feature or api call to check if there has been a large scale ddos attack started, if so instead of passing the request to the app servers it reverts to a fully cached version of the site until the attack finishes.
Icing on the cake would be if we could set the threshold to what levels of attack to revert to, maybe based on a percentage over expected traffic levels?
Appreciate any sort of guidance.

Yes you can do this, either from Akamai config or by sending relevant headers from the origin (your site/API) telling Akamai not to cache certain files. Bear in mind that you'll still pay for traffic flowing through Akamai, even if they're not caching it, although if you have global users that want good performance you won't mind that cause the performance through their network is awesome.
They do indeed have a product that will protect you from ddos, how configurable it is I'm not sure but basically with Akamai everything is configurable if you've got the money to spend with them! Having said that, they wouldn't be serving anything to ddos requests of course, they'd be trying to determine where the attack is coming from and ignoring those requests so not sure what you're suggesting is what you'd actually want.

How much overhead would a DNS call add to the response time of my API?

I am working on a cross-platform application that runs on iOS, Android and the web. Currently there is an API server which interacts with all the clients. Each request to the API is made through the ip (eg. http://1.1.1.1/customers). This disallows me to move the backend quickly whenever I want to another cloud VPS as I need to update iOS and Android versions of the app with a painful migration process.
I though the solution would be introducing a subdomain. (eg http://api.example.com/customers). How much would an additional DNS call would affect the response times?

The thing to remember about DNS queries is that, as long as you have configured your DNS sensibly, clients will only ever make a single call the first time communication is needed.
A dns query will typically involve three queries, one to the root server, one to the .com (etc) server, and a final one to the example.com domain. Each of these will take milliseconds and will be performed once, probably every hour or so whenever the TTL expires.
The TL;DR is basically that it is a no brainer, you get far far more advantages from using a domain name than you will ever get from an IP Address. The time is minimal, the packet size is tiny.

Some Varnish Requests Getting Past my Block.vcl

Recently dealt with a botnet running a sub-domain brute force/crawling script. Would run through the alphabet & numbers sequentially, which resulted in a minor nuisance and small load increase for legitimate traffic.
For example, hitting a.domain, b.domain, .., 9.domain, aa.domain, .., a9.domain. etc.
Obviously, this is quite stupid and fortunately it only originated from a few IP's at a time and the website in question was behind multiple auto-scaling load balancers. Attacks were stopped grabbing the X-Forwarded-For address from Varnish, detection was scripted via the subdomain attempts and the IP added to a remote blocklist which would be regularly refreshed and added into a Block.vcl on all Varnish servers, voila.
This worked well, detecting and taking care of things within a couple minutes each time. However it was noted that in the space of time after blocking an brute IP and applying blockage, 99.9% of the traffic would stop but the occasional requests from the blocked IP would still manage to get through. Not enough to cause a fuss, but more raise the question why? As I don't understand why a request at the varnish level would still make it through when hitting the reject on IP rule of my Block.vcl?
Is there some inherent limitation that might have come into play here which would allow a small number of requests through? Maybe based on the available resources or sheer number of requests per second hitting Varnish overwhelming it ever so slightly?
Resource wise the web servers seemed fine so I'm unsure. Any ideas?

Is IP address authentication safe for web service / site?

We're building a web service which users will subscribe to, and we were thinking of authenticating users based on their IP address.
I understand that this creates some hassle, eg, if a client's IP changes, but I wanted to know from a security point of view if this was safe? I'm not sure how hard it is to spoof IP addresses, but my thinking is that even if that happened we wouldn't end up sending data back to the attacker.
Any thoughts?
Thanks!

I'd say this would be very risky. Hackers use a number of IP spoofing tools to avoid detection, and there are legitimate anonymity uses. Check out IP onions via the Tor network (used extensively by wikileaks folks, for example) http://www.torproject.org
That said, if your data isn't sensitive AT ALL, like you want to guess their location to show the local weather, you can certainly use IP blocks to roughly locate people. If that kind of thing is all you're after, check out: http://www.hostip.info/dl/index.html

Think about proxies and VPN's.
And what if an user would like to use your site from an other PC?
You might want to use browser fingerprints (together with IP) it's safer, but then they must always use the same browser...
Conclusion: not a good idea.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js