How a website works/ What happens behind the scene

How a website works/ What happens behind the scene - web-services

I am trying to understand what things happen in the background when using a website OR basically what are the things that happen when a user interacts with a browser. I understand that this is a huge list and highly dependent on architecture and user actions etc, I am just trying to get a feel of major things and flush my misunderstandings and also use this to read more about stuff I don't understand.
As an exercise I am trying to note down things that happen in the background with respect to users action in a browser. Here is my attempt at this bit open ended but fun question:
User enters a url => browser checks if
available in browser cache => DNS look
up [root dns lookup => recursive dns
=> get ip ] => establish a tcp connection => send http req => get
the static page from web server=> if
authentication is required that
happens [either read cookies from
browser OR ask user to enter
credentials] => somehow gets the
dynamic elements as well [how ? ,
there is some lazy initilization here
?] => Then user performs some
action[clicks a link or something] =>
check browser cache => if not avail
[take the input parameters and embed
in the url in some manner [may be
encrypt some things if required] =>
hits a load balancer => directed to a
application server [depending on how
the LB selects a host] => application
server cache is checked [memcached or
some kind of caching, not sure if this
"normally" happens here or at some
other level] => application server
tries to understand the request [if
its a service listening on some port,
http port 80 it will get the URL and
parse to perform some operations] =>
database is queried if required to =>
there might again be connection
mgmt/caching/parallel queries etc here
=> database returns back the result to app server => app server creats a
result payload and headers [http] =>
sends it to browser for rendering =>
browser cache is updated => user
reacts to the response.
I have not considered retries/failures and how they are handled, but I would like to get some input there as well in a general sense
Note:
I am looking at things in general, I am sure that few companies might do it in different way etc etc. I will like to hear alternatives as well though!.
This is an effort to try and get more
perspective and read on few things
that will help me in general.
Clearly I have made an honest attempt
I also hope this would help others
looking at the question in general to
learn something new.
I am not asking
for opinions etc, so this aint a
completely open ended question [not
everything is right though there are
many options]
Thanks !

There is no difference between static or dynamic for browser. Browser makes HTTP request and gets HTTP response. If response is an HTML page, then browser renders HTML ,applies styles, and executes JavaScript code that come with page. This page can by dynamic or static - browser don't care! The side is care - is server side. If page is static, than HTTP server will just take page from disk and send it to client as HTTP response. If page is dynamic, than HTTP server will call some application and will ask this application to give requested resource. This application can be an PHP module for Apache(http server), or ASP.net for IIS, or even your C++ code that will generate any content you want.
How exactly page or resource (HTTP response can be also xml, or image etc) will be constructed depends on used application (server side technology).
As example, if you are using PHP - HTTP server will detect that requested resource has extension .php, server will pass this PHP file to PHP module for processing, and result will be sent to HTTP client(browser) as response.
When user perform some action, this is again just usual HTTP request. HTTP method GET and POST (look for article about HTTP on Wikipedia) is used to pass some input from server to client. Page can contain some heavy JS, that will make page look more like desktop application (rich controls, dynamically reacting on user action without request to server, or communicate with server in background), but this is not necessary for web application to be web application (for web site to by dynamic). It can be good old static HTML with HTML forms, and some server side code.
Web application is abstract entity that may consist from many HTTP resources (different URLs for server to response). Web application also is client-side code that communicates with server-side code thru HTTP with help of HTTP client(browser) and HTTP server. Web application is not some stand alone part, that only comes to work when user perform some action.
Web-service may fits this description - as thing that usually don't care about pages, and comes only when some action required. Its special type of web application, that expose some API thru HTTP(usually). You can request some resource, and pass some parameter, and you will get response with some result. It's same web application but without pages. But web-service usually part of big web application with pages, or even other part of same web-application (depending on how you look at this). It can be same server-side technology, and same HTTP server. And it's not necessary to create web-service if you want to make some web-application (dynamic web site).
Server-side part of web application can also communicate with some database, but it's not necessary too.
There can be real database, or just some text files on disk. And browser, client side code and HTTP server also don't care about database or source where server side code takes data.
Cache, load balancer, etc - it's just additional elements that usually are transparent for all this general stuff.
Cookies is passed with every HTTP request to HTTP server, and if requested resource is not static page, that HTTP server will pass them further to server-side code/application(part). And its usually how authentication and authorization works - cookies has contain info about session, and there is some data associated with session contains on server side - it can be ID of user, so server-side code will recognize user on every request.

Related

HTTP 407 Proxy Authentication Required while accessing Amazon S3

I have tried everything but I cant seem to fix this issue that is happening for only one client behind a corporate proxy/firewall. Our Silverlight application connects to Amazon S3 for downloading/Uploading some documents. On one client and one client only it returns a 407 error and after that the application fails to save anything.
Inner Exception:
System.ServiceModel.ProtocolException: [UnexpectedHttpResponseCode]
Arguments: 407,Proxy Authentication Required
We had something similar at a different client but there was more of a CORS issue. to resolve this I used cloud-front to fake a sub-domain that then accesses the S3 bucket and it solved the issue. I was hoping it would fix it with this client as well but it didnt.
I have tried adding this code to web.config as suggested by a lot of answers
<system.net>
<defaultProxy useDefaultCredentials="true" >
</defaultProxy>
</system.net>
I have read articles about passing a proxy headers with basis authentication using username and password but I am not sure how this would help us. The Proxy server is used by client and any authentication it requires is outside our domain.
**Additional Information**
The Silverlight code references 2 services. One is our wcf service that retrieves all the data for the application. One is The Amazon S3 service that uses the amazon Soap api, the endpoint for which is at http://s3.amazonaws.com/doc/2006-03-01/AmazonS3.wsdl?
If I go into our app and only use part of the system that dont make any calls to the Amazon S3 api the application works fine. As soon as I go to a part of the system that makes a call to the S3, the problem starts. funny enough the call to S3 goes fine and I can retrieve the doc fine but then any calls to our wcf service return 407.
Any ideas?
**Update 2**
Based on comments from Elliot Nelson I check the stack we were using for making http requests in our application. Turns out we are using client http for both http and https requests by default. Here is the code we have in the App.xaml constructor
public App()
{
Startup += Application_Startup;
UnhandledException += Application_UnhandledException;
InitializeComponent();
WebRequest.RegisterPrefix("http://", WebRequestCreator.ClientHttp);
WebRequest.RegisterPrefix("https://", WebRequestCreator.ClientHttp);
}
Now, to understand the differences between clienthttp and browserhttp and when to use them. Also, the potential impacts/issues of switching to browserhttp.
**Update 3**
Is there a way to request browsers to run your in-browser Silverlight application in trusted mode and would it help bypass this issue?

(Answer #2)
So, most likely (for corporate environments like this network), almost nothing can be done without whatever custom proxy settings are set in IE, usually pushed by corporate policy. To take advantage of these proxy settings, you want to use WebRequestCreator.BrowserHttp, which automatically uses the browser's default settings when making requests.
There's a table of the differences between these two clients available in the Microsoft docs. I'm guessing you were using something (maybe setting custom headers or reading the raw response body) that wasn't supported in BrowserHttp.
For security reasons, you can't "ask" the browser what its proxy settings are and use them, so this is a tricky situation. You can specify Browser vs Client handling by domain, or even for a specific request (the same page above describes how); you may be able in this case to get away with just using ClientHttp for your service calls and BrowserHttp for your S3 calls, and avoid the problem altogether!
For next steps, I'd try that approach; if it doesn't work, I'd try switching wholesale to BrowserHttp just to see if it bypasses the proxy issue (there's almost no chance the application will actually work, since you're probably using ClientHttp-only options).
Long term, you may want to consider making changes to your services so they are usable by a BrowserHttp-only application (this would require you to be pretty basic in your requests/responses, but using only BrowserHttp would be a guarantee you'd work in pretty much any corp network).

Running in trusted mode is probably a group policy thing which would require their AD admins to approve / whitelist your app.
I think the underlying issue you are facing is that the proxy requires NTLM authentication and for whatever reason the browser declines to provide your app with that context.
One way to prove that it's an NTLM auth issue is to test with curl - get it to make a req through the proxy, then it should be a bit easier to code to. EG the following curl will get you through 99% of Windows corporate proxies (assuming the proxy is at proxy-host.corp:3128):
C:\> curl.exe -v --proxy proxy-host:3128 --proxy-user : --proxy-ntlm https://www.google.com
NOTE The --proxy-user : tells curl to use the current user session to perform the NTLM challenge.
So if you can get the client to run that, you can at least identify that NTLM works, then it's a just a matter of getting the app to perform the NTLM challenge using the default credentials (which may or may not be provided by the browser session)

Since you described this as a silverlight application, I'm going to assume you can't use classic browser-proxy troubleshooting like "move browser to public network" or "try a different browser", to isolate the problem.
You should try to isolate the proxy server, and have the customer use the required proxy-auth.
The application is making request, but it might be intercepted by a transparent proxy, or the result might be coming from what you consider a web server.
In the early days, the 401 error was pretty strictly associated with web-auth, and 407 was for proxy-auth.
Architecturally, the separation is a convenience, a web server can have both web server, proxy, and reverse-proxy behaviors.
What happens is your customer's environment is making a web connection to the destination, but it receives a HTTP 407 status from some host, probably their network, or sometimes the provider. Almost certainly the request is received not forwarded. The HTTP client your application lives in needs to provide the credentials that host requires. Companies have environments that are complex enough where often your customer will say this is the first time they have heard of this (some proxy-auth is also dynamic or destination specific).
Also, in some corporate environments, the operator will allow temporary or permanent white-listing from the proxy-auth service. You should see if they can do this, even temporarily, to confirm there aren't going to be other problems.
In the end, it sounds like your application might not robustly support proxy-auth, or the proxy-auth type they use in their environment.

Understanding CORS

I've been looking on the web regarding CORS, and I wanted to confirm if whatever I made of it is, what it actually is.
Mentioned below is a totally fictional scenario.
I'll take an example of a normal website. Say my html page has a form that takes a text field name. On submitting it, it sends the form data to myPage.php. Now, what happens internally is that, the server sends the request to www.mydomain.com/mydirectory/myPage.php along with the text fields. Now, the server sees that the request was fired off from the same domain/port/protocol
(Question 1. How does server know about all these details. Where does it extract all these details froms?)
Nonetheless, since the request is originated from same domain, it server the php script and returns whatever is required off it.
Now, for the sake of argument, let's say I don't want to manually fill the data in text field, but instead I want to do it programmatically. What I do is, I create a html page with javascript and fire off a POST request along with the parameters (i.e. values of textField). Now since my request is not from any domain as such, the server disregards the service to my request. and I get cross domain error?
Similarly, I could have written a Java program also, that makes use of HTTPClient/Post request and do the same thing.
Question 2 : Is this what the problem is?
Now, what CORS provide us is, that the server will say that 'anyone can access myPage.php'.
From enable cors.org it says that
For simple CORS requests, the server only needs to add the following header to its response:
Access-Control-Allow-Origin: *
Now, what exactly is the client going to do with this header. As in, the client anyway wanted to make call to the resources on server right? It should be upto server to just configure itself with whether it wants to accept or not, and act accordingly.
Question 3 : What's the use of sending a header back to client (who has already made a request to the server)?
And finally, what I don't get is that, say I am building some RESTful services for my android app. Now, say I have one POST service www.mydomain.com/rest/services/myPost. I've got my Tomcat server hosting these services on my local machine.
In my android app, I just call this service, and get the result back (if any). Where exactly did I use CORS in this case. Does this fall under a different category of server calls? If yes, then how exactly.
Furthermore, I checked Enable Cors for Tomcat and it says that I can add a filter in my web.xml of my dynamic web project, and then it will start accepting it.
Question 4 : Is that what is enabling the calls from my android device to my webservices?
Thanks

First of all, the cross domain check is performed by the browser, not the server. When the JavaScript makes an XmlHttpRequest to a server other than its origin, if the browser supports CORS it will initialize a CORS process. Or else, the request will result in an error (unless user has deliberately reduced browser security)
When the server encounters Origin HTTP header, server will decide if it is in the list of allowed domains. If it is not in the list, the request will fail (i.e. server will send an error response).
For number 3 and 4, I think you should ask separate questions. Otherwise this question will become too broad. And I think it will quickly get close if you do not remove it.
For an explanation of CORS, please see this answer from programmers: https://softwareengineering.stackexchange.com/a/253043/139479
NOTE: CORS is more of a convention. It does not guarantee security. You can write a malicious browser that disregards the same domain policy. And it will execute JavaScript fetched from any site. You can also create HTTP headers with arbitrary Origin headers, and get information from any third party server that implements CORS. CORS only works if you trust your browser.

For question 3, you need to understand the relationship between the two sites and the client's browser. As Krumia alluded to in their answer, it's more of a convention between the three participants in the request.
I recently posted an article which goes into a bit more detail about how CORS handshakes are designed to work.

Well I am not a security expert but I hope, I can answer this question in one line.
If CORS is enabled then server will just ask browser if you are calling the request from [xyz.com]? If browser say yes it will show the result and if browser says no it is from [abc.com] it will throw error.
So CORS is dependent on browser. And that's why browsers send a preflight request before actual request.

In my case I just added
.authorizeRequests().antMatchers(HttpMethod.OPTIONS, "/**").permitAll()
to my WebSecurityConfiguration file issue is resolved

The procedure of Opening a website using IE8

I want to know when I'm using IE8 open a website (like www.yahoo.com), which API will be called by IE8? so I can hook these API to capture which website that IE8 opening currently.

When you enter a URL into the browser, the browser (usually) makes an HTTP request to the server identified by the URL. To make the request, the IP address of the server is required, which is obtained by a DNS lookup of the host (domain) name.
Once the response -- usually containing HTML markup -- is received, the browser renders it to display the webpage.
More details available here: what happens when you type in a URL in browser
So, in the general case, no "API" request as such is made. (Technically speaking, you can think of the original HTTP request to the server as an API request). The sort of "API" request you presumably mean, however, is not made in this general case just described. Those requests happens when the JavaScript executing on the page makes an Ajax HTTP request (XmlHttpRequest) to the web server to carry out some operation.
I am not sure about IE8, but the "developer tools" feature of most modern browsers (including IE9 and IE10), would let you see the Ajax HTTP requests that the webpage made as it carried out different operations.
Hope this helps.

IE uses Microsoft's WinSock library API to interact with web servers.
You may want to look for a network monitoring/sniffing API, which you could use to examine HTTP requests, and determine the URLs the browser is using.

Asp Mvc 3 - Restful web service for consuming on multiple platforms

I am wanting to expose a restful web service for posting and retrieving data, this may be consumed by mobile devices or a web site.
Now the actual creation of the service isn't a problem, what does seem to be a problem is communicating from a different domain.
I have made a simple example service deployed on the ASP.NET development server, which just exposes a simple POST action to send a request with JSON content. Then I have created a simple web page using jquery ajax to send some dummy data over, yet I believe I am getting stung with the same origin policy.
Is this a common thing, and how do you get around it? Some places have mentioned having a proxy on the domain that you always request a get to, but then you cannot use it in a restful manner...
So is this a common issue with a simple fix? As there seem to be plenty of restful services out there that allow 3rd parties to use their service...

How exactly are you "getting stung with the same origin policy"? From your description, I don't see how it could be relevant. If yourdomain.com/some-path/defined-request.json returns a certain JSON response, then it will return that response regardless of what is requesting the file, unless you have specifically defined required credentials that are not satisfied.
Here is an example of such a web service. It will return the same JSON object regardless of from where the request is made: http://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=true
Unless I am misunderstanding you (in which case you should clarify your actual problem), the same origin policy doesn't really seem to apply here.
Update Re: Comment
"I make a simple HTML page and load it as file://myhtmlfilelocation/myhtmlfile.html and try to make an ajax request"
The cause of your problem is that you are using the file:// URL scheme, instead of the http:// protocol scheme. You can find information about this scheme in Section 3.10 of RFC 1738. Here is an excerpt:
The file URL scheme is used to designate files accessible on a particular host computer. This scheme, unlike most other URL schemes, does not designate a resource that is universally accessible over the Internet.
You should be able to resolve your issue by using the http:// scheme instead of the file:// scheme when you make your asynchronous HTTP request.

Is request forwarding possible when using CGI?

I'm writing a small content server as a web service. There are 2 units - one authenticates the application requesting content and when authentication succeeds, the request is forwarded to the other unit that serves the content.
[1] If I want to do this using CGI
scripts, is there any equivalent of
jsp:forward in CGI?
[2] Suppose if
forwarding is not possible, the
client application shouldn't be able
to request the second unit directly.
What is the proper way to do this?

Another attempt, since you are not after HTTP redirect...
The short answer is: Yes, it is possible.
However, it is highly dependent on the tools you are using. What web server and CGI scripting language you are using?
CGI scripts can do practically anything they want to do, for example they could execute code from other CGI scripts. Thus, they can provide the behavior you are looking for.
CGI (Common Gateway Interface) just describes how a web server starts a CGI script and gives the script input data via environment variables. CGI also describes how the script returns data to web server. That's all.
So if your authorization script wants to delegate some operation to other some script, it is up to that authorization script to implement it somehow. The CGI protocol does not help here.

The concept you might be looking for is called HTTP redirect, where the server sends a response to browser's request, telling the browser to fetch a new page from another URL.
CGI can do HTTP redirects just fine just like jsp:forward. You need just to output the right HTTP headers.
You need to return a 302 response code in HTTP headers, and provide location URL where browser should go next. Have your CGI script output these kind of headers:
HTTP/1.1 302 Redirect
Location: http://www.example.org/
These headers tell browser to fetch a page from URL http://www.example.org/ .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js