How Browsers Work of modern web browsers - cookies

What are the components of a browser,Which are the settings that control a browser, how cookies works ,how browser sessions work?

Huge topic here. Rather than break this down and write forever, I will go through a web scenario: User types in address (or clicks a link?) - NOTE: a bit oversimplified
Browser breaks down URI
Browser checks cache to see if site IP address is in cache
If no, browser contacts DNS server to get IP address
Browser creates request for resource at URI, which is a package that has both a header (for routing) and a body (the request). For a page address typed in or click, it will be a GET request. The browser also sends a collection of "capabilities" like I accept cookies, etc.
Server contacted and returns a response.
Browser breaks down the response. It may be a success or failure and there will be a return code either way.
Assuming success, the browser then parses the message and breaks it into the HTML for the page and any collections sent (for example cookies)
For cookies, the browser checks user preferences before storing. It should be noted there is more than one type of cookie today. There are user cookies, which contain user information and can be easily blocked by the user and server cookies, which contain information needed by the application server. The later can also be blocked, if desired, but it is generally not advised as you lose functionality.
The HTML is parsed so the page can be displayed (rendering engine) and all resources required to see the page (like pictures) are requested through a new web request and rendered on the page.
Components? You can derive some here. Request creator, response parser, page renderer, configuration (both standard and user), etc.
Settings? Too numerous to cover. Open a browser and look at settings to see quite a few.
Cookies? Covered the basics already.
Sessions? Handled by server cookies. If you restrict them you can only get one page at a time, unless some information is passed in the URI on each request.

Related

How can I get all 1st and 3rd party cookies from CDP for a given page?

What I need:
I need to get all the currently applicable cookies for a page state. For example, I navigate to a site in the US, the browser has a lot of cookies populated by that page. I then decline all cookies on that site. Afterwards the browser reports much fewer cookies for that page. I need to be able to get that list of applicable cookies for the page as displayed in Chrome's View site information pop-out dialog left of the URL.
What I've tried:
I'm currently using Puppeteer, though I've also tried using Playwright. Both can utilize the Chrome Devtools Protocol (CDP) to gather information about a browser session and other related HTTP data.
I have a use case where I need to see the current state of cookies as it pertains to a web page. CDP provides two methods to accomplish this. Network.getAllCookies and Network.getCookies.
Network.getCookies returns cookies for a given URL, or else it provides all the cookies for the URL of the current page when no URL is provided. The problem with this is it's not a full picture. Network.getCookies does not include 3rd party cookies unless that domain is specifically requested. But there's no way for me to know all the 3rd party cookies for a given page.
If I switch it up and use Network.getAllCookies, that has too much information. It returns with all the cookies currently known to the browser session, regardless of applicability to the current page. For example, if I visit site A, then nav to site B, then I call Network.getAllCookies, I get all of the cookies from both pages since they are all in the same browser context. But I only want the 1st and 3rd party cookies for the current page I'm on.
In a headful browser it is possible to see all the applicable 1st and 3rd party cookies are for a site. If you click on the lock icon (secure site) to the left of the url, there is an option to view cookies. This is the most complete and accurate view of cookies I've come to know. Somehow the browser knows what 3rd party cookies originated from the current page and requests made by the current page and therefore displays them there. But how? And how can I get that list through CDP?
Note that document.cookie is useless as it does not include httpOnly cookies, and to my knowledge also doesn't return detailed cookie information (such as sameSite, secure, expiry, etc.)
I've tried using the CDP session directly attached to both of the above mentioned libraries. I've tried monitoring HTTP requests using Puppeteer and gathering set-cookie data from response headers, but CDP is unreliable when it comes to providing all the response header set-cookie information.
TLDR:
I'm expecting to programmatically retrieve only the relevant cookies for a page visit up to that point. If cookies become irrelevant in that page session, I don't want to receive them any more.
Network.getAllCookies returns too many cookies as it gives the whole browser context and not just the page relevant cookies.
Network.getCookies only returns cookies by a url.
document.cookie only returns application cookies.
How can I get all the 1st and 3rd party cookies that pertain to my current page?
I want this exact list:

Privacy Policy(ies). Does the cookie "collect" browser data or "request" browser data?

I'm working on legal portion of my site, Privacy Policy in particular. I've done the research and found that nearly all the answers to my question (below), is generalized.
Question: Do cookies "collect" data from user browsers, or do cookies "request" then receive data from user browsers?
This seems to be a very important distinction. Do I put into my privacy policy that my site "collects" data from my users or do I "request" data from my users.
My understanding of the core functionality is that cookies request data of user browser or browser activity. Users control how their browser will respond (or handle cookies) in their browser settings. If users have the ultimate control of handling "responses" to cookies is it proper for website privacy policies to state that they use cookies to collect browser data? Isn't it more accurate to state something like: "We use cookies to request data from your browser. Depending on you have your settings, your response to our request my impact your experience." Or something along those lines.
For years the way I understood the phrase "cookies collect browser data" is that we (websites) force code (the cookie), onto your browser that opens a siv for all your activity to flow back to us. But this isn't the case at all. Cookies actually make a "request" (i.e., asks) for the user's permission first, and depending on how the user has set up their browser settings, the cookie request is honored or denied.
I'm trying to stay away from the term "collect" as a general matter. I think it's improperly used and leaves the wrong impression on users.
Has anyone else thought about this? Am I missing something?
Cookies are being stored to the system/computer - or you can say browser. Cookies are used for authentication, preferences, advertisements, performance and analytics, security purposes. Yes, we need to mention that in privacy policy or some organization also add separate cookie policy.
Following should be mentioned for cookies in policies for standard web applications:
The application may use and store cookies to your system/compute which can help better to know your preferences when you visit the website later. Cookies can also be used for authentication/session checks, advertising, performance, analytics and research and security purposes. //Remove whichever is not applied for your site.

When django session is created

I don't really understand when session is created and per what entity it is created (per ip, per browser, per logged in user). I see in documentation that sessions by default is created per visitor - but what is visitor (browser or ip)?
What are HTTP sessions?
To display a webpage your browser sends an HTTP request to the server, the server sends back an HTTP response. Each time you click a link on website a new HTTP transacation takes place, i.e. it is not a connection that is persistant over time (like a phone call). Your communication with a website consists of many monolitic HTTP transactions (tens or hundres of phonecalls, each phonecall being a few words).
So how can the server remember information about a user, for instance that a user is logged in (ip addresses are not reliable)? The first time you visit a website, the server creates a random string, and in the HTTP response it asks the browser to create a so called HTTP cookie with that value. A cookie is really just a name (of the cookie) and a value. If you go to a simple session-enabled Django site, the server will ask your browser to set a cookie named 'sessionid' with such a random generated value.
The subsequent times your browser will make HTTP requests to that domain, it will include the cookie in the HTTP request.
The server saves these session ids (for django the default is to save in the database) and it saves them together with so called session variables. So based on the session id sent along with an HTTP request it can dig out previously set session variables as well as modify or add session variables. If you delete your cookies (ctrl+shift+delete in Firefox), you will realize that no website remembers you anymore (Gmail, Facebook, Django sites, etc.) and you have to log in again. Most browsers will allow you to disable cookies in general or for specific sites (for privacy reasons) but this means that you can not log into those websites.
Per browser, per window, per tab, per ip?
It is not possible to log into different GMail accounts within the same browser, not even from different windows. But it is possible to log in to one account with Firefox and another with Chrome. So the answer is: per browser. However, it is not always that simple. You can use different profiles in Firefox, and each can keep different cookies and thus you can log into different accounts simultaneously. There are also Firefox plugins for keeping multiple sessions, e.g. MultiFox.
The session all depends on which session cookie your browser sends in it's HTTP request.
Play around
To get the full understanding of what is going on, I recommend installing the FireBug and FireCookie plugins for Firefox. The above screenshots are taken from FireBug's net panel. FireCookie will give you an overview of when and which cookies are set when you visit a site, and will let you regulate which cookies are allowed.
If there is a server side error, and you have DEBUG=True, then the Django error message will show you information about the HTTP request, including the cookies sent
It's browser (not IP). A session is basically data stored on your server that is identified by a session id sent as a cookie to the browser. The browser will send the cookie back containing the session id on all subsequent requests either until the browser is closed or the cookie expires (depending on the expires value that is sent with the cookie header, which you can control from Django with set_expiry).
The server can also expire sessions by basically ignoring the (unexpired) cookie that the browser sends and requiring a new session to be started.
There is a great description on how sessions work here.

How do cookies work when browsing websites

On websites where you have to enter a user name and password, I notice that I can browse the site with one browser and it will know who I am no matter where I go on the site. But if I open a different browser it doesn't know who I am in that browser unless I log on in that browser.
After I log in to a website, does it store some kind of cookie in my browser, and every time I navigate to a different page on that site, it checks the cookie for my identity?
What would happen if I logged in, and then before browsing to a different page on the site, deleted the cookie?
This is more of a "teach a man to fish" answer, so I apologise if it's not what you were after. But if you take my advice you will learn lots, so please trust me :)
There's a number of tools that you can use to track exactly what http traffic is going between your browser and the server. One is called Firebug, a plugin for Firefox. The other kind of tool is called a "web debugging proxy". There's charles, which is very powerful, and fiddler, which is free.
What you want to do with any of these tools is use a website, and then look at the raw request. This shows you exactly what your browser is saying to the server. You'll see the cookies for that server are sent along with every request. What's cool about these tools is that you can edit a request just before it's sent, so you can test how the servers respond...
After I log in to a website, does it store some kind of cookie in my browser, and every time I navigate to a different page on that site, it checks the cookie for my identity?
Yes. The cookie is sent with each HTTP request.
What would happen if I logged in, and then before browsing to a different page on the site, deleted the cookie?
The same as if you were to switch browsers.
Every time when you navigate a new page, your browser sends a request to the server and the server sends back you the response. Your request contains the cookies, which the server can parse and use. You if you delete the cookie, your browser can't send it with the next request.
What would happen if I logged in, and then before browsing to a different page on the site, deleted the cookie?
You would no longer be logged in.
After I log in to a website, does it store some kind of cookie in my browser, and every time I navigate to a different page on that site, it checks the cookie for my identity?
Yes. Most likely, you are dealing with a "session-cookie". These cookies do not store any information themselves, but use a long string to identify yourself to a server. I would suggest doing some research on cookies. As for the (I'm guessing assumed) question of "Why cookies work on different pages?" is because cookies are tied to the domain, and not the exact URI.
Cookies contain names, values, and expirations (along with a few others). The most common you'll see are sessions, which use an identifier to load a session-state from the server containing your information. These are the safest cookies as everything is centralized and not as prone to hijacking. The other kind is a regular cookie, which has a limited size and stores information client-side. Anything that has to do with shopping or anything that tracks users most likely uses sessions, while something like a customizable javascript-y page probably uses a normal cookie. The former tracks information server-side for additional security, while the latter poses no security risk, and leaves the information for the client to manage.

How cookies work?

I wanted to know the interactions of a browser (i.e. Firefox ) and a website.
When I submit my user name and password to the login form, what happens?
I think that website sends me some cookies and authorizes me by checking those cookies.
Is there a standard structure for cookies?
Update:
Also, how I can see the cookies of specific URL sent to my browser if I want to use that cookie?
Understanding Cookies
Cookies are given to a browser by the server. The browser reveals the cookies as applicable only to the domain that provided the cookie in the first place.
The data in the cookie allows the server to continue a conversation, so to speak. Without the cookie, the server considers the browser a first-time visitor.
Have a look at these to know about browser cookies
Understanding Browser cookies
http://internet-security.suite101.com/article.cfm/understanding_computer_browser_cookies
http://www.willmaster.com/library/cookies/understanding-cookies.php
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-22_11-6063884.html
Explanation via Pictures
Simple Explanation by Analogy (via a story)
Freddie works at the Government Taxation Office (IRS/HMRC/ATO/CBDT etc). He deals with millions of people who come to see him everyday. And he has a very poor memory.
In a World Without Cookies:
One day a customer walks in to Freddie's customer care desk:
Customer 1: "Good morning Freddie, so did you change my address like I asked you to?"
Freddie: "I'm sorry. I don't remember who you are? Who are you?"
Customer 1: "Dude, I spoke to you last Monday regarding this issue! How could you forget!"
Unfortunately, the HTTP protocol is stateless. There is no way Freddie (the server) can identify different customers (clients) apart from each other. He doesn't remember. He has a very short memory. There is a solution though:
The World WITH Coookies:
The customer walks in to see Freddie (his name is Brian), but this time, the customer gives Freddie his taxation office ID card:
Brian May: "Good morning Freddie, My name is Brian May...so did you change my address like I asked you to?"
Freddie: "ah yes...hmmm......Brian May, Queen, Lead Guitarist, We Will Rock you......very interesting, I have your records here on my back end system.........let me bring up the records pertaining to your address........YES: I did in fact change your address. BTW since you gave me your ID that's all I need, you don't need to tell me your name is Brian May. Just give me your ID and I will be able to see that on my system".
Explanation of Analogy
You can think of a cookie as kinda like an ID card: if you identify yourself to the server, the server will remember who you are and will treat you accordingly:
e.g. it will remember what you've already ordered in your cart so far.
it will remember that you like reading your website in Tamil / Cantonese / Swahili etc.
it can (basically) identify who you are.
In this particular case, it is the Government Taxation Office who issues out the ID cards.
Granted the analogy is a little strained and very simplified but hopefully, it will help you understand and remember the underlying concept.
Usually the cookie contains a session id number. The id number is then connected to session data that is stored on the server. The usual process is then:
Send login form
Server checks username and password
If correct, the username is stored in a session file on the server, along with various other useful information about the user (if it's a site admin, moderator, userid and so on).
The server sends back a cookie containing an id number that identifies the session file
The browser sends the cookie with each request to that server, so the server can open the session file and read the saved data.
Usually the password is not sent more than once (at login in step 1).
It depends, because there are many scenarios and abilities of usage of cookies.
One of scenarios is:
User submits login form.
Website authorizes the user and set cookie visible in website domain with user name, password (i.e. MD5 hashed) and sometimes other information.
Cookie is sent with each request, which allows website to check if request is came from the authorized user.
For more details read Wikipedia article about cookies.
After logging , the request to server is sent. At server side, it checks the visitor's identification against an ID that identifies whether it is a new user or the older one.
If it determines it a new visitor,it then creates a cookie for it and sends it back in its response to browser. Cookie that is generated in response to Server has a name and unique identification is sent back to a user end. AT the user end ,after every visit to the same URL, browser rechecks cookie list and if it has the cookie for the same url , it is sent to server which identifies cookie ID and server shows the related history for this user then .
Cookies are small data packets that the Web Pages load on to the browser for various purposes.
Every time you re-visit a URL, the browser sends back a tiny package of this information back to the server which detects that you've returned to the page.
Cookies are the reasons that keeps you logged into sites so that you don't have to enter ID and password every time you visit the website.
Webmasters can use these cookies for monitoring the activity of Internet users.
Some sites use third-party cookie to track your Web habits for marketing purposes.
I found some information at this site that was really helpful to me and figure it might be of use: Webfundamentals - Cookies. It goes through what a cookie is, how they work, and the headers that are used to send them.
It says in summary that, cookies are pieces of information that are sent in HTTP requests inside the 'Set-Cookie' header from the server to the client/browser, or in the 'cookie' header in the client/browser to the server.
HTTP is stateless, meaning that one request to another has no knowledge of the state of the page you are browsing. Cookies were made to help address this issue, allowing users be 'known' by the site for as long as the cookie is set to be stored. By default cookies are stored until the client is closed, unless specified otherwise.