How does Apache HttpClient's support of cookies work? - cookies

Does Apache HttpClient support cookies coded in javascript in a site's html or just those sent by the server through http?
edit:
If not, how would you go about finding the javascript cookies, using wireshark or another sniffer?

You don't really give a lot of context, so it's hard to tell what sort of solution is appropriate to your problem.
If I wanted to find the JavaScript cookies sent by a site I'd probably do it from within a browser. As I mentioned in my comments above, reading the cookies set by JavaScript on the client side (in the general case) requires executing the JavaScript. Doing this "correctly" requires then entire environment that's visible to JavaScript, which is a pretty large fraction of a browser.
If a human operator is ok (eg: if this is for debugging), then you could use something like Firebug or Chrome's Developer Tools to examine the cookies. If you need something more automated, one option might be to write a browser extension.
There are other options that involve more work and/or less precision, but without knowing more about the constraints of your problem it's impossible to know which of those other options would be more appropriate.

Related

How to reduce your fingerprint in browser for privacy and for web scraping

You can disable cookies, change your ip 500 times but can’t anyone just track you through fingerprinting?
You could disable Java and Flash. Though that would break the page and make you stand out anyway.
You could use Tor but I think if you use Tor you get blacklisted from some sites instantly.
What’s the workaround? Using Chrome is a big nono. Internet explorer maybe and firefox perhaps…
Are there any apps that deal with this? Or just design a good web scraper, have an ip and cross your fingers.
I realize the average site is not going to implement all these features, but I am how one would workaround a site that was extremely vigilant.
There are two types of browser fingerprinting:
1. static fingerprinting - can identify browsers (and probably operating systems) just based on details of their requests. That's the order and capitalization of http headers, browser specific headers etc.
One small aspect is described here: https://gwillem.gitlab.io/2017/05/02/http-header-order-is-important/
As this can be done without any javascript, I guess scrapy is identifyable this way.
How to get around this?
As mentioned in the above article you need to exactly emulate a particular browser's fingerprint by emulating its headers' order and capitalization (and it has to match the user agent, of course)
2. dynamic fingerprinting - uses Javascript to collect data on installed plugins, plugin versions etc ... As Granitosaurus wrote, that won't be triggered by scrapy. But sites that use fingerprinting for scraping protection will block the scraper if it doesn't get any data from its fingerprinting module.
As this type of fingerprinting yields much more dimensions it can be used to identify particular users with a high reliability (over 90%)
You can find a good example how this is done here: https://github.com/Valve/fingerprintjs2
How to get around this?
use a lot of different real browsers for scraping (for example through selenium, no phantomjs, it can be detected)
randomize these browsers' settings and installed plugins (ideally using different versions)
when scraping rotate these browser instances instead of rotating IPs (each browser instance should keep its IP over its livetime)
If one of the instances is "burnt" replace it with a new instance that has a fresh IP and randomized browser fingerprint
... as you'll need many browsers this has to be done in an automated way, of course.
Resetting cookies sounds like a good idea at first, but if the fingerprinting system is worth its salt it won't need cookies to identify each of these machines reliably.

How can I intercept http requests which an Internet Explorer instance I started performs?

My C++ program launches Internet Explorer (it works with IE6 up to IE10) to display some web page on the Internet; I have no way to modify the web page. The web page references a JavaScript file (using a <script> tag in the HTML markup) - a copy of the swfobject JavaScript library. I'd like the web page to use a custom copy of this file which I provide.
I came up with two possible ways to tackle this
Write a proxy server which Internet Explorer connects to; the proxy fetches the actual data and then rewrites the HTML so that my own copy of swfobject is referenced. This is unfortunately quite a bit of work, and probably won't work with https. I could live without support for https for now.
Implement a asynchronous protocol plugin for Internet Explorer which intercepts all http requests. I know that the JavaScript file is always retrieved using http, so I could intercept accesses to the swfobject JavaScript file and yield my own file instead. Alas, this seems to be impossible as well, a Microsoft support page explains
Internet Explorer ignores naive attempts to overwrite HKEY_CURRENT_ROOT\PROTOCOLS\Http with a value other than the CLSID for
This sounds like hooking 'http' with a custom protocol handler won't work; in any case, this approach would also be problematic in case there is an existing http protocol handler.
Is there a better way to solve this than either of these two?
Depending on the complexity of your requirements, Fiddler may be a useful alternative to a custom proxy since it can automatically rewrite both requests and responses and can be a quick way of scripting what you want.
It also works well with HTTPS, so that part is "free".
Want to have Fiddler automatically rewrite requests and responses, add or remove headers, or flag/ignore sessions based on rules you specify? Check out the FiddlerScript Cookbook
Here is a link to the cookbook
If you need to embed it, it can also be embedded as FiddlerCore.
As #MSalters points out below, the Fiddler's optional SSL interception is something you should consider the trade-offs of before using it. It's documented here and I've written up a short summary of how it works in this answer.
Just shooting down an idea, it's possible to hook the WinSock send() and recv() function in your own process. This is a kind of man in the middle.. This solution has a high complexity drawback tho.
Easy, just translate the URL. Change the swfobject URL to a file:// URL, pointing at your copy.
(You're not actually launching IExplorer.EXE, are you? That's not how you're supposed to open web pages. You either launch a URL with ShellExecute, leaving the browserchoice to the user, or you embed MSHTML, IE's core, in your own app. Internet Explorer isn't part of Windows and may be absent, eg on Windows N.)

Functional testing with Jmeter

I want to check if the value of a cookie change after each reload of a web page.
I've tryied to use beanshell for the purpose but haven't succeed yet. Any example or tutorial ?
It depends on how the cookie is set. If it's a simple Set-Cookie response header then you can verify this using a standard Response Assertion. But if the cookie is normally set or amended using javascript then this code will not be executed by JMeter (it is not a browser) and you would probably do better looking at using a tool more focused on functional testing, like Selenium.
The thing is, JMeter is a tool used to simulate lots of browsers sending requests to a central server to verify that this machine, and it's friends, can support a certain load; it is not really designed to test client side functionality.

user browse web site history data

I would like to list user connect web site link,get all history data
where can i got those data.
thanks
Well, since I'm new I'll just have to post as broad an answer as I can for your vague question.
If your goal is to get a users recent browsing history, you should just be able to look up the places where all of the mainstream browsers store their history data. I highly doubt the devs would put such insensitive information under encryption, so this shouldn't be too hard. Browsers that you should take in to consideration include Internet Explorer, Firefox, Opera, Chrome, Netscape Navigator, and all of the other Mozilla spinoffs, such as Sea Monkey.
If your goal is to establish a connection to a web server, and then download a list of data provided by the server, there is a lot of setup involved. First, you need a server. You can use something like Apache, and use the HTTP protocol for all data transmission, or if you're feeling brave, you could whip up a server of your own design. Second, you need a way to connect to this server. Since it appears you're using visual C++, WinSock would be the way to do this. There are plenty of tutorials online for WinSock, just Google away.
I hope this helps you, and best of luck to your endeavor.
As your question is tagged "C++", I assume that your program works on local computer.
Each browser has its own format of "history storage". You will have to work on different formats if you are targeting the major browsers, e.g. Firefox, Chrome, IE, etc.
For example, Firefox and Chrome stores its history in a SQLite database, while IE stores in a binary file named "index.dat".
Here are some places to start:
Firefox :
http://kb.mozillazine.org/Places.sqlite
https://developer.mozilla.org/en/The_Places_database
IE :
http://www.forensicswiki.org/wiki/Internet_Explorer_History_File_Format

Can two different browser share one cookie?

My requirement is pretty interesting, I want to maintain one cookie between two different browser for same domain.
so lets say I have create one cookie with name "mydata" and value "hiscal" from IE, then if i browse same website from firefox and trying to read cookie "mydata" then system should give me value "hiscal"
but this is not happen in general case
so can any one tell me how i can share cookie between to different browser(client) of same domain.
Thanks,
Hiscal
You can build a cookie-proxy by creating a Flash application and use Shared Objects (SO = Flash cookies) to store data.
Any Browsers with Flash installed could retrieve the informations stored in the SO.
But, it's an ugly workaround.
Just don't share cookies... and find another way to build your website/app.
Every browser maintains it's own cookies. So in general, no this is not possible.
With a lot of hard work you could in theory write an application that sits on the client computer that looks at all the locations the different browsers store cookies, parses the different cookie formats, synchronises them and then writes them out.
That would be error prone and will break as soon as a browser changes how it works with cookies (not to mention that some of the browsers secure their cookies, so you won't be able to get to them in the first place).
In my opinion, this is not practical and I wouldn't even try.
Use YUI's storage utility and force it to use the SWF storage engine.
All computers and browsers would still have to have Flash installed, but you wouldn't have to write your own Flash app. You would benefit from using the one maintained by the YUI team.
As others have said, this is not very portable, but in a controlled environment, it might work for you.
Cookies can be shared with other data storage, through browser extensions. Maybe in Flash or Google Gears you can maintain shared DB between browsers, but it needs to be installed on both of them, of course.
Edit:
In Google Gears you can't. Maybe you should write self-made extension... or some user-login system, where the data will sit on the server.