How to get the source of the page loaded in IWebBrowser2? - c++

From a BHO (Browser Helper Object) in Internet Explorer, how do I get the full source code of the page currently loaded in the web browser when I have its IWebBrowser2 interface?
Do I have to download it again from the url where it resides or is there a way to get the copy that internet explorer downloaded and used to render the page?
I tried getting the outerHTML of the html element of the current document, but it returns the source code already preprocessed. I need to get it in the same form you see it when you clock "View Source Code" in the Internet Explorer.
Thank you for any helpful information!!!

You can query the browser's Document property for IPersistStream or IPersistFile and then call its Save() method. But when querying the browser for its HTML, you are likely to get the processed HTML, which may include DOM changes from scripts.
To get the original HTML, you should download it yourself directly from the source URL, or at least extract the file from the browser's local cache.

Related

How to programmatically change browser URL

In my C++ Windows application I use ShellExecute to open a remote PDF file in the internet browser at a certain PDF Destination (dynamic bookmarks provided by Adobe Acrobat Reader):
ShellExecute(NULL, "open", "https://www.myweb.cloud/guide.pdf#dest_1", NULL , NULL, SW_SHOWNORMAL);
Then if I want to move to another Destination, another call to ShellExecute (with #dest_2 in the URL) simply open another page in the browser and download the PDF again opening it at that Destination.
Is there a way to programmatically change the URL (from #dest_1 to #dest_2) without making the browser to open a new page e re-dowload the PDF?
I also use LibCurl in my application in order to retrieve data from remote servers. Can I reach my goal with LibCurl? If so, could you plese show me a code sample?
Thanks in advance.
External links opened with ShellExecute are always opened in a new tab by default. Chrome can't change this behavior. Early Firefox had an option for opening an external link in a currently active tab, but does not seem to have it now.
You can download files with libcurl, see url2file example. After a file has been downloaded, you can open it in a certain application with ShellExecute. You just need to find an application, that is suitable to your requirements. For example Adobe Reader does not seem to support opening in the same tab 1, 2. As #KJ commented while I was typing my answer, sumatrapdf -reuse-instance seem to be suitable for you.
You can use Edge WebView2 component in your app instead of a browser.

Windows update KB5003637 seems to have broken WebBrowser control, does anyone know why?

The recent Windows 10 update for KB5003637 seems to have caused our use of the WebBrowser control to fail. Our applications use a C++ dialog that hosts a web browser control based on the IWebBrowser2 interface and implemented by the COM class 8856f961-340a-11d0-a96b-00c04fd705a2. The control interacts with a bespoke internal 'web server' that is hosted on a localhost port. The web browser is rendering dynamic HTML with a bunch of css and javascript. It's a legacy app that has been working reliably for many years.
Our users that have Windows 10 versions 2004, 20H2, and 21H1 are installing the KB5003637, and when they do the web browser does not render the content that it did before.
Looking at some trace, I can see that the Web Browser is requesting the page's HTML, which seems to be delivered as it should. What normally happens at that time is that the web browser control requests the css and javascript files needed to make the page active. What happens instead is nothing.
The KB5003637 update is pretty big, but does contain fixes for some scripting vulnerabilities described in CVE-2021-31959 which are very much on point. Nothing that I've found so far indicates how this was fixed, the effect that it has on the WebBrowser control, nor what workarounds there might be.
Any help would be appreciated.
Turns out that the Windows update I described did change the behavior of the WebBrowser control. Our bespoke web server was not including content type headers for responses to the WebBrowser's request. For the last decade or more, the control was successfully able to figure out what the content was OR it defaulted to the correct content type in the cases that mattered. After the update, the WebBrowser was defaulting to a content type of 'text' for the initial HTML payload. As a result it was not trying to interpret the payload as HTML and therefore no further actions were necessary (like requesting css and js files).
When I changed the code to include a content type header of "text/html" for the initial payload, the application began working. Content type headers are now included with all replies.

Access Webpage With Credentials and Cookies From Command Line

I am trying to access a proprietary website which provides access to a large database. The database is quite large (many billions of entries). Each entry in the database is a link to a webpage that is essentially a flat file containing the information that I need.
I have about 2000 entries from the database and their corresponding webpages in the database. I have two related issues that I am trying to resolve:
How to get wget (or any other similar program) to read cookie data. I downloaded my cookies from google chrome (using: https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg?hl=en) but for some reason the html downloaded by wget still cannot be rendered as a webpage. Similarly, I have not been able to get Google Chrome from the command line to read cookies. These cookies are needed to access the database, since they contain my credentials.
In my context, it would be OK if the webpage was downloaded as a PDF, but I cannot seem to figure out how to download a webpage as a pdf using wget or similar tools. I tried using automate-save-page-as (https://github.com/abiyani/automate-save-page-as) but I continuously get an error of the browser not being in my PATH.
I solved both of these issues:
Problem 1: I switched away from wget, curl and python's requests to simply using the selenium webdriver in python. Using selenium, I did not have to deal with issues such as passing cookies,headers, post and get, since it actually opens a browser. This also has a plus that as I was writing the script to use selenium, I could inspect the page and see what it was doing as it was doing it.
Problem 2: Selenium has a method called page_source, which downloaded the html of the webpage. When I tested it, it rendered the html correctly.

Blank applet layout coming in Siebel before OpenUI rendering happens

We are using Siebel citizen portal application (IP 15.12). After login, we reach in to a landing page with 3 applets. These applets have some custom fields etc, but there is no UI customization we did (ie no PM/PR etc).
Now when we login, a blank layout of the applet appears for a few seconds (especially first load or after clearing cache) and then renders the original applet with Open UI layout. Since this is a customer facing application, our client has a lot of concern on appearing the blank layout before the actual rendered applet.
Could anyone throw some light here? Is this an Open UI behavior, if so is there any workaround for this?
With IP 15.18, We had similar issue of the list applets not loading data on initial (home page load) after login. Only the list applet titles were displayed but the form applet loaded elements, showing the busy cursor forever...
We were getting "Error downloading file define:siebel/htmltmplmgr" error in the browser console. we debugged and found that the localeobject.js was not loading, which is the dependency file mentioned in htmltmplmgr.js...So we added preload define parameter in the htmltmplmgr.js in the beginning to load the localeobject.js file before even the main function starts...then it started working...try this...
On first load or after clear cahce, it needs to load all JS files (vanilla) and other required files in Open UI. So, this is the default behavior of OpenUI.
Nothing much we can do with this. Else you can tweak down all unnecessary files from downloading.
I found this blog very useful, you can check.
http://www.askmesiebel.com/category/siebel-8-1-1-10-upgrade-issue/
edit the file htmltmplmgr.js in below folders
$SIEBEL_ROOT/siebsrvr/webmaster/siebel_build/scripts/siebel/
$SIEBEL_ROOT/SWEApp/public/enu/23048/scripts/siebel/
go to define section and add dependency files inside the square brackets
define("siebel/htmltmplmgr",[]
define("siebel/htmltmplmgr",["siebel/siebelconstants.js","siebel/localeobject.js"]
Note that you have to edit the files in both folders you can create a backup before making changes. If you just replace in public folder when you restart siebel server files will be overwritten from webmaster/siebel_build folder to public folder . Hence you have to replace in both folders and restart siebel server and web server.

Django - Upload file without using form

I have a small email client. I would like to be able to upload files without having to submit a form. So, I have my email form. I would like to, whenever I use the file input button on my form, that file would be uploaded without any reload of the page. The goal is to be able to upload multiple files without a reload of the page, something similar to what happens in GMail.
Every time you click the file input and choose a file, a small progress bar appears with the upload progress, and the page is not reloaded.
I am guessing some JS/Ajax library might help me achieve this? I am using HTML5.
Thank you
Blueimp has a great jQuery-based throttling file uploader.