Using curl in C++ to get page that changes after sometime - c++

I was trying to make a webscraper in C++ (I know I could use some other language but I'm just trying to learn). There's a webpage I'm trying to get the html code to but the page changes after a second or two with the links I want. How do I make the program wait until sometime to return the html?
Edit: I want to make a curl call once and then wait some time and then do another curl call to the same webpage after some time. (Not open the link again as it would give the same page)

You have three options:
investigate the site and figure out how the javascript code is changing the page, then replicate that in C++ (either by hard-coding a URL or parsing parts of the page),
embed a full browser engine that understands JavaScript and click the link after it changes, or
abandon C++ and use a dedicated scraping tool like CasperJS or Scrapy or wring or ...
I would inspect the page and see if you can make option 1 work, but option 3 is by far the easiest approach.

Related

Passing Deep Linking "parameters" to a C++ application

I'm currently struggling with a problem regarding Deep Linking for a C++ application: the current goal is to allow the user to take certain action in the app after clicking on a browser link. I didn't have any issues managing to allow the user to open the application using this... The issue comes when I try to make the application do something specific, depending on the URL's "parameters". Here's an example, to better illustrate my current goal:
If I open this link from a browser:
mybeautifulapp://action=banana
Then, the application should go to the "banana" section.
If I open this other link:
mybeautifulapp://action=apple
It should go to the "apple" section.
However, I have no idea where to get the URL's parameters themselves. I.e., everything after "mybeautifulapp://". Currently, I'm able to get the app to open using any of these URLs. However, I'm not sure how to get the app to do something specific, depending on the URL's contents.
I initially thought that the C++ app would receive the URL in its Main function arguments. I tried parsing them, but turns out that nothing related to the URL itself ever reaches the Main function. If that's the case, then, where can I receive the URL in order to parse it? I found a lot of information for Android, iOS, and Electron applications, but nothing for C++ apps

How to take screenshots in python?

I've an idea and want to implement it.
But I'm not sure if it's gonna work. So, wanted to get your inputs.
I would like to take screenshots of a url.
Say, when I open a web-site www.espncricinfo.com , I would like to take screenshot of that page and save locally. This saved image can be converted to GIF later on.
Can this be achieved through python ? Any suggestions/inputs to make it ?
Updated
And also is it possible to capture screenshot in headless-browser ?
Any possibilities to launch the browser in headless mode (non-GUI) and then take the screenshot of particular area of web-page ?
To take a screenshot using python:
import pyscreenshot as ImageGrab
im = ImageGrab.grab()
im.save('path/to/image/folder/image_name.png')
im.show()
Yes and no, if you send a request with urllib you will get the HTML in return, which is step one to displaying a webpage. But you have to build that webpage from that with a browser engine, otherwise all you will see is a bunch of text.
There are some python libraries that can do this, such as pywebkitgtk, but those are probably not going to give you the best experience and support.
Another thing you could try is to use crod and firefox/chrome/whatever and then use python to automate the process.
Oh, and by the way, I strongly recommend upgrading to python3
I know it is a bit to late now, but just saw this question and in case anyone else is looking for a mighty tool, to do screenshots and also use headless browsers with python... I just want to recommend to look more deeply into https://www.seleniumhq.org/. There are many tutorials on youtube available. However, this is IMHO the most accurate framework to execute your task, IMHO. To make screenshots, you can then define the screen resolution etc.

Are you able to get hints from what template a message is coming from with dev tools?

I have this website that I'm editing for a friend and they want to get rid of this message at the checkout screen but their boss doesnt know who implemented it. Its an error message at the top in read that says "If you are having trouble checking out, please contact us at sales#cbobaby.com" and is in the check out page. This is an open cart website and I only work with wordpress sites so I'm having trouble figuring out where the source of the message is coming from. I've dug through some of the template files in the theme and I can't seem to find or delete anything that gets rid of it. My question is if there is anything in Chrome dev tools that would help me identify the source or template it lives in? I only use dev tools for adjusting css but I know there's so much more you can do it with. Thanks.
No, DevTools can't relate your front-end code to what generates it for the DOM. For the exact same reason we are unable to persist edits in the DOM to your source.
You need to use grep, or some code editor with "find all" functionality and look for some part of the string. If that fails, search your database and see if it is coming out of there. You can then either edit the database and hope nothing breaks, or try to back-track through the application logic to find where is calling that part of the DB. It should give you some ground as to where to look.
In the Sources tab, you can see the resources, that are loaded when you are on a particular page. You can also use the Inspect tool in the Elements tab to find the element that hosts that bit of text to narrow things down in your search.
To add to this, if content is generated on the server side, the resources you see will likely be a merge from multiple generated sources, e.g. with templates in your case. You can search your solution for aspects of the DOM elements you see in Chrome Developer Tools, but look for the static parts instead of the dynamic parts. For example, the text itself won't be part of the template file, a placeholder will exist - a CSS class could be useful.

How to use Google Blink/Webkit to render HTML code

Sorry if the title somewhat ambiguous.
I'm buliding an app that recieve an URL then return the final HTML code (and save it for caching), after Ajax and other js feature executed (something like Phantomjs).
My language can call C++ code, so I think it would be nice if I can buid and use Blink/Webkit libary directly.
The issue is both Blink/Webkit document is too big.
UPDATE 1: Which API (Blink has many APIlayer) or a particular class I need to look at?
Do you know any example or tutorial I should look at?
Or any alternative simpler libary?
Thanks
Finally Chromium project have headless API in development with very good example which can be build using ninja, more information in their project at https://chromium.googlesource.com/chromium/src/+/master/headless/
A video from BlinkOn https://www.youtube.com/watch?v=GivjumRiZ8c&t=838s

Clever way to use 1 html-template for multiple applications in Flex Builder?

The html-template portion of Flex Builder is currently only limited to 1 template. This works fine when you only have 1 application, but I have several applications in one project, each of which takes different flashParams.
I was wondering if anyone thought of a clever way of using 1 template to be used by multiple apps. So far, the only thing I can think of is to just place the embed code for each app in the html-template, and have it hidden by default, with links to enable.
Any other ideas?
Thanks
I ended up just doing what I mentioned in my original post. I simply placed embed code for each app in the html-template, and set them all to hidden by default. To enable one, I simply have javascript links at the header of the page for each app.