Is there any way to read an webpage using c++

Is there any way to read an webpage using c++ - c++

I know that we can retrieve a webpage content through curl http://curl.haxx.se/ but is there any native way to retrieve the content of an webpage using c++ without using any library?

You will always need some kind of library in order to establish a network connection (I count OS APIs as libraries). That aside, you would have to:
establish a connection to the server
send a http request
receive and handle the http response
You can implement these steps by hand, but that really is a pain, especially because http is quite a complex protocol (even if you only implement the stuff you actually use, enough remains).

If you use Windows, you can use functions below
InternetOpen() - Initializes an application's use of the WinINet functions.
http://msdn.microsoft.com/en-us/library/aa385096(VS.85).aspx
InternetOpenUrl() - Opens a resource specified by a complete FTP, Gopher, or HTTP URL.
http://msdn.microsoft.com/en-us/library/aa385098(VS.85).aspx
InternetReadFile() - Reads data from a handle opened by the InternetOpenUrl
http://msdn.microsoft.com/en-us/library/aa385103(VS.85).aspx
InternetCloseHandle() - Closes a single Internet handle
http://msdn.microsoft.com/en-us/library/aa384350(VS.85).aspx
Hope it helps
PS: or you can use a more convenient function
URLDownloadToFile() - Downloads bits from the Internet and saves them to a file.
http://msdn.microsoft.com/en-us/library/ms775123(v=vs.85).aspx

Related

How to implement secure socket communication in c++ application using winsock?

I am trying to implement secure communication between a server and client in c++. The limitation is that both the client and server must run on windows and have to be in c++. This is for a research project I am working on at my university.
So far I have found that SChannel is the best option, but the documentation is extremely confusing and I can not find any guides/tutorials on how to use it. I have already looked at this link https://learn.microsoft.com/en-us/windows/desktop/secauthn/creating-a-secure-connection-using-schannel but still do not understand how to get it working. Could someone guide me through this if this is the best way?
I also looked into use SSLStream using the CLR to have .net run inside of a c++ application. However I can not use this because the client application is threaded and threads can't be used with CLR.
I already have a dummy client and server set up with communication between the two, I am just trying to secure and encrypt that communication.
Any help is greatly appreciated!

Whichever SSL library you choose to use there are a few things you need to know as a beginner in this field:
The server and client implementations will end up looking quite different in places.
Your server is absolutely going to need a certificate with a private key. During development you clearly don't want to get one from Verisign or something so you need to create a self-signed certificate. You can do this with openssl or other tools.
The certificate consists of a private part and a public part. The public part needs to go to the client, and will be used to validate the connection. When you are using something like SChannel the certificates (private and public) will need to be installed in the certificate stores of the server and client respectively.
SChannel does not send or receive data for you. So the core of your implementation is going to be: when the network has data: read ciphertext from socket and write to SChannel. Read clear text from SChannel (if any) and pass to application. When the application has data to send, get clear text from Application and pass to SChannel. Get the resulting ciphertext buffers from SChannel and write to the socket.
buffers from the internet may be partial, and negotiations and re-negotiations means there's no 1:1 mapping of passing data into SChannel and getting data out.
You therefore can't get away with a naive implementation that calls SChannel once to pass data in, and once again to get un/encrypted data. There will potentially be nothing available, or a whole lot of packets to send between the client and the server, before you'll get any application bytes. i.e. You will need some kind of state machine to keeptrack of this.
Obviously, don't write both the client and server at the same time: Start with your client against an https server.
That's the general outline of the process - the things that confused me when I first encountered SSL and why none of the samples were nearly as simple as I had hoped them to be.

How to access webpage text data dynamically to system process?

I have a webpage with some form that I can access through a URL http://server... I also have an application written in C++ that I want to get that form data.
How can I access a webpage through a c++ code?
How can I get that form data from webpage to c++ code?
Next I need to dynamically get this value when I edit any
data in webpage and press submit.
How can I execute this approach in C++? Should I choose any thread mechanism inside C++ or any socket?

How can i access a webpage through a c++ code ?
The scheme of the URL implies that you can access it using the Hypertext Transfer Protocol. The specification of HTTP tells you how to use the protocol and you can write a client based on that. You may get away with much less work by using an existing implementation of HTTP client that has a C++ (or C) API.
How can i get that form data from webpage to c++ code ?
As it is a webpage, presumably the form is described in HyperText Markup Language. Check the content to be sure. You can write a HTML parser based on the HTML standard specification. You may get away with much less work by using an existing implementation of HTML parser that has a C++ (or C) API.
when i edit any data in webpage and press submit , i need to get this data to c++ code .
The form will need to send the submit to a webserver that executes your C++ program. This is usually achieved through Common Gateway Interface. You may get away with much less work by using an existing webserver that supports CGI, and an existing C++ (or C) CGI library. If you choose to write the webserver yourself, then you could write the submit response as part of the webserver program and you won't need CGI, but that would probably make your server poorly reusable.
should i choose any thread mechanism
Webservers typically use multiple threads to handle requests, and if you write your own server, then I recommend also doing so. It's not absolutely necessary however.
Also, if your client program that reads the form has other things to do than wait for the server to respond, then an asynchronous request allows you to do those other things while waiting.
or any socket
If you choose to implement the client and/or the server yourself, then yes, you'll probably need to use sockets to communicate over network.

Not downloading a file correctly

I'm using the following line to download a file, and when I do that, it's not downloading the most recent file.
HRESULT hr = URLDownloadToFile(NULL, _T("http://example.com/users.txt"), _T("users.txt"), 0, NULL);
On the first run, users.txt has 3 names in it, if you were to remove a name, and run it again it still downloads with 3 names.
I'm using remove("users.txt); to remove the file prior to download.

It is probably operating system specific, or at least you need a library for HTTP client side.
You need to read a lot more about the HTTP protocol. The formulation of your question makes me believe you don't understand much about it.
On some OSes (notably Linux and POSIX compliant ones), you can use libcurl (which is a good HTTP client free software library)
URLDownloadToFile seems to be a Windows specific thing. Did you carefully read its documentation? It is returning some error code. Do you handle hr correctly?
You can probably only get what the HTTP protocol (response from web server, for a GET HTTP request) gives you. Mostly, the MIME type of the content of the URL, the content size, and the content bytes (etc... including content encoding etc...). The fact that the content has 3 names is your understanding of it.
Try to read more about the HTTP protocol, and understand what is really going on. Are any cookies or sessions involved? Did you try to use something like telnet to manually make the HTTP exchange? Are you able to show it and understand it? What it the HTTP response code ?
If you have access to the server (e.g. using ssh) and are able to look into the log files, try to understand what exchanges happened and what HTTP status -i.e. error code- was sent back. Perhaps set up some Linux box locally for initial tests. Or setup some HTTP server locally and use http://localhost/ etc...

HTML Forwarding

So I've been playing around with some simple HTML forwarding with c++. Haven't accomplished much and I have some questions on the backbone.
First: Do I need to use any special libraries other than socket libraries to simply forward HTML data and connections?
Second: When a client connects to an HTML server, is the TCP connection kept open? Or is it closed once data is sent?
Third: When I forward data, from a client to the server, the packet includes the destination address. I should technically be able to read this address and connect to the server via port 80, keep it open, and send and receive on that newly opened port right? Is there anything I have to do? Any time constraints? If I directly forward every single packet directly between the client and server the website should show up correctly on the client, correct?
I would prefer to keep any external libs to a minimum. But if necessary I can expand the program to include any required libraries.
So far I've gotten data to and from both parties, however the website does not function.
[platform] :: windows.primary && posix_compliant.secondary

First: No you do not need other special libraries but not using any that are available would to some extent be reinventing the wheel.
Second: No, HTTP is a connectionless protocol.
Third: An HTTP session begins with a request header, which in your case sounds like a POST. A POST may take more than one package, during which time the connection remains open. The server may well time you out.
You might look at libCURL even if you do not intend using it. (The source for that is in C, and is rather monolithic but it is commonly used).

After doing quite a bit of research, the greatest help I've had in my endeavors has been this website.
This one also helped quite a bit.
LibCURL is certainly the way to go. It's kind of dated, and everything is in C, but it's much easier than redoing everything..
quote from second site:
Like most network protocols, HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions).

Advice on web services without HTTP

My company is planning on implementing a remote programming tool to configure embedded devices in the field. I assumed that these devices would have an HTTP client on them, and planned to implement some REST services for them to access. Unfortunately, I found out that they have a TCP stack but no HTTP client. One of my co-workers suggested that we try to send “soap packets” over port 80 without an HTTP client. The devices also don’t have any SOAP client. Is this possible? Would there be implications if there was a web server running on the network the devices are connected to? I’d appreciate any advice or best practices on how to implement something like this.

If your servers are serving simple files, the embedded devices really only need to send an HTTP GET request (possibly with a little extra data identifying the device, so the server can know which firmware version to send).
From there, it's pretty much a simple matter of reading the raw data coming in on the embedded device's socket -- you might need to only disregard the HTTP header on the response, or you could possibly configure your server to not send it for those requests.

you don't really need an HTTP client per-se. HTTP is a very simple text-based protocol that you can implement yourself if you need to.
That said, you probably won't need to implement it yourself. If they have a TCP stack and a standard sockets library, you can probably find a simple C library (such as this one) that wraps up HTTP or SOAP functionality for you. You could then just build that library into your application.

Basic HTTP is not a particularly difficult protocol to implement by hand. It's a text and line based protocol, save for the payload, and the servers work quite well with "primitive, ham fisted" clients, which is all a simple client needs to be.
If you can use just a subset, likely, then simply write it and be done.

You can implement a trivial http client over sockets (here is an example of how to do it in ruby: http://www.tutorialspoint.com/ruby/ruby_socket_programming.htm )
It probably depends what technology you have available on your embedded devices - if you can easily consume JSON or XML then a webservice approach using the above may work for you.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js