Linux C++ Raw Socket Sniffer - recvfrom() fails when ostringstream introduced - c++

I am writing a packet sniffer in C++ utilizing streams instead of printf() to store and create output. The problem I've run into is that recvfrom() seems to fail and return -1 when I have two or more statements that generate output using a stream.
If I comment one of the two output generating statements, the program runs fine. Through trial and error, I've found that by removing the std::setw() from the std::cout statement, it will work correctly and display both the packet and the "beef" message.
Any ideas or help would be much appreciated as I am at a loss and considering reverting back to using printf() since it never had this problem (and is faster than a stream). I admit, this is really the first time I have ever used ostringstream and I may be using it incorrectly.
My simplified source code:
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
#include <arpa/inet.h>
#include <netinet/if_ether.h>
#include <linux/if_packet.h>
std::string BufferInHex( unsigned char * buffer, int length )
{
std::ostringstream out;
for( int i = 0; i < length; i ++ ) {
if( i % 16 == 0 && i != 0 ) {
out << "\n";
}
else if( i % 8 == 0 && i != 0 ) {
out << " ";
}
out << std::hex;
out << std::setfill('0') << std::setw(2) << static_cast<unsigned>(buffer[i]) << " ";
out << std::dec;
}
return out.str();
}
int main( void )
{
struct sockaddr_ll saddr = {0};
socklen_t saddr_size = sizeof(saddr);
unsigned char packet[1500] = {0};
int sockFd = socket( AF_PACKET, SOCK_RAW, htons(ETH_P_ALL) );
if( sockFd < 0 ) {
std::cerr << "Error creating socket!\n";
return 1;
}
int data_size = recvfrom( sockFd, packet, sizeof( packet ), 0, (struct sockaddr*)&saddr, &saddr_size );
if( data_size == -1 ) {
std::cerr << "Error in recvfrom()\n";
return 2;
}
std::cout << std::hex;
std::cout << std::setw( 8 ) << ntohs( 0xADDE ) << "\n";
std::cout << std::dec;
std::cout << BufferInHex( packet, data_size ) << "\n";
return 0;
}
It is being compiled with g++ on Centos 6.4 kernel 2.6.32 using the following command:
g++ sniff.cpp -o sniff -Wall
Thanks for any ideas or help,
Jeremiah

Related

setsockopt throws invalid argument error on MacOS

main.cpp:
#include <sys/socket.h>
#include <string>
#include <iostream>
#include <unistd.h>
int main() {
int s;
s = socket( AF_UNIX, SOCK_STREAM, 0 );
int val = 0;
socklen_t size = (socklen_t)sizeof(val);
if ( setsockopt( s, SOL_SOCKET, SO_SNDBUF, &val, size ) == -1 ) {
std::cerr << "Set sock option failed" << std::endl;
std::cerr << "Errno: " << errno << std::endl << "Error Message: " << std::strerror(errno) << std::endl;
return -1;
}
return 0;
}
compiled with:
g++ main.cpp
results in:
Set sock option failed
Errno: 22
Error Message: Invalid argument
This same code runs perfectly on linux, but gives the above error on macOS (10.15.7, g++ Apple clang version 12.0.0). Additionally, with other options (SO_DEBUG for example) it completes. What could be causing this?

What should the error statements be for exit 3 and 4 in my c++ simple buffer code?

I need to input error statements above exits three and four and then actually force the code to go to these error statements so I can have screenshot proof that they are working. However, I can't quite work out what should be in each. My initial thoughts are 'the output file can't be created for 3 and 'The file you want to read from is empty' for 4, but I can't seem to trigger these errors so I feel like that's not correct.
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <iostream>
using namespace std;
#define BUF_SIZE 500
#define OUTPUT_MODE 0700
int main(int argc, char *argv[])
{
int in_file, out_file;
int read_size = 1, write_size;
char buf[BUF_SIZE];
if (argc != 3){
cout<<"The command-line input does not contain 3 arguments"<<endl;
exit(1);
}
in_file= open(argv[1], O_RDONLY);
if (in_file < 0) {
cout<<"The file you are trying to copy from doesnt exist"<<endl;
exit(2);
}
out_file = creat(argv[2], OUTPUT_MODE);
if (out_file < 0) {
cout<<"Error statement 3"<<endl;
exit(3);
}
while (read_size > 0) {
read_size = read(in_file, buf, BUF_SIZE);
if (read_size <0){
cout<<"Error statement 4"<<endl;
exit(4);
}
write_size = write(out_file, buf, read_size);
if (write_size<=0){
close(in_file);
close(out_file);
cout<<"Reading and writing from and to files is complete"<<endl;
exit(5);
}
}
}
The way to work out how these functions can /will fail, is to read the documentation for them.
https://man7.org/linux/man-pages/man2/read.2.html
https://man7.org/linux/man-pages/man2/open.2.html
(note, the documentation of creat says its equivalent to calling open with specific arguments)
At the bottom, it lists what errno will be set to and why.
Open, for example, will fail on a read only disk.
Most standard library functions will set errno to give you the reason they failed. Use that information, and write your error messages to stderr:
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <iostream>
using namespace std;
#define OUTPUT_MODE 0700
int
main(int argc, char *argv[])
{
int in_file, out_file;
int read_size, write_size;
char buf[BUFSIZ];
if (argc != 3){
cerr << "Missing arguments" <<endl;
exit(EXIT_FAILURE);
}
in_file= open(argv[1], O_RDONLY);
if( in_file < 0 ){
cerr << argv[1] << ": " << strerror(errno) << endl;
exit(EXIT_FAILURE);
}
out_file = creat(argv[2], OUTPUT_MODE);
if( out_file < 0 ){
cerr << argv[2] << ": " << strerror(errno) << endl;
exit(EXIT_FAILURE);
}
while( (read_size = read(in_file, buf, BUFSIZ)) > 0 ){
const char *s = buf;
do {
write_size = write(out_file, s, read_size);
read_size -= write_size;
if( write_size <= 0 ){
cerr << argv[2] << ": " << strerror(errno) << endl;
exit(EXIT_FAILURE);
}
s += write_size;
} while( read_size > 0);
}
if( read_size < 0 ){
cerr << argv[1] << ": " << strerror(errno) << endl;
exit(EXIT_FAILURE);
}
close(in_file);
close(out_file);
}

TCP server message extra characters c++

I think I am missing something and doesn't make sense.
I am writing pretty simple TCP server, everything works pretty much as expected, but when a message saying "500 LOGIN FAILED" gets send over network, it get interpreted as "$500 LOGIN FAILED".
I am testing my server using telnet on localhost
here is simplified version of my code
recv(c_sockfd, buf, BUFFSIZE, 0))
inBuffer.push_back(buf);
auto messageToSend = checkResponse(parseBuffer(inBuffer.back()));
//get the second thing in the tuple
outBuffer.push_back(std::get<1>(messageToSend));
bzero(buf, sizeof(buf));
send(c_sockfd, &outBuffer.back(), sizeof( outBuffer.back() ), 0)
in the checkResponse func, I am implementing logic to decide what message to send, and somehow when I send ERROR message the extra character is added at the beginning of the message.
EXAMPLE 1:
Connected to localhost.
Escape character is '^]'.
200 LOGIN
Robot345\r\n
201 PASSWORD
674\r\n
202 OK
INFO iasdijasdjiajsdiajdijasidjiansdjsdvhdf dfvsdfsdf\r\n
&501 SYNTAX ERROR
Notice the "&" character
EXAMPLE 2:
Connected to localhost.
Escape character is '^]'.
200 LOGIN
Robot345\r\n
201 PASSWORD
456\r\n
$500 LOGIN FAILED
Notice the "$" character
Does anyone have any idea where the extra characters could be added to the string?
I didn't want to include full code, because the requirement was to have all in one file, which makes it difficult to read. Here it goes tho.
FULL CODE:
#include <iostream>
#include <regex>
#include <iterator>
#include <vector>
#include <sstream>
#include <string>
#include <stdio.h>
#include <string.h>
#include <string.h>
#include <netdb.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <tuple>
#define MIN_PORT 3000
#define MAX_PORT 3999
#define BUFFSIZE 1000
/**
0 - LOGIN SUCCESSFUL, USERNAME IS IN THE BUFFER
1 - PASSWORD CHECK
2 - PASSWORD OK, COMMUNICATING
*/
int state = 0;
std::string username, password;
/**
CHECKS ENTERED PASSWORD BASES ON THE SUM OF ASCII VALUES OF USERNAME
#return: true on success, false otherwise
*/
bool checkPassword(std::string password){
std::istringstream sst;
sst.str(username);
unsigned char byte = '\0';
int value = 0;
// std::cout << "byte poprve: " << byte << std::endl;
// std::cout << "byte poprve INT: " << (int) byte << std::endl;
while (sst >> byte) {
std::cout << "podruhe: " << byte << std::endl;
std::cout << "podruhe INT: " << (int) byte << std::endl;
std::cout << "Prubezna SUMA: " << (int) value << std::endl;
value += byte;
}
std::cout << "suma: " << value << std::endl;
// Check the entered password
if (password == std::to_string(value)) {
return true;
}
return false;
}
/**
CHECKS MESSAGE SYNTAX BASED ON THE STATE WE ARE IN
CHECKS PASSWORD
CHECKS CHECK SUM
#param response <string type (if available), string message to parse>
#return TRUE on success, FALSE otherwise
*/
bool checkMessage(std::tuple<std::string,std::string> response){
auto messageToParse = std::get<1>(response);
std::string delimeter = "\r\n";
std::string::size_type pos = messageToParse.find(delimeter);
//INITIAL CHECK
if (pos < 1){
return false;
}
//somehow you have to multiply the length by 2
auto parsedMessage = messageToParse.substr(0,pos - 2*delimeter.length());
std::cout << parsedMessage << " : THIS IS YOUR PARSED MESSAGE";
//USERNAME
if (state == 0) {
username = parsedMessage;
return true;
}
//PASSWORD CHECK
if (state == 1 && checkPassword(parsedMessage)) {
password = parsedMessage;
return true;
}
if (state == 2) {
std::string type = std::get<0>(response);
//INFO
if( type == "I" ){
return true;
}
//PHOTO
if ( type == "F") {
return true;
}
}
return false;
}
/**
THIS FUNC WILL CHECK RESPONSE FROM THE ROBOT, AND DECIDE WHAT TO DO BASED ON THE STATE
#return tuple<bool TRUE if everything is right,std::string MESSAGE to send to the robot>
*/
std::tuple<bool,std::string> checkResponse(std::tuple<std::string, std::string> response){
if (state == 0) {
if (checkMessage(response)) {
std::cout << state << " / / state" << std::endl;
return std::make_tuple(true, "201 PASSWORD\r\n");
}
}
if (state == 1) {
// TADY BUDE JESTE PODMINKA, ZE HESLO JE SPRAVNE
if(checkMessage(response)){
std::cout << state << " / / / state" << std::endl;
return std::make_tuple(true, "202 OK\r\n");
}else{
std::cout << state << " / / / / state" << std::endl;
return std::make_tuple(false, "500 LOGIN FAILED\r\n");
}
}
if (state == 2) {
if (checkMessage(response)) {
std::cout << state << " / / / / / state" << std::endl;
return std::make_tuple(true, "202 OK\r\n");
}else{
std::cout << state << " / / / / / / state" << std::endl;
return std::make_tuple(false, "501 SYNTAX ERROR \r\n");
}
}
std::cout << state << " / / / / / / / / state" << std::endl;
return std::make_tuple(false, "unexpected result");
}
/**
This func will parse the incoming buffer
#param buffer incoming buffer
#return tuple <String type of message (U,I,P,F), String actual message>
*/
std::tuple<std::string, std::string> parseBuffer(std::string buffer){
if (state == 0) {
return std::make_tuple("U", buffer);
}
if (state == 1) {
return std::make_tuple("P", buffer);
}else{
std::string delimeter = " ";
std::string::size_type pos = buffer.find(delimeter);
std::string type = buffer.substr(0, pos );
std::string message = buffer.erase(0, pos + delimeter.length());
return std::make_tuple(type, message);
}
}
int main(int argc, char *argv[])
{
char buf[BUFFSIZE];
std::vector<std::string> outBuffer;
std::vector<std::string> inBuffer;
int sockfd, c_sockfd;
sockaddr_in my_addr, rem_addr;
socklen_t rem_addr_length;
int mlen;
const int PORT_NUM = atoi(argv[1]);
if( (PORT_NUM > MAX_PORT) || (PORT_NUM < MIN_PORT)){
perror("Port number is not acceptable");
exit(-1);
}
if ((sockfd = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1)
{
perror("Socket nelze otevrit");
exit(-1);
}
bzero(&my_addr, sizeof(my_addr));
my_addr.sin_family = AF_INET;
my_addr.sin_port = htons(PORT_NUM);
std::cout << PORT_NUM << " PORT NUM" << std::endl;
if (bind(sockfd, (struct sockaddr *)&my_addr, sizeof(my_addr)) == -1)
{
perror("Chyba v bind");
close(sockfd); exit(1);
}
if (listen(sockfd, SOMAXCONN) == -1)
{
perror("Nelze provest listen");
close(sockfd); exit(1);
}
while (1)
{
rem_addr_length=sizeof(rem_addr);
c_sockfd = accept(sockfd, (struct sockaddr*) &rem_addr, &rem_addr_length);
if ( c_sockfd == -1)
{
perror("Nelze accept");
close(sockfd); exit(1);
}
///FIRST MESSAGE
std::string ok = "200 LOGIN\r\n";
send(c_sockfd, &ok, sizeof(std::string), 0);
if ((mlen = recv(c_sockfd, buf, BUFFSIZE, 0)) == -1)
perror("Chyba pri cteni");
else{
while (mlen)
{
///---------- MAIN PART--------------
//This is where comunication is happening
inBuffer.push_back(buf);
//Parse the buffer, check the message and
auto messageToSend = checkResponse(parseBuffer(inBuffer.back()));
//get the second thing in the tuple
outBuffer.push_back(std::get<1>(messageToSend));
bzero(buf, sizeof(buf));
///---------- MAIN PART--------------
state++;
std::cout << state << " state num" << std::endl;
if (send(c_sockfd, &outBuffer.back(), sizeof( outBuffer.back() ), 0) == -1)
{
perror("Chyba pri zapisu");
break;
}else{
}
std::cout << inBuffer.back() << std::endl;
if ((mlen = recv(c_sockfd, buf, BUFFSIZE, 0)) == -1)
{
perror("Chyba pri cteni");
break;
}
}
close(c_sockfd);
}
}
}
The problem is with this:
std::vector<std::string> outBuffer;
and this:
send(c_sockfd, &outBuffer.back(), sizeof( outBuffer.back() ), 0)
You can't send std::string objects over the network. You must send the string it contains. Those are two very different things.
For a simple fix, do e.g.
send(c_sockfd, outBuffer.back().c_str(), outBuffer.back().length(), 0)
If you want to send the terminating null then add one to the length to send.
For more details, while implementations of std::string is allowed to optimize small strings to be contained inside the actual object, otherwise a std::string object is really nothing more than a size and a pointer to the actual string (implementations might have other members as well).
A pointer is unique to the currently running process on the host system. You can't transfer a pointer over the network. You can't even save a pointer to a file and then load it again and have it working in a new process (even if it's a process from the same program).
By sending the std::string object, all you're really sending is this pointer. So on the receiving side it have no idea what you're really sending and how it should treat that.

posix_spawnp and piping child output to a string

I am struggling with process creation and piping the child process' output into a string of the parent process. I got it working on Windows (using CreatePipe and CreateProcess and ReadFile), but can't seem to get the exact analog on Unix to work. This is my code:
#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
int exit_code;
int cout_pipe[2];
int cerr_pipe[2];
posix_spawn_file_actions_t action;
if(pipe(cout_pipe) || pipe(cerr_pipe))
cout << "pipe returned an error.\n";
posix_spawn_file_actions_init(&action);
posix_spawn_file_actions_addclose(&action, cout_pipe[0]);
posix_spawn_file_actions_addclose(&action, cerr_pipe[0]);
posix_spawn_file_actions_adddup2(&action, cout_pipe[1], 1);
posix_spawn_file_actions_adddup2(&action, cerr_pipe[1], 2);
posix_spawn_file_actions_addclose(&action, cout_pipe[1]);
posix_spawn_file_actions_addclose(&action, cerr_pipe[1]);
vector<string> argmem = {"bla"};
vector<char*> args = {&argmem[0][0], nullptr}; // I don't want to call new.
pid_t pid;
if(posix_spawnp(&pid, "echo", &action, NULL, &args[0], NULL) != 0)
cout << "posix_spawnp failed with error: " << strerror(errno) << "\n";
//close(cout_pipe[0]);
//close(cerr_pipe[0]);
close(cout_pipe[1]);
close(cerr_pipe[1]);
waitpid(pid,&exit_code,0);
cout << "exit code: " << exit_code << "\n";
// Read from pipes
const size_t buffer_size = 1024;
string buffer;
buffer.resize(buffer_size);
ssize_t bytes_read = read(cout_pipe[0], &buffer[0], buffer_size);
while ((bytes_read = read(cout_pipe[0], &buffer[0], buffer_size)) > 0)
{
cout << "read " << bytes_read << " bytes from stdout.\n";
cout << buffer.substr(0, static_cast<size_t>(bytes_read)+1) << "\n";
bytes_read = read(cout_pipe[0], &buffer[0], buffer_size);
}
if(bytes_read == -1)
cout << "Failure reading from stdout pipe.\n";
while ((bytes_read = read(cerr_pipe[0], &buffer[0], buffer_size)) > 0)
{
cout << "read " << bytes_read << " bytes from stderr.\n";
cout << buffer.substr(0, static_cast<size_t>(bytes_read)+1) << "\n";
bytes_read = read(cout_pipe[0], &buffer[0], buffer_size);
}
if(bytes_read == -1)
cout << "Failure reading from stderr pipe.\n";
posix_spawn_file_actions_destroy(&action);
}
The output is:
exit code: 0
So I suppose everything is working except the actual piping. What is wrong here? I also wonder if there is a way to read the piped bytes in a waitpid loop, but when I try that, the parent process hangs infinitely.
posix_spawn is interesting and useful, which makes this question worth necromancing -- even if it is no longer relevant to the OP.
There are some significant bugs in the code as posted. I suspect that some of these were the result of hacking in desperation, but I don't know which was the original bug:
The args array does not include the argv[0] that would represent the executable name. This results in the echo program never seeing the intended argv[1] ("bla").
The read() function is called from different places in a way that just doesn't make sense. A correct way to do this would be to only call read as part of the control expression for the while loops.
waitpid() is called before reading from the pipes. This prevents the I/O from completing (in non-trivial cases at least).
A more subtle issue with this code is that attempts to read all of the child's stdout before reading anything from stderr. In principle, this could cause the child to block while attempting to write to stderr, thus preventing the program from completing. Creating an efficient solution to this is more complicated as it requires that you can read from whichever pipe has available data. I used poll() for this. Another approach would be to use multiple threads.
Additionally, I have used sh (the command shell, i.e. bash) as the child process. This provides a great deal of additional flexibility, such as running a pipeline instead of a single executable. In particular, though, using sh provides the simple convenience of not having to manage the parsing of the command-line.
/*BINFMTCXX: -std=c++11 -Wall -Werror
*/
#include <spawn.h> // see manpages-posix-dev
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
int exit_code;
int cout_pipe[2];
int cerr_pipe[2];
posix_spawn_file_actions_t action;
if(pipe(cout_pipe) || pipe(cerr_pipe))
cout << "pipe returned an error.\n";
posix_spawn_file_actions_init(&action);
posix_spawn_file_actions_addclose(&action, cout_pipe[0]);
posix_spawn_file_actions_addclose(&action, cerr_pipe[0]);
posix_spawn_file_actions_adddup2(&action, cout_pipe[1], 1);
posix_spawn_file_actions_adddup2(&action, cerr_pipe[1], 2);
posix_spawn_file_actions_addclose(&action, cout_pipe[1]);
posix_spawn_file_actions_addclose(&action, cerr_pipe[1]);
//string command = "echo bla"; // example #1
string command = "pgmcrater -width 64 -height 9 |pgmtopbm |pnmtoplainpnm";
string argsmem[] = {"sh","-c"}; // allows non-const access to literals
char * args[] = {&argsmem[0][0],&argsmem[1][0],&command[0],nullptr};
pid_t pid;
if(posix_spawnp(&pid, args[0], &action, NULL, &args[0], NULL) != 0)
cout << "posix_spawnp failed with error: " << strerror(errno) << "\n";
close(cout_pipe[1]), close(cerr_pipe[1]); // close child-side of pipes
// Read from pipes
string buffer(1024,' ');
std::vector<pollfd> plist = { {cout_pipe[0],POLLIN}, {cerr_pipe[0],POLLIN} };
for ( int rval; (rval=poll(&plist[0],plist.size(),/*timeout*/-1))>0; ) {
if ( plist[0].revents&POLLIN) {
int bytes_read = read(cout_pipe[0], &buffer[0], buffer.length());
cout << "read " << bytes_read << " bytes from stdout.\n";
cout << buffer.substr(0, static_cast<size_t>(bytes_read)) << "\n";
}
else if ( plist[1].revents&POLLIN ) {
int bytes_read = read(cerr_pipe[0], &buffer[0], buffer.length());
cout << "read " << bytes_read << " bytes from stderr.\n";
cout << buffer.substr(0, static_cast<size_t>(bytes_read)) << "\n";
}
else break; // nothing left to read
}
waitpid(pid,&exit_code,0);
cout << "exit code: " << exit_code << "\n";
posix_spawn_file_actions_destroy(&action);
}

How can I extract the domain from a URL?

I'm currently making a few changes in the rTorrent source. I have the following code:
torrent::Object
apply_to_domain(const torrent::Object& rawArgs) {
const char * url = rawArgs.as_string().c_str();
char buffer[50];
snprintf(buffer, 50, "URL: %s.", url);
return std::string(buffer);
}
I need to extract the domain from url. There's a regex.h included in the source but I'm not sure if I can use that or if I need to use a different regex library.
Link to regex.h
The only thing that "regex" implementation handles is the wildcard character, *. (BTW, I'm just assuming it's a wildcard, since it's the only character that's recognised and the comments seem to hint as much, but I haven't actually verified it.)
Use a proper regex library like Boost.Regex.
// This is a hacked up whole string
pattern matching. Replace with
// TR1's regex when that becomes widely
available. It is intended for
// small
strings.
That's not going to work for extracting the domain. Use Boost or VSCRT TR1 instead.
See *get_active_tracker_domain* in command_pyroscope.cc
In windows:
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <cctype>
#include <locale>
#pragma comment(lib,"ws2_32.lib")
using namespace std;
string website_HTML;
locale local;
//***************************
void get_Website(char *url );
void extract_URL();
//***************************
int main ()
{
char *url="www.bbc.com";
get_Website(url );
extract_URL();
return 0;
}
//***************************
void get_Website(char *url )
{
WSADATA wsaData;
SOCKET Socket;
SOCKADDR_IN SockAddr;
int lineCount=0;
int rowCount=0;
struct hostent *host;
char *get_http= new char[256];
memset(get_http,' ', sizeof(get_http) );
strcpy(get_http,"GET / HTTP/1.1\r\nHost: ");
strcat(get_http,url);
strcat(get_http,"\r\nConnection: close\r\n\r\n");
if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0)
{
cout << "WSAStartup failed.\n";
exit(0);
}
Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
host = gethostbyname(url);
SockAddr.sin_port=htons(80);
SockAddr.sin_family=AF_INET;
SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);
cout << "Connecting to ["<< url<<"]...\n";
if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0)
{
cout << "Could not connect\n";
exit(0);
}
cout << "Connected. (success!)\n";
std::cout << std::flush;
send(Socket,get_http, strlen(get_http),0 );
char buffer[10000];
int nDataLength;
int i = 0;
while ((nDataLength = recv(Socket,buffer,10000,0)) > 0)
{
while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r')
{
website_HTML+=buffer[i];
i += 1;
}
}
cout<<"\n"<<i<<" bytes downloaded \n\n";
closesocket(Socket);
WSACleanup();
delete[] get_http;
}
void extract_URL()
{
for (size_t i=0; i<website_HTML.length(); ++i) website_HTML[i]= tolower(website_HTML[i],local);
std::string to_find = "http:";
std::vector<string> extracted_website_URL;
std::string string_to_split;
char chr_String[1000];
int count = 0;
char seps[] = "\"";
char *token;
cout << "\nExtracting url.. ";
for (int j = 0; j < website_HTML.length() - to_find.length(); j++)
{
if (website_HTML.substr(j, to_find.length()) == to_find)
{
count++;
string_to_split=website_HTML.substr(j, to_find.length()+256);
strcpy(chr_String , string_to_split.c_str() );
token = strtok( chr_String, seps );
extracted_website_URL.push_back(token);
//cout<<website_HTML.substr(j, to_find.length()+30)<<" \n";
}
std::cout << "\b\\" << std::flush;
std::cout << "\b|" << std::flush;
std::cout << "\b/" << std::flush;
std::cout << "\b-" << std::flush;
}
for(j=0;j<extracted_website_URL.size();j++) cout<<extracted_website_URL[j] <<" \n";
cout<<"\n"<<extracted_website_URL.size()<<" URL's extracted ";
cout<<"\n\n";
}
something basic but that may do the job:
#include <regex>
std::string getHostFromUrl(const std::string & url) {
std::regex urlRe("^.*://([^/?:]+)/?.*$");
return std::regex_replace(url, urlRe, "$1");
}
Try C++11 Regex:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str("The link of this question: https://stackoverflow.com/questions/3073753/how-can-i-extract-the-domain-from-a-url \
Other urls are https://www.google.com, facebook.com. https://my_site.online.com:1234");
std::regex r("https?:\\/\\/(www\\.)?[-a-zA-Z0-9#:%._\\+~#=]{1,256}");
std::smatch sm;
while(regex_search(str, sm, r))
{
std::cout << sm.str() << '\n';
str = sm.suffix();
}
}
in Qt, you can use QUrl:
QString url("https://somedomain.com/index/of/somepage/blah/blah");
QUrl qu(url);
qDebug() << "qu.host " << qu.host();
it will give you : somedomain.com