Escape url parameters for cURL - c++

I have a url like that:
http://localhost:3000/get_agencies?zipcodecity=&zipcode=30048&city=kraków&
As you can see there city param is equal to kraków. When I pass such URL into curl I receive it somehow encoded in inappropriate way:
curl = curl_easy_init();
// Some code here
curl_easy_setopt(curl, CURLOPT_URL, url);
On the server side I get city=kraków. I tried to use curl_easy_escape(curl, url, strlen(url)); but it just encodes everything. So how can I parse only param values of a query string?

(sorry, either you significantly edited your original question, or i read it wrong the first time, let me try again)
well, i guess you can kindof repair it, guessing where the data name and value starts and ends based on the = and & characters. it's NOT foolproof, if & or ? is wrongly encoded, or if you encounter an unicode character using the equivalent bytes for their character (edit: this last part is fixable by switching to a unicode string search function), this won't be enough, but except for those 2 scenarios, something like this should work:
std::string patchInappropriatelyEncodedURL(CURL *curl, std::string url){
size_t pos=url.find("?");
size_t pos2;
if(pos==url.npos){
return url;
}
std::string ret=url.substr(0,pos+1);
std::string tmpstr;
char *escapedstr;
url=url.substr(pos+1,url.npos);
std::string type="=";
do{
pos=url.find("=");
pos2=url.find("&");
if(pos == url.npos && pos2 == url.npos){
break;
}
if(pos<pos2){
type="=";
}else{
type="&";
pos=pos2;
}
tmpstr=url.substr(0,pos);
url=url.substr(pos+1,url.npos);
escapedstr=curl_easy_escape(curl,tmpstr.c_str(),tmpstr.length());
ret.append(escapedstr);
ret.append(type);
curl_free(escapedstr);
}while(true);
escapedstr=curl_easy_escape(curl,url.c_str(),url.length());
ret.append(escapedstr);
curl_free(escapedstr);
return ret;
}
note that this function is based on guessing, and is not by any means foolproof. i suppose the guessing could improved with a dictionary for your target language or something, though.. but your time would probably be better spent on fixing the bug causing you to receive malformed urls in your program in the first place.
i deliberately omitted error checking because i'm lazy. curl_easy_escape can fail (out of memory), and when it does, it returns a nullptr. you should fix that before the code enters production, i'm too lazy.
you should put those curl_free's in a finally{} block, else you may encounter memory leaks if the string functions throw exceptions (like substr may throw bad_alloc exceptions), but again, i'm too lazy to fix it.

this is why we have curl_easy_escape.
char *escaped_string=curl_easy_escape(ch,"kraków",0);
(however, when the string is known at compile time, you could hardcode the encoded version instead of encoding it at runtime, in this case, the hardcoded version is krak%C3%B3w - your browser's javascript console can be used to figure that out, just write encodeURIComponent("kraków"); to see what the urlencoded version looks like)
gotchas:
when the 3rd paramater is 0, curl use strlen() to determine the size. this is safe when using utf8 text, but not safe with binary data. if you're encoding binary data, make sure to specify the length manually, as strlen() will stop once it finds a null byte. (other than that, curl_easy_escape, and urlencoded data is binary safe)
don't forget to curl_free(escaped_string); when you're done with it, else you'll end up with memory leaks.

Related

Joining a variable inside a JSON formatted post request C++

So I have this put request which submits to a service running on localhost.
Doing it the way below works just fine, note I have replaced the actual acc name with ACC_NAME and password with ACC_PASSWORD.
curl_easy_setopt(curl_account_login, CURLOPT_POSTFIELDS, "{\r\n \"username\": \ACC_NAME\,\r\n \"password\": \"ACC_PASSWORD\,\r\n \"persistLogin\": false\r\n}\r\n");
However when I wanted to pass in a variable containing the acc_name and acc_password, it does not work, I get an error response from server.
The below request is using the variable joined inside the JSON string, which gives me the error response.
curl_easy_setopt(curl_account_login, CURLOPT_POSTFIELDS, "{\r\n \"username\": \""+acc_name+"\,\r\n \"password\": \""+acc_password+"\,\r\n \"persistLogin\": false\r\n}\r\n");
I can't figure out what I am doing wrong when I am joining the string variable into the request.
It works just fine by plain, if I write the account credentials directly into the request and not in a variable.
Regards
C-style string literals (like "something") are of type const char [] (this is a null-terminated character array), which decays into a const char* (a pointer). Thus, + (as in "something" + "another") is just adding two addresses, resulting in an invalid pointer value. You cannot concatenate C-style strings by simply using + that way.
Assuming you are using C++ (not C), as indicated by your question being tagged c++, I suggest C++ string objects instead, this will allow easy concatenation.
Also, there's a note in curl documentation about CURLOPT_POSTFIELDS, mentioning that the data pointed to is not copied, thus requiring you to make sure the pointer remains valid until the associated transfer finishes. Because of that, I would prefer using CURLOPT_COPYPOSTFIELDS instead.
To sum up, do something like this:
#include <string>
// ...
std::string postfields
{
std::string{ "{" }
+ R"("username":")" + ACC_NAME
+ R"(","password":")" + ACC_PASSWORD
+ R"(","persistLogin":false})"
};
curl_easy_setopt(curl_account_login, CURLOPT_COPYPOSTFIELDS, postfields.data());

How do I get warn()'s output into a string?

I'm using the non-standard function warn() (provided by BSD) to output an error message if a file can't be opened, like so:
std::string path = get_path() ;
std::ifstream file(path) ;
if (file.is_open()) { /* do something */ }
else {
warn("%s", path.c_str()) ;
// uses errno to figure out what the error was and outputs it nicely along with the filename
}
That's all very well for outputting it, but what if I want to use the entire string somewhere else, in addition to printing it? The warn() functions don't seem to have a form that writes the error to a string. I've tried rolling my own, but it seems awfully cumbersome in comparison (besides not getting the program's name):
this->foo((boost::format("%s: %s") % path % strerror(errno)).str()) ;
So how do I get warn()'s output as a string?
warn puts its output on the standard error output. So you would have to create a mechanism to redirect standard error output to a location that you can read back into a string. The most straight forward way may be to redirect standard error to a file, and then read the file back as a string. You could, for instance, try to use dup2() to accomplish this (as explained in the answer to this question).
However, wrapping your own version of warn is probably a better choice. You may consider the C vsnprintf() function to implement it, though. There are answers to this question that address both using boost::format and vsnprintf().
You're right — there's no sprintf analog (i.e. that is, no hypothetical swarn function).
Your approach seems viable.
It would appear that your gyrations produce a result similar to:
path + ": " + strerror(errno);
At a guess, the "program's name" that it's including is probably just argv[0], so you could apparently produce a roughly equivalent of your warn that just returns a std::string with something on this general order:
std::string warn_s(std::string const &path) {
char *pname = strrchr(argv[0], '/');
if (pname == NULL)
pname = argv[0];
return path + pname + ": " + strerror(errno);
}
The major difficulty here is that argv is local to main, so you'll probably need to either save it into an accessible location in main, or else use some non-standard mechanism to re-retrieve that data in your function.
Unfortunately, the documentation for warn I was able to find was poor enough that a bit of testing/trial and error will probably be needed if you want to duplicate its output precisely.

Curlpp, incomplete data from request

I am using Curlpp to send requests to various webservices to send and receive data.
So far this has worked fine since i have only used it for sending/receiving JSON data.
Now i have a situation where a webservice returns a zip file in binary form. This is where i encountered a problem where the data received is not complete.
I first had Curl set to write any data to a ostringstream by using the option WriteStream, but this proved not to be the correct approach since the data contained null characters, and thus the data stopped at the first null char.
After that, instead of using WriteStream i used WriteFunction with a callback function.
The problem in this case is that this function is always called 2 or 3 times, regardless of the amount of data.
This results in always having a few chunks of data that don't seem to be the first part of the file, although the data always contains PK as the first 2 characters, indicating a zip file.
I used several tools to verify that the data is entirely being sent to my application so this is not a problem of the webservice.
Here the code. Do note that the options like hostname, port, headers and postfields are set elsewhere.
string requestData;
size_t WriteStringCallback(char* ptr, size_t size, size_t nmemb)
{
requestData += ptr;
int totalSize= size*nmemb;
return totalSize;
}
const string CurlRequest::Perform()
{
curlpp::options::WriteFunction wf(WriteStringCallback);
this->request.setOpt( wf );
this->request.perform();
return requestData;
}
I hope anyone can help me out with this issue because i've run dry of any leads on how to fix this, also because curlpp is poorly documented(and even worse since the curlpp website disappeared).
The problem with the code is that the data is put into a std::string, despite having the data in binary (ZIP) format. I'd recommend to put the data into a stream (or a binary array).
You can also register a callback to retrieve the response headers and act in the WriteCallback according to the "Content-type".
curlpp::options::HeaderFunction to register a callback to retrieve response-headers.
std::string is not a problem, but the concatenation is:
requestData += ptr;
C string (ptr) is terminated with zero, if the input contains any zero bytes, the input will be truncated. You should wrap it into a string which knows the length of its data:
requestData += std::string(ptr, size*nmemb);

Read text file step-by-step

I have a file which has text like this:
#1#14#ADEADE#CAH0F#0#0.....
I need to create a code that will find text that follows # symbol, store it to variable and then writes it to file WITHOUT # symbol, but with a space before. So from previous code I will get:
1 14 ADEADE CAH0F 0 0......
I first tried to did it in Python, but files are really big and it takes a really huge time to process file, so I decided to write this part in C++. However, I know nothing about C++ regex, and I'm looking for help. Could you, please, recommend me an easy regex library (I don't know C++ very well) or the well-documented one? It would be even better, if you provide a small example (I know how to perform transmission to file, using fstream, but I need help with how to read file as I said before).
This looks like a job for std::locale and his trusty sidekick imbue:
#include <locale>
#include <iostream>
struct hash_is_space : std::ctype<char> {
hash_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc['#'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new hash_is_space));
string word;
while(cin >> word) {
std::cout << word << " ";
}
std::cout << "\n";
}
IMO, C++ is not the best choice for your task. But if you have to do it in C++ I would suggest you have a look at Boost.Regex, part of the Boost library.
If you are on Unix, a simple sed 's/#/ /' <infile >outfile would suffice.
Sed stands for 'stream editor' (and supports regexes! whoo!), so it would be well-suited for the performance that you are looking for.
Alright, I'm just going to make this an answer instead of a comment. Don't use regex. It's almost certainly overkill for this task. I'm a little rusty with C++, so I'll not post any ugly code, but essentially what you could do is parse the file one character at a time, putting anything that wasn't a # into a buffer, then writing it out to the output file along with a space when you do hit a #. In C# at least two really easy methods for solving this come to mind:
StreamReader fileReader = new StreamReader(new FileStream("myFile.txt"),
FileMode.Open);
string fileContents = fileReader.ReadToEnd();
string outFileContents = fileContents.Replace("#", " ");
StreamWriter outFileWriter = new StreamWriter(new FileStream("outFile.txt"),
Encoding.UTF8);
outFileWriter.Write(outFileContents);
outFileWriter.Flush();
Alternatively, you could replace
string outFileContents = fileContents.Replace("#", " ");
With
StringBuilder outFileContents = new StringBuilder();
string[] parts = fileContents.Split("#");
foreach (string part in parts)
{
outFileContents.Append(part);
outFileContents.Append(" ");
}
I'm not saying you should do it either of these ways or my suggested method for C++, nor that any of these methods are ideal - I'm just pointing out here that there are many many ways to parse strings. Regex is awesome and powerful and may even save the day in extreme circumstances, but it's not the only way to parse text, and may even destroy the world if used for the wrong thing. Really.
If you insist on using regex (or are forced to, as in for a homework assignment), then I suggest you listen to Chris and use Boost.Regex. Alternatively, I understand Boost has a good string library as well if you'd like to try something else. Just look out for Cthulhu if you do use regex.
You've left out one crucial point: if you have two (or more) consecutive #s in the input, should they turn into one space, or the same number of spaces are there are #s?
If you want to turn the entire string into a single space, then #Rob's solution should work quite nicely.
If you want each # turned into a space, then it's probably easiest to just write C-style code:
#include <stdio.h>
int main() {
int ch;
while (EOF!=(ch=getchar()))
if (ch == '#')
putchar(' ');
else
putchar(ch);
return 0;
}
So, you want to replace each ONE character '#' with ONE character ' ' , right ?
Then it's easy to do since you can replace any portion of the file with string of exactly the same length without perturbating the organisation of the file.
Repeating such a replacement allows to make transformation of the file chunk by chunk; so you avoid to read all the file in memory, which is problematic when the file is very big.
Here's the code in Python 2.7 .
Maybe, the replacement chunk by chunk will be unsifficient to make it faster and you'll have a hard time to write the same in C++. But in general, when I proposed such codes, it has increased the execution's time satisfactorily.
def treat_file(file_path, chunk_size):
from os import fsync
from os.path import getsize
file_size = getsize(file_path)
with open(file_path,'rb+') as g:
fd = g.fileno() # file descriptor, it's an integer
while True:
x = g.read(chunk_size)
g.seek(- len(x),1)
g.write(x.replace('#',' '))
g.flush()
fsync(fd)
if g.tell() == file_size:
break
Comments:
open(file_path,'rb+')
it's absolutely obligatory to open the file in binary mode 'b' to control precisely the positions and movements of the file's pointer;
mode '+' is to be able to read AND write in the file
fd = g.fileno()
file descriptor, it's an integer
x = g.read(chunk_size)
reads a chunk of size chunk_size . It would be tricky to give it the size of the reading buffer, but I don't know how to find this buffer's size. Hence a good idea is to give it a power of 2 value.
g.seek(- len(x),1)
the file's pointer is moved back to the position from which the reading of the chunk has just been made. It must be len(x), not chunk_size because the last chunk read is in general less long than chink_size
g.write(x.replace('#',' '))
writes on the same length with the modified chunk
g.flush()
fsync(fd)
these two instructions force the writing, otherwise the modified chunk could remain in the writing buffer and written at uncontrolled moment
if g.tell() >= file_size: break
after the reading of the last portion of file , whatever is its length (less or equal to chunk_size), the file's pointer is at the maximum position of the file, that is to say file_size and the program must stop
.
In case you would like to replace several consecutive '###...' with only one, the code is easily modifiable to respect this requirement, since writing a shortened chunk doesn't erase characters still unread more far in the file. It only needs 2 files's pointers.

String issue with assert on erase

I am developing a program in C++, using the string container , as in std::string to store network data from the socket (this is peachy), I receive the data in a maximum possible 1452 byte frame at a time, the protocol uses a header that contains information about the data area portion of the packets length, and header is a fixed 20 byte length. My problem is that a string is giving me an unknown debug assertion, as in , it asserts , but I get NO message about the string. Now considering I can receive more than a single packet in a frame at a any time, I place all received data into the string , reinterpret_cast to my data struct, calculate the total length of the packet, then copy the data portion of the packet into a string for regex processing, At this point i do a string.erase, as in mybuff.Erase(totalPackLen); <~ THIS is whats calling the assert, but totalpacklen is less than the strings size.
Is there some convention I am missing here? Or is it that the std::string really is an inappropriate choice here? Ty.
Fixed it on my own. Rolled my own VERY simple buffer with a few C calls :)
int ret = recv(socket,m_buff,0);
if(ret > 0)
{
BigBuff.append(m_buff,ret);
while(BigBuff.size() > 16){
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
if(ntohs(hdr->PackLen) <= BigBuff.size() - 20){
hdr->PackLen = ntohs(hdr->PackLen);
string lData;
lData.append(BigBuff.begin() + 20,BigBuff.begin() + 20 + hdr->PackLen);
Parse(lData); //regex parsing helper function
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
}
}
}
From the code snippet you provided it appears that your packet comprises a fixed-length binary header followed by a variable length ASCII string as a payload. Your first mistake is here:
BigBuff.append(m_buff,ret);
There are at least two problems here:
1. Why the append? You presumably have dispatched with any previous messages. You should be starting with a clean slate.
2. Mixing binary and string data can work, but more often than not it doesn't. It is usually better to keep the binary and ASCII data separate. Don't use std::string for non-string data.
Append adds data to the end of the string. The very next statement after the append is a test for a length of 16, which says to me that you should have started fresh. In the same vein you do that reinterpret cast from BigBuff[0]:
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
Because of your use of append, you are perpetually dealing with the header from the first packet received rather than the current packet. Finally, there's that erase:
BigBuff.erase(hdr->PackLen + 20);
Many problems here:
- If the packet length and the return value from recv are consistent the very first call will do nothing (the erase is at but not past the end of the string).
- There is something very wrong if the packet length and the return value from recv are not consistent. It might mean, for example, that multiple physical frames are needed to form a single logical frame, and that in turn means you need to go back to square one.
- Suppose the physical and logical frames are one and the same, you're still going about this all wrong. As noted, the first time around you are erasing exactly nothing. That append at the start of the loop is exactly what you don't want to do.
Serialization oftentimes is a low-level concept and is best treated as such.
Your comment doesn't make sense:
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
BigBuff.erase(hdr->PackLen + 20) will erase from hdr->PackLen + 20 onwards till the end of the string. From the description of the code - seems to me that you're erasing beyond the end of the content data. Here's the reference for std::string::erase() for you.
Needless to say that std::string is entirely inappropriate here, it should be std::vector.