C++ regular expression - c++

I need to capture data from a variable using a regular expression. The data are available on the form: Ip=8.8.8.8&probe=ip/tcp{dst=53}
for example.
To achieve this I`m using:
char *data;
data = getenv("QUERY_STRING");
char ipt[40];
char probe[40];
sscanf(data,"ip=%[0-9a-zA-Z-.]&probe=%[0-9a-zA-Z-.{}/=]",ipt,probe);
The second field will always contain a / but I can't get this and the other special carachters ({}=)
What can I do?
I already tried:
sscanf(data,"ip=%[0-9a-zA-Z-.]&probe=%[(...)]",ipt,probe);
And had no success as well.

Given that you know the probe part ends with } and the IP part end with a &, it's probably easiest to just scan for those:
sscanf(input, "Ip=%[^&]&probe=%[^}]", ipt, probe);
One minor detail: scanf with either a scanset or a %s conversion needs to have the buffer size specified to have any safety at all. without a length, both are pretty much equivalent to gets for lack of safety, so you really want something like:
char ipt[256], probe[256];
sscanf(input, "Ip=%255[^&]&probe=%255[^}]", ipt, probe);
Also note that this will give you the probe part without the trailing }. If you really need it, you can use something like strncat to add it back on afterwards though.
For those looking on: no, scanf (and company) don't support full regular expressions, but they do support scansets, which is what he's using here.

Related

How can I replicate compile time hex string interpretation at run time!? c++

In my code the following line gives me data that performs the task its meant for:
const char *key = "\xf1`\xf8\a\\\x9cT\x82z\x18\x5\xb9\xbc\x80\xca\x15";
The problem is that it gets converted at compile time according to rules that I don't fully understand. How does "\x" work in a String?
What I'd like to do is to get the same result but from a string exactly like that fed in at run time. I have tried a lot of things and looked for answers but none that match closely enough for me to be able to apply.
I understand that \x denotes a hex number. But I don't know in which form that gets 'baked out' by the compiler (gcc).
What does that ` translate into?
Does the "\a" do something similar to "\x"?
This is indeed provided by the compiler, but this part is not member of the standard library. That means that you are left with 3 ways:
dynamically write a C++ source file containing the string, and writing it on its standard output. Compile it and (providing popen is available) execute it from your main program and read its input. Pretty ugly isn't it...
use the source of an existing compiler, or directly its internal libraries. Clang is probably a good starting point because it has been designed to be modular. But it could require a good amount of work to find where that damned specific point is coded and how to use that...
just mimic what the compiler does, and write your own parser by hand. It is not that hard, and will learn you why tests are useful...
If it was not clear until here, I strongly urge you to use the third way ;-)
If you want to translate "escape" codes in strings that you get as input at run-time then you need to do it yourself, explicitly.
One way is to read the input into one string. Then copy the characters from that source string into a new destination string, one by one. If you see a backslash then you discard it, fetch the next character, and if it's an x you can use e.g. std::stoi to convert the next few characters into its corresponding integer value, and append that number to the destination string (either adding it with std::to_string, or using output string streams and the normal "output" operator <<).

Convert string to short in C++

So I've looked around for how to convert a string to a short and found a lot on how to convert a string to an integer. I would leave a question as a comment on those threads, but I don't have enough reputation. So, what I want to do is convert a string to a short, because the number should never go above three or below zero and shorts save memory (as far as I'm aware).
To be clear, I'm not referring to ASCII codes.
Another thing I want to be able to do is to check if the conversion of the string to the short fails, because I'll be using a string which consists of a users input.
I know I can do this with a while loop, but if there's a built in function to do this in C++ that would be just as, or more, efficient than a while loop, I would love to hear about it.
Basically, an std::stos function is missing for unknown reasons, but you can easily roll your own. Use std::stoi to convert to int, check value against short boundaries given by e.g. std::numeric_limits<short>, throw std::range_error if it's not in range, otherwise return that value. There.
If you already have the Boost library installed you might use boost::lexical_cast for convenience, but otherwise I would avoid it (mainly for the verbosity and library dependency, and it's also a little inefficient).
Earlier boost::lexical_cast was known for not being very efficient, I believe because it was based internally on stringstreams, but as reported in comments here the modern version is faster than conversion via stringstream, and for that matter than via scanf.
An efficient way is to use boost::lexical_cast:
short myShort = boost::lexical_cast<short>(myString);
You will need to install boost library and the following include: #include <boost/lexical_cast.hpp>
You should catch bad_lexical_cast in case the cast fails:
try
{
short myShort = boost::lexical_cast<short>(myString);
}
catch(bad_lexical_cast &)
{
// Do something
}
You can also use ssprintf with the %hi format specifier.
Example:
short port;
char szPort[] = "80";
sscanf(szPort, "%hi", &port);
the number should never go above three or below zero
If you really really need to save memory, then this will also fit in a char (regardless whether char is signed or unsigned).
Another 'extreme' trick: if you can trust there are no weird things like "002" then what you have is a single character string. If that is the case, and you really really need performance, try:
char result = (char)( *ptr_c_string - '0' );

How to fix fprintf vulnerability?

In my code I used fprintf. I used flawfinder to check the code for vulnerabilities and I got that:
358: [4] (format) fprintf: If format strings can be influenced by
an attacker, they can be exploited. Use a constant for the format
specification.
Can someone explain to me what Use a constant for the format specification actually means? Is there any safe version of fprintf?
The problem is that fprintf determines how many arguments it should get by examining the format string. If the format string doesn't agree with the actual arguments, you have undefined behavior which can manifest as a security vulnerability.
The problem is particularly bad if the string supplied can be influenced by the user of your program, because he can then specifically design the string to make your program do bad things.
There is no safe version of fprintf in the C standard. C++ streams avoid the problem, at the cost of not having format strings and using a far more verbose syntax for specifying formatting options.
A constant string, as in a string literal.
Like in
fprintf(someFile, "%s", someStringVariable);
and not like
fprintf(someFile, someStringVariable);
It means it wants you to write:
fprintf(out, "foo %s", some_string);
instead of what you have, which I guess is something like:
const char *format = "foo %s";
/* some time later */
fprintf(out, format, some_string);
The reason is that it's worried format might come from user input or something, and a malicious user could supply a format foo %s%s%s in order to provoke undefined behavior that they may be able to exploit.
Obviously if you're choosing between n different format strings, all of which are string literals in your code and all use the same format specifiers, but you choose which one at runtime, then following this advice is a bit awkward and wouldn't make your code any safer. But you could have n functions instead of n strings, and each function calls fprintf with a different string literal.
If you're reading the format string out of a config file (which is one fairly crude way of implementing internationalization from scratch) then you're basically out of luck. The linter doesn't trust your translator to use the right format codes for the arguments supplied to the call. And arguably neither should you :-)

c++ best way to call function with const char* parameter type

what is the best way to call a function with the following declaration
string Extract(const char* pattern,const char* input);
i use
string str=Extract("something","input text");
is there a problem with this usage
should i use the following
char pattern[]="something";
char input[]="input";
//or use pointers with new operator and copy then free?
the both works but i like the first one but i want to know the best practice.
A literal string (e.g. "something") works just fine as a const char* argument to a function call.
The first method, i.e. passing them literally in, is usually preferable.
There are occasions though where you don't want your strings hard-coded into the text. In some ways you can say that, a bit like magic numbers, they are magic words / phrases. So you prefer to use constant identifier to store the values and pass those in instead.
This would happen often when:
1. a word has a special meaning, and is passed in many times in the code to have that meaning.
or
2. the word may be cryptic in some way and a constant identifier may be more descriptive
Unless you plain to have duplicates of the same strings, or alter those strings, I'm a fan of the first way (passing the literals directly), it means less dotting about code to find what the parameters actually are, it also means less work in passing parameters.
Seeing as this is tagged for C++, passing the literals directly allows you to easily switch the function parameters to std::string with little effort.

remove escape characters from a char

I've been working with this for about 2 days now. I'm stuck, with a rather simple annoyance, but I'm not capable of solving it.
My programs basicly recieves a TCP connection from a PHP script. And the message which is send is stored in char buffer[1024];.
Okay this buffer variable contains an unique key, which is being compared to a char key[1024] = "supersecretkey123";
The problem itself is that these two does not equal - no matter what I do.
I've been printing the buffer and key variable out just above eachother and by the look they are 100% identical. However my equalisation test still fails.
if(key == buffer) { // do some thing here etc }
So then I started searching the internet for some information on what could be wrong. I later realized that it might be some escape characters annoying me. But I'm not capable of printing them, removing them or even making sure they are there. So that's why I'm stuck - out of ideas on how to make these equal when the buffer variable matches the key variable.
Well the key does not chance, unless the declaration of the key is modified manually. The program itself is recieving the information and sending back information "correctly".
Thanks.
If you're using null terminated strings use proper api - strcmp and its variants.
Additionally size in declaration char key[1024] = "supersecretkey123"; is not needed - either compiler will reduced it or stack/heap memory will be wasted.
If you are using C++ use std::string instead of char []. You cannot compare two char [] in way you try to do this (they are pointers to memory), but it's possible with std::string.
If it's somehow mandatory to use char[] in your case, use strcmp.
Try with if(!strncmp(key,buffer,1024)). See this reference on strncmp.