How to find in my program a "const char* + int" expression - c++

I'm in a source code migration and the converter program did not convert concatenation of embedded strings with integers. Now I have lots of code with this kind of expressions:
f("some text" + i);
Since C/C++ will interpret this as an array subscript, f will receive "some text", or "ome text", or "me text"...
My source language converts the concatenation of an string with an int as an string concatenation. Now I need to go line by line through the source code and change, by hand, the previous expression to:
f("some text" + std::to_string(i));
The conversion program managed to convert local "String" variables to "std::string", resulting in expressions:
std::string some_str = ...;
int i = ...;
f(some_str + i);
Those were easy to fix because with such expressions the C++ compiler outputs an error.
Is there any tool to find automatically such expressions on source code?

Easy! Just replace all the + with -&:
find . -name '*.cpp' -print0 | xargs -0 sed -i '' 's/+/-\&/g'
When trying to compile your project you will see, between other errors, something like this:
foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
return f(s -& i);
~ ^~~~
(I'm using clang, but other compilers should issue similar errors)
So you just have to filter the compiler output to keep only those errors:
clang++ foo.cpp 2>&1 | grep -F "error: 'const char *' and 'int *' are not pointers to compatible types"
And you get:
foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
foo.cpp:18:10: error: 'const char *' and 'int *' are not pointers to compatible types

You can try flint, an open-source lint program for C++ developed and used at Facebook. It has blacklisted token sequences feature (checkBlacklistedSequences). You can add your token sequence to the checkBlacklistedSequences function and flint will report them.
in checkBlacklistedSequences function, I added the sequence string_literal + number
BlacklistEntry([tk!"string_literal", tk!"+", tk!"number"],
"string_literal + number problem!\n",
true),
then compile and test
$ cat -n test.cpp
1 #include <iostream>
2 #include <string>
3
4 using namespace std;
5
6 void f(string str)
7 {
8 cout << str << endl;
9 }
10
11 int main(int argc, char *argv[])
12 {
13 f("Hello World" + 2);
14
15 f("Hello World" + std::to_string(2));
16
17 f("Hello World" + 2);
18
19 return 0;
20 }
$ ./flint test.cpp
test.cpp(13): Warning: string_literal + number problem!
test.cpp(17): Warning: string_literal + number problem!
flint has two versions (old version developed in C++ and new version in D language), I made my changes in D version.

I'm not familiar with a lot of tools which can do that, but I think grep can be helpful in some measure.
In the root directory of your source code, try:
grep -rn '".\+"\s*+\s*' .
, which can find out all the files which containt a line like "xxxxx" +, hope this can help you find all the lines you need.
If all the integers are constant, you can alter the grep experssion as:
grep -rn '".\+"\s*+\s*[0-9]*' .
And you can also include the ( before the string constant:
grep -rn '(".\+"\s*+\s*[0-9]*' .
This may be not the "correct" answer, but I hope this can help you.

You may not need an external tool. Instead, you can take advantage of C++ one-user-defined-conversion rule. Basically, you need to change the argument of your f function from const char*/std::string to a type, that is implicitly convertible only from either a string literal (const char[size]) or an std::string instance (what you get when you add std::to_string in the expression).
#include <string>
#include <iostream>
struct string_proxy
{
std::string value;
string_proxy(const std::string& value) : value(value) {}
string_proxy(std::string&& value) : value(std::move(value)) {}
template <size_t size>
string_proxy(const char (&str)[size]) : value(str) {}
};
void f(string_proxy proxy)
{
std::cout << proxy.value << std::endl;
}
int main()
{
f("this works"); // const char[size]
f("this works too: " + std::to_string(10)); // std::string
f("compile error!" + 10); // const char*
return 0;
}
Note that this is not going to work on MSVC, at least not in 2012 version; it's likely a bug, since there are no warning emitted either. It works perfectly fine in g++ and clang (you can quickly check it here).

I've found a very simple way to detect this issue. Regular expression nor a lint won't match more complex expressions like the following:
f("Hello " + g(i));
What I need is to somehow do type inference, so I'm letting the compiler to do it. Using an std::string instead of a literal string raises an error, so I wrote a simple source code converter to translate all the string literals to the wrapped std::string version, like this:
f(std::string("Hello ") + g(i));
Then, after recompiling the project, I'd see all the errors. The source code is on GitHub, in 48 lines of Python code:
https://gist.github.com/alejolp/3a700e1730e0328c68de

If your case is exactly as
"some text in quotations" + a_numeric_variable_or_constant
then Powergrep or similar programs will let you to scan all files for
("[^"]+")\s*\+\s*(\w+)
and replace with
\1 + std::to_string(\2)
This will bring the possible matches to you but i strongly recommend first preview what you are replacing. Because this will also replace the string variables.
Regular expressions cannot understand the semantics of your code so they cannot be sure that if they are integers. For that you need a program with a parser like CDT or static code analyzers. But unfortunately i do not know any that can do that. So to sum i hope regex helps :)
PS: For the worst case if the variables are not numeric then compiler will give you error because to_string function doesn't accept anything than numeric values. May be later then you can manually replace only them which i can only hope won't be more.
PS 2: Some may think that Powergrep is expensive. You can use trial for 15 day with full functionality.

You can have a try at the Map-Reduce Clang plugin.
The tool was developped at Google to do just this kind of refactoring, mixing strong type-checking and regexp.
(see video presentation here ).

You can use C++ typecasting operator & create a new class which can overload the operator + to your need. You can replace the int to new class "Integer" & perform the required overloading. This requires no changes or word replacing in the main function invocation.
class Integer{
long i;
std::string formatted;
public:
Integer(int i){i = i;}
operator char*(){
return (char*)formatted.c_str();}
friend Integer operator +( char* input, Integer t);
};
Integer operator +( char* input, Integer integer) {
integer.formatted = input + std::to_string(integer.i);
return integer;
}
Integer i = ....
f("test" + i); //executes the overloaded operator

i'm assuming for function f(some_str + i); your definition should be like this
void f(std::string value)
{
// do something.
}
if you declare some other class like AdvString to implement Operator + for intergers. if your declare your function like this below code. it will work like this implementation f(some_str + i);
void f(AdvString value)
{
// do something.
}
sample implementation is here https://github.com/prasaathviki/advstring

Related

Why is appending an int to a std::string undefined behavior with no compiler warning in C++?

In my code I use logging statements in order to better see what's going on. Sometimes I write code like the following:
int i = 1337;
// More stuff...
logger->info("i has the following value: " + i);
When compiled and executed in debug mode this does not print out i as expected (this is how it would work in Java/C# for example), it rather prints something garbled. In release mode however this might as well crash the entire application. What does the C++ standard say about appending ints to a std::string like I'm doing here?
Why does the compiler not warn me at all when I compile code invoking obvious undefined behavior like this? Am I missing something? I'm using Visual Studio 2022 (MSVC). The correct way to do the logging statement would be converting the int to a std::string explicitly:
logger->info("i has the following value: " + std::to_string(i));
However this bug easily slips through during development. My warning level is set to Level4 (/W4).
The problem is that in
logger->info("i has the following value: " + i);
you are not working with std::string. You are adding an int to a string literal, ie a const char[] array. The const char[] decays into a const char* pointer in certain contexts. In this case, the int advances that pointer forward by 1337 characters, which is way beyond the end of the string literal, and therefore undefined behavior.
You should get a better compiler that warns you about this, ie:
foo.cc:7:42: warning: offset ‘1337’ outside bounds of constant string [-Warray-bounds]
7 | foo("i has the following value: " + i);
| ^
You can use a std::string literal like this:
#include <string>
using namespace std::literals;
void foo(std::string);
void bla() {
int i = 1337;
foo("i has the following value: "s + i);
}
and then you get a "nicer" error that "std::string + int" isn't a thing in C++:
foo.cc:8:40: error: no match for ‘operator+’ (operand types are ‘std::__cxx11::basic_string<char>’ and ‘int’)
8 | foo("i has the following value: "s + i);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~
| | |
| std::__cxx11::basic_string<char> int
...
going on for 147 lines
After this, it should be obvious that what you want is this instead:
logger->info("i has the following value: "s + std::to_string(i));
Using std::string literals avoids mistakes like this, because it turns warnings (which your compiler doesn't even give) into hard errors, forcing you to write correct code. So I recommend using the s suffix for all strings.
This line is correct,
logger->info("i has the following value: " + i);
in the expression
"i has the following value: " + i
there is used the pointer arithmetic.
For example if you will write
logger->info("i has the following value: " + 6);
then this line has the same effect if to write
logger->info("the following value: ");
That is this line
logger->info("i has the following value: " + i);
is equivalent to the line
logger->info( &"i has the following value: "[i]);
What does the C++ standard say about appending ints to a std::string
like I'm doing here
In the expression there is no object of the type std::string. There is used a string literal that has just an ordinary array type that is an operand of an expression with the pointer arithmetic. In the expression the string literal is implicitly converted to a pointer to its first element of the type const char *.

how adding string and numbers in cpp works using + operator?

I've used cpp for quite a while, I was known that we cannot add string and numbers(as + operator is not overloaded for that). But , I saw a code like this.
#include <iostream>
using namespace std;
int main() {
string a = "";
a += 97;
cout << a;
}
this outputs 'a' and I also tried this.
string a ="";
a=a+97;
The second code gives a compilation error(as invalid args to + operator, std::string and int).
I don't want to concatenate the string and number.
What is the difference? Why does one work but not the other?
I was expecting that a+=97 is the same as a=a+97 but it appears to be different.
The first snippet works because std::string overrides operator+= to append a character to a string. 97 is the ASCII code for 'a', so the result is "a".
The second snippet does not work because there is no + operator defined that accepts a std::string and an int, and no conversion constructor to make a std::string out of an int or char. There two overloads of the + operator that take a char, but the compiler cannot tell which one to use. The match is ambiguous, so an error is reported.

c++ error: no matching function for call to ‘std::__cxx11::basic_string<char>::append<int>(int, int)’

Trying to run this:
// appending to string
#include <iostream>
#include <string>
int
main()
{
std::string str;
std::string str2 = "Writing ";
std::string str3 = "print 10 and then 5 more";
// used in the same order as described above:
str.append(str2); // "Writing "
str.append(str3, 6, 3); // "10 "
str.append("dots are cool", 5); // "dots "
str.append("here: "); // "here: "
str.append(10u, '.'); // ".........."
str.append(str3.begin() + 8, str3.end()); // " and then 5 more"
str.append<int>(5, 0x2E); // "....."
std::cout << str << '\n';
return 0;
}
But having error on str.append(5,0x2E):
error: no matching function for call to ‘std::__cxx11::basic_string::append(int, int)’
Using VS Code 1.43.1, running on ubuntu 19.10, gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
I've tried to run the code on Code::Blocks 16.01 IDE, and windows, but had same error.
You need to convert 0x2E (integer) to char: char(0x2E) first.
str.append<int>(5,char(0x2E));
When having problems with the standard template library, you can always take a look at the c++ reference of the function you're using: http://www.cplusplus.com/reference/string/string/append/
In this case, there isn't any reason to specify the after your append: When doing this, both arguments get interpreted as an int, while you want the second to be interpreted as a char. You can simply achieve this by doing str.append(5,0x2E);. Your compiler will search the closest matching function, which is string& append (size_t n, char c); and implicitly convert the second argument to a char.
You just don't need the <int> part. str.append(5, 0x2E); compiles fine.
There is no variant of std::string::append() that takes two integers - you should make the second parameter a character, as there is a variant that takes an integer and a character.
In addition, by using <int>, you change the templated character type charT into an integer rather than a character, which is probably not going to work the way you expect. A std::string is generally defined as std::basic_string<char> so appending with .append<int> is going to have weird effects on the underlying memory at best.
Since you're wanting to add five more . characters, I'm not sure why you wouldn't just do:
str.append(5, '.');

Variable in Parentheses C++

I am currently in the process of writing up a C++ program that randomly chooses roles for people using file input/output.
I am almost done, and I build often to make sure my code is working and not psuedocode. I received an error on my snippet of code -
randomPrefs.open ("Preferences/"members[random]"-Preferences");
I am trying to access the text file in Preferences/foo-Preferences, and the variable is made random by some code above it. I have couted the random snippet and it works perfectly, so I need not include it here. The error I get is :
Avalon - Omnipotent.cpp:61:21: error: unable to find string literal operator 'operator""members' with 'const char [13]', 'unsigned int' arguments
And so, I have searched around for this error but have found nothing. I thought of making a mini-parentheses around it, and it resulted in a different error -
Avalon - Omnipotent.cpp:61:51: error: expression cannot be used as a function
Any help would be appreciated.
A little note down here, when not having the parentheses around it, I get a warning about my variable not being used -
Avalon - Omnipotent.cpp:39:21: warning: unused variable 'members' [-Wunused-variable]
However, the second error does not give a warning about the unused variable.
Hey is what my variable looks like:
unsigned const char members[22] =
And I assigned the value "random" which selects a random number from 0 - 21 and I assign the number generated to value random, and declare the variable as members[random]. It works perfectly, I just need help with these errors.
Help!
To concatenate strings, do the following:
std::string s = std::string("Preferences/") + members[random] + "-Preferences";
randomPrefs.open(s);
If you don't want the intermediate named variable, then:
randomPrefs.open(std::string("Preferences/") + members[random] + "-Preferences");
If members doesn't contain characters like 'A', 'B', 'C', or '4', and instead contains the number 4, 28, or 153, then you can convert the number to the appropriate string by using std::to_string.
std::string s = std::string("Preferences/") + std::to_string(members[random]) + "-Preferences");
The warning about the unused variable isn't useful, and is due to the compiler seeing earlier errors in your code. If you fix the above, that should also go away.
If you're trying to do string concatenation, it is probably best to use itoa() -> std::string(const char*) or to_string() for the number to string, and use operator+() or std::string.append() to do concatenation.
Note that to_string() is C++11:
http://coliru.stacked-crooked.com/a/eb3677d7abffca00
#include <iostream>
#include <string>
int main() {
std::string str1 = "Hello ";
int i = 2;
std::string str2 = " World!";
std::string output = "";
output.append(str1).append(std::to_string(i)).append(str2);
std::cout << output << std::endl;
return 0;
}
#YoBro: HERE! – Bill Lynch 7 mins ago
Modelling a bit of my code after his made it work!
std::string s = std::string("Preferences/") + std::to_string(members[random]) + "-Preferences";
randomPrefs.open(s);
I now use something similar to that.

How to cleanly use: const char* and std::string?

tl:dr
How can I concatenate const char* with std::string, neatly and
elegantly, without multiple function calls. Ideally in one function
call and have the output be a const char*. Is this impossible, what
is an optimum solution?
Initial Problem
The biggest barrier I have experienced with C++ so far is how it handles strings. In my opinion, of all the widely used languages, it handles strings the most poorly. I've seen other questions similar to this that either have an answer saying "use std::string" or simply point out that one of the options is going to be best for your situation.
However this is useless advice when trying to use strings dynamically like how they are used in other languages. I cannot guaranty to always be able to use std::string and for the times when I have to use const char* I hit the obvious wall of "it's constant, you can't concatenate it".
Every solution to any string manipulation problem I've seen in C++ requires repetitive multiple lines of code that only work well for that format of string.
I want to be able to concatenate any set of characters with the + symbol or make use of a simple format() function just how I can in C# or Python. Why is there no easy option?
Current Situation
Standard Output
I'm writing a DLL and so far I've been output text to cout via the << operator. Everything has been going fine so far using simple char arrays in the form:
cout << "Hello world!"
Runtime Strings
Now it comes to the point where I want to construct a string at runtime and store it with a class, this class will hold a string that reports on some errors so that they can be picked up by other classes and maybe sent to cout later, the string will be set by the function SetReport(const char* report). So I really don't want to use more than one line for this so I go ahead and write something like:
SetReport("Failure in " + __FUNCTION__ + ": foobar was " + foobar + "\n"); // __FUNCTION__ gets the name of the current function, foobar is some variable
Immediately of course I get:
expression must have integral or unscoped enum type and...
'+': cannot add two pointers
Ugly Strings
Right. So I'm trying to add two or more const char*s together and this just isn't an option. So I find that the main suggestion here is to use std::string, sort of weird that typing "Hello world!" doesn't just give you one of those in the first place but let's give it a go:
SetReport(std::string("Failure in ") + std::string(__FUNCTION__) + std::string(": foobar was ") + std::to_string(foobar) + std::string("\n"));
Brilliant! It works! But look how ugly that is!! That's some of the ugliest code I've every seen. We can simplify to this:
SetReport(std::string("Failure in ") + __FUNCTION__ + ": foobar was " + std::to_string(foobar) + "\n");
Still possibly the worst way I've every encounter of getting to a simple one line string concatenation but everything should be fine now right?
Convert Back To Constant
Well no, if you're working on a DLL, something that I tend to do a lot because I like to unit test so I need my C++ code to be imported by the unit test library, you will find that when you try to set that report string to a member variable of a class as a std::string the compiler throws a warning saying:
warning C4251: class 'std::basic_string<_Elem,_Traits,_Alloc>' needs to have dll-interface to be used by clients of class'
The only real solution to this problem that I've found other than "ignore the warning"(bad practice!) is to use const char* for the member variable rather than std::string but this is not really a solution, because now you have to convert your ugly concatenated (but dynamic) string back to the const char array you need. But you can't just tag .c_str() on the end (even though why would you want to because this concatenation is becoming more ridiculous by the second?) you have to make sure that std::string doesn't clean up your newly constructed string and leave you with garbage. So you have to do this inside the function that receives the string:
const std::string constString = (input);
m_constChar = constString.c_str();
Which is insane. Because now I traipsed across several different types of string, made my code ugly, added more lines than should need and all just to stick some characters together. Why is this so hard?
Solution?
So what's the solution? I feel that I should be able to make a function that concatenates const char*s together but also handle other object types such as std::string, int or double, I feel strongly that this should be capable in one line, and yet I'm unable to find any examples of it being achieved. Should I be working with char* rather than the constant variant, even though I've read that you should never change the value of char* so how would this help?
Are there any experienced C++ programmers who have resolved this issue and are now comfortable with C++ strings, what is your solution? Is there no solution? Is it impossible?
The standard way to build a string, formatting non-string types as strings, is a string stream
#include <sstream>
std::ostringstream ss;
ss << "Failure in " << __FUNCTION__ << ": foobar was " << foobar << "\n";
SetReport(ss.str());
If you do this often, you could write a variadic template to do that:
template <typename... Ts> std::string str(Ts&&...);
SetReport(str("Failure in ", __FUNCTION__, ": foobar was ", foobar, '\n'));
The implementation is left as an exercise for the reader.
In this particular case, string literals (including __FUNCTION__) can be concatenated by simply writing one after the other; and, assuming foobar is a std::string, that can be concatenated with string literals using +:
SetReport("Failure in " __FUNCTION__ ": foobar was " + foobar + "\n");
If foobar is a numeric type, you could use std::to_string(foobar) to convert it.
Plain string literals (e.g. "abc" and __FUNCTION__) and char const* do not support concatenation. These are just plain C-style char const[] and char const*.
Solutions are to use some string formatting facilities or libraries, such as:
std::string and concatenation using +. May involve too many unnecessary allocations, unless operator+ employs expression templates.
std::snprintf. This one does not allocate buffers for you and not type safe, so people end up creating wrappers for it.
std::stringstream. Ubiquitous and standard but its syntax is at best awkward.
boost::format. Type safe but reportedly slow.
cppformat. Reportedly modern and fast.
One of the simplest solution is to use an C++ empty string. Here I declare empty string variable named _ and used it in front of string concatenation. Make sure you always put it in the front.
#include <cstdio>
#include <string>
using namespace std;
string _ = "";
int main() {
char s[] = "chararray";
string result =
_ + "function name = [" + __FUNCTION__ + "] "
"and s is [" + s + "]\n";
printf( "%s", result.c_str() );
return 0;
}
Output:
function name = [main] and s is [chararray]
Regarding __FUNCTION__, I found that in Visual C++ it is a macro while in GCC it is a variable, so SetReport("Failure in " __FUNCTION__ "; foobar was " + foobar + "\n"); will only work on Visual C++. See: https://msdn.microsoft.com/en-us/library/b0084kay.aspx and https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
The solution using empty string variable above should work on both Visual C++ and GCC.
My Solution
I've continued to experiment with different things and I've got a solution which combines tivn's answer that involves making an empty string to help concatenate long std::string and character arrays together and a function of my own which allows single line copying of that std::string to a const char* which is safe to use when the string object leaves scope.
I would have used Mike Seymour's variadic templates but they don't seem to be supported by the Visual Studio 2012 I'm running and I need this solution to be very general so I can't rely on them.
Here is my solution:
Strings.h
#ifndef _STRINGS_H_
#define _STRINGS_H_
#include <string>
// tivn's empty string in the header file
extern const std::string _;
// My own version of .c_str() which produces a copy of the contents of the string input
const char* ToCString(std::string input);
#endif
Strings.cpp
#include "Strings.h"
const std::string str = "";
const char* ToCString(std::string input)
{
char* result = new char[input.length()+1];
strcpy_s(result, input.length()+1, input.c_str());
return result;
}
Usage
m_someMemberConstChar = ToCString(_ + "Hello, world! " + someDynamicValue);
I think this is pretty neat and works in most cases. Thank you everyone for helping me with this.
As of C++20, fmtlib has made its way into the ISO standard but, even on older iterations, you can still download and use it.
It gives similar capabilities as Python's str.format()(a), and your "ugly strings" example then becomes a relatively simple:
#include <fmt/format.h>
// Later on, where code is allowed (inside a function for example) ...
SetReport(fmt::format("Failure in {}: foobar was {}\n", __FUNCTION__, foobar));
It's much like the printf() family but with extensibility and type safety built in.
(a) But, unfortunately, not its string interpolation feature (use of f-strings), which has the added advantage of putting the expressions in the string at the place where they're output, something like:
set_report(f"Failure in {__FUNCTION__}: foobar was {foobar}\n");
If fmtlib ever got that capability, I'd probably wet my pants in excitement :-)