Boost xpressive regex results in garbage character - c++

I am trying to write some code that changes a string like "/path/file.extension" to another specified extension. I am trying to use boost::xpressive to do so. But, I am having problems. It appears that a garbage character appears in the output:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
std::string str( "xml.xml.xml.xml");
sregex date = sregex::compile( "(\\.*)(\\.xml)$");
std::string format( "\1.zipxml");
std::string str2 = regex_replace( str, date, format );
std::cout << "str = " << str << "\n";
std::cout << "str2 = " << str2 << "\n";
return 0;
}
Now compile and run it:
[bitdiot#kantpute foodir]$ g++ badregex.cpp
[bitdiot#kantpute foodir]$ ./a.out > output
[bitdiot#kantpute foodir]$ less output
[bitdiot#kantpute foodir]$ cat -vte output
str = xml.xml.xml.xml$
str2 = xml.xml.xml^A.zipxml$
In the above example, I redirect output to a file, and use cat to print out the non-printable character. Notice the ctrl-A in the str2.
Anyways, am I using boost libraries incorrectly? Is this a boost bug? Is there another regular expression I can use that can allow me to string replace the ".tail" with some other string? (It's fix in my example.)
thanks.

At least as I'm reading things, the culprit is right here: std::string format( "\1.zipxml");.
You forgot to escape the backslash, so \1 is giving you a control-A. You almost certainly want \\1.
Alternatively (if your compiler is new enough) you could use a raw string instead, so it would be something like: R"(\1.zipxml)", and you wouldn't have to escape your backslashes. I probably wouldn't bother to mention this, except for the fact that if you're writing REs in C++ strings, raw strings are pretty much your new best friend (IMO, anyway).

As Jerry Coffin pointed out to me. It was a stupid mistake on my part.
The errant code is the following:
std::string format( "\1.zipxml");
This should be replaced with:
std::string format( "$1.zipxml");
Thanks for your help everyone.

Related

C++ Regex always matching entire string

Whenever I use a regex function it matches the entire string for some reason.
#include <iostream>
#include <regex>
int main() {
std::string text = "This (is a) test";
std::regex pattern("\(.+\)");
std::cout << std::regex_replace(text, pattern, "isnt") << std::endl;
return 0;
}
Output: isnt
Your pattern unfortunately is not what it seems to be. Here is the problem.
Imagine for some reason you want to match tabs in with you regex. You might try this.
std::regex my_regex("\t");
This would work, but the string your std::regex class has seen is " ", not "\t". This is because of how C++ threats escaped characters. To pass literal "\t", you had to do the following.
std::regex my_regex("\\t");
So the correct syntax for your regex is.
std::regex pattern("\\(.+\\)");

Error in JSON string creation with / operator using jsoncpp library

I am using JSONcpp library and facing problem to create json which contains / operator (like date : 02/12/2015). Below is my code:
JSONNODE *n = json_new(JSON_NODE);
json_push_back(n, json_new_a("String Node", "02/05/2015"));
json_char *jc = json_write_formatted(n);
printf("%s\n", jc);
json_free(jc);
json_delete(n);
Output :
{
"String Node":"02\/05\/2015"
}
Check here "\/" in date, we want only "/" in date, so my expected output should look like this:
Expected output:
{
"String Node":"02/05/2015"
}
How to get rid of it? We are using inbuilt library function and we can not modify library.
According to this example
The json_new_a method will escape your string values when you go to
write the final string.
The additional backslashes you are facing in the output are the result of adding escape characters. This is actually a good thing as it prevents both accidental and intentional errors.
I suppose the easies option for you would be to replace all \/ occurrences in output string with single /. I wrote a simple program to do so:
#include <iostream>
#include <string>
#include <boost/algorithm/string/replace.hpp>
using namespace std;
int main()
{
//Here's the "json_char *jc" from your code
const char* jc = "02\\/05\\/2015";
//Replacing code
string str(jc);
boost::replace_all(str, "\\/", "/");
//Show result
cout << "Old output: " << jc << endl;
cout << "New output: " << str << endl;
}
Live demo
BTW: libJSON is not really a C++ library - it's much more C-style which involves many low-level operations and is more complicated and obfuscated than more C++ish libraries. If you'd like to try, there is a nice list of C++ JSON libs here.

Regular expressions in c++11

I want to parser cpu info in Linux. I wrote such code:
// Returns full data of the file in a string
std::string filedata = readFile("/proc/cpuinfo");
std::cmath results;
// In file that string looks like: 'model name : Intel ...'
std::regex reg("model name: *");
std::regex_search(filedata.c_str(), results, reg);
std::cout << results[0] << " " << results[1] << std::endl;
But it returns empty string. What's wrong?
Not all compilers support the full C++11 specification yet. Notably, regex_search does not work in GCC (as of version 4.7.1), but it does in VC++ 2010.
You didn't specify any capture in your expression.
Given the structure of /proc/cpuinfo, I'd probably prefer a line
oriented input, using std::getline, rather than trying to do
everything at once. So you'ld end up with something like:
std::string line;
while ( std::getline( input, line ) ) {
static std::regex const procInfo( "model name\\s*: (.*)" );
std::cmatch results;
if ( std::regex_match( line, results, procInfo ) ) {
std::cout << "???" << " " << results[1] << std::endl;
}
}
It's not clear to me what you wanted as output. Probably, you also
have to capture the processor line as well, and output that at the
start of the processor info line.
The important things to note are:
You need to accept varying amounts of white space: use "\\s*" for 0 or more, "\\s+" for one or more whitespace characters.
You need to use parentheses to delimit what you want to capture.
(FWIW: I'm actually basing my statements on boost::regex, since I
don't have access to std::regex. I think that they're pretty similar,
however, and that my statements above apply to both.)
Try std::regex reg("model_name *: *"). In my cpuinfo there are spaces before colon.

formatting a string which contains quotation marks

I am having problem formatting a string which contains quotationmarks.
For example, I got this std::string: server/register?json={"id"="monkey"}
This string needs to have the four quotation marks replaced by \", because it will be used as a c_str() for another function.
How does one do this the best way on this string?
{"id"="monkey"}
EDIT: I need a solution which uses STL libraries only, preferably only with String.h. I have confirmed I need to replace " with \".
EDIT2: Nvm, found the bug in the framework
it is perfectly legal to have the '"' char in a C-string. So the short answer is that you need to do nothing. Escaping the quotes is only required when typing in the source code
std::string str("server/register?json={\"id\"=\"monkey\"}")
my_c_function(str.c_str());// Nothing to do here
However, in general if you want to replace a substring by an other, use boost string algorithms.
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main(int, char**)
{
std::string str = "Hello world";
boost::algorithm::replace_all(str, "o", "a"); //modifies str
std::string str2 = boost::algorithm::replace_all_copy(str, "ll", "xy"); //doesn't modify str
std::cout << str << " - " << str2 << std::endl;
}
// Displays : Hella warld - Hexya warld
If you std::string contains server/register?json={"id"="monkey"}, there's no need to replace anything, as it will already be correctly formatted.
The only place you would need this is if you hard-coded the string and assigned it manually. But then, you can just replace the quotes manually.

C++ - string.compare issues when output to text file is different to console output?

I'm trying to find out if two strings I have are the same, for the purpose of unit testing. The first is a predefined string, hard-coded into the program. The second is a read in from a text file with an ifstream using std::getline(), and then taken as a substring. Both values are stored as C++ strings.
When I output both of the strings to the console using cout for testing, they both appear to be identical:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
However, the string.compare returns stating they are not equal. When outputting to a text file, the two strings appear as follows:
ThisIsATestStringOutputtedToAFile
T^#h^#i^#s^#I^#s^#A^#T^#e^#s^#t^#S^#t^#r^#i^#n^#g^#O^#u^#t^#p^#u^#t^#
t^#e^#d^#T^#o^#A^#F^#i^#l^#e
I'm guessing this is some kind of encoding problem, and if I was in my native language (good old C#), I wouldn't have too many problems. As it is I'm with C/C++ and Vi, and frankly don't really know where to go from here! I've tried looking at maybe converting to/from ansi/unicode, and also removing the odd characters, but I'm not even sure if they really exist or not..
Thanks in advance for any suggestions.
EDIT
Apologies, this is my first time posting here. The code below is how I'm going through the process:
ifstream myInput;
ofstream myOutput;
myInput.open(fileLocation.c_str());
myOutput.open("test.txt");
TEST_ASSERT(myInput.is_open() == 1);
string compare1 = "ThisIsATestStringOutputtedToAFile";
string fileBuffer;
std::getline(myInput, fileBuffer);
string compare2 = fileBuffer.substr(400,100);
cout << compare1 + "\n";
cout << compare2 + "\n";
myOutput << compare1 + "\n";
myOutput << compare2 + "\n";
cin.get();
myInput.close();
myOutput.close();
TEST_ASSERT(compare1.compare(compare2) == 0);
How did you create the content of myInput? I would guess that this file is created in two-byte encoding. You can use hex-dump to verify this theory, or use a different editor to create this file.
The simpliest way would be to launch cmd.exe and type
echo "ThisIsATestStringOutputtedToAFile" > test.txt
UPDATE:
If you cannot change the encoding of the myInput file, you can try to use wide-chars in your program. I.e. use wstring instead of string, wifstream instead of ifstream, wofstream, wcout, etc.
The following works for me and writes the text pasted below into the file. Note the '\0' character embedded into the string.
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::istringstream myInput("0123456789ThisIsATestStringOutputtedToAFile\x0 12ou 9 21 3r8f8 reohb jfbhv jshdbv coerbgf vibdfjchbv jdfhbv jdfhbvg jhbdfejh vbfjdsb vjdfvb jfvfdhjs jfhbsd jkefhsv gjhvbdfsjh jdsfhb vjhdfbs vjhdsfg kbhjsadlj bckslASB VBAK VKLFB VLHBFDSL VHBDFSLHVGFDJSHBVG LFS1BDV LH1BJDFLV HBDSH VBLDFSHB VGLDFKHB KAPBLKFBSV LFHBV YBlkjb dflkvb sfvbsljbv sldb fvlfs1hbd vljkh1ykcvb skdfbv nkldsbf vsgdb lkjhbsgd lkdcfb vlkbsdc xlkvbxkclbklxcbv");
std::ofstream myOutput("test.txt");
//std::ostringstream myOutput;
std::string str1 = "ThisIsATestStringOutputtedToAFile";
std::string fileBuffer;
std::getline(myInput, fileBuffer);
std::string str2 = fileBuffer.substr(10,100);
std::cout << str1 + "\n";
std::cout << str2 + "\n";
myOutput << str1 + "\n";
myOutput << str2 + "\n";
std::cout << str1.compare(str2) << '\n';
//std::cout << myOutput.str() << '\n';
return 0;
}
Output:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
It turns out that the problem was that the file encoding of myInput was UTF-16, whereas the comparison string was UTF-8. The way to convert them with the OS limitations I had for this project (Linux, C/C++ code), was to use the iconv() functions. To keep the compatibility of the C++ strings I'd been using, I ended up saving the string to a new text file, then running iconv through the system() command.
system("iconv -f UTF-16 -t UTF-8 subStr.txt -o convertedSubStr.txt");
Reading the outputted string back in then gave me the string in the format I needed for the comparison to work properly.
NOTE
I'm aware that this is not the most efficient way to do this. I've I'd had the luxury of a Windows environment and the windows.h libraries, things would have been a lot easier. In this case though, the code was in some rarely used unit tests, and as such didn't need to be highly optimized, hence the creation, destruction and I/O operations of some text files wasn't an issue.