C++ string: How to replace unescaped backslash? [closed] - c++

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I am reading some log files. The logs for Windows contain paths like C:\some\path.
When I read with std::getline, I get a string containing unescaped backslashes. How can I replace them with forward slashes?
I tried
std::replace(str.begin(), str.end(), '\\', '/');
but the result looks like C:somepath instead of C:/some/path.
How do I replace the \ with / or \\?
This string is then used to build a JSON object, so not replacing the backslashes results in an invalid JSON object.

The std::replace call that you tried is perfectly valid and should do exactly what you want from it, so the only reason for resulting string not to contain any slashes is there were no slashes of any kind to begin with.
I suggest using a debugger to determine what's going on with your string through its whole lifetime

Okay, so this is for conversion to JSON, where backslashes need to be modified somehow or other (apparently conversion to forward slashes is allowable in this case, otherwise you'd need to double the back-slashes to escape them).
Your basic idea should work--simply replacing each \\ with a / should be easy enough.
#include <iostream>
#include <algorithm>
#include <string>
#include <cassert>
int main() {
std::string in{"a\\b\\c\\d"};
std::replace(in.begin(), in.end(), '\\', '/');
assert(in == "a/b/c/d");
std::cout << in;
}
I'm not sure what problem you encountered--at least for me, this seems to work fine. Of course, this only really makes sense as part of a larger program. If you were going to do this in isolation, tr would be entirely sufficient. If you really needed to make it a program, SNOBOL would do the job considerably more easily than C or C++:
loop: INPUT "\" = '/' . OUTPUT : s(loop)

Escape character like newline using '\n' or backslash like '\\' are only used in literals, i.e. the constant strings and characters in your code. In string or character variables there is no special handling of backslashes.
That's because backslashes in string or character literal constants are handled by the compiler, at the time of compilation. Nothing is done at run-time.
So the solution to your problems is really to do nothing at all.

Related

Why does it give me an error when opening a txt fiile? [duplicate]

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.
In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.
The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)
Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

Make variable string ignore escape sequences

I'm currently facing an issue with a method parsing a string to another method. The problem is that I want to prevent it from using possible escape sequences.
The string I want to parse is not constant so (as far as I know) using the R declaration to make it a raw literal is not applicable here since I have to use variables.
Furthermore, in some cases there is user input included into the string (unconverted), so simply escaping those sequences by replacing a "\" character with "\\" is not an option either, the input can include those sequences too.
To be more precise on the issue:
A string formatted like f.e. " "\x10\x4 \x6(" " is getting auto compiled and converted into a non-human readable format as soon as it gets parsed to the next function. I want to prevent that conversion without In order to get the exact same string in the next function which needs to work with it.
Hope someone can help me since I'm new to c++ programming. Thanks in advance :D
#include "pch.h"
#include <iostream>
int main()
{
stringTester stringtester;
std::string test = stringtester.exampleString();
stringtester.stringOutput(test);
}
std::string stringTester::exampleString()
{
std::string exampleInput = "\x10\x5\x1a\aTestInput\\n \x6(";
return exampleInput;
}
void stringTester::stringOutput(std::string test)
{
std::cout << test << std::endl;
}
The actual output her (copied from console) is " TestInput\n ( ", whereas the wanted output would be the original string "\x10\x5\x1a\aTestInput\n \x6("
Edit: It seems like on SO it can't show the unknown characters. There are xtra characters in front and after the "TestInput\n ("
When you write a string literal in your source code the compiler replaces escape sequences with the character that they represent. That's why the quoted string in your example gets turned into nonsense. The way to fix that is to either replace each backslash with two backslashes or to make it a raw string literal.
When your program reads text input it doesn't do any of those adjustments. So if the code does
std::string input;
std::cin >> input;
and the user types the characters \x10\x5\x1a\aTestInput\\n \x6( into the console, input will end up with the characters \x10\x5\x1a\aTestInput\\n \x6(.
Once you've got the string, whether as a string literal or as text from the console, you can do whatever you want with it.
You have two possibilities for a backslash to remain a backslash in your C/C++ strings (and Java, JavaScript, PHP...)
Double all the Backslashes
Just as you said, you want to double all backslashes. This is fine. If the input was:
\\\\
Then your C/C++ string is going to be:
"\\\\\\\\"
(a mouthful, I know...)
Use the Hex/Octal Character
The other way, if you don't like the double backslash too much (if it scares you, somehow), is to use the character sequence in octal or hex (or Unicode in newer versions):
\ becomes "\134" or "\x5C"
As you may notice, though, this means 4 characters per backslash. So most people will generally just double the backslash (one 2 characters). Plus the double backslash is well understood. The code point may not be as well known by programmers coming behind you.
As a side note, if your user can enter any character, then they can also enter the double quote (") character. It is important that you also escape those. You can similarly use the backslash and the double quote character or its code point:
\" or \042 or \x22

Intellij: Regular Expression failed to match - produced stack overflow when matching content of the file [duplicate]

This is my Regex
((?:(?:'[^']*')|[^;])*)[;]
It tokenizes a string on semicolons. For example,
Hello world; I am having a problem; using regex;
Result is three strings
Hello world
I am having a problem
using regex
But when I use a large input string I get this error
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
How is this caused and how can I solve it?
Unfortunately, Java's builtin regex support has problems with regexes containing repetitive alternative paths (that is, (A|B)*). This is compiled into a recursive call, which results in a StackOverflow error when used on a very large string.
A possible solution is to rewrite your regex to not use a repititive alternative, but if your goal is to tokenize a string on semicolons, you don't need a complex regex at all really, just use String.split() with a simple ";" as the argument.
If you really need to use a regex that overflows your stack, you can increase the size of your stack by passing something like -Xss40m to the JVM.
It might help to add a + after the [^;], so that you have fewer repetitions.
Isn't there also some construct that says “if the regular expression matched up to this point, don't backtrace”? Maybe that comes in handy, too. (Update: it is called possessive quantifiers).
A completely different alternative is to write a utility method called splitQuoted(char quote, char separator, CharSequence s) that explicitly iterates over the string and remembers whether it has seen an odd number of quotes. In that method you could also handle the case that the quote character might need to be unescaped when it appears in a quoted string.
'I'm what I am', said the fox; and he disappeared.
'I\'m what I am', said the fox; and he disappeared.
'I''m what I am', said the fox; and he disappeared.

Could someone explain C++ escape character " \ " in relation to Windows file system?

I'm really confused about the escape character " \ " and its relation to the windows file system. In the following example:
char* fwdslash = "c:/myfolder/myfile.txt";
char* backslash = "c:\myfolder\myfile.txt";
char* dblbackslash = "c:\\myfolder\\myfile.txt";
std::ifstream file(fwdslash); // Works
std::ifstream file(dblbackslash); // Works
std::ifstream file(backslash); // Doesn't work
I get what you are doing here is escaping a special character so you can use it in this string. In no way by placing a backslash in a string literal or std::string do you actually change the string ---
---Edit: This is completely wrong, and the source of my confusion---
So it seems that the escape character is only treated by certain classes or things to mean something other than a backslash, like outputting on the console, ie., std::cout << "\hello"; will not print the backslash. In the case of ifstream (or I'm not sure if the same applies with the C fopen() version), it must be that this class or function treats backslashes as escape characters. I'm wondering, since the Windows file system uses backslashes wouldn't it make sense for it to accept the simple string with backslashes, ie., "c:\myfolder\myfile.txt" ? Trying it this way fails.
Also, in my compiler (Visual Studio) when I include headers I can use .\ and ..\ to mean either the current folder, or the parent folder. I'm pretty sure the \ in this isn't related to the escape character, but are these forms specific to Windows, part of the C preprocessor, or part of the C or C++ language? I know that backslashes are a Windows thing, so I can't see any reason another system would expect backslashes even when using .\ and ..\
Thanks.
In no way by placing a backslash in a string literal[...] do you
actually change the string
You do. Compiler actually modifies literal you wrote before embedding it into compiled program. If a backslash is found in string or character literal while parsing source code it is ignored and next character is treated specially. \n becomes carriage return, etc. For escaped characters without special meaning threatment is implementation defined. Usually it just means character unchanged.
You cannot just pass "c:\myfolder\file.txt" because it is not a string which will be seen by your program. Your program will see "c:myfolderfile.txt" instead. This is why escaped backslash has a special meaning, to allow embedding backslashes in actual string your program will see.
The solution is to either escape your backslashes, or use raw string literals (C++11 onwards):
const char* path = R"(c:\myfolder\file.txt)"
Filenames given to #include directive are not string literals, even if they are in form "path\to\header", so substitution rules are not applied to them.
The single backwards slash practically escapes the next character. In order to get rid of this behavior you need to double escape it. Now for the forward slash, it is probably a compatibility issue which follows the Unix tradition.
Similar thing to this is also in the Java world. A single forward slash is treated for path separation on both Windows and Unix, while also a double backslash.
To make it more clear why single backslash doesn't work, just remember that the following String practically produces a newline, a backslash and a tab:
"\n\\\t"
i.e. in an example like:
""c:\my\next\file.txt"
would actually produce:
"c:my
ext
ile.txt"
(the double space is form feed, see here)
Because when declaring a cstring literal the backslashes escape the next character, for special characters. This is so you can do newlines (\n), nulls (\0), carriage returns (\r) etc...
char* backslash = "c:\myfolder \myfile.txt";

C++ - Escaping or disabling backslash on string

I am writing a C++ program to solve a common problem of message decoding. Part of the problem requires me to get a bunch of random characters, including '\', and map them to a key, one by one.
My program works fine in most cases, except that when I read characters such as '\' from a string, I obviously get a completely different character representation (e.g. '\0' yields a null character, or '\' simply escapes itself when it needs to be treated as a character).
Since I am not supposed to have any control on what character keys are included, I have been desperately trying to find a way to treat special control characters such as the backslash as the character itself.
My questions are basically these:
Is there a way to turn all special characters off within the scope of my program?
Is there a way to override current digraphs definitions of special characters and define them as something else (like digraphs using very rare keys)?
Is there some obscure method on the String class that I missed which can force the actual character on the string to be read instead of the pre-defined constant?
I have been trying to look for a solution for hours now but all possible fixes I've found are for other languages.
Any help is greatly appreciate.
If you read in a string like "\0" from stdin or a file, it will be treated as two separate characters: '\\' and '0'. There is no additional processing that you have to do.
Escaping characters is only used for string/character literals. That is to say, when you want to hard-code something into your source code.