How to detect ESC in string using Boost regex - c++

I need to determine if a file is PCL encoded. So I am looking at the first line to see if it begins with an ESC character. If you know a better way feel free to suggest. Here is my code:
bool pclFlag = false;
if (containStr(jobLine, "^\\e")) {
pclFlag=true;
}
bool containStr(const string& s, const string& re)
{
static const boost::regex e(re);
return regex_match(s, e);
}
pclFlag does not get set to true.

You've declared boost::regex e to be static, which means it will only get initialized the very first time your function is called. If your search here is not the first call, it will be searching for whatever string was passed in the first call.
regex_match must match the entire string. Try adding ".*" (dot star) to the end of your regex.
Important
Note that the result is true only if the expression matches the whole of the input sequence. If you want to search for an expression somewhere within the sequence then use regex_search. If you want to match a prefix of the character string then use regex_search with the flag match_continuous set.
http://www.boost.org/doc/libs/1_51_0/libs/regex/doc/html/boost_regex/ref/regex_match.html
#JoachimPileborg is right... if (jobline[0] == 0x1B) {} is much easier.

Boost.Regex seems like overkill if all you want to do is see if a string starts with a certain character.
bool pclFlag = jobLine.length() > 0 && jobLine[0] == '\033';
You could also use Boost string algorithms:
#include <boost/algorithm/string.hpp>
bool pclFlag = jobLine.starts_with("\033");
If you're looking to see if a string contains an escape anywhere in the string:
bool pclFlag = jobLine.find('\033') != npos;

Related

Replace single backslash with double in a string c++

I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.

Why can't I specify regex_constants in assign?

I am using std::regex with VS2015/VS2017 on Windows. I am trying to set up a general routine that will do a search search to perform the 8 functions as requested by the user:
Begins with
Does not begin with
Ends with
Does not end with
Contains string
Does not contain string
Contains any of the characters in supplied string
Does not contain any of the characters in supplied string.
The routine is passed the type above, the associated string, the string to search, a boolean variable to state if the search is case sensitive/insensitive and another boolean variable on whether to restrict the search for whole words only (types 1-6 only). I modify the supplied find string if "whole words only" is specified.
e.g.
void Search(std::wstring sRegex, std::wstring wsString, bool bCaseSensitive, bool bWholeWordsOnly);
However, I can't get basic case insensitivity to work as I expected.
This works:
std::wregex reg;
std::wsmatch m;
if (bCaseSensitive) {
reg.assign(sRegex.c_str(), std::regex::ECMAScript);
} else {
reg.assign(sRegex.c_str(), std::regex::ECMAScript | std::regex::icase);
}
std::regex_search(wsString, m, reg) .....
but this doesn't (icase ignored and worse):
std::wregex reg;
std::wsmatch m;
reg.assign(sRegex.c_str(), std::regex::ECMAScript |
(bCaseSensitive ? 0 : std::regex::icase));
std::regex_search(wsString, m, reg) .....
Can anyone explain this please?
PS. I haven't yet coded more than type 1 yet as I want to fix this first before continuing. Any suggestions for the regex string for the other types with & without "whole word only" (where appropriate) would be gratefully received.

boost regex_match doesnot match string with special charcters [duplicate]

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.
I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.
Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.
Alternatively, is there a list of all characters that would need to be escaped?
. ^ $ | ( ) [ ] { } * + ? \
Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.
const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
boost::match_default | boost::format_sed);
(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)
Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.
const std::string rep("\\\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
boost::match_default | boost::format_perl);
Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():
std::wstring regex_escape(const std::wstring& string_to_escape) {
static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") );
const std::wstring rep( _T("\\\\&") );
std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);
return result;
}
For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.
Same with boost::xpressive:
const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])");
std::string regex_escape(std::string text){
text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") );
return text;
}
In C++11, you can use raw string literals to avoid escaping the regex string:
std::string myRegex = R"(something\.com)";
See http://en.cppreference.com/w/cpp/language/string_literal, item (6).

Regular Expression for removing suffix

What is the regular expression for removing the suffix of file names? For example, if I have a file name in a string such as "vnb.txt", what is the regular expression to remove ".txt"?
Thanks.
Do you really need a regular expression to do this? Why not just look for the last period in the string, and trim the string up to that point? Frankly, there's a lot of overhead for a regular expression, and I don't think you need it in this case.
As suggested by tstenner, you can try one of the following, depending on what kinds of strings you're using:
std::strrchr
std::string::find_last_of
First example:
char* str = "Directory/file.txt";
size_t index;
char* pStr = strrchr(str,'.');
if(nullptr != pStr)
{
index = pStr - str;
}
Second example:
int index = string("Directory/file.txt").find_last_of('.');
If you are using Qt already, you could use QFileInfo, and use the baseName() function to get just the name (if one exists), or the suffix() function to get the extension (if one exists).
If you're looking for a solution that will give you anything except for the suffix, you should use string::find_last_of.
Your code could look like this:
const std::string removesuffix(const std::string& s) {
size_t suffixbegin = s.find_last_of('.');
//This will handle cases like "directory.foo/bar"
size_t dir = s.find_last_of('/');
if(dir != std::string::npos && dir > suffixbegin) return s;
if(suffixbegin == std::string::npos) return s;
else return s.substr(0,suffixbegin);
}
If you're looking for a regular expression, use \.[^.]+$.
You have to escape the first ., otherwise it will match any character, and put a $ at the end, so it will only match at the end of a string.
Different operating systems may allow different characters in filenams, the simplest regex might be (.+)\.txt$. Get the first capture group to get the filename sans extension.

string contains valid characters

I am writing a method whose signature is
bool isValidString(std::string value)
Inside this method I want to search all the characters in value are belongs to a set of characters which is a constant string
const std::string ValidCharacters("abcd")
To perform this search I take one character from value and search in ValidCharacters,if this check fails then it is invalid string is there any other alternative method in STL library to do this check.
Use find_first_not_of():
bool isValidString(const std::string& s) {
return std::string::npos == s.find_first_not_of("abcd");
}
you can use regular expressions to pattern match.
library regexp.h is to be included
http://www.digitalmars.com/rtl/regexp.html