How can I terminate a string with regex_replace? - regex

I'm using CreateProcess to run a bash script via Cygwin's bash.exe and redirecting the output (because that's what the customer wants). The only problem still left to solve is that if ReadFile doesn't fill up lpBuffer I end up with a bunch of junk characters at the end of it, which I would like to filter out. Usually, this is something like:
"ÌÌÌÌ...ÌÌÌÌÌuÆì¨õD"
for which the code below will give me:
"uÆì¨õD"
So, I'm at least partially successful =D
However, what I'd really like is to just terminate the string at the first junk character, preferably with a newline also, but I can't seem to find a variation of fmt that works.
void ReadAndHandleOutput(HANDLE hPipeRead) {
char lpBuffer[256];
DWORD nBytesRead;
wstringstream wss;
while(TRUE)
{
if(!ReadFile(hPipeRead, lpBuffer, sizeof(lpBuffer), &nBytesRead, NULL) || !nBytesRead)
{
break;
}
// Filter out the weird non-ascii characters.
std::string buffer(lpBuffer);
std::regex rx("[^[:alnum:][:punct:][:space:]]+");
std::string fmt("\n\0");
std::regex_constants::match_flag_type fonly = std::regex_constants::format_first_only;
std::string result = std::regex_replace(buffer, rx, fmt, fonly);
wss << result.c_str();
}
SetWindowText(GetDlgItem(HwndMain, IDC_OUTPUT), LPCWSTR(wss.str().c_str())); }

I'm not sure fixing it with regex is all right. I believe you should put a \0 in where the input has finished, and you can find out the location by retrieving the number of characters read.
However, these are the set of printable (non-junk) ASCII characters:
[ -~]
Which is the set of characters from space to tilde.
So this is the desired pattern:
[^ -~]+

Related

c++ Function to add an extra '\' to a filepath?

I have about 3500 full file paths to sort through (ex. "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk"). I just learned that c++ does not recognize the single backslash when reading in a file path. I have about 3500 file paths that I am reading in so it would be overly tedious to manually change each one.
I have this for loop that finds the single backslash and inserts a double backslash at that index. This:
string line = "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk";
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\') {
filepath.insert(i, '\');
}
}
However, c++, specifically on c::b, does not compile because of the backslash character. Is there a way to add in the extra backslash character with a function?
I am reading the filepaths in from a text file, so they are being read into the string filepath variable, this is just a test.
Use double backslash as '\\' and "C:\\Users...". Because single backslash with the next character makes an escape.
Also the string::insert() method's 2nd argument expects number of characters, which is missing in your code.
With all those fixes, it compiles fine:
string filepath = "C:\\Users\\Nick\\Documents\\ReadIns\\NC_000852.gbk";
// ^^ ^^ ^^ ^^ ^^
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\\') {
// ^^
filepath.insert(i, 1, '\\');
} // ^^^^^^^
}
I am not sure, how above logic will work. But below is my preferred way:
for(auto pos = filepath.find('\\'); pos != string::npos; pos = filepath.find('\\', ++pos))
filepath.insert(++pos, 1, '\\');
If you had only single character to be replaced (e.g. linux system or probably supported in windows); then, you may also use std::replace() to avoid the looping as mentioned in this answer:
std::replace(filepath.begin(), filepath.end(), '\\', '/');
I assumed that, you already have a file created which contains single backslashes and you are using that for parsing.
But from your comments, I notice that apparently you are getting the file paths directly in runtime (i.e. while running the .exe). In that case, as #MSalters has mentioned, you need not worry about such transformations (i.e. changing the backslashes).
The problem that you're seeing is because in C++, string literals are commonly enclosed in "" quotes. This brings up one minor problem: how do you put a quote inside a string literal, when that quote would end the string literal. The solution is escaping it with a \. This can also be used to add a few other characters to a string, such as \n (newline). And since \ now has a special meaning in string literals, it's also used to escape itself. So "\\" is a string containing just one character (and of course a trailing NUL).
This also applies to character literals: char example[4] = {'a', '\\', 'b', 0} is an alternative way to write "a\\b".
Now this is all about compile time, when the compiler needs to separate C++ code and string contents. Once your executable is running, a backslash is just one char. std::cout << "a\\b" prints a single backslash, because there's only one in memory. std::String word; std::cin >> word will read a single word, and if you enter one backslash then word will contain one backslash. The compiler isn't involved in that.
So if you read 3500 filenames from a std::ifstream list_of_filenames and then use that to create a further 3500 std::ifstreams, you only need to worry about backslashes in specifying that very first filename in code. And if ou take that filename from argv[1] instead, you don't need to care at all.
One way to get rid of special handling of backslash is to keep all file names in a separate disk file as such and use file stream objects such as ifstream to get file names in C++ format.
TCHAR tcszFilename[MAX_PATH] = {0};
ifstream ObjInFiles( "E:\\filenames.txt" );
ObjInFiles.getline( tcszFilename, MAX_PATH );
ObjInFiles.close();
Suppose first file name stored in filenames.txt is "e:\temp\abc.txt" then after executing getline() above, the variable tcszFilename will hold "e:\\temp\\abc.txt".

How do I remove only the first character of a string that is not a digit? (MFC, C++)

I want to remove only the first character in a string that is NOT a digit. The first character can be anything from ‘A’ to ‘Z’ or it may be a special character like ‘&’ or ‘#’. This legacy code is written in MFC. I've looked at the CString class but cannot figure out how to make this work.
I have strings that may look like any of the following:
J22008943452GF or 22008943452GF or K33423333333IF or 23000526987IF or #12000895236GF. You get the idea by now.
My dilemma is I need to remove the character in the first position of all the strings, but not the strings that starts with a digit. For the strings that begin with a digit, I need to leave them alone. Also, none of the other characters in the string should not be altered. For example the ‘G’, ‘I’ or ‘F’ in the later part of the string should not be changed. The length of the string will always be 13 or 14 digits.
Here is what I have so far.
CString GAbsMeterCalibration::TrimMeterSNString (CString meterSN)
{
meterSN.MakeUpper();
CString TrimmedMeterSNString = meterSN;
int strlength = strlen(TrimmedMeterSNString);
if (strlength == 13)
{
// Check the first character anyway, even though it’s
// probably okay. If it is a digit, life’s good.
// Return unaltered TrimmedMeterSNString;
}
if (strlength == 14))
{
//Check the first character, it’s probably going
// to be wrong and is a character, not a digit.
// if I find a char in the first postion of the
// string, delete it and shift everything to the
// left. Make this my new TrimmedMeterSNString
// return altered TrimmedMeterSNString;
}
}
The string lengths are checked and validated before the calls.
From my investigations, I’ve found that MFC does not have a regular expression
class. Nor does it have the substring methods.
How about:
CString GAbsMeterCalibration::TrimMeterSNString (CString meterSN)
{
meterSN.MakeUpper();
CString TrimmedMeterSNString = meterSN;
int strlength = strlen(TrimmedMeterSNString);
if (std::isdigit(TrimmedMeterSNString.GetAt(0)) )
{
// Check the first character anyway, even though it’s
// probably okay. If it is a digit, life’s good.
// Return unaltered TrimmedMeterSNString;
}
}
From what I understand, you want to remove the first letter if it is not a digit. So you may make this function simpler:
CString GAbsMeterCalibration::TrimMeterSNString(CString meterSN)
{
meterSN.MakeUpper();
int length = meterSN.GetLength();
// just check the first character is always a digit else remove it
if (length > 0 && unsigned(meterSN[0] - TCHAR('0')) > unsigned('9'))
{
return meterSN.Right(length - 1);
}
return meterSN;
}
I am not using function isdigit instead of the conditional trick with unsigned because CString uses TCHAR which can be either char or wchar_t.
The solution is fairly straight forward:
CString GAbsMeterCalibration::TrimMeterSNString(CString meterSN) {
meterSN.MakeUpper();
return _istdigit(meterSN.GetAt(0)) ? meterSN :
meterSN.Mid(1);
}
The implementation can be compiled for both ANSI and Unicode project settings by using _istdigit. This is required since you are using CString, which stores either MBCS or Unicode character strings. The desired substring is extracted using CStringT::Mid.
(Note that CString is a typedef for a specific CStringT template instantiation, depending on your project settings.)
CString test="12355adaddfca";
if((test.GetAt(0)>=48)&&(test.GetAt(0)<=57))
{
//48 and 57 are ascii values of 0&9, hence this is a digit
//do your stuff
//CString::GetBuffer may help here??
}
else
{
//it is not a digit, do your stuff
}
Compare the ascii value of the first position in the string and you know if it's a digit or not..
I don't know if you've tried this, but, it should work.
CString str = _T("#12000895236GF");
// check string to see if it starts with digit.
CString result = str.SpanIncluding(_T("0123456789"));
// if result is empty, string does not start with a number
// and we can remove the first character. Otherwise, string
// remains intact.
if (result.IsEmpty())
str = str.Mid(1);
Seems a little easier than what's been proposed.

reading a "\n" string and writing to textfile?

I'm struggling with the following: I'm reading from an XML file the following std::stringstream
"sigma=0\nreset"
Which after some copying&processing is written to a text-file. And I was hoping for the following
sigma=0
reset
But sadly I only get
sigma=0\nreset
but when I directly stream
out << "sigma=0\nreset"
I get:
sigma=0
reset
I currently suspect that some qualifier of the "\n" is lost during the "copy&processing"... is this possible? How to track down a "\n" in the stream which isn't a linefeed anymore?
Thank you!
It's because the output functions doesn't handle the escape sequences like '\n', it's the compiler that does and then only for literals. The compiler knows nothing of the contents of strings, and so can not do the translation "\n" to newline when inside a string.
You have to parse the string itself, and write out newlines when appropriate.
Assuming that the std::stringstream actually contains what is equivalent to the literal "sigma=0\\nreset" (length = 14 characters) and not "sigma=0\nreset" (length = 13 characters), you'll have to replace it yourself. Doing so is not very difficult, either use boost's replace_all (http://www.boost.org/doc/libs/1_53_0/doc/html/boost/algorithm/replace_all.html), or std::string::find and std::string::replace:
std::stringstream inStream;
inStream.str ("sigma=0\\nreset");
std::string content = inStream.str();
size_t index = content.find("\\n",0);
while(index != std::string::npos)
{
content.replace(index, 2, "\n");
index = content.find("\\n",index);
}
std::cout << content << '\n';
Note: you may want to consider cases when the system end-of-line is something other than "\n"
If the std::stringstream actually contains "sigma=0\nreset", then please post the code that does the copying/processing and the writing to the text file.

sscanf for this type of string

I'm not quite sure even after reading the documentation how to do this with sscanf.
Here is what I want to do:
given a string of text:
Read up to the first 64 chars or until space is reached
Then there will be a space, an = and then another space.
Following that I want to extract another string either until the end of the string or if 8192 chars are reached. I would also like it to change any occurrences in the second string of "\n" to the actual newline character.
I have: "%64s = %8192s" but I do not think this is correct.
Thanks
Ex:
element.name = hello\nworld
Would have string 1 with element.name and string2 as
hello
world
I do recommend std::regex for this, but apart from that, you should be fine with a little error checking:
#include <cstdio>
int main(int argc, const char *argv[])
{
char s1[65];
char s2[8193];
if (2!=std::scanf("%64s = %8192s", s1, s2))
puts("oops");
else
std::printf("s1 = '%s', s2 = '%s'\n", s1, s2);
return 0;
}
Your format string looks right to me; however, sscanf will not change occurences of "\n" to anything else. To do that you would then need to write a loop that uses strtok or even just a simple for loop evaluating each character in the string and swapping it for whatever character you prefer. You will also need to evaluate the sscanf return value to determine if the 2 strings were indeed scanned correctly. sscanf returns the number of field successfully scanned according to your format string.
#sehe shows the correct usage of sscanf including the check for the proper return value.

How to read a word into a string ignoring a certain character

I am reading a text file which contains a word with a punctuation mark on it and I would like to read this word into a string without the punctuation marks.
For example, a word may be " Hello, "
I would like the string to get " Hello " (without the comma). How can I do that in C++ using ifstream libraries only.
Can I use the ignore function to ignore the last character?
Thank you in advance.
Try ifstream::get(Ch* p, streamsize n, Ch term).
An example:
char buffer[64];
std::cin.get(buffer, 64, ',');
// will read up to 64 characters until a ',' is found
// For the string "Hello," it would stream in "Hello"
If you need to be more robust than simply a comma, you'll need to post-process the string. The steps might be:
Read the stream into a string
Use string::find_first_of() to help "chunk" the words
Return the word as appropriate.
If I've misunderstood your question, please feel free to elaborate!
If you only want to ignore , then you can use getline.
const int MAX_LEN = 128;
ifstream file("data.txt");
char buffer[MAX_LEN];
while(file.getline(buffer,MAX_LEN,','))
{
cout<<buffer;
}
EDIT: This uses std::string and does away with MAX_LEN
ifstream file("data.txt");
string string_buffer;
while(getline(file,string_buffer,','))
{
cout<<string_buffer;
}
One way would be to use the Boost String Algorithms library. There are several "replace" functions that can be used to replace (or remove) specific characters or strings in strings.
You can also use the Boost Tokenizer library for splitting the string into words after you have removed the punctuation marks.