ofstream not translating "\r\n" to new line character - c++

I have written a c++ code for changing file formats. Part of the functionality is to add a configured line end character. For one of file conversions, the line end character required is "\r\n" i.e. CR+NL .
My code basically reads the configured value from DB and appends it to the end of each record. Something on the lines of
//read DB and store line end char in a string lets say lineEnd.
//code snippet for file writting
string record = "this is a record";
ofstream outFileStream;
string outputFileName = "myfile.txt";
outFileStream.open (outputFileName.c_str());
outFileStream<<record;
outFileStream<<lineEnd; // here line end contains "\r\n"
But this prints record followed by \r\n as it is, no translation to CR+NL takes place.
this is a record\r\n
While the following works (prints CR+LF in output file)
outFileStream<<record;
outFileStream<<"\r\n";
this is a record
But I can not hard code it. I am facing similar issues with "\n" also.
Any suggestions on how to do it.

The translation of \r into the ASCII character CR and of \n into the ASCII character LF is done by the compiler when parsing your source code, and in literals only. That is, the string literal "A\n" will be a 3-character array with values 65 10 0.
The output streams do not interpret escape sequences in any way. If you ask an output stream to write the characters \ and r after each other, it will do so (write characters with ASCII value 92 and 114). If you ask it to write the character CR (ASCII code 13), it will do so.
The reason std::cout << "\r"; writes the CR character is that the string literal already contains the character 13. So if your database includes the string \r\n (4 characters: \, \r, \, n, ASCII 92 114 92 110), that is also the string you will get on output. If it contained the string with ASCII 13 10, that's what you'd get.
Of course, if it's impractical for you to store 13 10 in the database, nothing prevents you from storing 92 114 92 110 (the string "\r\n") in there, and translating it at runtime. Something like this:
void translate(std::string &str, const std::string &from, const std:string &to)
{
std::size_t at = 0;
for (;;) {
at = str.find(from, at);
if (at == str.npos)
break;
str.replace(at, from.size(), to);
}
}
std::string lineEnd = getFromDatabase();
translate(lineEnd, "\\r", "\r");
translate(lineEnd, "\\n", "\n");

Related

How to delimit this text file? strtok

so there's a text file where I have 1. languages, a 2. text of a number written in the said language, 3. the base of the number and 4. the number written in digits. Here's a sample:
francais deux mille quatre cents 10 2400
How I went about it:
struct Nomen{
char langue[21], nomNombre [31], baseC[3], nombreC[21];
int base, nombre;
};
and in the main:
if(myfile.is_open()){
{
while(getline(myfile, line))
{
strcpy(Linguo[i].langue, strtok((char *)line.c_str(), " "));
strcpy(Linguo[i].nomNombre, strtok(NULL, " "));
strcpy(Linguo[i].baseC, strtok(NULL, " "));
strcpy(Linguo[i].nombreC, strtok(NULL, "\n"));
i++;
}
Difficulty: I'm trying to put two whitespaces as a delimiter, but it seems that strtok() counts it as if there were only one whitespace. The fact there are spaces in the text number, etc. is messing up the tokenization. How should I go about it?
strtok treats any single character in the provided string as a delimiter. It does not treat the string itself as a single delimiter. So " " (two spaces) is the same as " " (one space).
strtok will also treat multiple delimiters together as a single delimiter. So the input "t1 t2" will be tokenized as two tokens, "t1" and "t2".
As mentioned in comments, strtok is also writes the NUL character into the input to create the token strings. So, it is an error to pass the result of string::c_str() as input to the function. The fact that you need to cast the constant string should have been enough to dissuade you from this approach.
If you want to treat a double space as a delimiter, you will have to scan the string and search for them yourself. Given you are using C APIs, you can consider strstr. However, in C++, you can use string::find.
Here's an algorithm to parse your string manually:
Given an input string input:
language is the substring from the start of input to the first SPC character.
From where language ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
text is the substring from the start of input to the first double SPC sequence.
From where text ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
Parse base, and parse number.

How do I mimic a Unicode JS regular expression in Lucee

I am trying to write a regular express in Lucee to mimic the JS on the front end. Since Lucee's regex doesn't seem to suppoert unicode how do I do it.
This is the JS
function charTest(k){
var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
return regexp.test(k)
}
if(!charTest(thisKey)){
alert("Please Use Latin Characters Only");
return false;
}
This is what I have tried in Lucee
regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));
I have also tried
regexp = "[\\p{L}]";
but the dump is always 0
EDIT: Give me one second. I think I interpreted your initial JS regex incorrectly. Fixing it.
EDIT 2: It was more than a second. Your original JS regex was:
"/^[\u00C0-\u00ff\s -\~]+$/". This is:
Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string
Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À)
to Character 255 (ÿ). This is the Latin 1
Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the
(escaped) tilde character (~). This is ASCII 32-126, which
includes the printable characters of ASCII (except the DEL
character (127). This includes alpha-numerics amd most punctuation.
I missed the second half of your printable Latin basic character set. I've updated my regex and tests to include it. There are ways to shorthand some of these identifiers, but I wanted it to be explicit.
You can try this:
<cfscript>
//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode
function charTest(k) {
return
REfind("[^"
& chr(32) & "-" & chr(126)
& chr(192) & "-" & chr(255)
& "]",arguments.k)
? "Please Use Latin Characters Only"
: ""
;
}
// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest(" ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range)
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)
writeDump(charTest("
")); // CRLF (outside range)
writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)
//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));
</cfscript>
https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai
All the regex does is look at your input string and if it doesn't find a value between chr(192) and chr(255), it will return your chosen string, else it will return nothing.
I think you can access the UNICODE characters below 255 directly. I'll have to test it.
Do you need to alert this function, like the Javascript? If you need to, you can just output a 1 or 0 to determine if this function actually found the character you're looking for.

std::getline doesn't skip empty lines when reading from ifstream

Consider this code:
vector<string> parse(char* _config) {
ifstream my_file(_config);
vector<string> my_lines;
string nextLine;
while (std::getline(my_file, nextLine)) {
if (nextLine[0] == '#' || nextLine.empty() || nextLine == "") continue;
my_lines.push_back(nextLine);
}
return my_lines;
}
and this config file:
#Verbal OFF
0
#Highest numeric value
100
#Deck
67D 44D 54D 63D AS 69H 100D 41H 100C 39H 10H 85H 7D 42S 6C 67H 61D 33D 28H 93S QH 5D 91C 40S 50C 74S 8C 98C 96C 71D 82S 75S 23D 40C 29S QC 84C 16C 80D 13H 35S
#Players
P1 1
P2 2
My goal is to parse the config file to a vector of strings, parsed line by line, ignoring empty lines and the '#' character.
When running this code on Visual Studio, the output is correct But, when running on Linux with g++, I still get some empty lines.
Your input file most likely has lines ending with CR LF, i.e. Windows/DOS text files. Linux expects all lines ending with LF only, so on Linux, std::getline() ends up reading a line containing a single CR character.
Before the existing code that checks the contents of nextLine, check if the line is non-empty, and ends with the CR character, then remove it. Then continue on with your existing if statement.

Print a string like "First\nSecond" on two lines

Aim: to read a string in the form First\nSecond from a file and to print it as
First
Second
Problem: if the string is defined in the code, as in line = "First\nSecond";, then it is printed on two lines; if instead I read it from a file then is printed as
First\nSecond
Short program illustrating the problem:
#include "stdafx.h" // I'm using Visual Studio 2008
#include <fstream>
#include <string>
#include <iostream>
void main() {
std::ifstream ParameterFile( "parameters.par" ) ;
std::string line ;
getline (ParameterFile, line) ;
std::cout << line << std::endl ;
line = "First\nSecond";
std::cout << line << std::endl ;
return;
}
The parameters.par file contains only the line
First\nSecond
The Win32 console output is
C:\blabla>SOtest.exe
First\nSecond
First
Second
Any suggestion?
In C/C++ string literals ("...") the backslash is used to mark so called "escape sequences" for special characters. The compiler translates (replaces) the two characters '\' (ASCII code 92) followed by 'n' (ASCII code 110) by the new-line character (ASCII code 10). In a text file one would normally just hit the [RETURN] key to insert a newline character. If you really need to process input containing the two characters '\' and 'n' and want to handle them like a C/C++ compiler then you must explicitely replace them by the newline character:
replace(line, "\\n", "\n");
where you have to supply a replace function like this:
Replace part of a string with another string (Standard C++ does not supply such a replace function by itself.)
Other escape sequences supported by C/C++ and similar compilers:
\t -> [TAB]
\" -> " (to distinguish from a plain ", which marks the end of a string literal, but is not part of the string itself!)
\\ -> \ (to allow having a backslash in a string literal; a single backslash starts an escape sequence)
The character indicated in a string literal by the escape sequence \n is not the same as the sequence of characters that looks like \n!
When you think you're assigning First\nSecond, you're not. In your source code, \n in a string literal is a "shortcut" for the invisible newline character. The string does not contain \n - it contains the newline character. It's automatically converted for you.
Whereas what you're reading from your file is the actual characters \ and n.

Reading characters from a File with fscanf

I have a problem, using fscanf function :(
I need to reed a sequence of characters from file like "a b c d" (characters are separated by space).
but it doesn't works :(
how I have to read them? (
I tried to print it and the result is uncorrect. I think, it's because of spaces. I really don't know why it doesn't work.
Tell me please, what is wrong with array access?
From cplusplus.com:
The function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Then if your code is:
while ( fscanf(fin,"%c", &array[i++]) == 1 );
and your file is like this:
h e l l o
Your array will be:
[h][ ][e][ ][l][ ][l][ ][o]
If you change your code into:
while ( fscanf(fin," %c", &array[i++]) == 1 );
with the same file your array will be:
[h][e][l][l][o]
In any case the code works: it depends on what you want.
Anyway, you should think about starting to use fgets() + sscanf(), for example:
char buff[NUM];
while ( fgets(buff, sizeof buff, fin) )
sscanf(buff,"%c", &array[i++]);
With the single fscanf() the lack of buffer management can turns into buffer overflow problems.
Add white space before %c =>
while (fscanf(pFile," %c", &alpArr[i++]) == 1);
It should work.