Vim: Inter-String Line Breaking - c++

Consider the three lines shown below.
std::ostringstream ss;
cc::write(ss, "Error parsing first magic byte of header: expected 'P', but got '{0}'.", c);
return io_error{ss.str()};
The second line automatically breaks because it exceeds the text width (&tw), but it does so unsatisfactorily for two reasons:
When the line breaks on a string, the procedure is a little more complicated than usual. Vim needs to close the string literal at the end of the broken line, and add a string literal at the beginning of the newly-created line. But it would be awkward for the line to be broken in the middle of a word, so Vim needs to back up until it finds the end of a word boundary, such that adding a " character after it would not exceed the text width. If it can find no such word boundary, then the entire string needs to be begun on the next line.
When the line breaks in the middle of a string, I do not want any indentation to be inserted at the beginning of the proceeding line.
Are there are any native features of Vim or plugins that I can use to get behaviors (1) and (2), or do I have to write my own plugin?

To have this special line breaking behavior both with auto-format and gq, you have to write a custom 'formatexpr' that takes this into account.
I'm not aware of any existing plugin, but maybe you find something to get you started on vim.org.

Related

Regex to remove unnecessary period in Chinese translation

I use a translator tool to translate English into Simplified Chinese.
Now there is an issue with the period.
In English at the finish point of a sentence, we use full stop "."
In Simplified Chinese, it is "。"which looks like a small circle.
The translation tool mistakenly add this "small circle" / full stop to every major subtitles.
Is there a way to use Regex or other methods to scan the translated content, and replace any "small circle" / Chinese full stop symbol when the line has only 20 characters or less?
Some test data like below
<h1>这是一个测试。<h1>
这是一个测试,这是一个测试而已,希望去掉不需要的。
测试。
这是一个测试,这是一个测试而已,希望去掉不需要的第二行。
It shall turn into:
<h1>这是一个测试<h1>
这是一个测试,这是一个测试而已,希望去掉不需要的。
测试
这是一个测试,这是一个测试而已,希望去掉不需要的第二行。
Difference:
Line 1 it only has 10 characters, and shall have Chinese full stop removed.
Line 4 is a sub heading, it only has 4 characters, and shall have full stop removed too.
By the way, I was told 1 Chinese word is two English characters.
Is this possible?
I'm using the approach 2
Second: maybe this one is more accurate: if there is no comma in this line, it should not have a full stop.
to determine whether a full stop 。 should be removed.
Regex
/^(?=.*。)(?!.*,)([^。]*)。/mg
^ start of a line
(?=.*。) match a line that contains 。
(?!.*,) match a line that doesn't contain ,
([^。]*)。 anything that not a full stop before a full stop, put it in group 1
Substitution
$1
Check the test cases here
But do mind this only removes the first full stop.
If you want to remove all the full stops, you can try (?:\G|^)(?=.*。)(?!.*,)(.*?)。 but this only works for regex engines supports \G such as pcre.
Also, if you want to combine the two approaches(a line has no period , and the length is less than 20 characters), you can try ^(?=.{1,20}$)(?=.*。)(?!.*,)([^。]*)。

Loading a file with the LoadFromFile () function with a newline

I load the text file .txt using the LoadFromFile() function, and the text in the middle of the line is marked with a newline '\n'.
The LoadFromFile() function treats this character as a new line and divides the line in that place by creating a new line.
In the Windows system Note the text looks like this: **Ala has ace**
The program that loads this file looks different:
plik->LoadFromFile( path, TEncoding::ASCII);
for( short int i = 0; i < plik->Count; ++i )
Memo1->Lines->Add( plik->Strings[i] );
In Memo1 the text looks like this:
**Ala**
**has ace**
Can I remove the '\n' character to make the entire line and how?
I answered this same question on the Embarcadero forums earlier today, but I will answer it here, too.
plik is a TStringList (according to the other discussion), so its LoadFrom...() method treats bare-CR, bare-LF, and CRLF line breaks equally when the TStrings::LineBreak property matches the RTL's global sLineBreak constant. If the LineBreak property does not match sLineBreak, then TStrings only splits on line breaks that match its LineBreak property.
Since the RTL's sLineBreak constant is CRLF on Windows, and you don't
want to split on bare-LF line breaks, you are going to have to parse
the file data manually, not use TStrings::LoadFromFile() at all.
For instance, you could read the whole file into a System::String using the System::Classes::TStreamReader::ReadToEnd() or System::Ioutils::TFile::ReadAllText() method (TStreamReader and TFile both have methods for reading lines, but they both treat all three forms of line break equally), and then parse that String to extract CRLF-delimited substrings while ignoring any bare-LF characters.
Ideally, you would load a file into a TMemo by using its own LoadFromFile() method. But, in this situation, that will not work, either, because TMemo normalizes all three forms of line breaks to CRLF before passing the data to the Win32 API, so that is not useful to you.

gvim syntax highlight for different types of lines

I've done several syntax highlighting files for simple custom formats in the past (even changing the format a bit to be capable of making the syntax file basing on my skills, in effects).
But this time I feel confused and I will appreciate some help.
The file format is (obviously) a text file where every line contain three distinct elements separated by spaces, they can be "symbols" (names containing a series of alphanumerical chars plus hyphens) or "string" (a series of any chars, spaces included, but not pipes).
Strings can be only at start or end of a line, the middle element can be only a symbol. And string are delimited by a pipe at the end if it is the first element and at the start if it is the last element.
But a line can be also all symbols, string first and rest symbols, and string last and rest symbols.
Strings are always followed by a pipe if they are the first element, or
with a pipe as prefix if they are the last element.
Examples:
All symbols
this-is-a-symbol another-one and-another
First string
This is a string potentially containing any char| symbol symbol
Last string
symbol symbol |A string at the end of the line
First and last as strings
This is a string| now-we-have-a-symbol |And here another string
This four examples are the only possibilities available for a correct formatting.
All symbols need to be colored differently, a specific color for first element, a specific color for second, and one for third.
But strings will have one unique different color regardless of position.
If the pipe chars can be "dimmed" with a color similar (not precisely the same) to background this will be a big plus. But I think I can manage this myself.
A line in the file not like the ones showed will have to be highlighted as an error (like red background).
Some help?
ps: stackoverflow apply a sort of syntax highlighting to my examples which can be misleading
I have found a simpler approach than what I initially thought was necessary in terms of regular expressions. At end I just need to match the first element and the last, how can I've not think of that... So this is my solution, it seems to work well for my specifics. It only doesn't highlight bad formatted lines. Good enough for now. Thanks for the patience and the attention.
" Vim syntax file
" Language: ff .txt
if exists("b:current_syntax")
finish
endif
setlocal iskeyword+=:
syn match Asymbol /^[a-zA-Z0-9\-]* /
syn match Csymbol / [a-zA-Z0-9\-]*$/
syn match Astring /^.*| /
syn match Cstring / |.*$/
highlight link Asymbol Constant
highlight link Csymbol Statement
highlight link Astring Include
highlight link Cstring Comment
let b:current_syntax = "ff"

How to find a string in a txt file with case insensitivity and still retain part capitalization? In C++

It would be easy to make everything in the file lowercase and find it, but I want to find the string with the original capitalization so I could put it to a pointer and print it later. For example
FIND_WORD ransom.
File Word found. Line added
DISPLAY
rAnSoM nOtE. yOu HaVe TiLl nOon.
Go through the file line by line. For each line, go through the string from beginning to end.
For each starting point in the line, do a case-insensitive compare of the subsequent characters in the string to the characters in the word you're trying to find. If they all match, output that entire line as originally read.
In other words, don't convert anything to lower case. Instead, do a case-insensitive compare.

QString::split() and "\r", "\n" and "\r\n" convention

I understand that QString::split should be used to get a QStringList from a multiline QString. But if I have a file and I don't know if it comes from Mac, Windows or Unix, I'm not sure if QString.split("\n") would work well in all the cases. What is the best way to handle this situation?
If it's acceptable to remove blank lines, you can try:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
This splits the string whenever any of the newline character (either line feed or carriage return) is found. Any consecutive line breaks (e.g. \r\n\r\n or \n\n) will be considered multiple delimiters with empty parts between them, which will be skipped.
Emanuele Bezzi's answer misses a couple of points.
In most cases, a string read from a text file will have been read using a text stream, which automatically translates the OS's end-of-line representation to a single '\n' character. So if you're dealing with native text files, '\n' should be the only delimiter you need to worry about. For example, if your program is running on a Windows system, reading input in text mode, line endings will be marked in memory with single \n characters; you'll never see the "\r\n" pairs that exist in the file.
But sometimes you do need to deal with "foreign" text files.
Ideally, you should probably translate any such files to the local format before reading them, which avoids the issue. Only the translation utility needs to be aware of variant line endings; everything else just deals with text.
But that's not always possible; sometimes you might want your program to handle Windows text files when running on a POSIX system (Linux, UNIX, etc.), or vice versa.
A Windows-format text file on a POSIX system will appear to have an extra '\r' character at the end of each line.
A POSIX-format text file on a Windows system will appear to consist of one very long line with embedded '\n' characters.
The most general approach is to read the file in binary mode and deal with the line endings explicitly.
I'm not familiar with QString.split, but I suspect that this:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
will ignore empty lines, which will appear either as "\n\n" or as "\r\n\r\n", depending on the format. Empty lines are perfectly valid text data; you shouldn't ignore them unless you're certain that it makes sense to do so.
If you need to deal with text input delimited either by "\n", "\r\n", or "\r", then I think something like this:
QString.split(QRegExp("\n|\r\n|\r"));
would do the job. (Thanks to parsley72's comment for helping me with the regular expression syntax.)
Another point: you're probably not likely to encounter text files that use just '\r' to delimit lines. That's the format used by MacOS up to version 9. MaxOS X is based on UNIX, and it uses standard UNIX-style '\n' line endings (though it probably tolerates '\r' line endings as well).