Removing new lines from a text file in Notepad++ - regex

I need to replace all the strings that look like this:
<\name>
for a TAB
name can be anything from 3 to 15 characters long
I've managed to do it by doing search <.*> replace with \t
Now I need to replace any new lines with a single TAB i.e. remove the new line. For some reason Ultraedit doesn't recognise the new line in the search box. I've used \r and \n, but none of them works.
This is an example of the file, after the search and replace:
1
101
54651
150756
282
506
398
2759
59.62
35737
65
I want to get all that in a single line separated by tabs.
Any ideas?

As you're using Notepad++ I'll assume you're on Windows.
This means the text files you're using were likely created on a DOS type system (including Windows...) and therefore terminate lines with \r\n rather than a single \n like you might find on a UNIX system.
Try searching for that instead.

Related

Regex to remove unnecessary period in Chinese translation

I use a translator tool to translate English into Simplified Chinese.
Now there is an issue with the period.
In English at the finish point of a sentence, we use full stop "."
In Simplified Chinese, it is "。"which looks like a small circle.
The translation tool mistakenly add this "small circle" / full stop to every major subtitles.
Is there a way to use Regex or other methods to scan the translated content, and replace any "small circle" / Chinese full stop symbol when the line has only 20 characters or less?
Some test data like below
<h1>这是一个测试。<h1>
这是一个测试,这是一个测试而已,希望去掉不需要的。
测试。
这是一个测试,这是一个测试而已,希望去掉不需要的第二行。
It shall turn into:
<h1>这是一个测试<h1>
这是一个测试,这是一个测试而已,希望去掉不需要的。
测试
这是一个测试,这是一个测试而已,希望去掉不需要的第二行。
Difference:
Line 1 it only has 10 characters, and shall have Chinese full stop removed.
Line 4 is a sub heading, it only has 4 characters, and shall have full stop removed too.
By the way, I was told 1 Chinese word is two English characters.
Is this possible?
I'm using the approach 2
Second: maybe this one is more accurate: if there is no comma in this line, it should not have a full stop.
to determine whether a full stop 。 should be removed.
Regex
/^(?=.*。)(?!.*,)([^。]*)。/mg
^ start of a line
(?=.*。) match a line that contains 。
(?!.*,) match a line that doesn't contain ,
([^。]*)。 anything that not a full stop before a full stop, put it in group 1
Substitution
$1
Check the test cases here
But do mind this only removes the first full stop.
If you want to remove all the full stops, you can try (?:\G|^)(?=.*。)(?!.*,)(.*?)。 but this only works for regex engines supports \G such as pcre.
Also, if you want to combine the two approaches(a line has no period , and the length is less than 20 characters), you can try ^(?=.{1,20}$)(?=.*。)(?!.*,)([^。]*)。

Execute two commands in one line in vi editor

This command
%s#^#/*
and this command
%s#$#*/
works fine in vi editor on ubuntu 14.04 when I execute them separately one after another.
What I need is execute them both in one line like
%s#^#/* <bar> %s#$#*/
I also tried | and ; and CR as separator and always get error 488 trailing character
In my vim 7.4 patch 769 this works very well.
:%s/foo/FOO/ | %s/bar/BAR/
It looks like you have omitted the final separator character from your substitutions. Vim doesn't know where your replacement string ends without having the terminating # chars in there. When doing a usual command of one substitution, vim treats the end of the command as the terminating separator if one is not to be found.
For instance, it works if you omit the terminator in the last substitution of the command:
:%s/foo/FOO/ | %s#bar#BAR
And finally a quick tip regarding to your subs. You can wrap a line in text with a single substitution with some basic capture group magic. Match for the whole line and use & in the replacement to reuse the matched text:
:s#.*#/* & */

On Cygwin (or windows 7), match a word, look backwards, skip a word and print x number of comma separated words

Have a headache trying to understand squiggly awks and greps but not gotten far.
I have 100 thousand files from which I'm trying to extract a single line.
A sample set of lines of the file is:
Revenue,876.08,,9361.000,444.000,333.000,222.000,111.00,485.000,"\t\t",178.90,9008.98
EV to Revenue,6.170,0.65,3.600,2.60,1.520,1.7,"\t\t",190.9,9008.98,80.9,87
(there are two tabs between the double quotes. I'm representing them with \t here. They are actual whitespace tabs)
I'm trying to output just this line that starts with Revenue:
Revenue,444.000,333.000,222.000,111.000
This output line outputs the first word of the line and the comma (ie: Revenue,) It then finds the two tabs ensconced in double quotes, looks backwards skipping the first set of comma separated numbers (also assume that instead of numbers, there could be nothing ie: just a comma separated blank) and then outputs the 4 set of comma separated numbers.
Is this doable in a simple grep or awk or cut or tr command on cygwin that won't be a bear to run on 100K files ?
To clarify, there are 100K files that look very similar. Each file will contain lots of lines (separated by new line/carriage return). Some lines will contain the word Revenue at the start, some at the middle (as in the 2nd sample line I had paste above) etc. I'm only interested in those lines that start with Revenue followed by the comma and then the sequence above. Each file will contain that specific line.
As a completion to this kind of task (because working on 100K files would require this too), what would have to be added to sed to print out the current file name being operated on too?
ie: output like this:
FileName1: Revenue,444.000,333.000,222.000,111.000
[I'll post the answer here if I find it]
Thank you!
Thanks to Sputnick for editing my question so it looks neat and thanks to shellter for responding.
Ed, your solution looks really good. I'm testing it out and will reply back with info plus my understanding of how that regex works. Thank you very much for taking time to write this out!
Since this is just a simple subsitution on a single line it's really most appropriate for sed:
$ sed -n -r 's/(^Revenue)(,[^,]*){3}(.*),[^,]*,"\t\t".*/\1\3/p' file
Revenue,444.000,333.000,222.000,111.00
but you can do the same in awk with gensub() (gawk) or match()/substr() or similar. It will run in the blink of an eye no matter what tool you use.

Sublime Text macro to find and replace file path characters on current line

I use Sublime 2 for developing R and PHP code, although I imagine this shortcut would be useful for other languages.
If I copy the path of a file from Windows Explorer / XYPlorer (or other source) it has backslashes for directories. When entering a path into the source code, it needs forward slashes.
Sublime has some reasonably powerful macro commands, but I cannot think of a combination that would be able to:
take the string of text on the current line
replace all instances of '\' and replace them with '/'
Here is the workflow that I envisage:
Locate my filename in Explorer and copy its path
In Sublime, write a line of code and paste in the path
Hit a keyboard shortcut, say Ctrl+Shift+\, and all back slashes are converted to forward slashes
The result:
myPath = "E:\WORK\Code\myFile.csv";
Becomes:
myPath = "E:/WORK/Code/myFile.csv";
Without running the risk of backslashes elsewhere in the file being changed (e.g. \n characters), and without having to use multiple key presses or mouse clicks.
I imagine this would be possible with Regex. Two things I am no expert in are Sublime macros or regex, so I wonder if anyone else knows the magical commands that would achieve this?
I tried this for about 15 minutes. A few things:
Sublime text 2 doesn't allow for find/replace with macros
Sublime text 3 doesn't allow for 'find in selection'
So, I think you are kind of beat right now other than writing a plugin, which would be fairly straightforward.
This works for Sublime Text 3:
Type r before the string to tell python to read the directory as a raw string.
This way all the backslashes are read as slashes instead of 'ignore next character' (default meaning of \ in python)
Example
myPath = r"E:\WORK\Code\myFile.csv"
Python should now read the \ as /

How do change white spaces within CDATA to non-white space characters i.e. crlf to \n

VB2008 and VB2010 on Windows 7.
I am getting XML from an external source that needs to be written to a file. The XML contains a lot of CDATA's with white space characters such as new line CR LF's, etc. (i.e. 13 and 10's). This file is written via a WriteAllText and then read by another program.
The control characters screws up the output filse that is created.
I have tried to manipulate the contents of the CDATA and just do a replace but the file is so big that it takes about 20 minutes on a 3.6g machine so it is just not practical.
Need a way to convert the 13's and 10's into \n, etc and not takes a long time.
The 2nd VB2010 program uses
Transform As New XslCompiledTransform .....
Any ideas?