Regular Expression Find all LF characters not CRLF hexadecimal - regex

I am viewing a CSV file which has LF characters in the middle of a field and CRLF character to actually denote a new line. I am viewing the file in hexadecimal in Sublime Text 3 and I want to do a simple find and replace where I search for LF characters but NOT CRLF and replace it with a space.
I've gotten as far as to search for LF but NOT CRLF, I could use the regular expression
[^0d]0a. Problem with this is that it doesn't capture the case where you could have XX0d 0aXX and I don't know how to capture this with regular expressions. I would then want to replace this with '20' which is space in hexadecimal.

Use a negative lookbehind that matches 0d with optional whitespace.
(?<!0d\s*)0a
However, some regexp engines won't allow quantifiers in lookbehinds. So you may need to put the whitespace check after the lookbehind, and then capture it to use it in the replacement.
(?<!0d)(\s*)0a replace with ${1}20
It would probably be easier if you did this in text mode instead of hex. Replace
(?<!\r)\n
with space.

Related

Is there regex to remove space and newline from xml input file

I would like to change an xml which is in format
<input>My
Input</input>
<input2>My
input2</input2>
to
<input>My Input</input>
<input2>My input2</input2>
The input xml file has more than 10000 records with xml in the above format which breaks the software to work properly.
Need a regex to fix it in one stroke.
I tried ('//n','') but it is not functioning as expected
If your regex flavor supports Lookbehinds, you may use something like this:
(?<!>)(\s)*[\r\n]+
..and replace with \1.
This will match any number of new-line characters, preceded by zero or more other whitespace characters and not preceded by the > character. Then, it will replace them with a whitespace character (if present) or nothing.
Demo.
If Lookbehind is not supported, you may use:
([^>])(\s)*[\r\n]+
..and replace with \1\2.

Find and replace using regular expressions - remove double spaces between letters only

Trying to do this in the Atom editor (1.39.1 x64, uBuntu 18.04), though assume this applies to other text editors using regular expressions.
Say we have this text:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Which we would like to change to:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Using Find with Regex enabled (.*), all occurrences are correctly found using: [a-zA-Z] [a-zA-Z]. But what goes in the Replace row to enforce the logic:
1st letter, single space, 2nd letter?
You can use this
([a-z])\s{2}([a-z])
and replace by $1 $2
Regex Demo
If your editor supports lookarounds you can use
(?<=[a-z])\s{2}(?=[a-z])
Replace by single space character
Regex demo
Note:- don't forget to use i flag for case insensitivity or just change the character class to [a-zA-Z]

regex match file with multiple extension

I have several strings like this
XYZ_TEST_2017.txt
ASD_TEST_2017.txt.tmp
I need to extract only those strings ending with .txt
So I'm using this regex:
[A-Z]{3}_TEST_[0-9]{4}.txt
However I still get the strings with multiple extensions like the second one (.txt.tmp)
See my regex demo.
How can I handle it?
To have your regex match everything up to the end, append an "end-of-text marker" ($) to your pattern like this:
[A-Z]{3}_TEST_[0-9]{4}\.txt$
As you may have noticed, I also escaped the dot, otherwise this filename would match as well:
SOM_TEST_1234Etxt
The dot (.) would match any character (depending on your flags, even newline and carriage return), in this case, the E before txt.

Find character, text around and extract it in Notepad++

I have a problem to find a character, enlarge it by constant number of characters around and return it.
Example of text:
Contrary to popular belief, (Lorem Ipsum) is not simply random text. It (has) roots in a piece of ...
Expected result:
r belief, (Lorem Ipsu
text. It (has) roots
How it should work:
find position of "(" - 10 characters
find position of "(" + 10 characters
extract text with start position of point 1. and end position of point 2. (and store it in a new row)
Please is it possible to do this in Notepad++ or similar software with function Find and Replace?
I believe this can be done with regex, but I am not able to write it.
Thank you very much!
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (.{10}\(.{10})
Replace With: \r\n\1\r\n
check regular expression
click Replace or Replace All
Depending on your line endings, you may need to change the \r\n to \n in the replacement.
Explanation:
the regular expressin centers at a literal ( (it has to be escaped as \( due the regex rules)
it captures the 10 character before and after it with the two .{10} sections
all the 21 character are captured into \1 (by putting the whole regular expression in unescaped parenthesis)
the replacement inserts \1 surrounded by linebreaks (either \r\n or \n, adopt what you need)

Parameterize block of text using Notepad++

I have the following text in Notepad++
A
B
C
D
I would like to "parameterize" this text and turn it into this using a regex or some other native Notepad++ command(s) or plugin:
'A', 'B', 'C', 'D'
Note that I want the end text to be on one line and no trailing comma, if possible. This question gets me close but I am left with a trailing comma and the text is not compacted to one line. Is there anyway to accomplish this in Notepad++ without using a macro?
Try this in Regex Search Mode.
Search for (\w)\r\n
Replace with ('\1', )
But you will have to remove the space and a comma manually from the end of the line.
You can do it in two steps:
Search for e.g. (\w+) and replace with '$1'
The \w+ will find the letters (and digits and the underscore), at least one.
Search for (\s+) and replace with ,
\s+ will find whitespace characters, that means here the newline characters at the end of a row. If you have whitespace in your text, you want to keep, use [\r\n]+ instead.
This way, if there is no newline after the last letter, there will be no trailing comma.