I have file containing around ~1400 lines. In each line there are infomation + in next line is next information which I want move "to previous" line (where is text)
I tried " for" changing into "\r |" - only that was coming to my head in that time.
For example here it's "structure" of my file:
T="topic 1"
for xxx#xxx.com
T="topic 2"
for yyy#yyy.com
I wanted move that to clear into that
T="topic 1" | for xxx#xxx.com
T="topic 2" | for yyy#yyy.com
You may use
Find what: \n( for)\b
Replace with: |$1
Details
\n - a line break
( for) - Capturing group 1 ($1): a space and for
\b - word boundary.
Test result:
Another option if you don't want keep for could be to match:
\n[ \t]+for[ ]
That will match:
\n Match a line break
[ \t]+ Match 1+ times a space or char (Or just a single space if that is the case)
for[ ] Match for followed by a space (the square brackets are for clarity only
And replace with a space, a pipe followed by a space
|
Regex demo
Related
I have a file with text like this:
"Title" = "Body"
And I would like to remove both " before the =, to leave it like this:
Title = "Body"
So far I managed to select the first block of text with:
.+(=)
That selects everything up to the =, but I can't find how to reemplace (or delete) both " .
Any suggestions?
You could use a capture group in the replacement, and match the double quotes to be removed while asserting an equals sign at the right.
Find what:
"([^"]+)"(?=\h*=)
" Match literally
([^"]+) Capture group 1, match 1+ times any char other than "
" Match literally
(?=\h*=) Positive lookahead, assert an = sigh at the right
Regex demo
Replace with:
$1
To match the whole pattern from the start till end end of the string, you might also use 2 capture groups and use those in the replacement.
^"([^"]+)"(\h*=\h*"[^"]+")$
Regex demo
In the replacement use $1$2
You can use
(?:\G(?!^)|^(?=.*=))[^"=\v]*\K"
Replace with an empty string.
Details:
(?:\G(?!^)|^(?=.*=)) - end of the previous successful match (\G(?!^)) or (|) start of a line that contains = somewhere on it (^(?=.*=))
[^"=\v]* - any zero or more chars other than ", = and vertical whitespace
\K - omit the text matched
" - a " char (matched, consumed and removed)
See the screenshot with settings and a demo:
I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern
I'm missing something with this regular expression find/replace attempt. I have the following format:
word | word | word
I would like to first replace every word with "word" to produce
"word" | "word" | "word"
and then subsequently every [space]| with ,, finally producing
"word", "word", "word"
Obviously I could just do this with two simple find(f)/replace(r) commands ( f:([a-z]*\>)r:"$1"; f:[space]|r:,), but is there a way to do all of this at once?
I've tried lots of different ideas, but they all failed. The most successful was finding ([a-z]*\>)(( \|)|\R) and replacing with "$1",, which only ever got me a "word", "word", word format. The solution is probably either much more complicated or much simpler than I'm trying, but I'm stumped. Thanks!
You may use
(\w+)|\s*\|
and replace with (?1"$1":,).
Details
(\w+) - Group 1: one or more word chars
| - or
\s*\| - 0+ whitespaces and then a | char.
(?1"$1":,) - a conditional replacement pattern that replaces with " + Group 1 contents + " if Group 1 matches, else, replaces with ,.
I trying to write a regex to match the following at the beginning of a new line
- a number followed by parantheses e.g. 2) or 8)
- a number followed by period e.g. 5
- the character '-'
- the character '*'
the following strings should match
"1. Sorting function. If you have a long checklist it's very difficult."
"5) This is another example"
"-this is yet another one"
"* last item in the list"
I have tried this but it doesn't quite get me what I am looking for.
re.findall(r'(?m)\s*^[-*(\d.)(\d\))]',item)
Try
re.findall(r'^\s*(\d+(\)|\.)|-|\*)', item, re.MULTILINE)
It will match all sequences of numbers followed by a closing parenthesis or period as well as dashes and stars at the beginning of the line.
Example: https://regex101.com/r/cR2lZ5/6
Assuming that your quote marks " are not included, and that each line is a separate string,
^\d\.|^\d\)|^\-|^\*
Would be the regular expression. | is OR, \d is a digit, and you escape the special characters ".", ")", "-", and "*" by putting a backslash in front of them.
You can test your regular expressions here. Good luck!
I have a big text file with addresses, and I want split the data into 3 variables. Example:
NM_LOGRADO
Street BLA BLA BLA 340
Av BLE BLI 318
Road BLI 48 Block 4
I want transform into:
NM_LOGRADO
Street(TAB)BLA BLA BLA(TAB)340
Av(TAB)BLE BLI(TAB)318
Road(TAB)BLI(TAB)48 Block 4
Basically, replace the first space and the last space before the first number space by tab.
I'm using Notepad++, and for the second replacement I tried replace ' (?=[0-9])(?<=)' by '(TAB)', but it replaced all spaces before numbers (in the third line I got Road(TAB)BLI(TAB)48 Block(TAB)4). For the first replacement I have no idea :(
Go to Search > Replace menu (shortcut CTRL+H) and do the following:
Find what:
(?:^.+?\K | (?=[0-9]+.+))
Replace:
\t
Select radio button "Regular Expression"
Then press Replace All
You can test it with your example at regex101.
Update1:
Based on your updated sample, try this:
Find:
^([^ ]+) ([^0-9]+) (.+)
Replace:
$1\t$2\t$3
Test it at regex101.
Update2:
Based on your updated sample, try this:
Find:
(?:^[^ ]+\K |(?<!Block|Ap) (?=[0-9]))
Replace:
\t
Test it at regex101.
I'm assuming that (TAB) refers to a tab character rather than a literal string.
Find what: ^(\w*) ((([A-Z]{3})( )?)+) (\d.*)$
Replace with: \1\t\2\t\6
(If my assumption was incorrect, replace \t with \(TAB\))
The key is the ungreedy space: ( )?. That leaves the leading and trailing spaces uncaptured, and therefore replaced by the tab characters.
Explanation of regular expressions:
^ Beginning of line
(\w*) Any number of alphanumeric characters, i.e. "Street", "Av", "Road"
((([A-Z]{3})( )?)+) 3 uppercase letters, followed by an ungreedy space, once or more, i.e. "BLA BLA BLA", "BLE BLI", "BLI"
(\d.*) A digit, followed by any number of any characters, i.e. "340", "318", "48 Block 4"
$ End of line
\1 First capture group, "(\w*)"
\t Tab character
\2 Second capture group, "((([A-Z]{3})( )?)+)"
\t Tab character
\6 Sixth capture group, "(\d.*)"
as you're using Notpad++, the easiest way is not to bother with regex but rather use a macro. simply record one and play it until the end of the line. You'll want to:
put your cursor at the first character of the file
Macros > Start Recording
find a space and convert it to tab (this will replace the first space of the row)
press END to go to the end of the line
use "find previous" command to find the last space of the line
replace that space with tab
go to the next line
Macros > Stop Recording
Run your macro till the end of the file