how can i remove every thing before ":" string in notepad++? - replace

I have a file like this in notepad++
n1:n1:n1
n1:n1:n2
n1:n1:n3
i want to delete everything before the first ":" including the ":" itself
and be like this
n1:n1
n1:n2
n1:n3
and thanks..
hope i was clear enough in my explanation of my problem
Ken White :
thanks but the problem is my file have over 10k lines and the first "n1" changes to "n2" after about 1000 lines
and then it become "o1" instead of "n1"
i want to delelte every thing before the first ":"

Use Replace and use a regular expression to find any chars at the start of the line that are not a colon :, followed by a colon, and replace them with nothing
Find what: ^([^:]+:)(.)
Replace with: \2
Search Mode: Regular Expression
This actually answers your question and doesn't assume anything about what is before or after the first colon.
The first ^ indicates that the search must start at the beginning of a line
Parentheses are groupers and savers. They're not actually needed for this first bit, since you are just deleting the stuff before the colon, but this makes it parallel with Ken White's solution
Square brackets [ ] indicate which characters you want to look for
a. The second ^ right after the first square bracket switches from chars you want to look for to chars you do not want to look for
b. So [^:] means look for any char other than a colon
The plus + means look for 1 or more occurrences of this set of chars
a. If some lines may start with a colon, and you still want to replace that colon, you'd want to look for 0 or more occurrences of non-colon chars at the start of a line
b. To do that, replace the + with a *
Select the colon (so it will be deleted also)
Right Paren ends the first group
Left Paren starts the 2nd group
Dot . says look for any char. If you don't have this here, then it will delete everything before the first colon and then next set will be at the start of the line, so you'll delete too much. You could technically put a plus or star here, but you don't need it.
Right Paren ends the 2nd group
In the Replace with box, \2 (that's a backslash or reverse solidus if you prefer) will take the contents of the 2nd group and replace everything it found with those contents
Here is the test input and output:
Input (stuck some tabs and spaces and other stuff in there for good measure)
n1:n1:n1
n1:n1:n2
n1:n1:n3
n2:n1:n3
n4:n7:n5
o1:n1:n1:m1:m1:l1:l7b:l1011
z99:
-- Here's some more data
o1:o2:o3:o4:o5
:o2:o3:o4:o5:o6
o1:o1:o3:x37:n99
n2:o1:o3:o44:z76
n4:n7:n5:u72:j9:
Output
n1:n1
n1:n2
n1:n3
n1:n3
n7:n5
n1:n1:m1:m1:l1:l7b:l1011
z99:
o2:o3:o4:o5
:o2:o3:o4:o5:o6
o1:o3:x37:n99
o1:o3:o44:z76
n7:n5:u72:j9:
Notice it removed any line without a colon, which in some cases may be preferable. It also missed the two lines I threw in there with a colon at the beginning or end of the line.
If you wanted to leave these blank lines in, add an \r\n in the brackets in step 3 above (and again these are backslashes). Then it will look for any char that's not a colon or end-of-line (Step 3), followed by a colon (Step 5). Therefore, it only removes chars on the line with a colon. Change Find what to this string:
Find what: ^([^:\r\n]+):(.)
To catch the lines starting with a colon or with nothing after the first colon, change the plus to a star and add a question mark after the dot:
Find what: ^([^:\r\n]*):(.?)

Related

Regex in Notepad++ to add something after first character

So I have example lines like these:
JJmartin
It needs to become:
J.Jmartin
I need regex to be able to insert . after first character in each line
i tried ^. and replace with . but that regex delete first character and replaces it with .
I also had an idea of maybe deleting everything after first character and then putting it together again with a program i have since i have a regex that deletes everything but the last character, so i tried to tweek it, but didn't work, that regex is:
.*([A-Za-z\d]) replace with \1
Find ^(.{1}) That is, from the beginning of each line, capture a single character.
Replace \1\. That is, the captured character with the same character followed by a dot. The dot should be escaped because of its meaning within regex.

In lines starting with specific word followed by words separated by semicolon, replace semicolon with a comma and wrap the words in double quotes

I'm trying to change certain lines in my file using notepad++ and I have very less knowledge at regular expressions and henceforth seeking help.
Any kind of help is appreciable.
Find all the lines that looks like as See ABC'D EFG;IJKL;FOO;BAR;XXXXson on.
Lines that starts with word "See"
After that,there are words all in Capital letters and separated by semicolon
Words can have special characters
a) space
b) ' (apostrophy)
c) , (comma)
d) - (hiphen)
Ends with a full stop .
And replace those lines as:
See:["ABC'D EFG","IJKL","FOO","BAR",....]
Lets say the number of semi-colon is variable. You need to proceed in two passes.
Use Replace All for the two passes:
find: ^See \K([A-Z ,;'-]+)\.
replace: ["$1"]
and then:
find: (?:\G(?!^)|^See \["(?=[^"]*"]))[^";]*\K;
replace: ", "
The first pass is easy to understand, it only finds corresponding lines, remove the final dot and encloses the part with uppercase letters, commas, spaces, semi-colons, apostrophes and hyphens between double quotes and square brackets.
The second pass needs to replace only semi-colons inside quotes and square brackets for lines that start with See. To do that I used the second branch ^See \["(?=[^"]*"]) to reach the interesting lines and the \G anchor in the second branch to ensure that the next matches are contiguous to the first. Since [^";]* excludes the double quote, once the last semicolon is reached, the first branch can no longer succeed and the contiguity is broken.
Use \W which matches any non-word character
Example https://regex101.com/r/lFANF0/4
Find See\s([A-Z' ]+)\W(\w+)\. and Replace See:["$1","$2"]
1stGroup (\w+\'\w+\s+) \w+ matches any word character (equal to [a-zA-Z0-9_])
+ Matches between one and unlimited times
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
2nd Group (\w+\W*\w+) \W* matches any non-word character (equal to [^a-zA-Z0-9_])
find what: See ([A-Z'\-, ]+)\;([A-Z'\-, ]+)\.
replace with: See:["\1", "\2"]
see https://regex101.com/r/bfJkN6/3
also tested on my notepad++, got See:["ABC'D EFG", "IJKL"]
I updated the regex to catch multi hits on https://regex101.com/r/bfJkN6/5
See ((([A-Z'\-, ]+)\;)+)([A-Z'\-, ]+)\.

Understanding regex in shell

I came across single grouping concept in shell script.
cat employee.txt
101,John Doe,CEO
I was practising SED substitute command and came across with below example.
sed 's/\([^,]*\).*/\1/g' employee.txt
It was given that above expression matches the string up to the 1st comma.
I am unable to understand how this matches the 1st comma.
Below is my understanding
s - substitute command
/ delimiter
\ escape character for (
( opening braces for grouping
^ beginning of the line - anchor
[^,] - i am confused in this , is it negate of comma or mean something else?
why * and again .* is used to match the string up to 1st comma?
^ matches beginning of line outside of a character class []. At the beginning of a character class, it means negation.
So, it says: non-comma ([^,]) repeated zero or more times (*) followed by anything (.*). The matching part of the string is replaced by the part before the comma, so it removes everything from the first comma onward.
I know 'link only' answers are to be avoided - Choroba has correctly pointed out that this is:
non-comma ([^,]) repeated zero or more times () followed by anything (.). The matching part of the string is replaced by the part before the comma, so it removes everything from the first comma onward.
However I'd like to add that for this sort of thing, I find regulex quite a useful tool for visualising what's going on with a regular expression.
The image representation of your regular expression is:
Given the string "foo, bar", s/\([^,]*\).*/\1/g, and more specifically \([^,]\)*) means, "match any character that is not a comma" (zero or more times). Since "f" is not a comma, it matches "f" and "remembers" it. Because it is "zero or more times", it tries again. The next character is not a comma either (it is o), then, the regex engine adds that o to the group as well. The same thing happens for the 2nd o.
The next character is indeed a comma, but [^,] forbids it, as #choroba affirmed. What is in the group now is "foo". Then, the regex uses .* outside the group which causes zero or more characters to be matched but not remembered.
In the replacement part of the regex, \1 is used to place the contents of the remembered text ("foo"). The rest of the matched text is lost and that is how you remain with only the text up to the first comma.

regex to capture the second space in a line?

I've got the following text:
get close to,be next to,follow in sequence or along designated route/
direction
How would I match the space after the "e" in close and then replace it with a tab?
Although this may be easy to all of you, I've spent 4 hours trying to do this but haven't been successful.
The general rule would be "match only the space after the second word". I've got over 2000 unique lines which is why I need a regex.
Thank you!!
Search for: ^([^ ]+[ ]+[^ ]+)[ ]
Replace with: \1\t
From the beginning of the line, look for a pattern of non-spaces followed by spaces followed by another set of non-spaces. At the end, match a space character.
The replacement is everything leading up to the final space character, followed by a tab.
Demo: http://regex101.com/r/oS8vV7/1
You may not be able to match only the space you're replacing. That would require a lookbehind of variable length which isn't supported.
In vi editor you can use this search/replacement:
:s/^\([[:blank:]]*\w\+[[:blank:]]\+\w\+\)[[:blank:]]\+/\1\t/
Or shorter (thanks to #PeterRincker):
:%s/\v(\w+\zs\s+){2}/\t
s/\s//2
This can be used to search for a space (\s) and for removing the space, where the 2 represents the second space.

How can I delete everything that is not matched by a pattern

I have a text document in Notepad++ with information separated by line, but want to delete everything but every seventh line. This line is always matched by the pattern (\d{4} :.*?\r\n).
How can I delete everything that does not match this pattern so that I just get every seventh line separated by \r\n?
You could maybe try:
^(?!\d{4} :)[\s\S]*?(?=\r\n\d{4} :)
regex101 demo
[Note, I couldn't put \r in there because I couldn't insert carriage returns in the input box somehow...]
^ is a beginning of line anchor and matches the beginning of a line.
(?!\d{4} :) is a negative lookahead and will make the whole regex match only if there's no \d{4} : at the beginning of the line (the position being indicated by ^).
[\s\S]*? is a character class that will match any and all character. The quantifier is a lazy quantifier that will cause matching to stop as soon as possible (this is determined by what's following)
(?=\r\n\d{4} :) is a positive lookahead, and matches only when there's a \r\n\d{4} : ahead.
If I understood your question well, this would be what you're looking for. All lines except the 7th lines get deleted and there's only one empty line left behind between each of those 7th line.
Open the search dialogue and select the Mark tab. In the Find what field enter a search string to find the lines to be kept. Make sure that Bookmark line and Regular expression are selected, then click Mark all. Next visit the menu => Search => Bookmark => Remove unmarked lines.
The question says the lines to be retained match (\d{4} :.*?\r\n). The capture brackets ( and ) are not needed as the capture is not used. Searches for \r\n may often be rewritten as searching for $, ie an end-of-line. Your search pattern is just looking for the first end-of-line after the earlier items. The search may be reduced to \d{4} :.