Regex (Notepad++) - Locate and replace second occurance of a character per line

Regex (Notepad++) - Locate and replace second occurance of a character per line - regex

I would like to change the second forward slash, within each line, to a comma.
I have found various posts and managed to derive a way of doing it from them but it's not doing it how I want.
Initial attempt - I thought I needed to replace between 2 delimiters
1st "Replace 2nd occurrence" - Found this post which seemed easier.
2nd "Replace 2nd occurrence"- Used the regex in here as a base for mine.
What I am doing is;
Find:
^(.*?)\/(.*?)\/
Replace:
$&,
Which results in changing my data from;
042146/OVERNIGHT/HSSC825571,started,14/07/2016,00:00:56,V0700LWHSB
042146/OVERNIGHT/HSSC825571,ended,14/07/2016,00:00:56,
042147/OVERNIGHT/HSSC825571,started,14/07/2016,00:00:58,V0700LWHSB
042147/OVERNIGHT/HSSC825571,ended,14/07/2016,00:00:58,
To;
042146/OVERNIGHT/,HSSC825571,started,14/07/2016,00:00:56,V0700LWHSB
042146/OVERNIGHT/,HSSC825571,ended,14/07/2016,00:00:56,
042147/OVERNIGHT/,HSSC825571,started,14/07/2016,00:00:58,V0700LWHSB
042147/OVERNIGHT/,HSSC825571,ended,14/07/2016,00:00:58,
Is there a way of just replacing the second /?
An example set of my data is;
042146/OVERNIGHT/HSSC825571,started,14/07/2016,00:00:56,V0700LWHSB
042146/OVERNIGHT/HSSC825571,ended,14/07/2016,00:00:56,
042147/OVERNIGHT/HSSC825571,started,14/07/2016,00:00:58,V0700LWHSB
042147/OVERNIGHT/HSSC825571,ended,14/07/2016,00:00:58,
042154/TEMP56/QPADEV000M,started,14/07/2016,00:01:02,V0700LRFIN
042154/TEMP56/QPADEV000M,ended,14/07/2016,00:07:12,
042155/JMALICKA/QPADEV000N,started,14/07/2016,00:01:05,V0700LRFIN
042155/JMALICKA/QPADEV000N,ended,14/07/2016,00:06:53,
042156/DG8SVCPRF/DG8SVC,started,14/07/2016,00:01:15,DATAGATE
042156/DG8SVCPRF/DG8SVC,ended,14/07/2016,00:12:01,
042157/OVERNIGHT/RCPTDISCRP,started,14/07/2016,00:01:42,V0700LBATC
042157/OVERNIGHT/RCPTDISCRP,ended,14/07/2016,00:01:44,
042158/QTCP/QTSMTPCLTP,started,14/07/2016,00:01:53,QSYSWRK
042158/QTCP/QTSMTPCLTP,ended,14/07/2016,01:29:08,
042159/QTCP/QTSMTPCLTP,started,14/07/2016,00:01:53,QSYSWRK
042159/QTCP/QTSMTPCLTP,ended,14/07/2016,00:19:05,

Ctrl+H
Find what: ^([^/]+/[^/]+)/
Replace with: $1,
Replace all
This will replace the second slash of each line by a comma.

You were almost there. You only need to change your replace string with the following:
$1/$2,
How it works
Your regex was: ^(.*?)\/(.*?)\/
In Notepad++'s replace string, the dollar sign is used to refer to groups enclosed by parentheses in the regex.
$1 refers to the first group, (.*?) which is at the beginning of the line, as specified with the ^ character.
$2 refers to the second group, also (.*?), but which follows the first /.
Since you don't want to replace the first slash, you need $1/$2 at the beginning of your replace string. But since what follows the second group is another / (the 2nd one on the line), you need to replace it with the ,. That's why the replace string has to be $1/$2,. Notice that all characters that are not enclosed by ()'s need to be re-written in the replace string. Otherwise, they're just omitted (try replace string $1$2 and you'll see what I mean).
In other editors or programming languages, instead of the $ sign, the \ is sometimes used (sometimes doubled) to refer to parenthetic groups. So you could have for instance \\1/\\2, or \1/\2, as a replace string instead of $1/$2,.

Related

extracting text between nth and n+1th occurence of a character using sed

I'd like to know how to take the following string
/text1/text2/text3/wanted_text/text5/text6
and get the wanted text, based solely on its position between the 4th and 5th /?

A substitution command is enough (I've obviously assumed that the interesting part is between the 4th and 5th / as you said):
echo your_text | sed -E 's!(/[^/]+){3}/([^/]+).*!\2!'
where I've used ! as separator for the parts of the substitution command, in order to avoid having to escape every /.
More in detail:
s!…!…! is the seach-and-substitute command, where you put the search pattern in the first … and the replacement in the second …;
the seach pattern is (/[^/]+){3}/([^/]+).* and matches 3 occurrences of a / followed by 1 or more non-/, followed by a / followed by 1 ore more non-/; the (…) are for grouping a part of a regex such that you can apply quatifiers (like {3}) to the whole group (just like in (/[^/]+){3}), and for capturing the matching text to allow you to refer to it in the replacement; in this case, the third of the 3 texts matching (/[^/]+){3} is referred to via \1, whereas the text matched by ([^/]+) is referred to via \2;
the replacement is simply \2 (see previous point).
For more details about how the search pattern works, and to understand all of its parts, you can refer to this demo on regex 101.
(-E is a non-POSIX-compliant option that makes the script more readable. Without it, you have to prepend \ to each of (, ), {, } and +.)

Regular Expression (notepad++) insert, not replace

In a regular expression (notepad++), I want to search for:( )|(:)|(_)|(\.), and to insert \ before to, as above, a blank space, colon, under line and ".".
Search example: abcd:1234 jiod.8ufd_adfd
Result: abcd\:1234\ jiod\.8ufd\_adfd
Briefly, how can I refer to what was found in the replace expression?
Note that it is not \1, \2, \3 or \4 in the example, as I need to include what was found, there is no way to know which was found, is there?

You can use a single character class (instead of using the alternation with capturing groups) to match one of the listed
In the replacement use $& to refer to the matched text and prepend a backslash.
Match
[:\h._]
Replace with
\\$&
The character class matches either a colon, horizontal whitespace char, dot or underscore.
Regex demo

There's no such thing as insert, because if you think about it, inserting is just replacing the original with a new string that contains the old text as well.
Try this instead: search for ([ :_.]) (your original regex is pointlessly complicated) and replace with \\$1 (ie, slash followed by the original text).

remove all commas between quotes with a vim regex

I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g

My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g

:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2

:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas

Regex - Renaming files with a prefix/suffix

Command below is from this answer: https://stackoverflow.com/a/208220/3470343
The command adds the word new. as a prefix. That's understood.
rename 's/(.*)$/new.$1/' original.filename
However, I'd like to ask why the open and close brackets are required here:
(.*)
And also why is $1 the variable which stores the original file name, why can't I do the the same with following (where i have replaced $1 with $2):
rename 's/(.*)$/new.$2/' original.filename
I'm still relatively new to bash, so help would be greatly appreciated.

First off, (.*)$ is what is known as a regular expression (or regex). Regular expressions are used to match text based on some rules.
For example, .* matches zero or more characters. $ matches the end of the line. Because regular expressions by default are greedy, .*$ matches the whole line (although, precisely because regexes are greedy, $ is superfluous).
However, I'd like to ask why the open and close brackets are required here: (.*)
The round brackets denote a group. Groups are used to "save" the contents of the matched text, so that you can use it later.
And also why is $1 the variable which stores the original file name, why can't I do the the same with following (where I have replaced $1 with $2): ...
In the case of rename(1), the first group is stored in $1, the second group is stored in $2 and so on. For example, the following regular expression:
(a)(b)(c)
stores a single a into $1, a single b into $2 and so on.
There you have only one group, therefore you must use $1. Tokens like $2, $3, ... will be empty.
Last but not least, you could use a shorter, equivalent command:
rename 's/^/new./'
Here ^ denotes the start of the string.

Notepad++ Replace all with an exception

I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!

Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.

It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.

You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex (Notepad++) - Locate and replace second occurance of a character per line - regex

Ctrl+H Find what: ^([^/]+/[^/]+)/ Replace with: $1, Replace all This will replace the second slash of each line by a comma.

Related

extracting text between nth and n+1th occurence of a character using sed

Regular Expression (notepad++) insert, not replace

remove all commas between quotes with a vim regex

Regex - Renaming files with a prefix/suffix

Notepad++ Replace all with an exception

Categories

Resources