Replace string with modified string - regex

I have a file with 1000's of rows looking like
"20140611","20:19","C","IT","IT","HDR","HDPDIT","675605","000000135.97"," ..........
I am trying to replace all occurrences of string that matches this pattern :
quote then 6 numerics followed by a closing quote ( i.e. "675605" with "675605#")
Using edit plus regular expression search and replace, the search string is :
\"[0-9][0-9][0-9][0-9][0-9][0-9]\"
This will find all the occurrences I need
However I'm unable construct the correct replace with reg ex to replace the match with itself followed by the # sign e.g. "675605#

With sed you can have:
sed -r 's|"([0-9]{6})"|"\1#"|g' file
Add -i to modify it inline.
So my proposed regex - replacement form is:
"([0-9]{6})" - "\1#"
Quoted:
\"([0-9]{6})\" - \"\\1#\"

Regex:
\"([0-9][0-9][0-9][0-9][0-9][0-9])\"
Replacement string:
"\1#
DEMO
Replacement string would be "\1" if you want "675605#"

You need to use capturing groups. I don't know if you can use them in Edit Plus, but I think it should work:
Find what: \"(\d{6})\"
Replace with: \"\1#\"
Where \1 is a number captured in parenthesis.

Open your file in vim or vi editor using below command:
vi "filename"
then use this command it replace your "675605" pattern with "675605#"
:%s/675605/675605#/g
then
esc :wq
now you open your file it replaced all your "675605" pattern with "675605#".

Related

Finding the string or substring within nth occurrence in a line

I would like to find the third occurrence of string inside quotes in order to replace this string or part of this string. This is a typical line I have to deal with:
"/activities/absenceactivity/lang/#LANG_ID#/prop_dialog.php";"BA2_PD_SCR";"Opis
dogodka";"Event description";"Descrição do Evento";"ÐпиÑ
подÑÑ";"";"č®vykio aprašymas";"Descripción del evento";"";
I know that "([^"]*)" shows every occurrence of text and quotes but I would like to get just the third one, in this example "Opis dogodka" in order to perform Search & Replace in Sublime Text.
Problem is to find the third occurrence of string within the quotes, replace it entirely or just partially and make sure that the Regex provides also a solution for an empty
""
strings.
Thank you.
I'm sure there are ways to simplify this further, but if you're ok with brute force:
Sublime command:
Find: "[^"]*";"[^"]*";"([^"]*)".*
Replace: $1
NP++:
Find what: "([^"]*)";"([^"]*)";"([^"]*)".*
Replace with: $3
sed:
sed 's/"\([^"]*\)";"\([^"]*\)";"\([^"]*\)".*/\3/'
You can use {} pattern repetition:
/(?:"([^"]*)";){3}/ # first match group will be 'Opis dogodka'
Demo
Or, use a global type match and then take the third match. This might require logic such as slicing or looping:
/"([^"]*)";/g
Demo 2
Or, just manually put in the first two patterns to skip:
/^"[^"]*";"[^"]*";("[^"]*";)/
Demo 3
Or combine repetition to skip the first n-1 then capture the nth match:
/^(?:"[^"]*";){2}("[^"]*";)/
Demo 4

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

Regular expression get filename without extention from full filepath

How can I extract the filename without extention from the following file path:
D:\Projects\Extract\downtown - second.pdf
The following regular expression gives me the filename with extention: [^\\]*$
e.g. downtown - second.pdf
The following regular expression gives me the filename without extention: (.+)(?=(\.))
e.g. D:\Projects\Extract\downtown - second
I'm struggling to combine the two into one regular expression to give me the results I want: downtown - second
I suspect that your 2nd regex would not give you the output you have shown. It will give you the complete string till the first period (.).
To get just the file name without extension, you can use this regex: -
[^\\]*(?=[.][a-zA-Z]+$)
I have just replaced (.+) in your 2nd regex with the [^\\]* from your first regex, and added pattern to match pdf till the end.
Now this pattern will match 0 or more repetition of any character but backslash(\), followed by a . and then 1 or more repetition of alphabets making up extension.
I made up this one, which allows to capture most of the possibilities:
/[^\\\/]+(?=\.[\w]+$)|[^\\\/]+$/
/path/to/file
/path/to/file.txt
/path.with/dots.to/file.txt
/path/to/file.with.dots.txt
file.txt
C:\path\to\file.txt
and so on...
I captured file from /path/to/file.pdf by using following regex:
[^/]*(?=\.[^.]+($|\?))
Hope this helps you
I had to use an extra backslash before the first ']' to make this work
[^\\\]*(?=[.][a-zA-Z]+$)
I use this pattern
[^\/]+[.+\.].*$ for / path separator
[^\\]+[.+\.].*$ for \ path separator
hich matches the filename at the end of the string without worrying about characters. There is one exception that if the path for some reason has a folder with a period in it this will get upset. Linux hidden directories that are preceded with a . like .rvm are unaffected.
Hope this helps.
http://rubular.com/r/LNrI4inMU1

how to find and replace such a case

i have some text files in a pre-defined directory, the files end with *.edi
Each of them refers in one separate line within itself:
data_file file_n.data
I have to convert this line in every .edi file to sth like
data_file 'another_directory/file_n.data'
i will add 'another_directory/ and ' in the end,
because the data will be in another directory.
and I have such 200 .edi files files, making it tedious to handle them manually!
any help from regexp?
I use UltraEdit engine for regexp by the way.
a_dir/
file1.edi
file2.edi
file3.edi
...
and each file refers in one line:
data file file_n.data
becomes:--->
data_file 'another_directory/file_n.data'
Just do a Replace in Files from the Search menu, using Perl Regular Expression Engine:
Find:
^(\s*data file\s+)(.+)$
Replace:
\1'another_directory/\2'
In Files/Types:
*.edi
Directory:
<whatever the directory with all your edi files is>
(tested in UltraEdit, works for me)
Replace the space(s) by space(s) followed by another_directory/ using appropriate metacharacters and escaping for /.
If I were using RegexBuddy's GREP tool, here's what I would do :
Match pattern:
^(data file\s{3})(file_n\.data)$
Options:
^$ match at line breaks.
Replace pattern:
\1'another_directory/\2'
I think UltraEdit should work pretty much the same.
ultraedit has find and replace in files. If you're in v13+ it also has PERL compatible regexes.
If you turn on PERL compatible regular expression you can use:
For Find What: " (file_\d+.data)" (without the double quotes)
For replace with: " 'another_directory/$1'" (without the double quotes)
For old style UltraEdit syntax regular expressions use:
Find What: " ^(file_[0-9]+^.data^)"
Replace With: " 'another_directory/^1'"
(without the quotes but the leading whitespace is significant)