Replace text with regular expressions in a text editor - regex

I need to edit lines in a text file.
The text files contains more than 100 lines of data in the below format.
Cosmos Rh Us (Paperback) $10.99 Shipped:
The Good Earth (Paperback) $6.66 Shipped:
BEST OF D.H. LAWRENCE (Paperback) $7.89 Shipped:
...
These are excerpts from the online book shop I use to buy books
I have this data in a test editor. How do I edit it [Fine/Replace] such that the data becomes like this
$10.99
$6.66
$7.89
or better, without the dollar sign, since it'll be easy total it.
I use notepad++ as text editor.

Search for (don't forget to enable regular expressions in the replace box!)
^.*\$(\d+\.\d+).*$
and replace all with
\1

You could simply match full lines and capture all numbers after the $ sign:
Find what: ^[^$]*\$(\d+\.\d+).*$
Replace with: $1
Make sure that you don't check the ". matches newline" option. And note that this will behave unexpectedly if you have multiple $ signs in a line.
You might need to update to Notepad++ 6. Before that some regex features were not working properly.

Find:
((?<=\$)[\d\.]+)
Replace With:
\1 or $1 (whichever Notepad++ uses)

first regex will be replaced with nothing
[a-zA-Z0-9].*\)
second regex will be replaced with nothing
[a-zA-Z]+\:

Related

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

regex help in notepad++ marking embedded values and copying into a new list

I have a large text file with numerous lines containing data like below.
205=1<SOH>55=ES<SOH>48=17875701615154475972<SOH>207=CME<SOH>100=XNYM<SOH>16552=1
205=155=6A48=17875736456456445774207=CME100=XNYM16552=1
I would like to extract all of the values that are after the "48=" and before the ASCII code 01 delimiter AND the same for the value after "55=" and paste them into a new file:
ES|17875701615154475972
6A|17875736456456445774
They aren't all 20 characters in length, so I would need to do a regex search to mark them all - can you help me with the right regex expression to use and how to copy the identified values out of notepad++?
Do a replacement on the whole file to leave only the targets:
Find: ^.*\b48=(\d+).*
Repl: $1
Then ctrl+a, ctrl+c and paste into a new file.
To answer the question in the comment about capturing "CME" and allowing both "55" and "48" as markers:
Find: ^.*?(?:48|55)=([\w;]+).*?=([A-Z]+).*
Repl: $2|$1
The following will match and create a group for the digits. <SOH>48=(\d*)<SOH>
However, what you probably want is a global search/replace that finds the numbers and rewrites the file. Try
Find: .*<SOH>48=(\d*)<SOH>.*
Replace: \1
Of course remember to check the Regular Expression box or it won't work at all.

replace text with regular expression keeping structure match on sublime text

i been trying a few options, but i can't figure out.
this is the text i'm looking for:
php[whatever_is_in_between]myfunction
and i want to change it to this
=[whatever_is_in_between]myfunction
where [whatever_is_in_between] = \n or \n\t or \n\t\t or \nbarspace or \nbarespace\t and so on
so i found this regexp match the search text:
php[\n]*[\t]*[ ]*myfunction\(
this is the "replace with" text:
=[\n]*[\t]*[ ]*myfunction\(
but the regexp does not work on the replacement, it replace it as text.
can anybody help me with this?
thanks
I think the problem is that you are not using a capturing group ( ). Among other things, a capturing group allows you to take input from the the read text and then inject it into your replacement text.
I'd use a search pattern like this:
php\[([^\]]*)\]([\w\W]*)
It looks complicated, but I've set up a sample on Regex 101 that you can check out. The replacement text should look something like this:
[\1]\2
Please note that how you insert a capturing group will depend on what programming language you're using. The above should work for php.
I hope that helps,
--Jonathan

Sublime Text 2 - Regular Expression Find and Replace

I am looking for a solution to find and replace the formatting of many prices within one of my documents.
I currently have prices that are formatting like so: $60 and would like to change the formatting to: 60 $
The following 'Find What' works to find the first format \$\d{0,2} but not too sure about what to 'Replace With'.
Is there a way to preserve the number?
Thank you.
Try this:
find: \$(\d{0,2})
replace: \1 $
Option+Cmd+F:
Place into the find field:
\$([0-9]{0,2})
Place into the replace field:
\1 \$
The backslash + number indicated which capture group to place in there.