Dreamweaver help needed with regular expression search and replace - regex

I've got a bit of an issue. I have lots of text document I need to copy into html to try and speed the procees up I've been looking into search and replace regular expressions to help me add bold tags to varies bits of text.
I have lots of text like this:
1. Centrum Multimineral Vitamin x 30 £3.19 was £4.79 (11p per tablet)
I'm trying to write a regular expression search and replace to look for all text between a number with a dot and space after it and the first price. I want the find and replace to do this:
<b>1. Centrum Multimineral Vitamin x 30 £3.19</b> was £4.79 (11p per tablet)
So far I've written this expression which kind of works:
search for:
([0-9]{1}[\.\s][\s\D]?[^<]*)(\£\d\.\d\d[^<])
replace with :
<b>$1$2</b>
Output :
<b>1. Centrum Multimineral Vitamin x 30 £3.19 was £4.79</b> (11p per tablet)
How do I alter this search so that it stops at the 1st £ sign and includes the price?

Try this small modification:
([0-9]{1}[\.\s][\s\D]?[^<]*?)(\£\d\.\d\d[^<])
^
I have added ? which makes the asterisk * operator reluctant (lazy) so it reads the input string character by character instead of going all the way to the end, and then backtrack to try to find \£\d\.\d\d[^<].

Related

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Using Regex to clean a csv file in R

This is my first post so I hope it is clear enough.
I am having a problem regarding cleaning my CSV files before I can read them into R and have spent the entire day trying to find a solution.
My data is supposed to be in the form of two columns. The first column is a timestamp consisting of 10 digits and the second an ID consisting of 11 or 12 Letters and numbers (the first 6 are always numbers).
For example:
logger10 |
0821164100 | 010300033ADD
0821164523 | 010300033ADD
0821164531 | 010700EDDA0F0831102744
010700EDDA0F|
would become:
0821164100 | 010300033ADD
0821164523 | 010300033ADD
0821164531 | 010700EDDA0F
0831102744 | 010700EDDA0F
(please excuse the lines in the middle, that was my attempt at separating the columns...).
The csv file seems to occasionally be missing a comma which means that sometimes one row will end up like this:
0923120531,010300033ADD0925075301,010700EDD00A
My hardware also adds the word logger10 (or whichever number logger this is) whenever it restarts which gives a similar problem e.g. logger10logger100831102744.
I think I have managed to solve the logger text problem (see code) but I am sure this could be improved. Also, I really don't want to delete any of the data.
My real trouble is making sure there is a line break in the right place after the ID and, if not, I would like to add one. I thought I could use regex for this but I'm having difficulty understanding it.
Any help would be greatly appreciated!
Here is my attempt:
temp <- list.files(pattern="*.CSV") #list of each csv/logger file
for(i in temp){
#clean each csv
tmp<-readLines(i) #check each line in file
tmp<-gsub("logger([0-9]{2})","",tmp) #remove logger text
pattern <- ("[0-9]{10}\\,[0-9]{6}[A-Z,0-9]{5,6}") #regex pattern ??
if (tmp!= pattern){
#I have no idea where to start here...
}
}
here is some raw data:
logger01
0729131218,020700EE1961
0729131226,020700EE1961
0831103159,0203000316DB
0831103207,0203000316DB0831103253,010700EDE28C
0831103301,010700EDE28C
0831103522,010300029815
0831103636,010300029815
0831103657,020300029815
If you want to do this in a single pass:
(?:logger\d\d )?([\dA-F]{10}),?([\dA-F]{12}) ?
can be replaced with
\1\t\2\n
What this does is look for any of those rogue logger01 entries (including the space after it) optionally: That trailing ? after the group means that it can match 0 or 1 time: if it does match, it will. If it's not there, the match just keeps going anyway.
Following that, you look for (and capture) exactly 10 hex values (either digits or A-F). The ,? means that if a comma exists, it will match, but it can match 0 or 1 time as well (making it optional).
Following that, look for (and capture) exactly 12 hex values. Finally, to get rid of any strange trailing spaces, the ? (a space character followed by ?) will optionally match the trailing space.
Your replacement will replace the first captured group (the 10 hex digits), add in a tab, replace the second captured group (the 12 hex digits), and then a newline.
You can see this in use on regex101 to see the results. You can use code generator on the left side of that page to get some preformatted PHP/Javascript/Python that you can just drop into a script.
If you're doing this from the command line, perl could be used:
perl -pe 's/(?:logger\d\d )?([\dA-F]{10}),?([\dA-F]{12}) ?/\1\t\2\n/g'
If another language, you may need to adapt it slightly to fit your needs.
EDIT
Re-reading the OP and comments, a slightly more rigid regex could be
(?:logger\d\d\ )?([\dA-F]{10}),?(\d{6}[\dA-F]{5,6})\ ?
I updated the regex101 link with the changes.
This still looks for the first 10 hex values, but now looks for exactly 6 digits, followed by 5-6 hex values, so the total number of characters matched is 11 or 12.
The replacement would be the same.
Paste your regex here https://regex101.com/ to see whether it catches all cases. The 5 or 6 letters or digits could pose an issue as it may catch the first digit of the timestamp when the logger misses out a comma. Append an '\n' to the end of the tmp string should work provided the regex catches all cases.

Regex selecting the last 6 numbers of

I am a noob at regex and i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line (making a CSV obviously).
Anyway sample data is simply nonsense like this:
fafksadjlkgtjafglkj210000adsfaklgjadklgjag3600001skfjaklaj093i393593390000002sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438adsfaklgjadklgjag7700001skfjaklaj093i393593390000002ssafksa djlkgtjafglkj000000adsfaklgjadklgjag0000001skfj aklaj093i393593£39.00900002ssafksadjlk gtjafglkj000000adsfaklgjadklgjag0000001skfjaklaj093i3935£933.90000002s
Note some of the numbers are attached to currency values as well (and some are next to it but contain a space before hand) but the end will always be 6 numbers (consider them to be random as I can't see a pattern).
So I basically need to select strings matching numerics that are six digits long or longer, if longer then it just uses the last 6 digits.
Then I will replace it with itself and a comma and new line.
I hope that makes sense, i've tried a few things without success..
Thanks, edit the closest I have is:
(\d)\d{6}(?!\d)
In the Find what: text field, type in (\d{6})(\D). In the Replace with: text field, type in $1\r\n$2. Make sure that the regular expression radio button is selected. For your input, that should yield this:
fafksadjlkgtjafglkj210000
adsfaklgjadklgjag3600001
skfjaklaj093i393593390000002
sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438
adsfaklgjadklgjag7700001
skfjaklaj093i393593390000002
ssafksa djlkgtjafglkj000000
adsfaklgjadklgjag0000001
skfj aklaj093i393593
£39.00900002
ssafksadjlk gtjafglkj000000
adsfaklgjadklgjag0000001
skfjaklaj093i3935£933.90000002
s
You want
\d{6}(?=\D*$)
Read more about anchors here.
i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line
So you're basically trying to do this, right?:
Find:
(\d{6})(\D)
Replace:
\1\n\2
[Online example]
How about:
Find what: (\d{6,})(?:\D*)$
Replace with: $1,\n

Advanced VS2012 Find and Replace with regex

In VS2012 I am wanting to find and replace a "near-repetitive" string within a few large generated .SQL files:
The format of the search string is:
print 'Processed {d} total records'
where {d} is a number such as 100, 200, 300 etc all the way up to 70,000
I will be replacing this will nothing (i.e. deleting it)
Can anybody provide me with a simple regex for the FIND using the new VS2012 syntax?
Regex is like witchcraft and is beyond me
Any questions feel free to ask
Cheers
Kyle
This should do the job:
print 'Processed [0-9]+ total records'
If your number contains thousands separators (like in '70,000') you might want to use
print 'Processed [0-9,]+ total records'
instead.
[] is a bracket expression which matches any of the characters inside it, so [0-9] matches every digit (since 0-9 is a character range). The + in the end means 'any number of matches but at least one'.

How do I write a Regular Expression to match any three digit number value?

I'm working with some pretty funky HTML markup that I inherited, and I need to remove the following attributes from about 72 td elements.
sdval="285"
I know I can do this with find/replace in my code editor, except since the value of each attribute is different by 5 degree increments, I can't match them all without a Regular Expression. (FYI I'm using Esspress and it does support RegExes in it's Find/Replace tool)
Only trouble is, I really can't figure out how to write a RegEx for this value. I understand the concept of RegExes, but really don't know how to use them.
So how would I write the following with a Regular Expression in place of the digits so that it would match any three digit value?
sdval="285"
/sdval="\d{3}"/
EDIT:
To answer your comment, \d in regular expressions means match any digit, and the {n} construct means repeat the previous item n times.
Easiest, most portable: [0-9][0-9][0-9]
More "modern": \d{3}
This should do (ignores leading zeros):
[1-9][0-9]{0,2}
import re
data = "719"
data1 = "79"
# This expression will match any single, double or triple digit Number
expression = '[\d]{1,3}'
print(re.search(expression, data).string)
# This expression will match only triple digit Number
expression1 = '[\d]{3}'
print(re.search(expression1, data1).string)
Output :
expression : 719
expression1 : 79
It sounds like you're trying to do a find / replace in Visual Studio of a 3 digit number (references to Express and Find/Replace tool). If that's the case the regex to find a 3 digit number in Visual Studio is the following
<:d:d:d>
Breakdown
The < and > establish a word boundary to make sure we don't match a number subset.
Each :d entry matches a single digit.