Regex Notepad++ Finding, Removing and Replacing Quotations - regex

I have some content in a CSV file which I need to format correctly, the content layout goes like this :
###Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,
Field8,
Field8,
Field8, Field8,
Field8, Field8,
Field8,
"Field8""",Field9,
###
As you can see Field 8 Spans multiple lines and has quotes and commas within which seems to break it in to new fields. What I need is a Regex which will identify the 9th Comma in from the start of the line which is ### for each line. and then go from the end ### back 2 commas. I need to then be able to format everything in that area across all records that match, remove all the quote marks just in that area and add them back in at the start and end, effectively wrapping the quotes round the whole of Field 8.
The triple # symbol is present in the file and I would need to use this as a reference to find the start of each record.
I have a Regex which seemed to work previously doing something similar but now does not as the format of the CSV has changed from file to file.
^((?:[^,]+,){8})(.+)((?:,[^,]*){2})$ and replace with $1"$2"$3

Related

Getting Only The Text and Not Any of the Trailing Blank Lines from a Text Widget in Python 2.7

I'm working with Python 2.7 and tkinter.
I have a text widget that I fill with lines of text where each line is terminated with a "\n" from a file. The text in the Text widget may be modified later.
Now I want to get only the text from the Text widget and ignore any trailing blank line that may be present. The get() method will get everything to the end of the Text widget including any trailing blank lines that may be present.
How can I get the text and not the trailing blank lines?
The text widget doesn't have a good way to filter out the results you get from get. You can get all but the last newline that is automatically added by tkinter by using the index end-1c, but if you have multiple blank lines at the end, the easiest way is to strip out the newlines after fetching the data:
data = the_widget.get("1.0", "end-1c").rstrip("\n")
Another way would be to use the search method going backwards from the end, with a regular expression that finds the first non-blank line. You can then get all the text up to the end of that line:
index = text.search('^.+', "end", backwards=True, regexp=True)
data = text.get("1.0", f"{index} lineend") if index else ""

VBscript regular expression

There is a txt file containing multiple lines with - Browser("something").page("something_else").webEdit("some").
I need to retrieve the names of the browser, page and fields (names surrounded by double quotes ) and replace the line with "something_somethingelse_some" (concatinating the names of the browser, page n filed respectively), please help.
the names can be anything so we should go with regex. Note we have to convert everything comes in the above format within the text file till the EOF..
You may try this:
^Browser\("(.*?)"\).page\("(.*?)"\).webEdit\("(.*?)"\).*$
and replace by:
$1_$2_$3
Regex Demo

Ultraedit, regular expression help, extracting 2 values, comma separated

I have this file where I only want to extract the email address and first name from our client list.
So a sample from the file:
a#abc.com,www.abc.com,2011-11-15 00:00:00,8.8.8.8,John,Doe,209 Park Rd,See,FL,33870,,,
b#abc.com,cde.com,2011-11-07 00:00:00,4.4.4.4,Erickson,Crast,136 Kua St # 1367,Pearl,HI,96782,,8084568190,
I would like to get back
a#abc.com,John
b#abc.com,Erickson
So basically email address and First Name
I know I can do this in powershell but maybe a find and replace in ultraedit will be faster
Note: you will notice some fields are not provided so it will show ",," meaning those fields were left empty when the user signed up but the amount of comma in each line is the same, 12 being the count.
So basically there are fields separated by ",". Without looking at the correct content (i.e. email/timestamp etc. will need to have a certain format which could also be checked) let's just try to extract the values of the first and fourth field.
so I'd suggest
a Replace-Operation where you search for
^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),.*$
and replace it with
\1 # \2
Options: "Regular Expressions: Unix".
(Just inserted the # to have a separator, although the first whitespace would be sufficient. But you'll get the idea, I assume...)
Result:
a#abc.com # John
b#abc.com # Erickson

Trimming text from a multi-line entry in postgres with regex

I have a column "verbatim" where each entry contains multiple lines. Here's an example:
Dummy field1:Text
Tell Us More:Text to capture
Dummy field2:Text
I'd like to capture only Text to capture text in the second line Tell Us More: and put that value into the column verbatim_scrubbed. In the example above, Text to capture would be the entry in verbatim_scrubbed.
I'm not that great with postgres and regexp, so I was hoping somebody could help me out here. Was thinking of something similar to the following:
update TABLE
set verbatim_trimmed = array_to_string(regexp_matches(verbatim,'tell us more:(.*)','gi'));
This doesn't work, but I have a feeling something similar may work.
Perhaps there is a direct way to capture the: Text to capture without the cariage return \r and the new line \n charracters (without using the regexp_replace).
Here is what you can do:
select regexp_replace(array_to_string(regexp_matches(verbatim, '^Tell Us More:(.*)$','n'),'',''), E'[\\r\\n]', '' ) from my_table;

Django custom template filter to highlight a column in a block of text

I'm rendering a list in an HTML template using {{ my_list | join:"<\br>"}} , and it appears as...
$GPGGA,062511,2816.8178,S,15322.3185,E,6,04,2.6,72.6,M,37.5,M,,*68
$GPGGA,062512,2816.8177,S,15322.3184,E,1,04,2.6,72.6,M,37.5,M,,*62
$GPGGA,062513,2816.8176,S,15322.3181,E,1,04,2.6,72.6,M,37.5,M,,*67
$GPGGA,062514,2816.8176,S,15322.3180,E,1,03,2.6,72.6,M,37.5,M,,*66
$GPGGA,062515,2816.8176,S,15322.3180,E,6,03,2.6,72.6,M,37.5,M,,*60
I am attempting to use regular expressions to insert the CSS at the 4th and 5th commas so I can highlight the text in this column, however I'm not able to figure out the expression to do this. Other methods to achieve this also appreciated.
Other info:
1) each line ends with a '\n'. Although this can be removed and the HTML display is unchanged, I've left it in for the regular expression to use if required.
2) The string will not always have a nice header such as '$GPGGA' in this example, although I could add one to help ID the start of the line if required by the regex.
3) The columns may not be a uniform number of characters as indicated in this example.
The filters I'm working on are as follows
#register.filter(is_safe=True)
def highight_start(text):
return re.sub('regex to find 4th comma in each line', ",<span class='my_highlight'>", text, flags=re.MULTILINE)
#register.filter(is_safe=True)
def highight_end(text):
return re.sub('regex to find 5th comma in each line', "</span>,", text, flags=re.MULTILINE)
Regards
You can achieve that by replacing the 5th value with the value itself wrapped in your <span> tags.
RegEx: ^((?:[\w\d\.\$]+,){4})([\d\.]+)
Replacement: \1<span class='my_highlight'>\2</span>
Explained demo here: http://regex101.com/r/cX5iA0
Note: I assumed the 5th value will be digits and dots
Thanks #ka, who got me ontrack with this solution. My working filter uses:
expression = '^((?:[^,]+,){4})([^,]+)'
replace = r'\g<1><span class="my_highlight">\g<2></span>'
#[^,] also allows matching of hidden HTML tags in the text
#To get the groups to insert back into the text and not be overwritten, they need to be referenced as indicated in 'replace'.