Django custom template filter to highlight a column in a block of text - regex

I'm rendering a list in an HTML template using {{ my_list | join:"<\br>"}} , and it appears as...
$GPGGA,062511,2816.8178,S,15322.3185,E,6,04,2.6,72.6,M,37.5,M,,*68
$GPGGA,062512,2816.8177,S,15322.3184,E,1,04,2.6,72.6,M,37.5,M,,*62
$GPGGA,062513,2816.8176,S,15322.3181,E,1,04,2.6,72.6,M,37.5,M,,*67
$GPGGA,062514,2816.8176,S,15322.3180,E,1,03,2.6,72.6,M,37.5,M,,*66
$GPGGA,062515,2816.8176,S,15322.3180,E,6,03,2.6,72.6,M,37.5,M,,*60
I am attempting to use regular expressions to insert the CSS at the 4th and 5th commas so I can highlight the text in this column, however I'm not able to figure out the expression to do this. Other methods to achieve this also appreciated.
Other info:
1) each line ends with a '\n'. Although this can be removed and the HTML display is unchanged, I've left it in for the regular expression to use if required.
2) The string will not always have a nice header such as '$GPGGA' in this example, although I could add one to help ID the start of the line if required by the regex.
3) The columns may not be a uniform number of characters as indicated in this example.
The filters I'm working on are as follows
#register.filter(is_safe=True)
def highight_start(text):
return re.sub('regex to find 4th comma in each line', ",<span class='my_highlight'>", text, flags=re.MULTILINE)
#register.filter(is_safe=True)
def highight_end(text):
return re.sub('regex to find 5th comma in each line', "</span>,", text, flags=re.MULTILINE)
Regards

You can achieve that by replacing the 5th value with the value itself wrapped in your <span> tags.
RegEx: ^((?:[\w\d\.\$]+,){4})([\d\.]+)
Replacement: \1<span class='my_highlight'>\2</span>
Explained demo here: http://regex101.com/r/cX5iA0
Note: I assumed the 5th value will be digits and dots

Thanks #ka, who got me ontrack with this solution. My working filter uses:
expression = '^((?:[^,]+,){4})([^,]+)'
replace = r'\g<1><span class="my_highlight">\g<2></span>'
#[^,] also allows matching of hidden HTML tags in the text
#To get the groups to insert back into the text and not be overwritten, they need to be referenced as indicated in 'replace'.

Related

Postgres: remove second occurrence of a string

I tried to fix bad data in postgres DB where photo tags are appended twice.
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
As you can see in the string, photo tags were added already, but they were appended to the text again. I want to remove the second occurrence: . The first occurrence has certain order and I want to keep them.
I wrote a function that could construct a regex pattern:
CREATE OR REPLACE FUNCTION dd_trip_photo_tags(tagId int) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
SELECT string_agg(concat('<photo=',media_name,'>.*?(<photo=',media_name,'>)'),'|') FROM t_ddtrip_media WHERE tag_id=tagId $$;
This captures the second occurrence of a certain photo tag.
Then, I use regex_replace to replace the second occurrence:
update t_ddtrip_content set content = regexp_replace(content,dd_trip_photo_tags(332761),'') from t_ddtrip_content where tag_id=332761;
Yet, it would remove all matched tags. I looked up online for days but still couldn't figure out a way to fix this. Appreciate any help.
This Should Work.
Regex 1:
<photo=.+?>
See: https://regex101.com/r/thHmlq/1
Regex 2:
<.+?>
See: https://regex101.com/r/thHmlq/2
Input:
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
Output:
<photo=2-1-1601981-7-1.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-7-1.jpg>

kimonolabs >Text before comma

I'm trying to scrape a piece of text from a website using Kimonolabs. The text is succesfully scraped using the advanced setting:
div > div > ul > li.location > span.value
The text being scraped using this CSS selector is:
Cityname, streetname 1
However, I wish to delete everything before the comma so that only remains:
Cityname
I wish to do this with regex, but I'm totally ignorant about it. What I do konw is that it has to containof 3 blocks when using Kimonolabs: https://help.kimonolabs.com/hc/en-us/articles/203043464-Manually-input-regular-expressions
Can anybody help me setting up the correct regex? All I got so far is the following, but it's not the correct markup for Kimonolabs (it doesn't allow for it in the dashboard):
^(.+?),
See the docs you referred to:
The regular expression pattern in kimono is defined in three parts. It's important that any custom regular expression you produce retains the three part notation, with the surrounding ( ) for each part. The first part refers to the pattern to the left of the desired content. The middle part refers to the pattern that the desired content must match and the third part refers to the pattern to the right of the desired content.
So, you seem to need:
/^()([^,]+)()/
Or, /(^)([^,]+)(,)/ (it should be equivalent), and the 2nd capture group (the middle part) should capture the Cityname.

Ultraedit, regular expression help, extracting 2 values, comma separated

I have this file where I only want to extract the email address and first name from our client list.
So a sample from the file:
a#abc.com,www.abc.com,2011-11-15 00:00:00,8.8.8.8,John,Doe,209 Park Rd,See,FL,33870,,,
b#abc.com,cde.com,2011-11-07 00:00:00,4.4.4.4,Erickson,Crast,136 Kua St # 1367,Pearl,HI,96782,,8084568190,
I would like to get back
a#abc.com,John
b#abc.com,Erickson
So basically email address and First Name
I know I can do this in powershell but maybe a find and replace in ultraedit will be faster
Note: you will notice some fields are not provided so it will show ",," meaning those fields were left empty when the user signed up but the amount of comma in each line is the same, 12 being the count.
So basically there are fields separated by ",". Without looking at the correct content (i.e. email/timestamp etc. will need to have a certain format which could also be checked) let's just try to extract the values of the first and fourth field.
so I'd suggest
a Replace-Operation where you search for
^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),.*$
and replace it with
\1 # \2
Options: "Regular Expressions: Unix".
(Just inserted the # to have a separator, although the first whitespace would be sufficient. But you'll get the idea, I assume...)
Result:
a#abc.com # John
b#abc.com # Erickson

Regex to match only the first listed item in a block of HTML

My CMS allows PHP keyword replacements, and I'm currently building a format to return the first listed item element in a data field which usually contains a HTML unordered list, but can often contain paragraphs, etc.
If possible, I'd like to use a regular expression to match only the first listed item element li in a returned block, and print it.
One severe limitation, is that I cannot use the ^ character as my CMS (annoyingly) uses that character for modification functions.
So far, I've only come up with: replace:<\/li>.*:</li></ul> - but this is only replacing the first listed item's closing tag in a returned block. What I really need is something like:
replace:anything_that's_not_first_li_element:nothing
I appreciate that this question is a very long shot, so thanks in advance for all constructive responses.
You could use this regex with the s flag.
(?<=<ul>).*?<li>.*?<\/li>
Working regex example:
http://regex101.com/r/hL1zF0
PHP:
$list = '<ul>
<li>first</li>
<li>second</li>
<li>third</li>
<li>fourth</li>
</ul>';
preg_match('/(?<=<ul>).*?<li>.*?<\/li>/s', $list, $matches);
echo $matches[0];
Output:
<li>first</li>

Find/Replace regex to remove html tags

Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace
This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"
I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :
This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++
Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>
String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.