How do I use regular expressions in Jinja2? - regex

I'm new to Jinja2 and so far I've been able to do most of what I want. However, I need to use regular expressions and I can't seem to find anything anywhere in the documentation or on teh Googles.
I'd like to create a macro that mimics the behavior of this in Javascript:
function myFunc(str) {
return str.replace(/someregexhere/, '').replace(' ', '_');
}
which will remove characters in a string and then replace spaces with underscores. How can I do this with Jinja2?

There is an already existing filter called replace that you can use if you don't actually need a regular expression. Otherwise, you can register a custom filter:
{# Replace method #}
{{my_str|replace("some text", "")|replace(" ", "_")}}
# Custom filter method
def regex_replace(s, find, replace):
"""A non-optimal implementation of a regex filter"""
return re.sub(find, replace, s)
jinja_environment.filters['regex_replace'] = regex_replace

Related

How to do a regex search in a GtkSourceBuffer

I am using GtkSourceView with a GtkSourceBuffer.
I need to do a regular expression search on its contents, and I know that GtkSourceBuffer is a subclass of GtkTextBuffer.
I'd like to do something like the Python code below, where search_text is a regular expression.
search_text = 'some regular expression'
source_buffer = source_view.get_buffer()
match_start = source_buffer.get_start_iter()
result = match_start.forward_search(search_text, 0, None)
if result:
match_start, match_end = result
source_buffer.select_range(match_start, match_end)
The regex isn't too complex: search_text = '/file_name\S*'. (Basically I want to match all file names in a document that are preceded by a separator character /, start with a common file name, and end with a sequence of non-space characters, including the file extension).
The Gtk.GtkTextIter.forward_search() function only seems to accept these three flags, so I do not see a way of specifying that the search string is a regular expression...
Gtk.TextSearchFlags.VISIBLE_ONLY
Gtk.TextSearchFlags.TEXT_ONLY
Gtk.TextSearchFlags.CASE_INSENSITIVE
How can I achieve a regex search on GtkSourceBuffer or GtkTextBuffer ?
You should take a look at SearchSettings, which allows you to enable regex and set search text.
After that you create a SearchContext and use it to search (forward or backward methods)
Also GktTextBuffer can return it's text with get_text, but it's not what you are looking for.

search results highlighting with custom template tag

my custom template tag to highlight query in search results.
def highlight(text, word):
return mark_safe(text.replace(word, "<Strong>%s</Strong>" % word))
it's working, issue is
it is not ignoring case sensitive,
I want to do it by using regular expression, have no idea weather mark_safe will support, didn't find any documentation are example in this scenario
word =search query
text =search result
use sub method in regular expressions which will find and replace given query(repl) in given string.
syntax:
re.sub(pattern, repl, string, count=0, flags=0);
code:
def highlight(text, word):
word=word.lower()
result=re.sub(word ,"<Strong>%s</Strong>" % word,text,flags=re.IGNORECASE)
return mark_safe(result)

Group Extraction with Regular Expressions [duplicate]

I want a regular expression to extract the title from a HTML page. Currently I have this:
title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
title = title.replace('<title>', '').replace('</title>', '')
Is there a regular expression to extract just the contents of <title> so I don't have to remove the tags?
Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)
Note that starting in Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to improve a bit on Krzysztof Krasoń's solution by capturing the match result directly within the if condition as a variable and re-use it in the condition's body:
# pattern = '<title>(.*)</title>'
# text = '<title>hello</title>'
if match := re.search(pattern, text, re.IGNORECASE):
title = match.group(1)
# hello
Try using capturing groups:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)
May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.
soup = BeatifulSoup(html_doc)
titleName = soup.title.name
Try:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)
re.search('<title>(.*)</title>', s, re.IGNORECASE).group(1)
The provided pieces of code do not cope with Exceptions
May I suggest
getattr(re.search(r"<title>(.*)</title>", s, re.IGNORECASE), 'groups', lambda:[u""])()[0]
This returns an empty string by default if the pattern has not been found, or the first match.
I'd think this should suffice:
#!python
import re
pattern = re.compile(r'<title>([^<]*)</title>', re.MULTILINE|re.IGNORECASE)
pattern.search(text)
... assuming that your text (HTML) is in a variable named "text."
This also assumes that there are no other HTML tags which can be legally embedded inside of an HTML TITLE tag and there exists no way to legally embed any other < character within such a container/block.
However ...
Don't use regular expressions for HTML parsing in Python. Use an HTML parser! (Unless you're going to write a full parser, which would be a of extra, and redundant work when various HTML, SGML and XML parsers are already in the standard libraries).
If you're handling "real world" tag soup HTML (which is frequently non-conforming to any SGML/XML validator) then use the BeautifulSoup package. It isn't in the standard libraries (yet) but is widely recommended for this purpose.
Another option is: lxml ... which is written for properly structured (standards conformant) HTML. But it has an option to fallback to using BeautifulSoup as a parser: ElementSoup.
The currently top-voted answer by Krzysztof Krasoń fails with <title>a</title><title>b</title>. Also, it ignores title tags crossing line boundaries, e.g., for line-length reasons. Finally, it fails with <title >a</title> (which is valid HTML: White space inside XML/HTML tags).
I therefore propose the following improvement:
import re
def search_title(html):
m = re.search(r"<title\s*>(.*?)</title\s*>", html, re.IGNORECASE | re.DOTALL)
return m.group(1) if m else None
Test cases:
print(search_title("<title >with spaces in tags</title >"))
print(search_title("<title\n>with newline in tags</title\n>"))
print(search_title("<title>first of two titles</title><title>second title</title>"))
print(search_title("<title>with newline\n in title</title\n>"))
Output:
with spaces in tags
with newline in tags
first of two titles
with newline
in title
Ultimately, I go along with others recommending an HTML parser - not only, but also to handle non-standard use of HTML tags.
I needed something to match package-0.0.1 (name, version) but want to reject an invalid version such as 0.0.010.
See regex101 example.
import re
RE_IDENTIFIER = re.compile(r'^([a-z]+)-((?:(?:0|[1-9](?:[0-9]+)?)\.){2}(?:0|[1-9](?:[0-9]+)?))$')
example = 'hello-0.0.1'
if match := RE_IDENTIFIER.search(example):
name, version = match.groups()
print(f'Name: {name}')
print(f'Version: {version}')
else:
raise ValueError(f'Invalid identifier {example}')
Output:
Name: hello
Version: 0.0.1
Is there a particular reason why no one suggested using lookahead and lookbehind? I got here trying to do the exact same thing and (?<=<title>).+(?=<\/title>) works great. It will only match whats between parentheses so you don't have to do the whole group thing.

Remove chars in a string Django python

When I post a value from my page an extra string is create and I would like to remove the 'pattern = "' is there a tag i could use or replace function i can use to remove pattern =" and the ending ". Please find below scenario:
pattern = "apple"
Desired output
apple
I tried using but to no success.Is there another method i could use newbie at django python.
{{ pattern|split }}
Write a custom templatetag and use a regular expression (a capture group would do the trick) to replace the desired part.
If the quotation marks are added to your string when you are submitting the form, you could use the .strip() method when accessing the POST data. You can also specify what characters you want to remove. Check it out here: http://docs.python.org/2/library/stdtypes.html

Match ${variable} occurrences in a text using regex

I am currently working in an application where I need to find all occurrences of strings like ${[0-9-a-zA-Z]} in a bigger string. Here is my method:
def countVariables(str) {
def pattern = ~'${sss}'
def matcher = str =~ pattern
print matcher.count
}
Now the problem.
When I pass a string like "asidb ${sss} asodniasndin", I get:
groovy.lang.MissingPropertyException: No such property: sss for class: ConsoleScript83
I think that, given that in Groovy ${} are properties, I'm having these conflicts.
In this case, I would have to run the whole text searching for the dollar sign and replacing it for something else? Or is there a simpler way to do this?
Regards!
Are you using single quotes so groovy doesn't do the expansion and just gives you a string?
Ie:
countVariables( 'asidb ${sss} asodniasndin' )