Pygments syntax highlighter in python tkinter text widget - python-2.7

I have been trying to incorporate syntax highlighting with the tkinter text widget. However, using the code found on this post, I cannot get it to work. There are no errors, but the text is not highlighted and a line is skipped after each character. If there is a better way to incorporate syntax highlighting with the tkinter text widget, I would be happy to hear it. Here is the smallest code I could find that replicates the issue:
import Tkinter
import ScrolledText
from pygments import lex
from pygments.lexers import PythonLexer
root = Tkinter.Tk(className=" How do I put an end to this behavior?")
textPad = ScrolledText.ScrolledText(root, width=100, height=80)
textPad.tag_configure("Token.Comment", foreground="#b21111")
code = textPad.get("1.0", "end-1c")
# Parse the code and insert into the widget
def syn(event=None):
for token, content in lex(code, PythonLexer()):
textPad.insert("end", content, str(token))
textPad.pack()
root.bind("<Key>", syn)
root.mainloop()
So far, I have not found a solution to this problem (otherwise I would not be posting here). Any help regarding syntax highlighting a tkinter text widget would be appreciated.
Note: This is on python 2.7 with Windows 7.

The code in the question you linked to was designed more for highlighting already existing text, whereas it looks like you're trying to highlight it as you type.
I can give some suggestions to get you started, though I've never done this and don't know what the most efficient solution is. The solution in this answer is only a starting point, there's no guarantee it is actually suited to your problem.
The short synopis is this: don't set up a binding that inserts anything. Instead, just highlight what was inserted by the default bindings.
To do this, the first step is to bind on <KeyRelease> rather than <Key>. The difference is that <KeyRelease> will happen after a character has been inserted whereas <Key> happens before a character is inserted.
Second, you need to get tokens from the lexer, and apply tags to the text for each token. To do that you need to keep track where in the document the lexer is, and then use the length of the token to determine the end of the token.
In the following solution I create a mark ("range_start") to designate the current location in the file where the pygments lexer is, and then compute the mark "range_end" based on the start, and the length of the token returned by pygments. I don't know how robust this is in the face of multi-byte characters. For now, lets assume single byte characters.
def syn(event=None):
textPad.mark_set("range_start", "1.0")
data = textPad.get("1.0", "end-1c")
for token, content in lex(data, PythonLexer()):
textPad.mark_set("range_end", "range_start + %dc" % len(content))
textPad.tag_add(str(token), "range_start", "range_end")
textPad.mark_set("range_start", "range_end")
This is crazy inefficient since it re-applies the highlighting to the whole document on every keypress. There are ways to minimize that, such as only highlighting after each word, or when the GUI goes idle, or some other sort of trigger.

To highlight certain words you can do this :
textarea.tag_remove("tagname","1.0",tkinter.END)
first = "1.0"
while(True):
first = textarea.search("word_you_are_looking_for", first, nocase=False, stopindex=tkinter.END)
if not first:
break
last = first+"+"+str(len("word_you_are_looking_for"))+"c"
textarea.tag_add("tagname", first, last)
first = last
textarea.tag_config("tagname", foreground="#00FF00")

Related

Regex to locate a specific sentence fragment but there may be an underscore some where in the string

I am trying to find a string that may exist in a Windows menu item. As such a simple text search is complicated by the potential presence of an underscore character, which may be anywhere in the string. For example, I may be looking for "Import File" but the resulting string may be any of the following strings.
These are easy enough:
Import File
_Import File
But these elude simple grepping:
I_mport _File
Im_port File
Impor_t File
This works but it's clunky and prone to error, and it means that every time I need to look for a new menu item, I have to completely reconstruct the pattern. Is there an easier way?
I_{0,1}m_{0,1}p_{0,1}o_{0,1}r_{0,1}t _{0,1}F_{0,1}i_{0,1}l_{0,1}e
This is the regex tester URL: https://regex101.com/r/7ptBHG/2
UPDATED regex tester URL: https://regex101.com/r/7ptBHG/3
I'm using the VS2015 editor's "Use Regular Expressions" feature.
UPDATE -> RESPONSE TO QUESTIONS:
I hadn't considered using the ? instead of {0,1} That does make it a bit less cumbersome.
I_?m_?p_?o_?r_?t _?F_?i_?l_?e
If you proved an answer, I'll accept that. I did ask, "Is there an easier way?".

How to translate multiline string in Django models

I use ugettext_lazy as _ , and in a models file my string is represented in this way:
s = _("firstline"
"secondline"
"thirdline")
But after running makemessages I found that in the .po file only "firstline" is marked for translation, the rest are absent. I wouldn't like to avoid using multilining, so is there any way to make translation work with this?
UPD:
Should complement my question: I need my multiline strings to be proceeded by django's makemessages
The best solution I can imagine so far, is
s = str(_("firstline")) +
str(_("secondline") +
str(_("thirdline"))
Edit : Goodguy mentions that makemessages won't do Python parsing, hence not properly collect those kind of "multiline" strings.
The first part is actually true and I stand corrected on this (my bad) - BUT xgettext does the same adjacent strings concatenation has Python, as mentionned here :
Some internationalization tools -- notably xgettext -- have already
been special-cased for implicit concatenation,
and here:
Note also that long strings can be split across lines, into multiple
adjacent string tokens. Automatic string concatenation is performed at
compile time according to ISO C and ISO C++; xgettext also supports
this syntax.
and as a matter of fact me and half a dozen co-workers have been using this very pattern for years on dozen of projects.
s = _("firstline" "secondline" "thirdline")
Python xgettext will automatically concatenate literal strings separated only by blank spaces (space, newlines etc), so this is the exact equivalent of
s = _("firstlinesecondlinethirdline")
If you only get the first of those strings in your po file then the problem is elsewhere - either your snippet is NOT what you actually have in your code or your po file is not correctly updated or anything else... (broken xgettext version maybe ?).
NB : this :
s = str(_("firstline")) +
str(_("secondline") +
str(_("thirdline"))
is about the worse possible solution from the translator's point of view (and can even make your message just impossible to translate in some languages).
I had a similar issue and solved it using standard Python multi-line but single-string format. For example for your string :
s = _("firstline\
secondline\
thirdline")
Update: The actual problem is that makemessages is not doing python (and JS and etc.) parsing, so it would not concatenate multiline strings as expected. Solution below will not work either (it won't see computed values).
Unfortunately, you have to find another way to format your message, preferably by splitting it into single-line parts.
Previous answer:
ugettext_lazy can only accept single argument so it's up to you how you want your translations to be.
If you are fine with "firstline" "secondline" "thirdline" being exported for translation as a single sentence you can do something like this:
s = _(' '.join(["firstline", "secondline", "thirdline"]))
If you want to keep them as a separate translation sentences when something like this may also work:
s = ' '.join(_(line) for line in ["firstline", "secondline", "thirdline"])
Or just call _ on every line and concatenate them

Python 2.7: Handeling Unicode Objects

I have an application that needs to be able to handle non-ASCII characters of unknown encoding. The program may delete or replace these characters (if they are discovered in a user dictionary file), otherwise they need to pass cleanly through unaltered. What's mind-boggling is, it works one minute, then I make some seemingly trivial change, and now it fails with UnicodeDecode, UnicodeEncode, or kindred errors. Addressing this has led me down the road of cargo cult programing--making random tweaks that get it working again, but I have no idea why. Is there a general-purpose solution for dealing this, perhaps even the creation of class that modifies the normal way Python deals with strings?
I'm not sure what code to include as about five separate modules are involved. Here is what I am doing in abstract terms:
Taking a text from one of two sources: text that the user has pasted directly into a Tkinter toplevel window; text captured from the Win32 clipboard via a hotkey command.
The text is processed, including the removal of whitespace charters, then certain characters/words are replaced or simply deleted based on a customizable user dictionary.
The result is then returned to the Tkinter GUI or the Win32 clipboard, depending on whether or not the keyboard shortcut was used.
Some details that may be relevant:
All modules use
# -*- coding: utf-8 -*-
The user dictionary is saved in UTF-16 LE with BOM (a function removes BOM characters when parsing the file). The file object is instantiated with
self.pf = codecs.open(self.pattern_fn, 'r', 'utf-16')
The text entry points for text are via a Tkinter GUI Text widget:
text = self.paste_to_field.get(1.0, Tkinter.END)
Or from the clipboard:
text = win32clipboard.GetClipboardData(win32clipboard.CF_UNICODETEXT)
And example error:
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u201d' in position
2: character maps to <undefined>
Furthermore, the same text might work when tested on OS X (where I do development work) but cause an error on Windows.
Regular expressions are used, however in this case no non-ASCIIs are included in the pattern. For non-ASCIIs I simply
text = text.replace(old, new)
Another thing to consider: for c in text type iterations are no good because a non-ASCII may look like several characters to Python. The normal word/character distinction no longer holds. Also, using bad_letter = repr(non_ASCII) doesn't help since str(bad_letter) merely returns a string of the escape sequence--it can't restore the original character.
Sorry if this is extremely vague. Please let me know what info I can provide to help clarify. Thanks in advance for reading this.

ColdFusion -- Do I need URLDecode with form POSTs? / URLDecode randomly removes one character

I'm using a WYSIWYG to allow users to format text. This is the error-causing text:
<p><span style="line-height: 115%">This text starts with a 'T'</span></p>
The error is that the 'T' in "This", or whatever the first letter happens to be, is randomly removed when using URLDecode and saving to the DB. Removing URLDecode on the server side seems to fix it without any negative side-effects (the DB contains the same information).
The documentation says that
Query strings in HTTP are always URL-encoded.
Is this really the case? If so, why doesn't removing URLDecode seem to mess everything up?
So two questions:
Why is URLDecode causing the first text character to be removed like this (it seems to only happen when the line-height property is present)?
Do I really need (or would I even want) to use URLDecode before putting POSTed data into the database?
Edit: I made a test page to echo back the decoded text, and URLDecode is definitely removing that character, but I have no idea why.
I believe decoding is done automatically when form scope is populated. That's why characters after % (this char is used for encoding) are removed -- you are trying to decode the string second time.
For security reasons you might be interested in stripping script tags, or even cleaning up HTML using white-list. Try to search in CFLib.org for applicable functions.

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.
Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.
Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe
Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.
I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.
I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.
One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.
I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"ΒΆ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";
The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.
Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.
I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.