Getting Only The Text and Not Any of the Trailing Blank Lines from a Text Widget in Python 2.7 - python-2.7

I'm working with Python 2.7 and tkinter.
I have a text widget that I fill with lines of text where each line is terminated with a "\n" from a file. The text in the Text widget may be modified later.
Now I want to get only the text from the Text widget and ignore any trailing blank line that may be present. The get() method will get everything to the end of the Text widget including any trailing blank lines that may be present.
How can I get the text and not the trailing blank lines?

The text widget doesn't have a good way to filter out the results you get from get. You can get all but the last newline that is automatically added by tkinter by using the index end-1c, but if you have multiple blank lines at the end, the easiest way is to strip out the newlines after fetching the data:
data = the_widget.get("1.0", "end-1c").rstrip("\n")
Another way would be to use the search method going backwards from the end, with a regular expression that finds the first non-blank line. You can then get all the text up to the end of that line:
index = text.search('^.+', "end", backwards=True, regexp=True)
data = text.get("1.0", f"{index} lineend") if index else ""

Related

Regex - return only first src url match

I'm trying to extract the first jpg from a page of text. multiple paragraphs and multiple urls in each, but i only want the first url/jpg, stop after first is matched/returned.
sample page;
this is some text and a url src="https://www.someurl.jpg" more text, more text, more text.
more text, more text
more text, more text.
this is some text and a url src="https://www.anotherurl.jpg" more text, more text, more text.
more text, more text.
Current Code;
(?<=src=")(.*?)(?=")
This code returns both urls. I need the output to be just the first one it finds and stop there, just return the first.
Output required;
https://www.someurl.jpg
any help appreciated.
Your regex is quite good, just add surroundings and g flag.
/(?<=src=")(.*?)(?=")/g
Now You gave correct regex.
console.log(`I'm trying to extract the first jpg from a page of text. multiple paragraphs and multiple urls in each, but i only want the first url/jpg, stop after first is matched/returned.
sample page;
this is some text and a url src="https://www.so23123123123l.jpg" more text, more text, more text. more text, more text
more text, more text. this is some text and a url src="https://www.anotherurl.jpg" more text, more text, more text. more text, more text.`.match(/(?<=src=")(.*?)(?=")/ig));
You can read here about regexp flags.

Trying to select all text except for Twitter handle in Regex

I've exhausted everything I could find and just can't seem to get this to work. I have a .txt with rows of Twitter posts and I'm trying to delete everything but the #handles mentioned in the text.
For example:
Row1: This is the text of the tweet #Handle1
Row2: This text is meant for #Handle2 and #Handle3
Would result in:
Row1: #Handle1
Row2: #Handle2 #Handle3
I've come up with a regex expression to select the handles as: #[^\W]*
That works for all the handles in the set even if they have a colon or period immediately after them without a space (happens often).
I tried adding the negative lookahead command to it: (?!(#[^\W]*))
But I don't really know what else to add to make it work?
Thanks!
So you can loop through each row, and scan for the twitter handles.
For example,
str = "This text is meant for #Handle2 and #Handle3"
str.scan(/#\w+/).to_a #=> ["#Handle2", "#Handle3"]
Then you can manipulate the array however you want.
the \w is any alphanumeric and underscore character, you can modify that if you need any other characters.

Regex Notepad++ Finding, Removing and Replacing Quotations

I have some content in a CSV file which I need to format correctly, the content layout goes like this :
###Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,
Field8,
Field8,
Field8, Field8,
Field8, Field8,
Field8,
"Field8""",Field9,
###
As you can see Field 8 Spans multiple lines and has quotes and commas within which seems to break it in to new fields. What I need is a Regex which will identify the 9th Comma in from the start of the line which is ### for each line. and then go from the end ### back 2 commas. I need to then be able to format everything in that area across all records that match, remove all the quote marks just in that area and add them back in at the start and end, effectively wrapping the quotes round the whole of Field 8.
The triple # symbol is present in the file and I would need to use this as a reference to find the start of each record.
I have a Regex which seemed to work previously doing something similar but now does not as the format of the CSV has changed from file to file.
^((?:[^,]+,){8})(.+)((?:,[^,]*){2})$ and replace with $1"$2"$3

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Django custom template filter to highlight a column in a block of text

I'm rendering a list in an HTML template using {{ my_list | join:"<\br>"}} , and it appears as...
$GPGGA,062511,2816.8178,S,15322.3185,E,6,04,2.6,72.6,M,37.5,M,,*68
$GPGGA,062512,2816.8177,S,15322.3184,E,1,04,2.6,72.6,M,37.5,M,,*62
$GPGGA,062513,2816.8176,S,15322.3181,E,1,04,2.6,72.6,M,37.5,M,,*67
$GPGGA,062514,2816.8176,S,15322.3180,E,1,03,2.6,72.6,M,37.5,M,,*66
$GPGGA,062515,2816.8176,S,15322.3180,E,6,03,2.6,72.6,M,37.5,M,,*60
I am attempting to use regular expressions to insert the CSS at the 4th and 5th commas so I can highlight the text in this column, however I'm not able to figure out the expression to do this. Other methods to achieve this also appreciated.
Other info:
1) each line ends with a '\n'. Although this can be removed and the HTML display is unchanged, I've left it in for the regular expression to use if required.
2) The string will not always have a nice header such as '$GPGGA' in this example, although I could add one to help ID the start of the line if required by the regex.
3) The columns may not be a uniform number of characters as indicated in this example.
The filters I'm working on are as follows
#register.filter(is_safe=True)
def highight_start(text):
return re.sub('regex to find 4th comma in each line', ",<span class='my_highlight'>", text, flags=re.MULTILINE)
#register.filter(is_safe=True)
def highight_end(text):
return re.sub('regex to find 5th comma in each line', "</span>,", text, flags=re.MULTILINE)
Regards
You can achieve that by replacing the 5th value with the value itself wrapped in your <span> tags.
RegEx: ^((?:[\w\d\.\$]+,){4})([\d\.]+)
Replacement: \1<span class='my_highlight'>\2</span>
Explained demo here: http://regex101.com/r/cX5iA0
Note: I assumed the 5th value will be digits and dots
Thanks #ka, who got me ontrack with this solution. My working filter uses:
expression = '^((?:[^,]+,){4})([^,]+)'
replace = r'\g<1><span class="my_highlight">\g<2></span>'
#[^,] also allows matching of hidden HTML tags in the text
#To get the groups to insert back into the text and not be overwritten, they need to be referenced as indicated in 'replace'.