I have a program that works as intended:
import os
import hashlib
from pprint import pprint
def md5(fname):
hash_md5 = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
lst = []
for dirpath, dirnames, filenames in os.walk('d:\\python\\exercism.io'):
d = {dirpath: filenames}
for filename in filenames:
d[filename] = [os.stat(dirpath).st_mtime, md5(dirpath + '\\' + filename)]
# d[filename] = [os.stat(dirpath).st_mtime]
# d[filename] = [md5(dirpath + '\\' + filename)]
lst.append(d)
pprint(lst)
My question is this:
If I get rid of this line:
d[filename] = [os.stat(dirpath).st_mtime, md5(dirpath + '\\' + filename)]
and try to use the two commented out lines (with a modification -- see below) it fails.
1) Either commented out line works by themselves. I get a key: value pair in which the value is a list.
I then want to add the value of the second commented out line to the list.
I was trying this:
d[filename][1] = md5(dirpath + '\\' + filename)
but I get an index out of range error. The first element of the list
should be item [0], the second should be [1].
Partial output:
[{'ceasar_cipher.py': [1512494094.5630972, '844e069c90ebdb3e1e5f5dd56da2ac2e'],
'd:\\python\\exercism.io': ['ceasar_cipher.py',
'difference_of_squares.py',
'gigasecond.py',
'grains_in_python.py',
'hamming-compare.py',
'isogram.py',
'leap_year.py',
'rna_transcription.py',
'run_length_encoding.py'],
Note the key: 'ceasar_cipher.py' has a value which is a two element list.
I want to construct the exact same output with the two commented out lines
versus the single line I am using now (just so I can, not that I should). My concern is simply what am I doing wrong.
You need to append to your list:
d[filename] = [os.stat(dirpath).st_mtime]
d[filename].append(md5(dirpath + '\\' + filename))
to get the same effect. There is no item at index 1 yet. You need to create it, for example with append().
Related
I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I donโt what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y
First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match ๐๐๐ ")
else:
print('๐ Sorry! No matches! Something is not right! Call 911 ๐ฎ')
I have some data that I've pulled from a website. This is the code I used to grab it (my actual code is much longer but I think this about sums it up).
lid_restrict_save = []
for t in range(10000,10020):
address = 'http://www.tspc.oregon.gov/lookup_application/' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
#District Restriction
dist_restrict = tree.xpath('//tr[11]//text()')
if u"District Restriction" in dist_restrict:
lid_restrict_save.append(id2)
I'm trying to export this list:
print lid_restrict_save
[['5656966VP65', '5656966RR68', '56569659965', '56569658964']]
to a text file.
f = open('dis_restrict_no_uniqDOB2.txt', 'r+')
for j in range(0,len(lid_restrict_save)):
s = ( (unicode(lid_restrict_save[j]).encode('utf-8') + ' \n' ))
f.write(s)
f.close()
I want the text to come out looking like this:
5656966VP65
5656966RR68
56569659965
56569658964
This code worked but only when I started the range from 0.
f = open('dis_restrict.txt', 'r+')
for j in range(0,len(ldob_restrict)):
f.write( ldob_restrict[j].encode("utf-8") + ' \n' )
f.close()
When I've tried changing the code I keep getting this error:
"AttributeError: 'list' object has no attribute 'encode'."
I've tried the suggestions from here, here, and here but to no avail.
If anyone has any hints it would be greatly appreciated.
lid_restrict_save is a nested list so you can't encode the first element because it is not a string.
You could write to the txt file using this:
lid_restrict_save = [['5656966VP65', '5656966RR68', '56569659965', '56569658964']]
lid_restrict_save = lid_restrict_save[0] # remove the outer list
with open('dis_restrict.txt', 'r+') as f:
for i in lid_restrict_save:
f.write(str(i) + '\n')
I'm having some issues with subtitles, I need a way to detect specific errors. I think regular expressions would help but need help figuring this one out. In this example of SRT formatted subtitle, line #13 ends at 00:01:10,130 and line #14 begins at 00:01:10:129.
13
00:01:05,549 --> 00:01:10,130
some text here.
14
00:01:10,129 --> 00:01:14,109
some other text here.
Problem is that next line can't begin before current one is over - embedding algorithm doesn't work when that happens. I need to check my SRT files and correct this manually, but looking for this manually in about 20 videos each an hour long just isn't an option. Specially since I need it 'yesterday' (:
Format for SRT subtitles is very specific:
XX
START --> END
TEXT
EMPTY LINE
[line number (digits)][new line character]
[start and end times in 00:00:00,000 format, separated by _space__minusSign__minusSign__greaterThenSign__space_][new line character]
[text - can be any character - letter, digit, punctuation sign.. pretty much anything][new line character]
[new line character]
I need to check if END time is greater then START time of the following subtitle. Help would be appreciated.
PS. I can work with Notepad++, Eclipse (Aptana), python or javascript...
Regular expressions can be used to achieve what you want, that being said, they can't do it on their own. Regular expressions are used for matching patterns and not numerical ranges.
If I where you, what I would do would be as following:
Parse the file and place the start-end time in one data structure (call it DS_A) and the text in another (call it DS_B).
Sort DS_A in ascending order. This should guarantee that you will not have overlapping ranges. (This previous SO post should point you in the right direction).
Iterate over and write the following in your file:j DS_A[i] --> DS_A[i + 1] <newline> DS_B[j] where i is a loop counter for DS_A and j is a loop counter for DS_B.
I ended up writing short script to fix this. here it is:
# -*- coding: utf-8 -*-
from datetime import datetime
import getopt, re, sys
count = 0
def fix_srt(inputfile):
global count
parsed_file, errors_file = '', ''
try:
with open( inputfile , 'r') as f:
srt_file = f.read()
parsed_file, errors_file = parse_srt(srt_file)
except:
pass
finally:
outputfile1 = ''.join( inputfile.split('.')[:-1] ) + '_fixed.srt'
outputfile2 = ''.join( inputfile.split('.')[:-1] ) + '_error.srt'
with open( outputfile1 , 'w') as f:
f.write(parsed_file)
with open( outputfile2 , 'w') as f:
f.write(errors_file)
print 'Detected %s errors in "%s". Fixed file saved as "%s"
(Errors only as "%s").' % ( count, inputfile, outputfile1, outputfile2 )
previous_end_time = datetime.strptime("00:00:00,000", "%H:%M:%S,%f")
def parse_times(times):
global previous_end_time
global count
_error = False
_times = []
for time_code in times:
t = datetime.strptime(time_code, "%H:%M:%S,%f")
_times.append(t)
if _times[0] < previous_end_time:
_times[0] = previous_end_time
count += 1
_error = True
previous_end_time = _times[1]
_times[0] = _times[0].strftime("%H:%M:%S,%f")[:12]
_times[1] = _times[1].strftime("%H:%M:%S,%f")[:12]
return _times, _error
def parse_srt(srt_file):
parsed_srt = []
parsed_err = []
for srt_group in re.sub('\r\n', '\n', srt_file).split('\n\n'):
lines = srt_group.split('\n')
if len(lines) >= 3:
times = lines[1].split(' --> ')
correct_times, error = parse_times(times)
if error:
clean_text = map( lambda x: x.strip(' '), lines[2:] )
srt_group = lines[0].strip(' ') + '\n' + ' --> '.join( correct_times ) + '\n' + '\n'.join( clean_text )
parsed_err.append( srt_group )
parsed_srt.append( srt_group )
return '\r\n'.join( parsed_srt ), '\r\n'.join( parsed_err )
def main(argv):
inputfile = None
try:
options, arguments = getopt.getopt(argv, "hi:", ["input="])
except:
print 'Usage: test.py -i <input file>'
for o, a in options:
if o == '-h':
print 'Usage: test.py -i <input file>'
sys.exit()
elif o in ['-i', '--input']:
inputfile = a
fix_srt(inputfile)
if __name__ == '__main__':
main( sys.argv[1:] )
If someone needs it save the code as srtfix.py, for example, and use it from command line:
python srtfix.py -i "my srt subtitle.srt"
I was lazy and used datetime module to process timecodes, so not sure script will work for subtitles longer then 24h (: I'm also not sure when miliseconds were added to Python's datetime module, I'm using version 2.7.5; it's possible script won't work on earlier versions because of this...
I used the following code to compare two text files
import difflib
with open("D:/Dataset1/data/1/hy/0/Info.txt") as f, open("D:/Dataset1/data/2/hy/0/Info.txt") as g:
flines= f.readlines()
glines= g.readlines()
d = difflib.Differ()
diff = d.compare(flines, glines)
print("\n".join(diff))
and I got this result:
- Local Config: HKEY_CURRENT_USER\Software\Microsoft\Uwxa\Kavi
? ^^^ ^^^
+ Local Config: HKEY_CURRENT_USER\Software\Microsoft\Otgad\Hyikqomi
? ^^^ + ^^^^^^^
any idea how to skip the blank lines?
The result of difflib.Differ.compare already contains newlines.
>>> import difflib
>>> list(difflib.Differ().compare(['1\n', '2\n'], ['1\n', '3\n']))
[' 1\n', '- 2\n', '+ 3\n']
>>> print ''.join(difflib.Differ().compare(['1\n', '2\n'], ['1\n', '3\n']))
1
- 2
+ 3
Joining the result with \n add additional newlines.
Replace following line:
print("\n".join(diff))
with (joining with empty string instead of newline):
print("".join(diff))
I couldnt do it with linejunk function ofdifflib.ndiff (i think it would have been a better solution). But end up using strip() function that works for me:
diff = difflib.ndiff(file1.readlines(), file2.readlines());
for x in diff:
if x.strip() == "+" or x.strip() == "-":
print("Blank Line... Ignore")
else:
print("Non Blank");