Ok, I've tried all the methods in Convert a list to a dictionary in Python, but I can't seem to get this to work right. I'm trying to convert a list that I've made from a .txt file into a dictionary. So far my code is:
import os.path
from tkinter import *
from tkinter.filedialog import askopenfilename
import csv
window = Tk()
window.title("Please Choose a .txt File")
fileName = askopenfilename()
classInfoList = []
classRoster = {}
with open(fileName, newline = '') as listClasses:
for line in csv.reader(listClasses):
classInfoList.append(line)
The .txt file is in the format:
professor
class
students
An example would be:
Professor White
Chem 101
Jesse Pinkman, Brandon Walsh, Skinny Pete
The output I desire would be a dictionary with professors as the keys, and then the class and list of students for the values.
OUTPUT:
{"Professor White": ["Chem 101", [Jesse Pinkman, Brandon Walsh, Skinny Pete]]}
However, when I tried the things in the above post, I kept getting errors.
What can I do here?
Thanks
Since the data making up your dictionary is on consecutive lines, you will have to process three lines at once. You can use the next() method on the file handle like this:
output = {}
input_file = open('file1')
for line in input_file:
key = line.strip()
value = [next(input_file).strip()]
value.append(next(input_file).split(','))
output[key] = value
input_file.close()
This would give you:
{'Professor White': ['Chem 101',
['Jesse Pinkman, Brandon Walsh, Skinny Pete']]}
Related
I have been searching for an answer to this, but can not seem to get what I need. I would like a python script that reads my text file and starting from the top working its way through each line of the file and then prints out all the matches in another txt file. Content of the text file is just 4 digit numbers like 1234.
example
1234
3214
4567
8963
1532
1234
...and so on.
I would like the output to be something like:
1234 : matches found = 2
I know that there are matches in the file do to almost 10000 lines. I appreciate any help. If someone could just point me in the right direction here would be great. Thank you.
import re
file = open("filename", 'r')
fileContent=file.read()
pattern="1234"
print len(re.findall(pattern,fileContent))
If I were you I would open the file and use the split method to create a list with all the numbers in and use the Counter method from collections to count how many of each number in the list are dupilcates.
`
from collections import Counter
filepath = 'original_file'
new_filepath = 'new_file'
file = open(filepath,'r')
text = file.read()
file.close()
numbers_list = text.split('\n')
numbers_set = set(numbers_list)
dupes = [[item,':matches found =',str(count)] for item,count in Counter(numbers_list).items() if count > 1]
dupes = [' '.join(i) for i in dupes]
new_file = open(new_filepath,'w')
for i in dupes:
new_file.write(i)
new_file.close()
`
Thanks to everyone who helped me on this. Thank you to #csabinho for the code he provided and to #IanAuld for asking me "Why do you think you need recursion here?" – IanAuld. It got me to thinking that the solution was a simple one. I just wanted to know which 4 digit numbers had duplicates and how many, and also which 4 digit combos were unique. So this is what I came up with and it worked beautifully!
import re
a=999
while a <9999:
a = a+1
file = open("4digits.txt", 'r')
fileContent = file.read()
pattern = str(a)
result = len(re.findall(pattern, fileContent))
if result >= 1:
print(a,"matches",result)
else:
print (a,"This number is unique!")
I am new to python, and im just trying to get a feel for the language
I have a file called lion.txt that has this text:
The lion (Panthera leo) is one of the big cats in the genus Panthera and a member of the family Felidae. The commonly used term African lion collectively denotes the several subspecies in Africa. With some males exceeding=250/12 kg (550 lb) in weight,[4].
What I want my program to do is search for the keyword exceeding and write only the value 250 to another file called searched.txt. At very best is it possible to store it as a variable and then print it to another text file?
This is what I have so Far:
import os
import re
os.chdir("C:\Python 2016 Training\lionfolder")
f = open("lion.txt", "r")
w = open("searched.txt", "w")
k = [] #Figured a dictionary would be the best way to deal with this?
for line in f:
if re.match('(.*)exceeding(.*)', line):
w.write(k[1] = "line")
Is what im asking to do even possible with Python?
Thank you in advance
Regards,
Kevin.
Not bad for a first attempt. You're close to a working solution, but missing some critical parts. Try this:
import os
import re
f = open('lion.txt', 'r')
w = open('searched.txt', 'w')
for line in f:
match = re.search('exceeding\=(\d+)', line)
if match:
w.write(match.group(1))
w.close()
f.close()
There are better ways of doing this, but I have tried to stay as close to your original code as possible, so you don't get lost.
I'm writing a python executable script that does the following:
I want to gather information from a .csv file and read it into python as a dictionary. This .csv file contains several columns of information with headings, and I only want to extract particular columns (those columns with specific headings I want) , and print those columns out to another .csv file. I am using the functions DictReader and DictWriter.
I am reading in the .csv file as a dictionary (with the headings being the key and the column values being the items),and output the information as a dictionary to another .csv file.
After I read it in, I print out the items in the particular headings (so I can double check what I have read it). I then open up a new .csv file and want to write the data (which I have just read in) as a dictionary. I can write in the keys (column headings) but my code doesn't print any of the item values for some reason. The headings that I want in this case are 'Name' and 'DOB'.
Here is my code:
#!/usr/bin/python
import os
import os.path
import re
import sys
import pdb
import csv
csv_file = csv.DictReader(open(sys.argv[1],'rU'),delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
test_file = open('test2.csv','wr')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
Any ideas of where I'm going wrong ? I want to print the item values under their their corresponding column headers in the output file.
I am using python 2.7.11 on a Mac machine. I am also printing values to the terminal.
You're unfortunately tricked by your own testing, that is, the printing of the individual rows. By looping through csv_file initially, you've exhausted the iterator and are at the end. Further iterations, as done in the bottom of your code, are not possible and will be ignored.
Your question is essentially a duplicate of various other question, such as how to read from a CSV file repeatedly. Albeit that the issue here comes up in a different way: you didn't realise what the problem was, while those questions do know the cause, but not the solution.
Answers to those questions tell you to simply reset the file pointer of the input file. Unfortunately, the input file gets closed promptly after reading, in your current code.
Thus, something like this should work:
infile = open(sys.argv[1], 'rU')
csv_file = csv.DictReader(infile ,delimiter = ',')
<all other code>
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
infile.close()
As an aside, just use the with statement when opening files:
with open(sys.argv[1], 'rU') as infile, open('test2.csv', 'wr') as outfile:
csv_file = csv.DictReader(infile ,delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
csvwriter = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames)
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
Note: DictWriter will take care of the header row. No need to write it yourself.
I have a list which contains list of file names, i wanted to sort based on timestamp, which ( i.e timestamp ) is inbuild in each file name.
Note: In file, Hello_Hi_2015-02-20T084521_1424543480.tar.gz --> 2015-02-20T084521 represents as "year-moth-dayTHHMMSS" ( Based on this i wanted to sort )
Input file below:
file_list = ['Hello_Hi_2015-02-20T084521_1424543480.tar.gz',
'Hello_Hi_2015-02-20T095845_1424543481.tar.gz',
'Hello_Hi_2015-02-20T095926_1424543481.tar.gz',
'Hello_Hi_2015-02-20T100025_1424543482.tar.gz',
'Hello_Hi_2015-02-20T111631_1424543483.tar.gz',
'Hello_Hi_2015-02-20T111718_1424543483.tar.gz',
'Hello_Hi_2015-02-20T112502_1424543483.tar.gz',
'Hello_Hi_2015-02-20T112633_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113427_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113456_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113608_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113659_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113809_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113901_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113955_1424543485.tar.gz',
'Hello_Hi_2015-03-20T114122_1424543485.tar.gz',
'Hello_Hi_2015-02-20T114532_1424543486.tar.gz',
'Hello_Hi_2015-02-20T120045_1424543487.tar.gz',
'Hello_Hi_2015-02-20T120146_1424543487.tar.gz',
'Hello_WR_2015-02-20T084709_1424543480.tar.gz',
'Hello_WR_2015-02-20T113016_1424543486.tar.gz']
Output should be:
file_list = ['Hello_Hi_2015-02-20T084521_1424543480.tar.gz',
'Hello_WR_2015-02-20T084709_1424543480.tar.gz',
'Hello_Hi_2015-02-20T095845_1424543481.tar.gz',
'Hello_Hi_2015-02-20T095926_1424543481.tar.gz',
'Hello_Hi_2015-02-20T100025_1424543482.tar.gz',
'Hello_Hi_2015-02-20T111631_1424543483.tar.gz',
'Hello_Hi_2015-02-20T111718_1424543483.tar.gz',
'Hello_Hi_2015-02-20T112502_1424543483.tar.gz',
'Hello_Hi_2015-02-20T112633_1424543484.tar.gz',
'Hello_WR_2015-02-20T113016_1424543486.tar.gz',
'Hello_Hi_2015-02-20T113427_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113456_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113608_1424543484.tar.gz',
'Hello_Hi_2015-02-20T113659_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113809_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113901_1424543485.tar.gz',
'Hello_Hi_2015-02-20T113955_1424543485.tar.gz',
'Hello_Hi_2015-02-20T114532_1424543486.tar.gz',
'Hello_Hi_2015-02-20T120045_1424543487.tar.gz',
'Hello_Hi_2015-02-20T120146_1424543487.tar.gz',
'Hello_Hi_2015-03-20T114122_1424543485.tar.gz']
Below is the code which i have tried.
def sort( dir ):
os.chdir( dir )
file_list = glob.glob('Hello_*')
file_list.sort(key=os.path.getmtime)
print("\n".join(file_list))
return 0
Thanks in advance!!
So this worked for me and it sorted files by created time that did not have the time stamp in the name;
import os
import re
files = [file for file in os.listdir(".") if (file.lower().endswith('.gz'))]
files.sort(key=os.path.getmtime)
for file in sorted(files,key=os.path.getmtime):
print(file)
Would this work?
You could write list contents to a file line by line and read the file:
lines = sorted(open(open_file).readlines(), key = lambda line :
line.split("_")[2])
Further, you could print out lines.
Your code is trying to sort based on the filesystem-stored modified time, not the filename time.
Since your filename encoding is slightly sane :-) if you want to sort based on filename alone, you may use:
sorted(os.listdir(dir), key=lambda s: s[9:]))
That will do, but only because the timestamp encoding in the filename is sane: fixed-length prefix, zero-padded, constant-width numbers, going in sequence from biggest time reference (year) to the lowest one (second).
If your prefix is not fixed, you can try something with RegExp like this (which will sort by the value after the second underscore):
import re
pat = re.compile('_.*?(_)')
sorted(os.listdir(dir), key=lambda s: s[pat.search(s).end():])
So, I'm a bit new to Python and I've come across the following problem in one of my codes:
I have a txt file with the following text:
Jolly 77777
Fargo 88888
Hunt 68548
I want to convert it into a dictionary with BOTH the name and number as keys... Here's what I have so far but I keep getting a traceback error and am not sure as to what error I am making. It's driving me nuts; Help?
This is what I have so far:
filename = open("ident.txt","r")
dictionary={}
with open('ident.txt','r') as f:
for line in f.readlines():
a,b = line.split()
dictionary[a] = int(b)
You're close:
dictionary = {}
with open('ident.txt','r') as f:
for line in f:
a,b = line.split()
dictionary[a] = int(b)
That yields a dictionary value of:
{'Fargo': 88888, 'Hunt': 68548, 'Jolly': 77777}
FWIW, the line filename = open("ident.txt","r") isn't going to do you any favors, since filename will end up being an open file, not a filename. And you don't need f.readlines(). Files iterate fine on their own, line by line.