Mapreduce... doing a udacity course - mapreduce

as part of an udacity course I should program a mapper and a reducer function .
My mapper function looks like this.. and I am pretty sure it is working and right:
def mapper():
for line in sys.stdin:
data= line.strip().split(",")
#logging.info("{0}\t{1}".format(data[1],data[6]))
print "{0}\t{1}".format(data[1],data[6])
mapper()
My reducer function somehow does not add it up correctly:
def reducer():
old_key=None
for line in sys.stdin:
data= line.strip().split("\t")
#logging.info(data)
` new_key=data[0]
ENTRIESn_hourly=data[1]
count=0
if new_key and new_key != ENTRIESn_hourly:
print "{0}\t{1}".format(new_key,count)
else:
count+= int(ENTRIESn_hourly)
reducer()
What am I missing here?

You are setting the count to zero every single input line. count=0 should only be called when you have a new key, something like:
def reducer():
old_key=None
for line in sys.stdin:
data= line.strip().split("\t")
#logging.info(data)
` new_key=data[0]
ENTRIESn_hourly=data[1]
if new_key and new_key != ENTRIESn_hourly:
print "{0}\t{1}".format(new_key,count)
count=0
else:
count+= int(ENTRIESn_hourly)

Related

BeautifulSoup iterating over a table and insert each row value into list variable TypeError: list indices must be integers, not Tag

I am iterating over a table in a for loop and I would like to store the value in a list variable. If i store it into a variable I just get the first value out when i return the value in my function call.
In each iteration of the for loop I have several values. Storing it in a variable is not good. I would need to store it into a list so I can capture all the values.
The error I get is:
list1[div] = div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
TypeError: list indices must be integers, not Tag
My code is:
def extract_testcases_from_report_htmltestrunner():
filename = (r"C:\temp\selenium_report\ClearCore501_Automated_GUI_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
for div in soup.select("#result_table tr div.testcase"):
print(div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8'))
list1 = []
for div in soup.select("#result_table tr div.testcase"):
var = div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
list1[div] = div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
return var
If i comment out return var My output from the print statement is:
('test_000001_login_valid_user', 'pass')
('test_000002_select_a_project', 'pass')
('test_000003_verify_Lademo_CRM_DataPreview_is_present', 'pass')
('test_000004_view_data_preview_Lademo_CRM_and_test_scrollpage', 'pass')
('test_000005_sort_data_preview_by_selecting_column_header', 'pass')
# etc. More tests
If i call the function with return var my output is:
('test_000001_login_valid_user', 'pass')
I would like my function call to return all the testcases. I think I would need to return it as a list. I can then call this function and iterate over the list and print it in my email code for the email message.
Thanks, Riaz
I have it returning as a list now. When i call the function and print it's return value it prints all the values in 1 line. I would like to separate it into separate lines.
def extract_testcases_from_report_htmltestrunner():
filename = (r"C:\temp\selenium_report\ClearCore501_Automated_GUI_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
list1 = []
for div in soup.select("#result_table tr div.testcase"):
#print(div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8'))
list1.append((div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')))
return list1
if __name__ == "__main__":
print extract_testcases_from_report_htmltestrunner()
The output is:
[('test_000001_login_valid_user', 'pass'), ('test_000002_select_a_project', 'pass'), ('test_000003_verify_Lademo_CRM_DataPreview_is_present', 'pass') etc.
I would like the output to be like:
[('test_000001_login_valid_user', 'pass')
('test_000002_select_a_project', 'pass')
('test_000003_verify_Lademo_CRM_DataPreview_is_present', 'pass')
etc.
Thanks, Riaz
You want to yield, you can only return from a function once so your function ends as soon as you hit the return on the first iteration:
from bs4 import BeautifulSoup
def extract_testcases_from_report_htmltestrunner():
filename = (r"C:\temp\selenium_report\ClearCore501_Automated_GUI_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
for div in soup.select("#result_table tr div.testcase"):
yield div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
all_data = list(extract_testcases_from_report_htmltestrunner())

removal the overlapping item in list by using 'while' in python 2.70

# -*- coding: utf-8 -*-
# 8-5 목록에서 중복 원소제거
def findUnique(list):
k=len(list)
for a in range(1, k-1):
i=0
while i<k:
if list[i] == list[a]:
del list[a]
else:
i=i+1
return list
list = raw_input("목록 원소들을 입력하세요: ").split()
findUnique(list)
list = findUnique(list)
print "갱신된 목록:", list
This is the program I have made. It did not work at all.
please tell me the solution.
There are two ways to get the unique results from a list.
The hard way:
strings=raw_input("Enter a line: ").split()
def unique(listing):
check={}
result=[]
for word in listing:
check.setdefault(word,False)
if check[word]==False:
result.append(word)
check[word]=True
return result
myresult=' '.join(unique(strings))
print "The result is: %s"%(myresult)
The easy way
strings=raw_input("Enter a line: ").split()
def unique(listing):
return list(set(listing))
myresult=' '.join(unique(strings))
print "The result is: %s"%(myresult)

Iterating through a .txt file in an odd way

What I am trying to do is write a program that opens a .txt file with movie reviews where the rating is a number from 0-4 followed by a short review of the movie. The program then prompts the user to open a second text file with words that will be matched against the reviews and given a number value based on the review.
For example, with these two sample reviews how they would appear in the .txt file:
4 A comedy-drama of nearly epic proportions rooted in a sincere performance by the title character undergoing midlife crisis . 2 Massoud 's story is an epic , but also a tragedy , the record of a tenacious , humane fighter who was also the prisoner -LRB- and ultimately the victim -RRB- of history .
So, if I were looking for the word "epic", it would increment the count for that word by 2 (which I already have figured out) since it appears twice, and then append the values 4 and 2 to a list of ratings for that word.
How do I append those ints to a list or dictionary related to that word? Keep in mind that I need to create a new list or dicitonary key for every word in a list of words.
Please and thank you. And sorry if this was poorly worded, programming isn't my forte.
All of my code:
def menu_validate(prompt, min_val, max_val):
""" produces a prompt, gets input, validates the input and returns a value. """
while True:
try:
menu = int(input(prompt))
if menu >= min_val and menu <= max_val:
return menu
break
elif menu.lower == "quit" or menu.lower == "q":
quit()
print("You must enter a number value from {} to {}.".format(min_val, max_val))
except ValueError:
print("You must enter a number value from {} to {}.".format(min_val, max_val))
def open_file(prompt):
""" opens a file """
while True:
try:
file_name = str(input(prompt))
if ".txt" in file_name:
input_file = open(file_name, 'r')
return input_file
else:
input_file = open(file_name+".txt", 'r')
return input_file
except FileNotFoundError:
print("You must enter a valid file name. Make sure the file you would like to open is in this programs root folder.")
def make_list(file):
lst = []
for line in file:
lst2 = line.split(' ')
del lst2[-1]
lst.append(lst2)
return lst
def rating_list(lst):
'''iterates through a list of lists and appends the first value in each list to a second list'''
rating_list = []
for list in lst:
rating_list.append(list[0])
return rating_list
def word_cnt(lst, word : str):
cnt = 0
for list in lst:
for word in list:
cnt += 1
return cnt
def words_list(file):
lst = []
for word in file:
lst.append(word)
return lst
##def sort(words, occurrences, avg_scores, std_dev):
## '''sorts and prints the output'''
## menu = menu_validate("You must choose one of the valid choices of 1, 2, 3, 4 \n Sort Options\n 1. Sort by Avg Ascending\n 2. Sort by Avg Descending\n 3. Sort by Std Deviation Ascending\n 4. Sort by Std Deviation Descending", 1, 4)
## print ("{}{}{}{}\n{}".format("Word", "Occurence", "Avg. Score", "Std. Dev.", "="*51))
## if menu == 1:
## for i in range (len(word_list)):
## print ("{}{}{}{}".format(cnt_list.sorted[i],)
def make_odict(lst1, lst2):
'''makes an ordered dictionary of keys/values from 2 lists of equal length'''
dic = OrderedDict()
for i in range (len(word_list)):
dic[lst2[i]] = lst2[i]
return dic
cnt_list = []
while True:
menu = menu_validate("1. Get sentiment for all words in a file? \nQ. Quit \n", 1, 1)
if menu == True:
ratings_file = open("sample.txt")
ratings_list = make_list(ratings_file)
word_file = open_file("Enter the name of the file with words to score \n")
word_list = words_list(word_file)
for word in word_list:
cnt = word_cnt(ratings_list, word)
cnt_list.append(word_cnt(ratings_list, word))
Sorry, I know it's messy and very incomplete.
I think you mean:
import collections
counts = collections.defaultdict(int)
word = 'epic'
counts[word] += 1
Obviously, you can do more with word than I have, but you aren't showing us any code, so ...
EDIT
Okay, looking at your code, I'd suggest you make the separation between rating and text explicit. Take this:
def make_list(file):
lst = []
for line in file:
lst2 = line.split(' ')
del lst2[-1]
lst.append(lst2)
return lst
And convert it to this:
def parse_ratings(file):
"""
Given a file of lines, each with a numeric rating at the start,
parse the lines into score/text tuples, one per line. Return the
list of parsed tuples.
"""
ratings = []
for line in file:
text = line.strip().split()
if text:
score = text[0]
ratings.append((score,text[1:]))
return ratings
Then you can compute both values together:
def match_reviews(word, ratings):
cnt = 0
scores = []
for score,text in ratings:
n = text.count(word)
if n:
cnt += n
scores.append(score)
return (cnt, scores)

Raise SystemExit not working

This is a simplified version of a portion of my program:
for i in range(5):
turn1l = []
turn1 = raw_input("Enter Value Using the format \'x,y\' : ")
turn1l.append(turn1)
def winnerchecker():
if "1,1" in turn1l and "1,2" in turn1l and "1,3" in turn1l:
print xplayer, "YOU HAVE WON! GG TO ", name
raise SystemExit()
winnerchecker()
For some reason every time I enter "1,1" then "1,2" then "1,3" it doesn't stop the porgram, it keeps going. How do I get it to stop, is there any way I'm not aware of? Thank you!
Don't reinitialize the list turn1l at the beginning of every iteration of the loop:
for i in range(5):
turn1l = []
Instead, make the list once:
turn1l = []
for i in range(5):
...

Can not constructing a class from another file

Here are my two files:
Character.py
def Check_File(fn):
try:
fp = open(fn);
except:
return None;
return fp;
class Character:
## variables ##
## atk ##
## def ##
## HP ##
## empty inv ##
'''
init (self, filename),
RETURN -1 == if the file is not exist
RETURN 0 == all good
all files will be save in format of
"skill: xxx, xxx; xxx, xxx; xxx, xxx;"
'''
def __init__(self, fn):
fp = Check_File(fn);
if(fp == None):
print "Error: no such file"
return None;
self.stats = {};
for line in fp:
nline = line.strip().split(": ");
if(type(nline) != list):
continue;
else:
self.stats[nline[0]] = nline[1];
##print self.stats[nline[0]]
fp.close();
'''
display character
'''
def Display_Character(self):
print "The Character had:";
## Parse into the character class ##
for item in self.stats:
print item + ": " + self.stats[item];
print "Finished Stats Displaying";
print Character("Sample.dat").stats
Another one is:
Interface.py
##from Interface_helper import *;
from Character import *;
wind = Character("Sample.dat");
wind.Display_Character();
When I run the code in Character.py, it gives
%run "C:/Users/Desktop/Helper Functions/New Folder/Character.py"
{'item': 'weird stuff', 'hp': '100', 'name': 'wind', 'def': '10', 'atk': '10'}
But when I run Interface.py:
I had
%run "C:/Users/Desktop/Helper Functions/New Folder/Interface.py"
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
E:\canopy\Canopy\App\appdata\canopy-1.4.0.1938.win-x86_64\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
195 else:
196 filename = fname
--> 197 exec compile(scripttext, filename, 'exec') in glob, loc
198 else:
199 def execfile(fname, *where):
C:\Users\Desktop\Helper Functions\New Folder\Interface.py in <module>()
13 from Character import *;
14
---> 15 wind = Character("Sample.dat");
16
17
C:\Users\Desktop\Helper Functions\New Folder\Character.py in __init__(self, fn)
48 for line in fp:
49 nline = line.strip().split(": ");
---> 50 if(type(nline) != list):
51 continue;
52 else:
AttributeError: Character instance has no attribute 'stats'
I was wondering what is going on for this piece of code, did I import the wrong way?
No there is no issue with your import. Are you sure that you are in the same location for both runs? Since your code just specifies the filename with no path, your python session needs to run in the directory where the Sample.dat file is located. The reason why I am asking this, is because you define a stats attribute in the middle of your __init__ and the only thing that can happen for this not to exist is for the return None above it to be invoked. Which happens only when the file doesn't exist (meaning doesn't exist where it looks, which is where you are running from).
P.S. in python:
Semicolons are not needed at the end of lines
Parentheses are not needed around the condition in an if statement
Classes should derive from object: class Character(object):
Docstrings (the strings you put in triple quotes above the method names) should be right below the method name. That will allow ipython and other tools to pick them up and display them when users put a question mark in front of them.