\t Doesn't work in my code, neither does \n - python-2.7

Hey I am wondering why this doesn't work in my code,
I am lead to believe from other forums that putting \t and \n in speech marks should fix the result:
zoo = ("Kangaroo","Leopard","Moose")
print("Tuple:", zoo, "\tLength:", len(zoo))
print(type( zoo))
bag = {'Red','Green','Blue'}
bag.add('Yellow')
print('\nSet:',bag,'\tLength' , len(bag))
print(type(bag))
print('\nIs Green In bag Set?:','Green' in bag)
print('Is orange in bag set?:', 'Orange' in bag)
box = {'Red','Purple','Yellow'}
print('\nSet:',box,'\t\tLength' , len(box))
print('Common to both sets:' , bag.intersection(box))
It just says:
('Tuple:', ('Kangaroo', 'Leopard', 'Moose'), '\tLength:', 3)
<type 'tuple'>
('\nSet:', set(['Blue', 'Green', 'Yellow', 'Red']), '\tLength', 4)
<type 'set'>
('\nIs Green In bag Set?:', True)
('Is orange in bag set?:', False)
('\nSet:', set(['Purple', 'Yellow', 'Red']), '\t\tLength', 3)
('Common to both sets:', set(['Red', 'Yellow']))

print is a command, not a function, in python2.7, so the parentheses are being interpreted as surrounding tuples, which is what that is what gets printed. The control characters are being displayed (instead of their effects) because you aren't printing the strings directly, but as part of tuples.

One way you can do it without changing to much is this.
zoo = ("Kangaroo","Leopard","Moose")
zlength = len(zoo)
print "Tuple: {}, \tLength: {}".format(zoo,zlength)
print type(zoo)
bag = {'Red','Green','Blue'}
bag.add('Yellow')
blength = len(bag)
print '\nSet: {}, \tLength: {}'.format(list(bag), blength)
print type(bag)
print '\nIs Green In bag Set?:','Green' in bag
print 'Is orange in bag set?:', 'Orange' in bag
box = {'Red','Purple','Yellow'}
bolength = len(box)
print '\nSet: {}, \tLength: {}'.format(list(box),bolength)
print 'Common to both sets:' , list(bag.intersection(box))
OUPUT:
Tuple: ('Kangaroo', 'Leopard', 'Moose'), Length: 3
Set: ['Blue', 'Green', 'Yellow', 'Red'], Length: 4
Is Green In bag Set?: True
Is orange in bag set?: False
Set: ['Purple', 'Red', 'Yellow'], Length: 3
Common to both sets: ['Yellow', 'Red']

In Python 2.7 the name print is recognized as the print statement, not as a built-in function. You can disable the statement and use the print() function by adding the following future statement at the top of your module:
from __future__ import print_function
Thus, for example:
>>> zoo = ("Kangaroo","Leopard","Moose")
>>> print("Tuple:", zoo, "\tLength:", len(zoo))
('Tuple:', ('Kangaroo', 'Leopard', 'Moose'), '\tLength:', 3)
>>> from __future__ import print_function
>>> print("Tuple:", zoo, "\tLength:", len(zoo))
Tuple: ('Kangaroo', 'Leopard', 'Moose') Length: 3
>>>

Related

How to create a column in pandas dataframe using conditions defined in dict

Here's my code:
import pandas as pd
import numpy as np
input = {'name': ['Andy', 'Alex', 'Amy', "Olivia" ],
'rating': ['A', 'A', 'B', "B" ],
'score': [100, 60, 70, 95]}
df = pd.DataFrame(input)
df['valid1']=np.where((df['score']==100) & (df['rating']=='A'),'true','false')
The code above works fine to set a new column 'valid1' data as 'true' for score is 100 and 'rating' is A.
If the condition comes from a dict variable as
c = {'score':'100', 'rating':'A'}
How can I use the condition defined in c to get the same result 'valid' column value? I tried the following code
for key,value in c.iteritems():
df['valid2']=np.where((df[key]==value),'true','false')
got an error:
TypeError: Invalid type comparison
I'd define c as a pd.Series so that when you compare it to a dataframe, it automatically compares agains each row while matching columns with series indices. Note that I made sure 100 was an integer and not a string.
c = pd.Series({'score':100, 'rating':'A'})
i = df.columns.intersection(c.index)
df.assign(valid1=df[i].eq(c).all(1))
name rating score valid1
0 Andy A 100 True
1 Alex A 60 False
2 Amy B 70 False
3 Olivia B 95 False
You can use the same series and still use numpy to speed things up
c = pd.Series({'score':100, 'rating':'A'})
i = df.columns.intersection(c.index)
v = np.column_stack(df[c].values for c in i)
df.assign(valid1=(v == c.loc[i].values).all(1))
name rating score valid1
0 Andy A 100 True
1 Alex A 60 False
2 Amy B 70 False
3 Olivia B 95 False

Is it possible to convert a string to a list index in Python (2.7)?

I am an absolute beginner currently working on a Tic Tac Toe game and in the below example I am stuck on how to convert a string (e.g. "row1[0]") to a list index (e.g. row1[0]). Basically, I am unsure why eval(aa) = a does not work, but, for example, row1[0] = a does work (and yes, I am aware that eval() is usually frowned upon but have been unable to find any alternatives, as dictionaries, exec, and compile have all failed).
Please also note this is not the full code, just one of my attempts at figuring out the above. Would really appreciate your input on this specific step, I've been unable to find an answer so far. Thanks.
row1 = [_,_,_]
row2 = [_,_,_]
row3 = [_,_,_]
a = raw_input("Player 1, choose your marker - X or O: ")
aa = raw_input("Player 1, choose box (row#[box # - 1]): ")
#Attempt at assigning "X" or "O" to a row index.
eval(aa) = a
print row1
print row2
print row3
eval('row1[0]') = 'X' doesn't work for the same reason that 'a' = 'b' would not work.
eval('row1[0]') returns the value of the list row1 at index 0, not the reference to it.
If row1 is ['', '', ''] then eval(row1[0]) will return an empty string (''), and you can't assign the string 'X' to an empty string.
If you would like to use eval then the user will have to input the list name and the index separately.
row1 = ['', '', '']
eval('row1')[0] = 'X'
print row1
# ['X', '', '']
I realize that you are a beginner but my suggestion is to use a class so you will be able to use getattr. getattr is a built-in function in Python that accepts an object and a string and returns the object's attribute with that name.
class Board(object):
def __init__(self):
self.row1 = ['', '', '']
self.row2 = ['', '', '']
self.row3 = ['' ,'', '']
board = Board()
shape = raw_input("Player 1, choose your marker - X or O: ")
row_and_col = raw_input("Player 1, choose box (row#,col#): ") # user will input 1,0
# for example
row_number, col_number = row_and_col.split(',') # split divides the string it is called
# upon with the character it gets as
# the argument. In this case
# 'row_and_col' is '1,0' so after this
# line row_number is '1' and col_number
# is '0'
relevant_row = getattr(board, 'row' + row_number)
# now relevant_row actually holds the reference to the list,
# not its value as eval would have returned
relevant_row[int(col_number)] = shape

How can I print output with a defined number of characters in each line with python?

I used textwrap.fill (textwrap.fill(text, 6)) to limit each line in only 6 characters, but there is a problem with using this command because my purpose is go to new line exact at 6 character, I mean:
for example using textwrap.fill(I am a student, 8):
what I want:
(I am a s
tudent)
output:
(I am a
student)
One approach:
>>> text = 'I am a student, 8'
>>> text = 'I am a student'
>>> for i in range(0, len(text), 8):
... print text[i:i+8]
...
I am a s
tudent
for i in range(0, len(text), 8) means "Give me numbers starting at 0, incrementing by 8 at a time, and ending before the length of the text."
EDIT
If you want the value in a single string:
>>> wrapped = "\n".join(text[i:i+8] for i in range(0, len(text), 8))
>>> wrapped
'I am a s\ntudent'

How to filter on pandas dataframe when column data type is a list

I am having some trouble filtering a pandas dataframe on a column (let's call it column_1) whose data type is a list. Specifically, I want to return only rows such that column_1 and the intersection of another predetermined list are not empty. However, when I try to put the logic inside the arguments of the .where, function, I always get errors. Below are my attempts, with the errors returned.
Attemping to test whether or not a single element is inside the list:
table[element in table['column_1']]
returns the error ...
KeyError: False
trying to compare a list to all of the lists in the rows of the dataframe:
table[[349569] == table.column_1] returns the error Arrays were different lengths: 23041 vs 1
I'm trying to get these two intermediate steps down before I test the intersection of the two lists.
Thanks for taking the time to read over my problem!
consider the pd.Series s
s = pd.Series([[1, 2, 3], list('abcd'), [9, 8, 3], ['a', 4]])
print(s)
0 [1, 2, 3]
1 [a, b, c, d]
2 [9, 8, 3]
3 [a, 4]
dtype: object
And a testing list test
test = ['b', 3, 4]
Apply a lambda function that converts each element of s to a set and intersection with test
print(s.apply(lambda x: list(set(x).intersection(test))))
0 [3]
1 [b]
2 [3]
3 [4]
dtype: object
To use it as a mask, use bool instead of list
s.apply(lambda x: bool(set(x).intersection(test)))
0 True
1 True
2 True
3 True
dtype: bool
Hi for long term use you can wrap the whole work flow in functions and apply the functions where you need. As you did not put any example dataset. I am taking an example data set and resolving it. Considering I have text database. First I will find the #tags into a list then I will search the only #tags I want and filter the data.
# find all the tags in the message
def find_hashtags(post_msg):
combo = r'#\w+'
rx = re.compile(combo)
hash_tags = rx.findall(post_msg)
return hash_tags
# find the requered match according to a tag list and return true or false
def match_tags(tag_list, htag_list):
matched_items = bool(set(tag_list).intersection(htag_list))
return matched_items
test_data = [{'text': 'Head nipid mõnusateks sõitudeks kitsastel tänavatel. #TipStop'},
{'text': 'Homses Rooli Võimus uus #Peugeot208!\nVaata kindlasti.'},
{'text': 'Soovitame ennast tulevikuks ette valmistada, electric car sest uus #PeugeotE208 on peagi kohal! ⚡️⚡️\n#UnboringTheFuture'},
{'text': "Aeg on täiesti uueks roadtrip'i kogemuseks! \nLase ennast üllatada - #Peugeot5008!"},
{'text': 'Tõeline ikoon, mille stiil avaldab muljet läbi eco car, electric cars generatsioonide #Peugeot504!'}
]
test_df = pd.DataFrame(test_data)
# find all the hashtags
test_df["hashtags"] = test_df["text"].apply(lambda x: find_hashtags(x))
# the only hashtags we are interested
tag_search = ["#TipStop", "#Peugeot208"]
# match the tags in our list
test_df["tag_exist"] = test_df["hashtags"].apply(lambda x: match_tags(x, tag_search))
# filter the data
main_df = test_df[test_df.tag_exist]

Quick implementation of character n-grams for word

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and more efficient method for computing character n-grams?
b='student'
>>> y=[]
>>> for x in range(len(b)):
n=b[x:x+2]
y.append(n)
>>> y
['st', 'tu', 'ud', 'de', 'en', 'nt', 't']
Here is the result I would like to get:['st','tu','ud','de','nt]
Thanks in advance for your suggestions.
To generate bigrams:
In [8]: b='student'
In [9]: [b[i:i+2] for i in range(len(b)-1)]
Out[9]: ['st', 'tu', 'ud', 'de', 'en', 'nt']
To generalize to a different n:
In [10]: n=4
In [11]: [b[i:i+n] for i in range(len(b)-n+1)]
Out[11]: ['stud', 'tude', 'uden', 'dent']
Try zip:
>>> def word2ngrams(text, n=3, exact=True):
... """ Convert text into character ngrams. """
... return ["".join(j) for j in zip(*[text[i:] for i in range(n)])]
...
>>> word2ngrams('foobarbarblacksheep')
['foo', 'oob', 'oba', 'bar', 'arb', 'rba', 'bar', 'arb', 'rbl', 'bla', 'lac', 'ack', 'cks', 'ksh', 'she', 'hee', 'eep']
but do note that it's slower:
import string, random, time
def zip_ngrams(text, n=3, exact=True):
return ["".join(j) for j in zip(*[text[i:] for i in range(n)])]
def nozip_ngrams(text, n=3):
return [text[i:i+n] for i in range(len(text)-n+1)]
# Generate 10000 random strings of length 100.
words = [''.join(random.choice(string.ascii_uppercase) for j in range(100)) for i in range(10000)]
start = time.time()
x = [zip_ngrams(w) for w in words]
print time.time() - start
start = time.time()
y = [nozip_ngrams(w) for w in words]
print time.time() - start
print x==y
[out]:
0.314492940903
0.197558879852
True
Although late, NLTK has an inbuilt function that implements ngrams
# python 3
from nltk import ngrams
["".join(k1) for k1 in list(ngrams("hello world",n=3))]
['hel', 'ell', 'llo', 'lo ', 'o w', ' wo', 'wor', 'orl', 'rld']
Ths fucntion gives you ngrams for n = 1 to n:
def getNgrams(sentences, n):
ngrams = []
for sentence in sentences:
_ngrams = []
for _n in range(1,n+1):
for pos in range(1,len(sentence)-_n):
_ngrams.append([sentence[pos:pos+_n]])
ngrams.append(_ngrams)
return ngrams