Extracting elements from list using Python - python-2.7

How can I extract '1' '11' and '111' from this list ?
T0 = ['4\t1\t\n', '0.25\t11\t\n', '0.2\t111\t\n']
to extract '4', '0.25' and '0.2' I used this :
def extract(T0):
T1 = []
for i in range(0, len(T0)):
pos = T0[i].index('\t')
T1.append(resultat[i][0: pos])
return T1
then I got :
T1 = ['4','0.25','0.2']
but for the rest I don't know how to extract it
can you help me please?

Using your code as base, it can be done as below. Will return as string if its alphabet, otherwise return as decimal integer.
def extract(T0):
T1=[]
for i in range len(T0):
tmp = T0[i].split('\t')[1]
if tmp.isalpha():
T1.append(tmp)
else:
T1.append(int(tmp))
return T1
Alternatively, try below for a more compact code using list comprehension
def extract(T0):
# return as string if its alphabet else return as decimal integer
# change int function to float if wanna return as float
tmp = [i.split('\t')[1] for i in T0]
return [i if i.isalpha() else int(i) for i in tmp]
Example
T0= ['X\tY\tf(x.y)\n', '0\t0\t\n', '0.1\t10\t\n', '0.2\t20\t\n', '0.3\t30\t\n']
extract(T0) # return ['Y', 0, 10, 20, 30]

You can accomplish this with the re module and a list comprehension.
import re
# create a regular expression object
regex = re.compile(r'[0-9]{1,}\.{0,1}[0-9]{0,}')
# assign the input list
T0 = ['4\t1\t\n', '0.25\t11\t\n', '0.2\t111\t\n']
# get a list of extractions using the regex
extractions = [x for x in [re.findall(regex, e) for e in T0]]
print extractions
# => [['4', '1'], ['0.25', '11'], ['0.2', '111']]

Related

Pyparsing sum of optionally repeated expressions

I am working on an simple parser to handle expression such as:
"""
FOO*1.5
+
BAR*3
"""
To get an end numeric result, where FOO and BAR are replaced at runtime by the values returned by external function executions. For example: FOO ---> def foo():return 2 and BAR ---> def bar():return 4. Which in our example would yield (2*1.5)+(4*3) = 3+12 = 14.
This is what I have so far:
from pyparsing import *
from decimal import Decimal
WEIGHT_OPERATORS = ['*', '/']
NUMERIC_OPERATORS = ['+', '-']
def make_score(input):
if input[0] == 'FOO':
return 5
elif input[0] == 'BAR':
return 10
return 1
def make_decimal(input):
try:
return Decimal(input[0])
except ValueError:
pass
return 0
SCORE = Word(alphanums + '_').setParseAction(make_score)
WEIGHT_OPERATOR = oneOf(WEIGHT_OPERATORS)
WEIGHT = Word(nums+'.').setParseAction(make_decimal)
INDIVIDUAL_EXPRESSION = SCORE('score') \
+ WEIGHT_OPERATOR('weight_operator') \
+ WEIGHT('weight')
print INDIVIDUAL_EXPRESSION
print INDIVIDUAL_EXPRESSION.parseString(expression).dump()
Up to here, all works well.
What I miss is the ability to "chain" INDIVIDUAL_EXPRESSIONs together to add/substract them together, as in the simple example above. I have tried:
GLOBAL_EXPRESSION = infixNotation(
INDIVIDUAL_EXPRESSION,
[
(NUMERIC_OPERATORS, 2, opAssoc.RIGHT,)
# or (NUMERIC_OPERATORS, 1, opAssoc.LEFT,), etc... :(
]
)
print GLOBAL_EXPRESSION
print GLOBAL_EXPRESSION.parseString(expression).dump()
Nope.
And:
INDIVIDUAL_EXPRESSION = SCORE('score') \
+ WEIGHT_OPERATOR('weight_operator') \
+ WEIGHT('weight')
+ ZeroOrMore(NUMERIC_OPERATORS)
To get the final list or dict that would easy to compute, to no avail. I am doing something wrong, but what?
Try this:
GLOBAL_EXPRESSION = OneOrMore(Group(INDIVIDUAL_EXPRESSION) + Optional(oneOf(NUMERIC_OPERATORS)))
GE_LIST = Group(delimitedList(GLOBAL_EXPRESSION))
print GE_LIST.parseString(expression)

Python 3.6 - Regular Expression with

The following code returns a dictionary of part numbers from a spreadsheet and works as intended.
import openpyxl, os, pprint, re
wb = openpyxl.load_workbook('RiverbedInventory.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
max_row = sheet.max_row
inventory = {}
for row in range(1,max_row+1):
prodName = sheet['G' + str(row)].value
inventory.setdefault (prodName, {'count': -0})
inventory[prodName] ['count'] += 1
pprint.pprint(inventory)
I'm trying to filter the results using a regular expression to only return part #s matching specific criteria (part #s that start with VCX in this case). I keep getting "TypeError: expected string or bytes-like object" failure messages. I've googled this quite a bit but can't find an answer. Here's the regular expression code I'm using:
import openpyxl, os, pprint, re
wb = openpyxl.load_workbook('RiverbedInventory.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
max_row = sheet.max_row
steelhead = re.compile(r'VCX-\d+-\w+')
inventory = {}
for row in range(1,max_row+1):
prodCode = sheet['G' + str(row)].value
inventory.setdefault (prodCode, {'count': -0})
inventory[prodCode]['count'] += 1
pprint.pprint (steelhead.findall(inventory))
working vs non-working
In steelhead.findall(inventory), you pass a dictionary instead of a string. re.findall expects a string as the second argument.
You may use dictionary comprehension here:
print( {k: inventory[k] for k in inventory if steelhead.search(k)} )
See the Python 3 demo:
import re
inventory = {'UMTS-UNV-E': {'count':59}, 'VCX-020-E': {'count':2}, 'VCX-030-E': {'count':3}}
steelhead = re.compile(r'VCX-\d+-\w+')
print( {k: inventory[k] for k in inventory if steelhead.search(k)} )
Output: {'VCX-030-E': {'count': 3}, 'VCX-020-E': {'count': 2}}

Python 2.7 dictionary value not taking float as the input

I am working with Python 2.7 and trying to insert a value which is a float to a key. However, all the values are being inserted as 0.0. The polarity value is being inserted as 0.0 and not the actual value.
Code Snippet:
from textblob import TextBlob
import json
with open('new-webmd-answer.json') as data_file:
data = json.load(data_file, strict=False)
data_new = {}
lst = []
for d in data:
string = d["answerContent"]
blob = TextBlob(string)
#print blob
#print blob.sentiment
#print d["questionId"]
data_new['questionId'] = d["questionId"]
data_new['answerMemberId'] = d["answerMemberId"]
string1 = str(blob.sentiment.polarity)
print string1
data_new['polarity'] = string1
#print blob.sentiment.polarity
lst.append((data_new))
json_data = json.dumps(lst)
#print json_data
with open('polarity.json', 'w') as outfile:
json.dump(json_data, outfile)
The way your code is currently written, you're overwriting the dictionary with each iteration. Then you append that dictionary to the list multiple times.
lets say your dictionary was dict = {"a" : 1} and then you append that to a list
alist.append(dict)
alist
[{'a' : 1}]
Then you change the value of dict, dict{"a" : 0} and append it to the list again alist.append(dict)
alist
[{'a' : 0}, {'a' : 0}]
This occurs because dictionaries are mutable. For a more complete overview on mutable vs unmutable objects see the docs here
To achieve your expected output, make a new dictionary with each iteration of data
lst = []
for d in data:
data_new = {} # makes a new dictionary with each iteration
string = d["answerContent"]
blob = TextBlob(string)
# print blob
# print blob.sentiment
# print d["questionId"]
data_new['questionId'] = d["questionId"]
data_new['answerMemberId'] = d["answerMemberId"]
string1 = str(blob.sentiment.polarity)
print string1
data_new['polarity'] = string1
# print blob.sentiment.polarity
lst.append((data_new))

find all ocurrences inside a list

I'm trying to implement a function to find occurrences in a list, here's my code:
def all_numbers():
num_list = []
c.execute("SELECT * FROM myTable")
for row in c:
num_list.append(row[1])
return num_list
def compare_results():
look_up_num = raw_input("Lucky number: ")
occurrences = [i for i, x in enumerate(all_numbers()) if x == look_up_num]
return occurrences
I keep getting an empty list instead of the ocurrences even when I enter a number that is on the mentioned list.
Your code does the following:
It fetches everything from the database. Each row is a sequence.
Then, it takes all these results and adds them to a list.
It returns this list.
Next, your code goes through each item list (remember, its a sequence, like a tuple) and fetches the item and its index (this is what enumerate does).
Next, you attempt to compare the sequence with a string, and if it matches, return it as part of a list.
At #5, the script fails because you are comparing a tuple to a string. Here is a simplified example of what you are doing:
>>> def all_numbers():
... return [(1,5), (2,6)]
...
>>> lucky_number = 5
>>> for i, x in enumerate(all_numbers()):
... print('{} {}'.format(i, x))
... if x == lucky_number:
... print 'Found it!'
...
0 (1, 5)
1 (2, 6)
As you can see, at each loop, your x is the tuple, and it will never equal 5; even though actually the row exists.
You can have the database do your dirty work for you, by returning only the number of rows that match your lucky number:
def get_number_count(lucky_number):
""" Returns the number of times the lucky_number
appears in the database """
c.execute('SELECT COUNT(*) FROM myTable WHERE number_column = %s', (lucky_number,))
result = c.fetchone()
return result[0]
def get_input_number():
""" Get the number to be searched in the database """
lookup_num = raw_input('Lucky number: ')
return get_number_count(lookup_num)
raw_input is returning a string. Try converting it to a number.
occurrences = [i for i, x in enumerate(all_numbers()) if x == int(look_up_num)]

Pythonic way to convert a list of integers into a string of comma-separated ranges

I have a list of integers which I need to parse into a string of ranges.
For example:
[0, 1, 2, 3] -> "0-3"
[0, 1, 2, 4, 8] -> "0-2,4,8"
And so on.
I'm still learning more pythonic ways of handling lists, and this one is a bit difficult for me. My latest thought was to create a list of lists which keeps track of paired numbers:
[ [0, 3], [4, 4], [5, 9], [20, 20] ]
I could then iterate across this structure, printing each sub-list as either a range, or a single value.
I don't like doing this in two iterations, but I can't seem to keep track of each number within each iteration. My thought would be to do something like this:
Here's my most recent attempt. It works, but I'm not fully satisfied; I keep thinking there's a more elegant solution which completely escapes me. The string-handling iteration isn't the nicest, I know -- it's pretty early in the morning for me :)
def createRangeString(zones):
rangeIdx = 0
ranges = [[zones[0], zones[0]]]
for zone in list(zones):
if ranges[rangeIdx][1] in (zone, zone-1):
ranges[rangeIdx][1] = zone
else:
ranges.append([zone, zone])
rangeIdx += 1
rangeStr = ""
for range in ranges:
if range[0] != range[1]:
rangeStr = "%s,%d-%d" % (rangeStr, range[0], range[1])
else:
rangeStr = "%s,%d" % (rangeStr, range[0])
return rangeStr[1:]
Is there a straightforward way I can merge this into a single iteration? What else could I do to make it more Pythonic?
>>> from itertools import count, groupby
>>> L=[1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 19, 20, 22, 23, 40, 44]
>>> G=(list(x) for _,x in groupby(L, lambda x,c=count(): next(c)-x))
>>> print ",".join("-".join(map(str,(g[0],g[-1])[:len(g)])) for g in G)
1-4,6-9,12-13,19-20,22-23,40,44
The idea here is to pair each element with count(). Then the difference between the value and count() is constant for consecutive values. groupby() does the rest of the work
As Jeff suggests, an alternative to count() is to use enumerate(). This adds some extra cruft that needs to be stripped out in the print statement
G=(list(x) for _,x in groupby(enumerate(L), lambda (i,x):i-x))
print ",".join("-".join(map(str,(g[0][1],g[-1][1])[:len(g)])) for g in G)
Update: for the sample list given here, the version with enumerate runs about 5% slower than the version using count() on my computer
Whether this is pythonic is up for debate. But it is very compact. The real meat is in the Rangify() function. There's still room for improvement if you want efficiency or Pythonism.
def CreateRangeString(zones):
#assuming sorted and distinct
deltas = [a-b for a, b in zip(zones[1:], zones[:-1])]
deltas.append(-1)
def Rangify((b, p), (z, d)):
if p is not None:
if d == 1: return (b, p)
b.append('%d-%d'%(p,z))
return (b, None)
else:
if d == 1: return (b, z)
b.append(str(z))
return (b, None)
return ','.join(reduce(Rangify, zip(zones, deltas), ([], None))[0])
To describe the parameters:
deltas is the distance to the next value (inspired from an answer here on SO)
Rangify() does the reduction on these parameters
b - base or accumulator
p - previous start range
z - zone number
d - delta
To concatenate strings you should use ','.join. This removes the 2nd loop.
def createRangeString(zones):
rangeIdx = 0
ranges = [[zones[0], zones[0]]]
for zone in list(zones):
if ranges[rangeIdx][1] in (zone, zone-1):
ranges[rangeIdx][1] = zone
else:
ranges.append([zone, zone])
rangeIdx += 1
return ','.join(
map(
lambda p: '%s-%s'%tuple(p) if p[0] != p[1] else str(p[0]),
ranges
)
)
Although I prefer a more generic approach:
from itertools import groupby
# auxiliary functor to allow groupby to compare by adjacent elements.
class cmp_to_groupby_key(object):
def __init__(self, f):
self.f = f
self.uninitialized = True
def __call__(self, newv):
if self.uninitialized or not self.f(self.oldv, newv):
self.curkey = newv
self.uninitialized = False
self.oldv = newv
return self.curkey
# returns the first and last element of an iterable with O(1) memory.
def first_and_last(iterable):
first = next(iterable)
last = first
for i in iterable:
last = i
return (first, last)
# convert groups into list of range strings
def create_range_string_from_groups(groups):
for _, g in groups:
first, last = first_and_last(g)
if first != last:
yield "{0}-{1}".format(first, last)
else:
yield str(first)
def create_range_string(zones):
groups = groupby(zones, cmp_to_groupby_key(lambda a,b: b-a<=1))
return ','.join(create_range_string_from_groups(groups))
assert create_range_string([0,1,2,3]) == '0-3'
assert create_range_string([0, 1, 2, 4, 8]) == '0-2,4,8'
assert create_range_string([1,2,3,4,6,7,8,9,12,13,19,20,22,22,22,23,40,44]) == '1-4,6-9,12-13,19-20,22-23,40,44'
This is more verbose, mainly because I have used generic functions that I have and that are minor variations of itertools functions and recipes:
from itertools import tee, izip_longest
def pairwise_longest(iterable):
"variation of pairwise in http://docs.python.org/library/itertools.html#recipes"
a, b = tee(iterable)
next(b, None)
return izip_longest(a, b)
def takeuntil(predicate, iterable):
"""returns all elements before and including the one for which the predicate is true
variation of http://docs.python.org/library/itertools.html#itertools.takewhile"""
for x in iterable:
yield x
if predicate(x):
break
def get_range(it):
"gets a range from a pairwise iterator"
rng = list(takeuntil(lambda (a,b): (b is None) or (b-a>1), it))
if rng:
b, e = rng[0][0], rng[-1][0]
return "%d-%d" % (b,e) if b != e else "%d" % b
def create_ranges(zones):
it = pairwise_longest(zones)
return ",".join(iter(lambda:get_range(it),None))
k=[0,1,2,4,5,7,9,12,13,14,15]
print create_ranges(k) #0-2,4-5,7,9,12-15
def createRangeString(zones):
"""Create a string with integer ranges in the format of '%d-%d'
>>> createRangeString([0, 1, 2, 4, 8])
"0-2,4,8"
>>> createRangeString([1,2,3,4,6,7,8,9,12,13,19,20,22,22,22,23,40,44])
"1-4,6-9,12-13,19-20,22-23,40,44"
"""
buffer = []
try:
st = ed = zones[0]
for i in zones[1:]:
delta = i - ed
if delta == 1: ed = i
elif not (delta == 0):
buffer.append((st, ed))
st = ed = i
else: buffer.append((st, ed))
except IndexError:
pass
return ','.join(
"%d" % st if st==ed else "%d-%d" % (st, ed)
for st, ed in buffer)
Here is my solution. You need to keep track of various pieces of information while you iterate through the list and create the result - this screams generator to me. So here goes:
def rangeStr(start, end):
'''convert two integers into a range start-end, or a single value if they are the same'''
return str(start) if start == end else "%s-%s" %(start, end)
def makeRange(seq):
'''take a sequence of ints and return a sequence
of strings with the ranges
'''
# make sure that seq is an iterator
seq = iter(seq)
start = seq.next()
current = start
for val in seq:
current += 1
if val != current:
yield rangeStr(start, current-1)
start = current = val
# make sure the last range is included in the output
yield rangeStr(start, current)
def stringifyRanges(seq):
return ','.join(makeRange(seq))
>>> l = [1,2,3, 7,8,9, 11, 20,21,22,23]
>>> l2 = [1,2,3, 7,8,9, 11, 20,21,22,23, 30]
>>> stringifyRanges(l)
'1-3,7-9,11,20-23'
>>> stringifyRanges(l2)
'1-3,7-9,11,20-23,30'
My version will work correctly if given an empty list, which I think some of the others will not.
>>> stringifyRanges( [] )
''
makeRanges will work on any iterator that returns integers and lazily returns a sequence of strings so can be used on infinite sequences.
edit: I have updated the code to handle single numbers that are not part of a range.
edit2: refactored out rangeStr to remove duplication.
how about this mess...
def rangefy(mylist):
mylist, mystr, start = mylist + [None], "", 0
for i, v in enumerate(mylist[:-1]):
if mylist[i+1] != v + 1:
mystr += ["%d,"%v,"%d-%d,"%(start,v)][start!=v]
start = mylist[i+1]
return mystr[:-1]