Getting values with regular expressions from a string;

Getting values with regular expressions from a string; - regex

I have a string like this:
Some calculations income:11111.11 com:11.11 outgo:22222.22 com:22.22 cancel:33333.33 com:333.33
The order of the income, the outgo and the cancel parts can be different.
I need to get the sum of the income and the related commission
and the outgo and it's commission
and also the canceled sum and it's commission
Now I do it in this way :
first I get the string income:11111.11 com:11.11 with expression income:.*?com:(\-){0,1}[0,1,2,3,4,5,6,7,8,9,.]+ after that I get the substring income:11111.11 and then I get the sum
and so on
I think there must be a better way to do it. Can someone tell me how to?

Since you didn't specify which language you are using, here is a way to get the pairs (item/comission) in Python:
Python 3.7.4 (default, Aug 12 2019, 14:45:07)
[GCC 9.1.1 20190605 (Red Hat 9.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> from pprint import pprint
>>> s='Some calculations income:11111 com:11 outgo:22222 com:22 cancel:33333 com:333'
>>> pairs = re.compile(r'(\w+:\d+)\s(com:\d+)')
>>> pprint(pairs.findall(s))
[('income:11111', 'com:11'),
('outgo:22222', 'com:22'),
('cancel:33333', 'com:333')]
>>>
The regex here is just (\w+:\d+)\s(com:\d+).
If you have float numbers on the right side of the key:value pairs, then change the regex to (\w+:[\d.]+)\s(com:[\d.]+):
>>> s='Some calculations income:11111.11 com:11.11 outgo:22222.22 com:22.22 cancel:33333.33 com:333.33'
>>> pairs = re.compile(r'(\w+:[\d.]+)\s(com:[\d.]+)')
>>> pprint(pairs.findall(s))
[('income:11111.11', 'com:11.11'),
('outgo:22222.22', 'com:22.22'),
('cancel:33333.33', 'com:333.33')]
>>>
Calculations depend on what language you are using.
Since you already have a list of the pairs, you just need to split them on :, convert the right value to float and then do the calculations.

Related

python2.7: Why does printing a list of chinese look like [u'\u4ed6', u'\u6765\u5230', u'\u4e86', u'\u7f51\u6613']?

I used the jieba chinese dictionary for word segmentation.
When I print a list of words, the result is the following:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import jieba
import sys
import jieba
s1 = "他来到了网易杭研大厦!"
seg_list = jieba.cut(s1)
lst1 = ", ".join(seg_list)
print lst1
m =lst1.split(', ')
print m[2]
punct = set(u''':!),.:;?]}¢'"、。〉》」』】〕〗〞︰︱︳﹐､﹒﹔﹕﹖﹗﹚﹜﹞！），．：；？｜｝︴︶︸︺︼︾﹀﹂﹄﹏､～￠々‖•·ˇˉ―--′’”([{£¥'"‵〈《「『【〔〖（［｛￡￥〝︵︷︹︻︽︿﹁﹃﹙﹛﹝（｛“‘-—_…''')
filterpuntl = list(filter(lambda x: x not in punct, m))
print filterpuntl[2]
The result is following:
他, 来到, 了, 网易, 杭研, 大厦, !
[u'\u4ed6', u'\u6765\u5230', u'\u4e86', u'\u7f51\u6613', u'\u676d\u7814', u'\u5927\u53a6', u'!']
[u'\u4ed6', u'\u6765\u5230', u'\u4e86', u'\u7f51\u6613', u'\u676d\u7814', u'\u5927\u53a6']
How to change the [u'\u4ed6', u'\u6765\u5230' ...] to Chinese characters?
When I print a single element of the list, it is Chinese:
print m[2]
print filterpuntl[2]
The result is:
他, 来到, 了, 网易, 杭研, 大厦, !
了
了

u'\u4ed6' is a Chinese character. It's just a different representation, just like you can write 0.1 or 1e-1 for the same number – it's the same thing, just with different looks.
If you want to see the proper glyphs when printing a list etc. (which emits the repr() form of the objects), switch to Python 3:
$ python3
Python 3.5.2 (default, Aug 18 2017, 17:48:00)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(['\u4ed6'])
['他']

pymongo float precision

I have a number stored in mongo as 15000.245263 with 6 numbers after decimal point but when I use pymongo to get this number I got 15000.24. Is the pymongo reduced the precision of float?

I can't reproduce this. In Python 2.7.13 on my Mac:
>>> from pymongo import MongoClient
>>> c = MongoClient().my_db.my_collection
>>> c.delete_many({}) # Delete all documents
>>> c.insert_one({'x': 15000.245263})
>>> c.find_one()
{u'x': 15000.245263, u'_id': ObjectId('59525d32a08bff0800cc72bd')}
The retrieved value of "x" is printed the same as it was when I entered it.

This could happen if you trying to print out a long float value, and i think it is not related to mongodb.
>>> print 1111.1111
1111.1111
>>> print 1111111111.111
1111111111.11
>>> print 1111111.11111111111
1111111.11111
# for a timestamp
>>> import time
>>> now = time.time()
>>> print now
1527160240.06
For python2.7.10 it will just display 13 character(for my machine), if you want to display the whole value, use a format instead, like this:
>>> print '%.6f' % 111111111.111111
111111111.111111
And this is just a display problem, the value of the variable will not be affected.
>>> test = 111111111.111111 * 2
>>> test
222222222.222222
>>> print test
222222222.222

Python Function Returning Decimal Precision

I have a function which I want to return a decimal number 1.0000 [not a string], how can I do this without using NUMPY or SCIPY? No matter what I do, it returns it as 1.0.

Seems like you are looking for decimal package:
>>> from decimal import *
>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')

The precision of Python's floats is about 16 digits, no matter how you write them. 1.0 is the same number as 1.0000. The difference is not in number, but in number-to-string conversion involved in display. Thus, the only way to do what you want is to, as you say, convert it to a string explicitly under your terms: "{:.4f}".format(1.0).

LPTHW - Ex:25 - values are not returned from a couple of function calls - why?

LPTHW - Ex25 - Code copied from the book
def break_words(stuff):
"""This function will break words for us."""
words = stuff.split(' ')
return words
def sort_words(words):
"""Sorts the words."""
return sorted(words)
def print_first_word(words):
"""Prints the first word after popping it off."""
word = words.pop(0)
return word
def print_last_word(words):
"""Prints the last word after popping it off."""
word = words.pop(-1)
return word
def sort_sentence(sentence):
"""Takes in a full sentence and returns the sorted words."""
words = break_words(sentence)
return sort_words(words)
def print_first_and_last(sentence):
"""Prints the first and last words of the sentence."""
words = break_words(sentence)
print_first_word(words)
print_last_word(words)
def print_first_and_last_sorted(sentence):
"""Sorts the words and then prints the first and last one."""
words = sort_sentence(sentence)
print_first_word(words)
print_last_word(words)
I tried this exercise today on WindowsPowerShell.
The last two calls (refer: screenshot below) do not return any values.
PS C:\mystuff> python
Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from ex25 import *
>>> sentence = "All good things come to those who wait."
>>> sentence
'All good things come to those who wait.'
>>> print_first_and_last(sentence)
>>> print_first_and_last_sorted(sentence)
>>>
Could you please help me understand why they are not returning any values?

It din't return or print any value as you were not returning the value from inner function, I've modified the code to print the values in the function now.
def print_first_and_last(sentence):
"""Prints the first and last words of the sentence."""
words = break_words(sentence)
print (print_first_word(words))
print (print_last_word(words))
def print_first_and_last_sorted(sentence):
"""Sorts the words and then prints the first and last one."""
words = sort_sentence(sentence)
print (print_first_word(words))
print (print_last_word(words))
sentense = "All good things come to those who wait"
print_first_and_last(sentense)
print_first_and_last_sorted(sentense)

Python 3 filter - Bug or Feature?

Okay, I am a complete newbie to Python - and stackoverflow. I am coming from a ksh and Perl background.
The following in an interactive session with Python 2.7:
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> KEY="REC_PAPER"
>>> VALIDVALUES=filter(lambda x:re.search(r'^' + KEY + '\=', x), [
... "REC_METAL=|YES|NO|",
... "REC_PAPER=|YES|NO|",
... "REC_GLASS=|YES|NO|",
... "REC_PLAST=|YES|NO|",
... "DEBUG_FLAG=|0|1|"
... ]) #End general list.
>>> print(VALIDVALUES)
['REC_PAPER=|YES|NO|']
>>>
Which is what I would expect VALIDVALUES to return. However, Python 3.2's interactive session yields completely different results:
Python 3.2.3 (default, Feb 20 2013, 17:02:41)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> KEY="REC_PAPER"
>>> VALIDVALUES=filter(lambda x:re.search(r'^' + KEY + '\=', x), [
... "REC_METAL=|YES|NO|",
... "REC_PAPER=|YES|NO|",
... "REC_GLASS=|YES|NO|",
... "REC_PLAST=|YES|NO|",
... "DEBUG_FLAG=|0|1|"
... ]) #End general list.
>>> print(VALIDVALUES)
&ltfilter object at 0xb734268c>
>>>
I have seen in several places (including stackoverflow) where Python's equivalent of Perl's grep against a list is to filter the list. That appeared to work in Python 2. However, assuming the above behaviour in Python 3 is "correct," that no longer seems to be the case.
First question: Is the above beahviour a bug or feature in Python 3?
Second question: Assuming it is a feature, how do I get the output that Python 2 was giving? For reasons I won't go into, I want to stay away from defining a function or subroutine, and do it "inline" like the current code.
Am I missing something obvious (quite possible for a newbie)? Thanks in advance.

As per the documentation, filter in Python 3.x returns an iterator, rather than a list as in version 2.x. This is more memory-efficient than generating the whole list up-front. If you want the list back, you can wrap the iterator in a list() call:
VALIDVALUES = list(filter(...))
Alternatively, and as recommended by What’s New In Python 3.0, you could rewrite it as a list comprehension without a lambda:
VALIDVALUES = [x for x in [...] if re.search(r'^' + KEY + '\=', x)]

Note that you don't usually need a list of values. You can directly loop the output like below
for value in VALIDVALUES:
do_some_thing(value)
or
for value in filter(...):
do_some_thing(value)
Sometimes you may need unique values or non mutable values. Use set or tuple or frozenset instead of list as shown in the other answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting values with regular expressions from a string; - regex

Related

python2.7: Why does printing a list of chinese look like [u'\u4ed6', u'\u6765\u5230', u'\u4e86', u'\u7f51\u6613']?

pymongo float precision

Python Function Returning Decimal Precision

LPTHW - Ex:25 - values are not returned from a couple of function calls - why?

Python 3 filter - Bug or Feature?

Categories

Resources