Python how to create new lists from a list containing multiple lists depending on indexes - list

I have a list that stores lists with 5 elements. I want to create 5 new lists that store elements of each indexes. I have the following code but it seems not smart way.
>>> stats
[['1', '0', '36', '36', '3'], ['10', '0', '41', '77', '5'], ['1', '0', '631', '631', '63'], ['1', '0', '98', '98', '9'], ['9', '0', '52', '81', '6'], ['2', '0', '111', '167', '13'], ['1', '0', '98', '98', '9'], ['1', '0', '92', '92', '9'], ['2', '0', '241', '287', '26'], ['1', '0', '210', '210', '21'], ['2', '0', '336', '358', '34'], ['2', '0', '49', '57', '5'], ['5', '0', '52', '148', '7'], ['2', '0', '46', '76', '6'], ['3', '0', '33', '50', '4'], ['7', '0', '47', '70', '6'], ['1', '0', '94', '94', '9'], ['1', '0', '65', '65', '6'], ['1', '0', '66', '66', '6'], ['1', '0', '429', '429', '42'], ['1', '0', '337', '337', '33'], ['12', '0', '49', '126', '6'], ['1', '0', '47', '47', '4'], ['1', '0', '63', '63', '6'], ['1', '0', '79', '79', '7'], ['2', '0', '96', '100', '9'], ['1', '0', '36', '36', '3'], ['1', '0', '69', '69', '6'], ['6', '0', '44', '67', '5'], ['3', '0', '269', '385', '31'], ['2', '0', '78', '115', '9'], ['2', '0', '49', '52', '5'], ['3', '0', '26', '134', '9'], ['2', '0', '255', '561', '40'], ['1', '0', '75', '75', '7'], ['1', '0', '59', '59', '5'], ['2', '0', '59', '64', '6'], ['1', '0', '86', '86', '8'], ['1', '0', '63', '63', '6'], ['2', '0', '79', '100', '8'], ['4', '0', '825', '888', '86'], ['1', '0', '82', '82', '8'], ['3', '0', '65', '94', '7'], ['1', '0', '88', '88', '8'], ['1', '0', '344', '344', '34'], ['1', '0', '286', '286', '28'], ['1', '0', '73', '73', '7'], ['3', '0', '42', '69', '5'], ['1', '0', '151', '151', '15'], ['1', '0', '286', '286', '28'], ['2', '0', '47', '59', '5'], ['9', '0', '15', '41', '2'], ['2', '0', '343', '355', '34'], ['1', '0', '305', '305', '30'], ['1', '0', '238', '238', '23'], ['2', '0', '974', '2101', '153'], ['2', '0', '138', '142', '14'], ['7', '0', '45', '70', '5'], ['1', '0', '39', '39', '3']]
>>>
>>> num_requests,num_failures,min_response_time,max_response_time,avg_response_time = [], [], [], [], []
>>>
>>> for l in stats:
... num_requests.append(l[0])
... num_failures.append(l[1])
... min_response_time.append(l[2])
... max_response_time.append(l[3])
... avg_response_time.append(l[4])
...
>>> num_requests
['1', '10', '1', '1', '9', '2', '1', '1', '2', '1', '2', '2', '5', '2', '3', '7', '1', '1', '1', '1', '1', '12', '1', '1', '1', '2', '1', '1', '6', '3', '2', '2', '3', '2', '1', '1', '2', '1', '1', '2', '4', '1', '3', '1', '1', '1', '1', '3', '1', '1', '2', '9', '2', '1', '1', '2', '2', '7', '1']
It could be stored in one list which stores 5 sublist.

Solution
Just use zip with *:
(num_requests, num_failures, min_response_time, max_response_time,
avg_response_time) = zip(*stats)
This gives you tuples. Convert to lists if you need lists:
(num_requests, num_failures, min_response_time, max_response_time,
avg_response_time) = (list(x) for x in zip(*stats))
Details
A shorter example:
>>> data = [[1, 2, 3], [10, 20, 30], [100, 200, 300]]
>>> a, b, c = zip(*data)
>>> a
(1, 10, 100)
>>> b
(2, 20, 200)
>>> c
(3, 30, 300)
This is equivalent to:
a, b, c = zip(data[0], data[1], data[2])
but works for any number of sublists.
The left side uses tuple unpacking. For example, this:
x, y, z = (10, 20, 30)
assigns 10 to x, 20 to y, and 30 to z.
Performance
Measure how fast it is.
Version with append:
%%timeit
num_requests,num_failures,min_response_time,max_response_time,avg_response_time = [], [], [], [], []
for l in stats:
num_requests.append(l[0])
num_failures.append(l[1])
min_response_time.append(l[2])
max_response_time.append(l[3])
avg_response_time.append(l[4])
10000 loops, best of 3: 51 µs per loop
Version with zip:
%%timeit
(num_requests, num_failures, min_response_time, max_response_time,
avg_response_time) = zip(*stats)
100000 loops, best of 3: 8.58 µs per loop
It is about five times faster.
It takes a bit longer when you convert the tuples to lists:
%%timeit
(num_requests, num_failures, min_response_time, max_response_time,
avg_response_time) = (list(x) for x in zip(*stats))
100000 loops, best of 3: 13.3 µs per loop
Still, about four times faster.

Related

Is there a way to test PySpark Regex's?

I'd like to test different inputs to a PySpark regex to see if they fail/succeed before running a build. Is there a way to test this in Foundry before running a full build/checks?
You can downsample your input using the Preview functionality in Authoring, where you can then specify a filter you want to craft your input for testing.
Then, you can run your PySpark code on this custom sample to verify it does what you expect.
You click on the gear in the following view after clicking the Preview button.
Then, you can describe what sample you want.
After you have this, running your regex on your input will be fast and easy to test.
I am also a fan of writing unit tests. Create a small input df, desired output df, and write a simple function that takes the input, applies the regex, and returns the output.
import pytest
from datetime import date
import pandas as pd # noqa
import numpy as np
from myproject.analysis.simple_discount import (
calc
)
columns = [
"date",
"id",
"other",
"brand",
"grp_id",
"amounth",
"pct",
"max_amount",
"unit",
"total_units"
]
output_columns = [
"date",
"id",
"other",
"brand",
"grp_id",
"amount",
"pct",
"max_amount",
"qty",
"total_amount"
]
#pytest.fixture
def input_df(spark_session):
data = [
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 1],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 1],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 1],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 4],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 2],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 3],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 4],
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 1.08, 1],
['3/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 2.6, 1],
['6/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 2.6, 1],
['6/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 2.6, 1],
['6/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 2.6, 1],
['6/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 2.6, 1],
]
pdf = pd.DataFrame(data, columns=columns)
pdf = pdf.replace({np.nan: None})
return spark_session.createDataFrame(pdf)
#pytest.fixture
def output_df(spark_session):
data = [
['4/1/21', 'a', '1', 'mn', '567', 0.54, 50, 1.08, 27, 14.580000000000002],
['3/1/21', 'b', '2', 'mn', '555', 1.3, 50, 2.6, 1, 1.3],
]
pdf = pd.DataFrame(data, columns=columns)
pdf = pdf.replace({np.nan: None})
return spark_session.createDataFrame(pdf)
# ======= FIRST RUN CASE
def test_normal_input(input_df, output_df):
calc_output_df = calc(input_df)
assert sorted(calc_output_df.collect()) == sorted(output_df.collect())
#
# Folder Structure
#
# transforms-python/
# ├── ...
# └── src/
# ├── ...
# ├── myproject/
# │ ├── ...
# │ └── analysis/
# │ ├── ...
# │ └── simple_discounts.py
# └── tests/
# ├── ...
# └── unit_tests.py

Regex Removing text before the 5th comma and after the 6th comma

i have alot of this line
('235753', 'BayEnesYT', '$2y$12$laiU7F7HWJoXuryMTmgb6uKDfiOcxqD/R6Mxjg.KVNn2TK/Ra2Vwq', 'BNnuyZNL', 'zb6WPCvWYDwQmwZJQI7sypkc6oqVjZpSvnlg8gYYztJm6JYmJh', 'm.enesberber2009#gmail.com', '0', '0', '', '', '', '2', '', '0', '', '1560100802', '1560689379', '1560102670', '0', '', '0', '', '', '', '', '', 'all', '', '1', '0', '0', '0', '1', '0', '1', '0', '1', '0', 'linear', '1', '1', '1', '1', '1', '1', '0', '0', '0', '', '', '', '0', '0', '', '', '0', '0', '0', '0', '', '', '', '0', '0', '0', 0x51D69F63, 0x5567522E, '', '1894', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '0', '', '0', '0.00', '1', '0', '0', '0', '', '1', '1', '2', '0', '0', '0', '0', '[]', '1', null, null, null, '1', '', '', '0', '0', '0', '', '', '', '', '', '0', '', '', '', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '', '0', '', '0', '0', '0', '', '', '', '', '', '', '', '', '', '', '0', '', 'mybb_bcrypt', '0', '');
and i need to remove text before the 5th comma and after the 6th comma
so the result would be m.enesberber2009#gmail.com only
i want to use regex replacement method with notepad++
Replace this expression
^\(?(?:'[^']*',\s*){5}('[^']+').+
with the first captured group, see a demo on regex101.com.

How to obtain data from csv files using regex in python

I am trying to filter a csv file from a column contains many minus numbers.
I found a source code on the website, which worked on small lists, but it does not work on data from a csv file.
here is an example of the data I have.
691
609
627
211
-226
921
829
1
972
173
181
-66
-96
573
and here is the code I am using
import pandas as pd
from pandas import DataFrame
import numpy as np
import re
import csv
from re import findall
ful = pd.read_csv(r'/home/aziz/Desktop/testminplus.csv')
ful1 = ful[0:]
#full = ['1', '-3']
full = ful1
regex = re.compile(r'(-\d*)')
# use only one of the following lines, whichever you prefer
#filtered = filter(lambda i: not regex.search(i), full)
filtered = [i for i in full if not regex.search(i)]
print(filtered)
The results are as the following:
[' ', ' ', ' ', ' ', '8', '2', '3', '\n', '0', ' ', ' ', ' ', ' ', '6', '0', '9', '\n', '1', ' ', ' ', ' ', ' ', '6', '2', '7', '\n', '2', ' ', ' ', ' ', ' ', '2', '1', '1', '\n', '3', ' ', ' ', ' ', '2', '2', '6', '\n', '4', ' ', ' ', ' ', ' ', '9', '2', '1', '\n', '5', ' ', ' ', ' ', ' ', '8', '2', '9', '\n', '6', ' ', ' ', ' ', ' ', ' ', ' ', '1', '\n', '7', ' ', ' ', ' ', ' ', '9', '7', '2', '\n', '8', ' ', ' ', ' ', ' ', '1', '7', '3', '\n', '9', ' ', ' ', ' ', ' ', '1', '8', '1', '\n', '1', '0', ' ', ' ', ' ', '6', '6', '\n', '1', '1', ' ', ' ', ' ', '9', '6', '\n', '1', '2', ' ', ' ', ' ', '5', '7', '3', '\n', '1', '3', ' ', ' ', ' ', '8', '9', '5', '\n', '1', '4', ' ', ' ', ' ', '1', '1', '8', '\n', '1', '5', ' ', ' ', ' ', ' ', '7', '\n', '1', '6', ' ', ' ', '6', '9', '8', '\n', '1', '7', ' ', ' ', ' ', '3', '5', '1', '\n', '1', '8', ' ', ' ', ' ', '9', '3', '3', '\n', '1', '9', ' ', ' ', ' ', '9', '3', '2', '\n', '2', '0', ' ', ' ', ' ', '7', '3', '2', '\n', '2', '1', ' ', ' ', '6', '6', '0', '\n', '2', '2', ' ', ' ', '4', '6', '5', '\n', '2', '3', ' ', ' ', ' ', '3', '4', '5', '\n', '2', '4', ' ', ' ', ' ', ' ', '1', '8', '\n', '2', '5', ' ', ' ', ' ', '1', '2', '0', '\n', '2', '6', ' ', ' ', '2', '7', '0', '\n', '2', '7', ' ', ' ', '2', '3', '3', '\n', '2', '8', ' ', ' ', '1', '5', '2', '\n', '2', '9', ' ', ' ', ' ', '1', '8', '6', '\n', '3', '0', ' ', ' ', '3', '9', '6', '\n', '3', '1', ' ', ' ', '5', '3', '5', '\n', '3', '2', ' ', ' ', ' ', '3', '5', '9', '\n', '3', '3', ' ', ' ', ' ', ' ', '1', '\n', '3', '4', ' ', ' ', '5', '3', '3', '\n', '3', '5', ' ', ' ', ' ', '8', '1', '2', '\n', '3', '6', ' ', ' ', ' ', '5', '4', '6']
The desired output is something like the following:
123
213
2
5
Any idea how to solve this problem?
If you've just got a file with one number per line (and not an actual CSV file with multiple fields which doesn't appear to be your case) then you can do:
with open('/home/aziz/Desktop/testminplus.csv') as fin:
# generator to yield each line as an integer
data = (int(line) for line in fin)
# list-comp to only include positive numbers...
positive = [n for n in data if n >= 0]
Pandas solution is probably an overkill here, but also works quite well
import pandas as pd
# read file
df = pd.read_csv("/home/aziz/Desktop/testminplus.csv",
header=None,
converters={0: int}) # spits an error if non-numbers are present
# filter positives
df = df[df[0]>=0]
# write back
df.to_csv("/home/aziz/Desktop/positives_only.csv",
header=False,
index=False)

remove specific values from the python list

Here is the list
['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', '00', 'STOP/HOLD ADD', '5', '', '00', 'TOWER INQ', 'T', '', '00', 'ACCT FIELD MNT', '2', '', '00', 'COMB STMT MAINT', 'C', '', '00', 'MONETARY IM80', 'W', '', '00', 'MONETARY-IM201', 'D', '', '00', 'OCF INQ', 'G', '', '00', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', '', '00', 'NAME/ADDR CHG', '4', '', '00', 'MEMO POST', 'Z', '', '00', 'FLOOR LIMITS', '0']
I would like to remove '' and '00' from the list
result should be like this
['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', 'STOP/HOLD ADD', '5', 'TOWER INQ', 'T', 'ACCT FIELD MNT', '2', 'COMB STMT MAINT', 'C', 'MONETARY IM80', 'W', 'MONETARY-IM201', 'D', 'OCF INQ', 'G', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', 'NAME/ADDR CHG', '4', 'MEMO POST', 'Z', 'FLOOR LIMITS', '0']
I tried this
apa= [aa for aa in apa if aa != "''" or aa != "00"]
getting same result
Here's how I would do it:
for i in list:
if i = "00":
del list[i]
You can also use:
list.remove('00');
lst=['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', '00',
'STOP/HOLD ADD', '5', '', '00', 'TOWER INQ', 'T', '', '00',
'ACCT FIELD MNT', '2', '', '00', 'COMB STMT MAINT', 'C', '',
'00', 'MONETARY IM80', 'W', '', '00', 'MONETARY-IM201', 'D',
'', '00', 'OCF INQ', 'G', '', '00', 'ACCESS ALL FUNC', 'NO',
'RATE INQ', 'K', '', '00', 'NAME/ADDR CHG', '4', '', '00',
'MEMO POST', 'Z', '', '00', 'FLOOR LIMITS', '0']
print [x for x in lst if x != '00' and x != '']
#Output
['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', 'STOP/HOLD ADD',
'5', 'TOWER INQ', 'T', 'ACCT FIELD MNT', '2', 'COMB STMT MAINT',
'C', 'MONETARY IM80', 'W', 'MONETARY-IM201', 'D', 'OCF INQ', 'G',
'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', 'NAME/ADDR CHG', '4',
'MEMO POST', 'Z', 'FLOOR LIMITS', '0']
res=['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', '00', 'STOP/HOLD ADD', '5', '', '00', 'TOWER INQ', 'T', '', '00', 'ACCT FIELD MNT', '2', '', '00', 'COMB STMT MAINT', 'C', '', '00', 'MONETARY IM80', 'W', '', '00', 'MONETARY-IM201', 'D', '', '00', 'OCF INQ', 'G', '', '00', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', '', '00', 'NAME/ADDR CHG', '4', '', '00', 'MEMO POST', 'Z', '', '00', 'FLOOR LIMITS', '0']
res=[x for x in res if x not in ('00','')]
print res
['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', 'STOP/HOLD ADD', '5', 'TOWER INQ', 'T', 'ACCT FIELD MNT', '2', 'COMB STMT MAINT', 'C', 'MONETARY IM80', 'W', 'MONETARY-IM201', 'D', 'OCF INQ', 'G', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', 'NAME/ADDR CHG', '4', 'MEMO POST', 'Z', 'FLOOR LIMITS', '0']
Use while loop
list1 = ['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', '00', 'STOP/HOLD ADD', '5', '', '00', 'TOWER INQ', 'T', '', '00', 'ACCT FIELD MNT', '2', '', '00', 'COMB STMT MAINT', 'C', '', '00', 'MONETARY IM80', 'W', '', '00', 'MONETARY-IM201', 'D', '', '00', 'OCF INQ', 'G', '', '00', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', '', '00', 'NAME/ADDR CHG', '4', '', '00', 'MEMO POST', 'Z', '', '00', 'FLOOR LIMITS', '0']
while '00' in list1: list1.remove('00')
print(list1)
The output will be
['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', 'STOP/HOLD ADD', '5', '', 'TOWER INQ', 'T', '', 'ACCT FIELD MNT', '2', '', 'COMB STMT MAINT', 'C', '', 'MONETARY IM80', 'W', '', 'MONETARY-IM201', 'D', '', 'OCF INQ', 'G', '', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', '', 'NAME/ADDR CHG', '4', '', 'MEMO POST', 'Z', '', 'FLOOR LIMITS', '0']
With all the '00' terms removed
Single liner:
filter(lambda a: a!='' and a!='00', ['DEFAULT SECURITY', 'YES', 'ACCT INQ', '3', '', '00', 'STOP/HOLD ADD', '5', '', '00', 'TOWER INQ', 'T', '', '00', 'ACCT FIELD MNT', '2', '', '00', 'COMB STMT MAINT', 'C', '', '00', 'MONETARY IM80', 'W', '', '00', 'MONETARY-IM201', 'D', '', '00', 'OCF INQ', 'G', '', '00', 'ACCESS ALL FUNC', 'NO', 'RATE INQ', 'K', '', '00', 'NAME/ADDR CHG', '4', '', '00', 'MEMO POST', 'Z', '', '00', 'FLOOR LIMITS', '0'])
See https://stackoverflow.com/a/1157160/761963

How do you compare integer values in two separate lists?

This is my current code:
predict_results = []
with open ("newTesting.predict") as inputfile:
for line in inputfile:
predict_results.append(line.strip())
print predict_results
first_list = []
with open ("newTesting") as inputfile:
for line in inputfile:
first_list.append(line.strip().split()[0])
print first_list
if predict_results [0] == first_list[0]:
print True
else:
print False
This is my current output:
['-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '1', '-1', '1', '-1', '1', '-1', '1', '-1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '1', '1', '-1', '-1', '-1', '-1', '-1', '1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '-1']
['1', '-1', '1', '-1', '-1', '-1', '-1', '-1', '-1', '1', '1', '-1', '1', '1', '1', '1', '-1', '1', '-1', '1', '1', '1', '1', '-1', '1', '-1', '1', '1', '-1', '-1', '1', '1', '-1', '1', '1', '-1', '1', '1', '1', '1', '-1', '-1', '1', '1', '1', '1', '1', '1', '1', '-1', '-1', '1', '1', '-1', '1', '-1', '1', '-1', '-1', '1', '1', '-1', '1', '1', '1', '-1', '-1', '-1', '1', '-1', '-1', '1', '1', '1', '-1', '1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '1', '1', '-1', '-1', '-1', '-1', '-1', '1', '1', '-1', '-1', '1', '-1', '1', '1', '1', '-1', '1', '-1', '-1', '-1', '1', '1', '-1', '1', '1', '-1', '-1', '-1', '-1', '-1', '1', '-1', '-1', '-1', '-1', '-1']
False
I can only check index [0] which is correct. How do I check all indexes in predict_results with first_list
Thanks
Hope this will help:
>>> from pandas import *
>>> L1 = [1,2,3]
>>> L2 = [2,2,3]
>>> S1 = Series(L1)
>>> S2 = Series(L2)
>>> RES = S1==S2
>>> RES
0 False
1 True
2 True
For your case:
>>> S1 = Series(predict_results)
>>> S2 = Series(first_list)
>>> RES = S1==S2
>>> RES[0] # check predict_results[0]==first_list[0]
>>> RES[1] # check predict_results[1]==first_list[1]
And you can use loop:
for x,y in zip(predict_results,first_list):
if(x==y):
print False
else:
print True