Python: re.sub single item in list with multiple items - regex

I'm new to Python and trying to use re.sub or other approach to find individual items in a list and replace with multiple items. For example:
import re
list = ['abc', 'def']
tolist = []
for item in list:
a = re.sub(r'^(.)(.)(.)$', '\\1\\2', '\\2\\3', item)
tolist.append(a)
print tolist # want: ['ab', 'bc', 'de', 'ef']
The '\1\2', '\2\3' part clearly doesn't work, just there to lamely illustrate the idea.

You could pair characters without regexes:
lst = ['abc', 'def']
result = [a+b for chars in lst for a, b in zip(chars, chars[1:])]
print(result)
# -> ['ab', 'bc', 'de', 'ef']

Here's a rather generic approach where you have a list of tuples for all the substitutions you want to do with each item:
In [1]: import re
In [2]: subs = [(r'^(.)(.)(.)$', r'\1\2'), (r'^(.)(.)(.)$', r'\2\3')]
In [3]: inlist = ['abc', 'def']
In [4]: [re.sub(*sub, string=s) for s in inlist for sub in subs]
Out[4]: ['ab', 'bc', 'de', 'ef']
The second element in each tuple can also be a function, because re.sub allows it. I renamed your initial list because list is a built-in type name and shouldn't be used for variables.

>>> res = []
>>> m = re.compile('(..)')
>>> for items in list:
... for p in range(0,len(items)):
... r = m.search(items[p:])
... if r != None:
... res.append(r.group())
make a regexp that matches two characters and groups them
first for loop, iterate the list
second for loop, character indexes in each list item
search for the character pairs starting at further on offsets
store anything that's found

Related

python 3, comparing elements of two lists of lists

I'm trying to compare elements of 2 lists of lists in python. I want to create a new list (ph) which has a 1 if elements of lists from the 1st list of lists are in the elements of the 2nd list of lists.
However, this seems to compare the whole list and not individual elements. The code is below. Many thanks for the help! :)
import numpy as np
import pandas as pd
abc = [[1,800000,3],[4,5,6],[100000,7,8]]
l = [[
[i for i in range(0, 100000)],
[i for i in range(200000,300000)],
[i for i in range(400000,500000)],
[i for i in range(600000,700000)],
[i for i in range(800000,900000)],
[i for i in range(1000000,1100000)]
]]
ph = []
for i in abc:
for j in l:
if l[0] == abc[0]:
ph.append(1)
else:
ph.append(0)
print(ph)
The goal of your problem is somewhat unclear to me. Correct me if I'm wrong but what you want is: for each sublist of abc, get a boolean describing if all its elements are anywhere in l. Is that it ?
If it is indeed the case, here's my answer.
First of all, your second list is not a list of lists but a list of lists of lists. Hence, I removed a nested list in my code.
abc = [[1,800000,3],[4,5,6],[100000,7,8]]
L = [
[i for i in range(0, 100000)],
[i for i in range(200000,300000)],
[i for i in range(400000,500000)],
[i for i in range(600000,700000)],
[i for i in range(800000,900000)],
[i for i in range(1000000,1100000)]
]
flattened_L = sum(L, [])
print(
list(map(lambda sublist: all(x in flattened_L for x in sublist), abc))
)
# returns [True, True, False]
My code first flattens L so that is becomes easy to check whether any element is in it or not. Then, for each sublist in abc, it checks if all elements are in this flattened list.
Note: my code returns a list of boolean. If you absolutely need integers value (0 and 1), which you shouldn't, you can wrap int around all.

return indexs from list of substring after matching with string

I have list of sub-strings , and I am checking if any of the substring is found in another string. any does return boolean.
>>> list=['oh' , 'mn' , 'nz' , 'ne']
>>> name='hstntxne'
>>> any(x in name for x in list)
True
>>> name='hstnzne'
>>> any(x in name for x in list)
True
I want to return index . for example first time it should be 3 and second time it should be 2 and 3.
Firstly, do not call your list list. list is a python data structure and you do not want to be overriding that name unless you have a specific reason for doing so.
You can easily achieve this with a list comprehension in one line.
substrings = ['oh' , 'mn' , 'nz' , 'ne']
name1='hstntxne'
name2='hstnzne'
[substrings.index(x) for x in substrings if x in name1]
This returns 3
[substrings.index(x) for x in substrings if x in name2]
This returns [2, 3]
In order to make this work with any list of substrings, and names put it in a function:
def getIndex(subs, name):
return [subs.index(x) for x in subs if x in name]
getIndex(substrings, name2) #example call
you can use inbuilt enumerate() function.
def get_index(name, lis=['oh' , 'mn' , 'nz' , 'ne']):
indx = []
for index, element in enumerate(lis):
if element in name:
indx.append(index)
return indx
Now get_index(name='hstnzne') will give [2, 3]
and get_index(name='hstntxne') will give [3]
import re
# Try and use regex to see if subpattern exists
l = ['oh', 'mn', 'nz', 'ne']
name='hstnzne'
match_indx = []
for i, sub_str in enumerate(l):
result = re.split(sub_str, name)
if (len(result)>1):
# We could split our string due to match, so add index of substring
match_indx.append(i)
print(match_indx)

Creating list of lists and skipping elements in Python

How can I create a list of list using [x for x in input] (where input is a list of strings) and skip elements if they satisfy certain condition? For example, this is the list of lists:
[['abc', 'def', 'ghi'], ['abc', 'd_f', '+hi'], ['_bc', 'def', 'ghi']]
and this should be the output -- with skipped elements containing either '_' or '+':
[['abc', 'def', 'ghi'], ['abc'], ['def', 'ghi']]
Thanks!
You'll need a sub-list comprehension:
[[item for item in sub if not any(char in item for char in '_+')] for sub in input]
which is a simplified version of:
result = []
for sub in input:
result.append([])
for item in sub:
should_add = True
for char in '_+':
if char in item:
should_add = False
break
if should_add:
result[-1].append(item)
Pretty similar to the other answer except tests if the string contains only alpha numeric characters as opposed to specifically '_' and '+'. Loops through each sub list then the strings in each sub list.
filtered = [[s for s in l if s.isalpha()] for l in lists]
print(filtered)
[['abc', 'def', 'ghi'], ['abc'], ['def', 'ghi']]
Another short version using a set:
stuff= [['abc', 'def', 'ghi'], ['abc', 'd_f', '+hi'], ['_bc', 'def', 'ghi']]
unwanted = {'+', '-'}
filtered = [[item for item in s if not set(s) & unwanted] for s in stuff]

how to get the list of the lists?

I have a problem like that:
list = ['a1',['b1',2],['c1',2,3],['d1',2,3,4]]
I want to get a new list like that
new_list['a1','b1','c1','d1']
I do like this:
lst = ['a1',['b1',2],['c1',2,3],['d1',2,3,4]]
for item in lst:
print(item)
result is:
a1
['b1', 2]
['c1', 2, 3]
['d1', 2, 3, 4]
But I want the first element of each result
The best answer is like this :
my_list = list()
lst = ['a1',['b1',2],['c1',2,3],['d1',2,3,4]]
for element in lst:
if type(element)==type('string'):
my_list.append(element)
else:
my_list.append(element[0])
print(my_list)
Thank you!
Do it as below:
>>> my_list = list()
>>> lst = ['a1',['b1',2],['c1',2,3],['d1',2,3,4]]
>>> for element in lst:
if type(element)==type('string'):
my_list.append(element)
else:
my_list.append(element[0])
It will produce:
>>> my_list
['a1', 'b1', 'c1', 'd1']
>>>
As you see above, first I created a list (named my_list) and then checked each elements of your list. If the element was a string, I added it to my_list and otherwise (i.e. it is a list) I added the first element of it to my_list.
I would do
res = []
for x in the_list:
if x is Array:
res.append(x[0])
else:
res.append(x)

Python 3.4: adding value to list if condition exists

i have a scenario like this one:
mainList = [[9,5],[17,3],[23,1],[9,2]]
secondaryList = [9,12,28,23,1,6,95]
myNewList = []
myNewList.append([[a,b] for a,b in mainList if a in secondaryList])
this, return me to me:
myNewList = [[9,5],[23,1],[9,2]]
but I need only the first occourance of "a". In other words I need to obtain:
myNewList = [[9,5],[23,1]]
How can I achieve this?
First of all:
myNewList = []
myNewList.append([[a,b] for a,b in mainList if a in secondaryList])
simply is the same as
myNewList = [[a,b] for a,b in mainList if a in secondaryList]
Then:
What you're building is functionally a python dictionary. Your two-element lists in mainList are the same as dict.items()!
So what you'd do is build a dict out of mainList (reversing it, because usually, you'd just save the last, not the first occurence):
mainDict = dict([reversed(mainList)])
Then you just make your new list:
myNewList = [ (key, mainDict[key]) for key in secondaryList ]
You can use a set to store the first elements and then check for existing the first element before adding the sub-lists :
>>> seen=set()
>>> l=[]
>>> for i,j in mainList:
... if i in secondaryList and i not in seen:
... seen.add(i)
... l.append([i,j])
...
>>> l
[[9, 5], [23, 1]]
Or you can use collections.defaultdict and deque with specifying its maxlen.But note that you need to loop over your list from end to start if you want the first occourance of a because deque will keep the last insert value :
>>> from collections import defaultdict
>>> from functools import partial
>>> d=defaultdict(partial(deque, maxlen=1))
>>> for i,j in mainList[::-1]:
... if i in secondaryList:
... d[i].append(j)
...
>>> d
defaultdict(<functools.partial object at 0x7ff672706e68>, {9: deque([5], maxlen=1), 23: deque([1], maxlen=1)})