Generate subsequences - c++

I have a string like "0189", for which I need to generate all subsequences, but the ordering of the individual characters must be kept, i.e, here 9 should not come before 0, 1 or 8. Ex: 0, 018, 01, 09, 0189, 18, 19, 019, etc.
Another example is "10292" for which subsequences would be: 1, 10, 02, 02, 09, 29, 92, etc. As you might have noticed '02' two times, since '2' comes twice in the given string. But again things like: 21, 01, 91 are invalid as order is to be maintained.
Any algorithm or psuedo code, which could be implemented in C/C++ would be appreciated!

Try a recursive approach:
the set of subsequences can be split into the ones containing the first character and the ones not containing it
the ones containing the first character are build by appending that character to the subsequences which don't contain it (+ the subsequence which contains only the first character itself)

I'd recommend using the natural correspondence between the power set of a sequence and the set of binary numbers from 0 to 2^n - 1, where n is the length of the sequence.
In your case, n is 4, so consider 0 = 0000 .. 15 = 1111; where there is a 1 in the binary expression include the corresponding item from the sequence. To implement this you'll need bitshift and binary operations:
for (int i = 0; i < (1 << n); ++i) {
std::string item;
for (j = 0; j < n; ++j) {
if (i & (1 << j)) {
item += sequence[j];
}
}
result.push_back(item);
}
Also consider how you'd handle sequences longer than can be covered by an int (hint: consider overflow and arithmetic carry).

In Python:
In [29]: def subseq(s): return ' '.join((' '.join(''.join(x) for x in combs(s,n)) for n in range(1, len(s)+1)))
In [30]: subseq("0189")
Out[30]: '0 1 8 9 01 08 09 18 19 89 018 019 089 189 0189'
In [31]: subseq("10292")
Out[31]: '1 0 2 9 2 10 12 19 12 02 09 02 29 22 92 102 109 102 129 122 192 029 022 092 292 1029 1022 1092 1292 0292 10292'
In [32]:

__author__ = 'Robert'
from itertools import combinations
g = combinations(range(4), r=2)
print(list(g)) #[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
def solve(string_):
n = len(string_)
for repeat in range(1, len(string_) + 1):
combos = combinations(range(len(string_)), r=repeat)
for combo in combos:
sub_string = "".join(string_[i] for i in combo)
yield sub_string
print(list(solve('0189'))) #['0', '1', '8', '9', '01', '08', '09', '18', '19', '89', '018', '019', '089', '189']
#using recursion
def solve2(string_, i):
if i >= len(string_):
return [""] #no sub_strings beyond length of string_
character_i = string_[i]
all_sub_strings = solve2(string_, i + 1)
all_sub_strings += [character_i + sub_string for sub_string in all_sub_strings]
return all_sub_strings
print(solve2('0189', 0)) #['', '9', '8', '89', '1', '19', '18', '189', '0', '09', '08', '089', '01', '019', '018', '0189']

Related

divide list to sublists in groovy

I am trying to divide a list of string into 4 sublist (which should have all elements of original list balanced among them). I have tried following approach but i am getting reminder elements into a 5th list. I need only four sublists and reminder must be adjusted into these four list only
def sublists = localities.collate(localities.size().intdiv(4))
for(sublist in sublists){
println(sublist.join(','))
println "next"
}
here localities is having around 163 elements, I am getting output as 4 list of 40 and 5th list of size 3.. my localities list is dynamic and could have variable number (will always be above 100) . i need to get only 4 list, where reminder of 3 elements are adjusted in 4 list.
Something like this:
def split( list ){
int size = Math.ceil( list.size() / 4 )
list.collate size
}
assert split( 1..163 )*.size() == [41, 41, 41, 40]
assert split( 1..157 )*.size() == [40, 40, 40, 37]
assert split( 1..100 )*.size() == [25, 25, 25, 25]
assert split( 1..4 )*.size() == [1, 1, 1, 1]
//edge cases
assert split( 1..3 )*.size() == [1, 1, 1]
assert split( 1..2 )*.size() == [1, 1]
// for all lengths above 4 exactly 4 groups are returned
4..200.each{
assert split( 1..it ).size() == 4
}

RegEx to find all indices for a unique character in 3 character sub-string

I am getting indices of a pattern of two bytes using finditer.
my_val = [0, 1]
[(m.start(0), m.end(0)) for m in re.finditer(myval, content)]
But now I also need unique values/locations where the two first bytes are the same as my_val, but the 3rd value is unique. i.e in a pattern like: 013 234 523 015 68 012 9 015 014 012 013 013 012 012 I need 013, 015, 012, and 014 neglecting duplicate values.
First, the pattern should be a bytes, note a list.
Then, You can use a dict to store the already found matches :
content = '0120150160150132468451018'
content = bytes(map(int, content))
my_val = b'\x00\x01.'
d = dict()
for m in re.finditer(my_val, content):
k = m.group(0)
if k not in d :
d[m.group(0)] = (m.start(0), m.end(0))
res = d.values()
Note : to cast a bytes to a list of int and a list of int to a bytes :
>>> list(b'\x00\x03\xa2')
[0, 3, 162]
>>> bytes([0, 3, 162])
b'\x00\x03\xa2'

Create list from pandas dataframe

I have a function that takes all, non-distinct, MatchId and (xG_Team1 vs xG_Team2, paired) and gives an output of as an array. which then summed up to be sse constant.
The problem with the function is it iterates through each row, duplicating MatchId. I want to stop this.
For each distinct MatchId I need the corresponding home and away goals as a list. I.e. Home_Goal and Away_Goal to be used in each iteration. from Home_Goal_time and Away_Goal_time columns of the dataframe. The list below doesn't seem to work.
MatchId Event_Id EventCode Team1 Team2 Team1_Goals
0 842079 2053 Goal Away Huachipato Cobresal 0
1 842079 2053 Goal Away Huachipato Cobresal 0
2 842080 1029 Goal Home Slovan lava 3
3 842080 1029 Goal Home Slovan lava 3
4 842080 2053 Goal Away Slovan lava 3
5 842080 1029 Goal Home Slovan lava 3
6 842634 2053 Goal Away Rosario Boca Juniors 0
7 842634 2053 Goal Away Rosario Boca Juniors 0
8 842634 2053 Goal Away Rosario Boca Juniors 0
9 842634 2054 Cancel Goal Away Rosario Boca Juniors 0
Team2_Goals xG_Team1 xG_Team2 CurrentPlaytime Home_Goal_Time Away_Goal_Time
0 2 1.79907 1.19893 2616183 0 87
1 2 1.79907 1.19893 3436780 0 115
2 1 1.70662 1.1995 3630545 121 0
3 1 1.70662 1.1995 4769519 159 0
4 1 1.70662 1.1995 5057143 0 169
5 1 1.70662 1.1995 5236213 175 0
6 2 0.82058 1.3465 2102264 0 70
7 2 0.82058 1.3465 4255871 0 142
8 2 0.82058 1.3465 5266652 0 176
9 2 0.82058 1.3465 5273611 0 0
For example MatchId = 842079, Home_goal =[], Away_Goal = [87, 115]
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot): # k will take multiple values
if k in Home_Goal:
ssd.append(sum((x2 - y) ** 2))
elif k in Away_Goal:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
def my_function(row):
xG_Team1 = row.xG_Team1
xG_Team2 = row.xG_Team2
return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)
results
sum(results.sum())
For the three matches above the desire outcome should look like the following.
If I need an individual sse, sum(sum_squared_diff(x1, x2, x3, y)) gives me the following.
MatchId = 842079 = 3.984053038520635
MatchId = 842080 = 7.882189570700502
MatchId = 842080 = 5.929085973050213
Given the size of the original data, realistically I am after the total sum of the sse. For the above sample data, simply adding up the values give total sse=17.79532858227135.` Once I achieve this, then I will try to optimise the sse based on this figure by updating the arbitrary value m.
Here are the lists i hoped the function will iterate over.
Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series
Then convert it to lists.
Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
list
Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]
Away_Goal
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]
But the function still takes Home_Goal and Away_Goal as empty list.
If you only want to consider one MatchId at a time you should .groupby('MatchID') first
df.groupby('MatchID').apply(...)

Regex to pull out numbers and operands

I am trying to write a regex to parse out seven match objects: four numbers and three operands:
Individual lines in the file look like this:
[ 9] -21 - ( 12) - ( -5) + ( -26) = ______
The number in brackets is the line number which will be ignored. I want the four integer values, (including the '-' if it is a negative integer), which in this case are -21, 12, -5 and -26. I also want the operands, which are -, - and +.
I will then take those values (match objects) and actually compute the answer:
-21 - 12 - -5 + -26 = -54
I have this:
[\s+0-9](-?[0-9]+)
In Pythex it grabs the [ 9] but it also then grabs every integer in separate match objects (four additional match objects). I don't know why it does that.
If I add a ? to the end: [\s+0-9](-?[0-9]+)? thinking it will only grab the first integer, it doesn't. I get seventeen matches?
I am trying to say, via the regex: Grab the line number and it's brackets (that part works), then grab the first integer including sign, then the operand, then the next integer including sign, then the next operand, etc.
It appears that I have failed to explain myself clearly.
The file has hundreds of lines. Here is a five line sample:
[ 1] 19 - ( 1) - ( 4) + ( 28) = ______
[ 2] -18 + ( 8) - ( 16) - ( 2) = ______
[ 3] -8 + ( 17) - ( 15) + ( -29) = ______
[ 4] -31 - ( -12) - ( -5) + ( -26) = ______
[ 5] -15 - ( 12) - ( 14) - ( 31) = ______
The operands are only '-' or '+', but any combination of those three may appear in a line. The integers will all be from -99 to 99, but that shouldn't matter if the regex works. The goal (as I see it) is to extract seven match objects: four integers and three operands, then add the numbers
exactly as they appear. The number in brackets is just the line number and plays no role in the computation.
Much luck with regex, if you just need the result:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
s = s[s.find("]")+1:s.find("=")] # cut away line nr and = ...
if not re.sub( "[+-0123456789() ]*","",s): # weak attempt to prevent python code injection
print(eval(s))
else:
print("wonky chars inside, only numbers, +, - , space and () allowed.")
Output:
-54
Make sure to read the eval()
and have a look into:
https://opensourcehacker.com/2014/10/29/safe-evaluation-of-math-expressions-in-pure-python/
https://softwareengineering.stackexchange.com/questions/311507/why-are-eval-like-features-considered-evil-in-contrast-to-other-possibly-harmfu/311510
https://www.kevinlondon.com/2015/07/26/dangerous-python-functions.html
Example for hundreds of lines:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
def calcIt(line):
s = line[line.find("]")+1:line.find("=")]
if not re.sub( "[+-0123456789() ]*","",s):
return(eval(s))
else:
print(line + " has wonky chars inside, only numbers, +, - , space and () allowed.")
return None
import random
random.seed(42)
pattern = "[ {}] -{} - ( {}) - ( -{}) + ( -{}) = "
for n in range(1000):
nums = [n]
nums.extend([ random.randint(0,100),random.randint(-100,100),random.randint(-100,100),
random.randint(-100,100)])
c = pattern.format(*nums)
print (c, calcIt(c))
Ahh... I had a cup of coffee and sat down in front of Pythex again.
I figured out the correct regex:
[\s+0-9]\s+(-?[0-9]+)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)
Yields:
-21
-
12
-
-5
+
-26

mapping a matrix to list....its tricky

I have a list of 28 elements and a string of length 7 made up of 0,1,2,3 such as 0012031.
Now I want to read this string and depending on what character I read at a position I want to call a particular element from the list.
Think of the list's elements indexed as:
0 1 2 3 4 5 6 7 8 9
00 ['00', '01', '02', '03', '04', '05', '06', '10', '11', '12',
10 '13', '14', '15', '16', '20', '21', '22', '23', '24', '25',
20 '26', '30', '31', '32', '33', '34', '35', '36']
If the string has 0 at position 0, then I want l[0] (00). if the string has 3 at position 6, then I want l[27] (36).
Example:
String : '0012031'
character position call element from list
0 0 l[0] - 00
0 1 l[1] - 01
1 2 l[9] - 12
2 3 l[17] - 23
0 4 l[4] - 04
3 5 l[26] - 35
1 6 l[13] - 16
Note: I'm working with a list, not a matrix.
If you just want a way to find the index given the character and position, the following formula should cover it:
listIndex = 7*character + position