I have several, let's say six, different values. Can be numbers from 1 to 6.
I want to quickly list all unique combinations of four, so 1-2-3-4, 1-2-3-5 ... 3-4-5-6, all of them, but without any numbers showing more than once.
I'd like to do it in Libre Office Calc or Libre Office Base, but thus far I haven't had much luck searching for a way to do it. I'd be really grateful for any ideas.
there you go:
1234 1235 1236 1243 1245 1246 1253 1254 1256 1263 1264 1265 1324 1325 1326 1342 1345 1346 1352 1354 1356 1362 1364 1365 1423 1425 1426 1432 1435 1436 1452 1453 1456 1462 1463 1465 1523 1524 1526 1532 1534 1536 1542 1543 1546 1562 1563 1564 1623 1624 1625 1632 1634 1635 1642 1643 1645 1652 1653 1654 2134 2135 2136 2143 2145 2146 2153 2154 2156 2163 2164 2165 2314 2315 2316 2341 2345 2346 2351 2354 2356 2361 2364 2365 2413 2415 2416 2431 2435 2436 2451 2453 2456 2461 2463 2465 2513 2514 2516 2531 2534 2536 2541 2543 2546 2561 2563 2564 2613 2614 2615 2631 2634 2635 2641 2643 2645 2651 2653 2654 3124 3125 3126 3142 3145 3146 3152 3154 3156 3162 3164 3165 3214 3215 3216 3241 3245 3246 3251 3254 3256 3261 3264 3265 3412 3415 3416 3421 3425 3426 3451 3452 3456 3461 3462 3465 3512 3514 3516 3521 3524 3526 3541 3542 3546 3561 3562 3564 3612 3614 3615 3621 3624 3625 3641 3642 3645 3651 3652 3654 4123 4125 4126 4132 4135 4136 4152 4153 4156 4162 4163 4165 4213 4215 4216 4231 4235 4236 4251 4253 4256 4261 4263 4265 4312 4315 4316 4321 4325 4326 4351 4352 4356 4361 4362 4365 4512 4513 4516 4521 4523 4526 4531 4532 4536 4561 4562 4563 4612 4613 4615 4621 4623 4625 4631 4632 4635 4651 4652 4653 5123 5124 5126 5132 5134 5136 5142 5143 5146 5162 5163 5164 5213 5214 5216 5231 5234 5236 5241 5243 5246 5261 5263 5264 5312 5314 5316 5321 5324 5326 5341 5342 5346 5361 5362 5364 5412 5413 5416 5421 5423 5426 5431 5432 5436 5461 5462 5463 5612 5613 5614 5621 5623 5624 5631 5632 5634 5641 5642 5643 6123 6124 6125 6132 6134 6135 6142 6143 6145 6152 6153 6154 6213 6214 6215 6231 6234 6235 6241 6243 6245 6251 6253 6254 6312 6314 6315 6321 6324 6325 6341 6342 6345 6351 6352 6354 6412 6413 6415 6421 6423 6425 6431 6432 6435 6451 6452 6453 6512 6513 6514 6521 6523 6524 6531 6532 6534 6541 6542 6543
PS: i don't think that there is a way to generate them in libre office, since i'm not aware of programming languages in that program, however you can compute them online or with a your script
If you need the script, save this code in a .html file and open it in a browser
<html>
<body>
<script>
function finish(arr, n){
for(let el in arr)
if(el != n)
return true;
return false;
}
function updateIndexes(arr, n){
for( i = 0; i < arr.length ; i++ ){
if(arr[i] < n-1){
arr[i]++;
return true;
}
arr[i] = 0;
}
return false
}
let from = [1,2,3,4,5,6].map((el)=>el.toString());
let length = 4;
let separator = '-'
let indexes = Array(length).fill().map(el=>el=0);
let results = [];
do{
results.push(indexes.map(index => from[index]).join(separator));
} while (updateIndexes(indexes, from.length));
body = document.getElementsByTagName('body')[0];
results.filter((el)=>{
for(i = 0; i < el.length ; i++)
for(j = i+1 ; j < el.length ; j++)
if(el.charAt(i) == el.charAt(j) && el.charAt(i) != separator)
return false;
return true;
}).forEach(el => body.innerHTML+= el.toString()+'<br>');
</script>
</body>
</html>
what you can customize is:
let from = [1,2,3,4,5,6]; to what numbers/letters you want
let length = 4; to the length of the string you want
let separator = '-' to the separator you want (the separator here intended is the one between each sequence generated, so in this case will be 1-2-3-4 for example)
Python has a library called itertools that does this.
import itertools
l = itertools.permutations(range(1,7), 4) # between 1 and 6 of length 4
for t in list(l):
print("{}, ".format("-".join(str(i) for i in t)), end='')
Result:
1-2-3-4, 1-2-3-5, 1-2-3-6, 1-2-4-3, 1-2-4-5, 1-2-4-6, 1-2-5-3, 1-2-5-4, 1-2-5-6, 1-2-6-3, 1-2-6-4, 1-2-6-5, 1-3-2-4, 1-3-2-5, 1-3-2-6, 1-3-4-2, 1-3-4-5, 1-3-4-6, 1-3-5-2, 1-3-5-4, 1-3-5-6, 1-3-6-2, 1-3-6-4, 1-3-6-5, 1-4-2-3, 1-4-2-5, 1-4-2-6, 1-4-3-2, 1-4-3-5, 1-4-3-6, 1-4-5-2, 1-4-5-3, 1-4-5-6, 1-4-6-2, 1-4-6-3, 1-4-6-5, 1-5-2-3, 1-5-2-4, 1-5-2-6, 1-5-3-2, 1-5-3-4, 1-5-3-6, 1-5-4-2, 1-5-4-3, 1-5-4-6, 1-5-6-2, 1-5-6-3, 1-5-6-4, 1-6-2-3, 1-6-2-4, 1-6-2-5, 1-6-3-2, 1-6-3-4, 1-6-3-5, 1-6-4-2, 1-6-4-3, 1-6-4-5, 1-6-5-2, 1-6-5-3, 1-6-5-4, 2-1-3-4, 2-1-3-5, 2-1-3-6, 2-1-4-3, 2-1-4-5, 2-1-4-6, 2-1-5-3, 2-1-5-4, 2-1-5-6, 2-1-6-3, 2-1-6-4, 2-1-6-5, 2-3-1-4, 2-3-1-5, 2-3-1-6, 2-3-4-1, 2-3-4-5, 2-3-4-6, 2-3-5-1, 2-3-5-4, 2-3-5-6, 2-3-6-1, 2-3-6-4, 2-3-6-5, 2-4-1-3, 2-4-1-5, 2-4-1-6, 2-4-3-1, 2-4-3-5, 2-4-3-6, 2-4-5-1, 2-4-5-3, 2-4-5-6, 2-4-6-1, 2-4-6-3, 2-4-6-5, 2-5-1-3, 2-5-1-4, 2-5-1-6, 2-5-3-1, 2-5-3-4, 2-5-3-6, 2-5-4-1, 2-5-4-3, 2-5-4-6, 2-5-6-1, 2-5-6-3, 2-5-6-4, 2-6-1-3, 2-6-1-4, 2-6-1-5, 2-6-3-1, 2-6-3-4, 2-6-3-5, 2-6-4-1, 2-6-4-3, 2-6-4-5, 2-6-5-1, 2-6-5-3, 2-6-5-4, 3-1-2-4, 3-1-2-5, 3-1-2-6, 3-1-4-2, 3-1-4-5, 3-1-4-6, 3-1-5-2, 3-1-5-4, 3-1-5-6, 3-1-6-2, 3-1-6-4, 3-1-6-5, 3-2-1-4, 3-2-1-5, 3-2-1-6, 3-2-4-1, 3-2-4-5, 3-2-4-6, 3-2-5-1, 3-2-5-4, 3-2-5-6, 3-2-6-1, 3-2-6-4, 3-2-6-5, 3-4-1-2, 3-4-1-5, 3-4-1-6, 3-4-2-1, 3-4-2-5, 3-4-2-6, 3-4-5-1, 3-4-5-2, 3-4-5-6, 3-4-6-1, 3-4-6-2, 3-4-6-5, 3-5-1-2, 3-5-1-4, 3-5-1-6, 3-5-2-1, 3-5-2-4, 3-5-2-6, 3-5-4-1, 3-5-4-2, 3-5-4-6, 3-5-6-1, 3-5-6-2, 3-5-6-4, 3-6-1-2, 3-6-1-4, 3-6-1-5, 3-6-2-1, 3-6-2-4, 3-6-2-5, 3-6-4-1, 3-6-4-2, 3-6-4-5, 3-6-5-1, 3-6-5-2, 3-6-5-4, 4-1-2-3, 4-1-2-5, 4-1-2-6, 4-1-3-2, 4-1-3-5, 4-1-3-6, 4-1-5-2, 4-1-5-3, 4-1-5-6, 4-1-6-2, 4-1-6-3, 4-1-6-5, 4-2-1-3, 4-2-1-5, 4-2-1-6, 4-2-3-1, 4-2-3-5, 4-2-3-6, 4-2-5-1, 4-2-5-3, 4-2-5-6, 4-2-6-1, 4-2-6-3, 4-2-6-5, 4-3-1-2, 4-3-1-5, 4-3-1-6, 4-3-2-1, 4-3-2-5, 4-3-2-6, 4-3-5-1, 4-3-5-2, 4-3-5-6, 4-3-6-1, 4-3-6-2, 4-3-6-5, 4-5-1-2, 4-5-1-3, 4-5-1-6, 4-5-2-1, 4-5-2-3, 4-5-2-6, 4-5-3-1, 4-5-3-2, 4-5-3-6, 4-5-6-1, 4-5-6-2, 4-5-6-3, 4-6-1-2, 4-6-1-3, 4-6-1-5, 4-6-2-1, 4-6-2-3, 4-6-2-5, 4-6-3-1, 4-6-3-2, 4-6-3-5, 4-6-5-1, 4-6-5-2, 4-6-5-3, 5-1-2-3, 5-1-2-4, 5-1-2-6, 5-1-3-2, 5-1-3-4, 5-1-3-6, 5-1-4-2, 5-1-4-3, 5-1-4-6, 5-1-6-2, 5-1-6-3, 5-1-6-4, 5-2-1-3, 5-2-1-4, 5-2-1-6, 5-2-3-1, 5-2-3-4, 5-2-3-6, 5-2-4-1, 5-2-4-3, 5-2-4-6, 5-2-6-1, 5-2-6-3, 5-2-6-4, 5-3-1-2, 5-3-1-4, 5-3-1-6, 5-3-2-1, 5-3-2-4, 5-3-2-6, 5-3-4-1, 5-3-4-2, 5-3-4-6, 5-3-6-1, 5-3-6-2, 5-3-6-4, 5-4-1-2, 5-4-1-3, 5-4-1-6, 5-4-2-1, 5-4-2-3, 5-4-2-6, 5-4-3-1, 5-4-3-2, 5-4-3-6, 5-4-6-1, 5-4-6-2, 5-4-6-3, 5-6-1-2, 5-6-1-3, 5-6-1-4, 5-6-2-1, 5-6-2-3, 5-6-2-4, 5-6-3-1, 5-6-3-2, 5-6-3-4, 5-6-4-1, 5-6-4-2, 5-6-4-3, 6-1-2-3, 6-1-2-4, 6-1-2-5, 6-1-3-2, 6-1-3-4, 6-1-3-5, 6-1-4-2, 6-1-4-3, 6-1-4-5, 6-1-5-2, 6-1-5-3, 6-1-5-4, 6-2-1-3, 6-2-1-4, 6-2-1-5, 6-2-3-1, 6-2-3-4, 6-2-3-5, 6-2-4-1, 6-2-4-3, 6-2-4-5, 6-2-5-1, 6-2-5-3, 6-2-5-4, 6-3-1-2, 6-3-1-4, 6-3-1-5, 6-3-2-1, 6-3-2-4, 6-3-2-5, 6-3-4-1, 6-3-4-2, 6-3-4-5, 6-3-5-1, 6-3-5-2, 6-3-5-4, 6-4-1-2, 6-4-1-3, 6-4-1-5, 6-4-2-1, 6-4-2-3, 6-4-2-5, 6-4-3-1, 6-4-3-2, 6-4-3-5, 6-4-5-1, 6-4-5-2, 6-4-5-3, 6-5-1-2, 6-5-1-3, 6-5-1-4, 6-5-2-1, 6-5-2-3, 6-5-2-4, 6-5-3-1, 6-5-3-2, 6-5-3-4, 6-5-4-1, 6-5-4-2, 6-5-4-3,
LibreOffice allows Python scripting, so the code can be added to Calc or Base by including it in a Python-UNO macro.
I have a dataframe(df1) as following:
datetime m d 1d 2d 3d
2014-01-01 1 1 2 2 3
2014-01-02 1 2 3 4 3
2014-01-03 1 3 1 2 3
...........
2014-12-01 12 1 2 2 3
2014-12-31 12 31 2 2 3
Also I have another dataframe(df2) as following:
datetime m d
2015-01-02 1 2
2015-01-03 1 3
...........
2015-12-01 12 1
2015-12-31 12 31
I want to merge the 1d 2d 3d columns value of df1 to df2.
There are two conditions:
(1) only m and d are the same in both df1 and df2 can merge.
(2) if the index of df2 index % 30 ==0 don't merge, the value of 1d 2d 3d of these index can be Nan.
I mean I want the new dataframe of df2 like as following:
datetime m d 1d 2d 3d
2015-01-02 1 2 Nan Nan Nan
2015-01-03 1 3 1 2 3
...........
2015-12-01 12 1 2 2 3
2015-12-31 12 31 2 2 3
Thanks in advance!
I think you need add NaNs by loc and then merge with left join:
np.random.seed(10)
N = 365
rng = pd.date_range('2015-01-01', periods=N)
df_tr_2014 = pd.DataFrame(np.random.randint(10, size=(N, 3)), index=rng).reset_index()
df_tr_2014.columns = ['datetime','7d','15d','20d']
df_tr_2014.insert(1,'month', df_tr_2014['datetime'].dt.month)
df_tr_2014.insert(2,'day_m', df_tr_2014['datetime'].dt.day)
#print (df_tr_2014.head())
N = 366
rng = pd.date_range('2016-01-01', periods=N)
df_te = pd.DataFrame(index=rng)
df_te['month'] = df_te.index.month
df_te['day_m'] = df_te.index.day
df_te = df_te.reset_index()
#print (df_te.tail())
df2 = df_te.copy()
df1 = df_tr_2014.copy()
df1 = df1.set_index('datetime')
df1.index += pd.offsets.DateOffset(years=1)
#correct 29 February
y = df1.index[0].year
df1 = df1.reindex(pd.date_range(pd.datetime(y,1,1), pd.datetime(y,12,31)))
idx = df1.index[(df1.index.month == 2) & (df1.index.day == 29)]
df1.loc[idx, :] = df1.loc[idx - pd.Timedelta(1, unit='d'), :].values
df1.loc[idx, 'day_m'] = idx.day
df1[['month','day_m']] = df1[['month','day_m']].astype(int)
df1[['7d','15d', '20d']] = df1[['7d','15d', '20d']].astype(float)
df1.loc[np.arange(len(df1.index)) % 30 == 0, ['7d','15d','20d']] = 0
df1 = df1.reset_index()
print (df1.iloc[57:62])
index month day_m 7d 15d 20d
57 2016-02-27 2 27 2.0 0.0 1.0
58 2016-02-28 2 28 2.0 3.0 5.0
59 2016-02-29 2 29 2.0 3.0 5.0
60 2016-03-01 3 1 0.0 0.0 0.0
61 2016-03-02 3 2 7.0 6.0 9.0
Why don't you just remove the rows in df1 that don't match (m, d) pairs in df2?
df_new = df2.drop(df2[(not ((df2.m == df1.m) & (df2.n == df1.n)).any()) or (df2.index % 30 == 0)].index)
Or something along those lines.
Link to a related answer.
I'm not enormously familiar with Pandas and have not tested the above example.
df_te is df2
df_tr_2014 is df1
7d 15d 20 is 1d 2d 3d respectively in question. size_df_te is the length of df_te, month and day_m are m, d in df2
df_te['7d'] = 0
df_te['15d'] = 0
df_te['20d'] = 0
mj = 0
dj = 0
for i in range(size_df_te):
if i%30 != 0:
m = df_te.loc[i,'month']
d = df_te.loc[i,'day_m']
if (m== 2) & (d == 29):
m = 2
d = 28
dk_7 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['7d']
dk_15 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['15d']
dk_20 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['20d']
df_te.loc[i,'7d'] = float(dk_7)
df_te.loc[i,'15d'] = float(dk_15)
df_te.loc[i,'20d'] = float(dk_20)
EDIT:
Sample data:
np.random.seed(10)
N = 365
rng = pd.date_range('2014-01-01', periods=N)
df_tr_2014 = pd.DataFrame(np.random.randint(10, size=(N, 3)), index=rng).reset_index()
df_tr_2014.columns = ['datetime','7d','15d','20d']
df_tr_2014.insert(1,'month', df_tr_2014['datetime'].dt.month)
df_tr_2014.insert(2,'day_m', df_tr_2014['datetime'].dt.day)
#print (df_tr_2014.head())
N = 365
rng = pd.date_range('2015-01-01', periods=N)
df_te = pd.DataFrame(index=rng)
df_te['month'] = df_te.index.month
df_te['day_m'] = df_te.index.day
df_te = df_te.reset_index()
#print (df_te.head())
I am trying to get a proper structured output into a csv.
Input:
00022d9064bc,1073260801,1073260803,819251,440006
00022d9064bc,1073260803,1073260810,819213,439954
00904b4557d3,1073260803,1073261920,817526,439458
00022de73863,1073260804,1073265410,817558,439525
00904b14b494,1073260804,1073262625,817558,439525
00022d1406df,1073260807,1073260809,820428,438735
00022d9064bc,1073260801,1073260803,819251,440006
00022dba8f51,1073260801,1073260803,819251,440006
00022de1c6c1,1073260801,1073260803,819251,440006
003065f30f37,1073260801,1073260803,819251,440006
00904b48a3b6,1073260801,1073260803,819251,440006
00904b83a0ea,1073260803,1073260810,819213,439954
00904b85d3cf,1073260803,1073261920,817526,439458
00904b14b494,1073260804,1073265410,817558,439525
00904b99499c,1073260804,1073262625,817558,439525
00904bb96e83,1073260804,1073265163,817558,439525
00904bf91b75,1073260804,1073263786,817558,439525
Code:
import pandas as pd
from datetime import datetime,time
import numpy as np
fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)
df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)
# building reporting DF: `r`
freq = '1H' # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)
# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1
r['LogCount'] = 0
r['UniqueIDCount'] = 0
for i, row in r.iterrows():
# intervals overlap test
# https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
# i've slightly simplified the calculations of m and d
# by getting rid of division by 2,
# because it can be done eliminating common terms
u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time
#df.to_csv((r[r.LogCount > 0])'example.csv')
#print(r[r.LogCount > 0]) -- This gives the correct count and unique count but I want to write the output in a structure.
print (r['StartTime'], ['EndTime'], ['Day'], ['LogCount'], ['UniqueIDCount'])
Output: This is the output that I am getting which is not what I am looking for.
(2004-01-05 00:00:00 00:00:00
2004-01-05 01:00:00 01:00:00
2004-01-05 02:00:00 02:00:00
2004-01-05 03:00:00 03:00:00
2004-01-05 04:00:00 04:00:00
2004-01-05 05:00:00 05:00:00
2004-01-05 06:00:00 06:00:00
2004-01-05 07:00:00 07:00:00
2004-01-05 08:00:00 08:00:00
2004-01-05 09:00:00 09:00:00
And the Expected output headers are
StartTime, EndTime, Day, Count, UniqueIDCount
How do I structure the Write statement in code to have the above mentioned columns in my output csv.
Try This:
rout = r[['StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ]
print rout
rout.to_csv('results.csv', index=False)