Regexp split key values into columns in Bigquery - regex

My column data looks like
"dayparts": [{"day": "Saturday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Sunday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Thursday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}]
I would like to have the result like

You can try:
WITH sample AS (
SELECT "1" AS id, "{\"dayparts\":[{\"day\":\"Saturday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Sunday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Thursday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]}]}" AS msg
)
SELECT id,
JSON_VALUE(dparts, '$.day') AS day,
JSON_QUERY(dparts, '$.hours') AS hours
FROM (
SELECT id,
JSON_EXTRACT_ARRAY(JSON_QUERY(msg, '$.dayparts')) AS dayparts
FROM sample) t, UNNEST(t.dayparts) dparts
(I added enclosing "{" and "}" to be able to perform JSON operations, if they are not there just concatenate them I guess)
(You can also add "JSON_EXTRACT_ARRAY" around "JSON_QUERY(dparts, '$.hours')" if you wish an actual array in the result table)

Related

Django filter - is DateTimeField filled

to my model I added a simply DateTimeField:
expired = models.DateTimeField(default=None)
. The value of the field can be either None or a Datetime.
I'd like to filter for objects where the expired is filled with any datum, however I'm struggling to find the right filter.
I think I tried all the combinations of filter / exclude and expired__isnull=True / expired=None, but I never get back the exact number.
What's the right way to filter if the field has a DateTime in it, or not?
Django: 1.11.16
Thanks.
In my model there're 2122 lines:
Counter(Obj.objects.filter().values_list('expired'))
Counter({(datetime.datetime(2021, 9, 24, 1, 6, 50),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 51),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 32),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 3),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 44),): 1,
(datetime.datetime(2021, 12, 4, 1, 31, 25),): 1,
(datetime.datetime(2021, 12, 4, 1, 37, 49),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 55),): 1,
(None,): 2087,
(datetime.datetime(2021, 12, 4, 1, 37, 52),): 1,
(datetime.datetime(2021, 12, 4, 1, 2, 8),): 4,
(datetime.datetime(2021, 12, 4, 1, 5, 14),): 9,
(datetime.datetime(2021, 9, 28, 0, 43, 51),): 1,
(datetime.datetime(2021, 12, 4, 1, 0, 13),): 7,
(datetime.datetime(2021, 12, 4, 1, 9, 59),): 2,
(datetime.datetime(2021, 12, 3, 17, 25, 46),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 54),): 1,
(datetime.datetime(2021, 9, 24, 1, 14, 30),): 1})
.
Obj.objects.filter(expired__isnull=False).count()
returns all the lines (2122) ... .
Obj.objects.filter(expired=None).count() returns 2087 lines instead of the 35 expected.
Obj.objects.exclude(expired=None).count() returns 2122, so all the lines.
The query is good, the problem is in the model definition. It should be blank=True and null=True.
try changing the field in Model
expired = models.DateTimeField(
auto_now=False,
null=True,
blank=True
)

How to select rows by a column value in D with mir.ndslice?

I am browsing through mir.ndslice docs trying to figure out how to do a simple row selection by column.
In numpy I would do:
a = np.random.randint(0, 20, [4, 6])
# array([[ 8, 5, 4, 18, 1, 4],
# [ 2, 18, 15, 7, 18, 19],
# [16, 5, 4, 6, 11, 11],
# [15, 1, 14, 6, 1, 4]])
a[a[:,2] > 10] # select rows where the second column value is > 10
# array([[ 2, 18, 15, 7, 18, 19],
# [15, 1, 14, 6, 1, 4]])
Using mir library I naively tried:
import std.range;
import std.random;
import mir.ndslice;
auto a = generate!(() => uniform(0, 20)).take(24).array.sliced(4,6);
// [[12, 19, 3, 10, 19, 11],
// [19, 0, 0, 13, 9, 1],
// [ 0, 0, 4, 13, 1, 2],
// [ 6, 19, 14, 18, 14, 18]]
a[a[0..$,2] > 10];
But got
Error: incompatible types for `((ulong __dollar = a.length();) , a.opIndex(a.opSlice(0LU, __dollar), 2)) > (10)`: `Slice!(int*, 1LU, cast(mir_slice_kind)0)` and `int`
dmd failed with exit code 1.
So, I went through the docs and couldn't find anything that would look like np.where or similar. Is it even possible in mir?

Timeline bar graph using python and matplotlib

I am looking to draw a timeline bar graph using matplotlib that will show the things a person did in one day. I am adding the code below's output and an expected output that I am looking for. Any library can be used, in my case the closest I could get to was using matplotlib. Any help would be greatly appreciated.
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = [ (dt.datetime(2018, 7, 17, 0, 15), dt.datetime(2018, 7, 17, 0, 30), 'sleep'),
(dt.datetime(2018, 7, 17, 0, 30), dt.datetime(2018, 7, 17, 0, 45), 'eat'),
(dt.datetime(2018, 7, 17, 0, 45), dt.datetime(2018, 7, 17, 1, 0), 'work'),
(dt.datetime(2018, 7, 17, 1, 0), dt.datetime(2018, 7, 17, 1, 30), 'sleep'),
(dt.datetime(2018, 7, 17, 1, 15), dt.datetime(2018, 7, 17, 1, 30), 'eat'),
(dt.datetime(2018, 7, 17, 1, 30), dt.datetime(2018, 7, 17, 1, 45), 'work')
]
rng=[]
for i in range(len(data)):
rng.append((data[i][0]).strftime('%H:%M'))
index={}
activity = []
for i in range(len(data)):
index[(data[i][2])]=[]
activity.append(data[i][2])
for i in range(len(index)):
for j in range(len(activity)):
if activity[j]==index.keys()[i]:
index[index.keys()[i]].append(15)
else:
index[index.keys()[i]].append(0)
data = list(index.values())
df = pd.DataFrame(data,index=list(index.keys()))
df.plot.barh(stacked=True, sharex=False)
plt.show()
My Output:
Using matplotlib this is what I was getting
Expected Output:
I got this using google charts' Timeline graph but I need this using python and the data used for generating both graphs is not exactly the same, I hope you get the point
You may create a PolyCollection of "bars". For this you would need to convert your dates to numbers (matplotlib.dates.date2num).
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.collections import PolyCollection
data = [ (dt.datetime(2018, 7, 17, 0, 15), dt.datetime(2018, 7, 17, 0, 30), 'sleep'),
(dt.datetime(2018, 7, 17, 0, 30), dt.datetime(2018, 7, 17, 0, 45), 'eat'),
(dt.datetime(2018, 7, 17, 0, 45), dt.datetime(2018, 7, 17, 1, 0), 'work'),
(dt.datetime(2018, 7, 17, 1, 0), dt.datetime(2018, 7, 17, 1, 30), 'sleep'),
(dt.datetime(2018, 7, 17, 1, 15), dt.datetime(2018, 7, 17, 1, 30), 'eat'),
(dt.datetime(2018, 7, 17, 1, 30), dt.datetime(2018, 7, 17, 1, 45), 'work')
]
cats = {"sleep" : 1, "eat" : 2, "work" : 3}
colormapping = {"sleep" : "C0", "eat" : "C1", "work" : "C2"}
verts = []
colors = []
for d in data:
v = [(mdates.date2num(d[0]), cats[d[2]]-.4),
(mdates.date2num(d[0]), cats[d[2]]+.4),
(mdates.date2num(d[1]), cats[d[2]]+.4),
(mdates.date2num(d[1]), cats[d[2]]-.4),
(mdates.date2num(d[0]), cats[d[2]]-.4)]
verts.append(v)
colors.append(colormapping[d[2]])
bars = PolyCollection(verts, facecolors=colors)
fig, ax = plt.subplots()
ax.add_collection(bars)
ax.autoscale()
loc = mdates.MinuteLocator(byminute=[0,15,30,45])
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
ax.set_yticks([1,2,3])
ax.set_yticklabels(["sleep", "eat", "work"])
plt.show()
Note that such plots can equally be generated with a Broken Bar plot (broken_barh), however, the (unsorted) data used here, make it a bit easier using a PolyCollection.
And now you would need to explain to me how you can sleep and eat at the same time - something I can never quite get at, as hard as I try.
My solution using Altair (example):
import altair as alt
import datetime as dt
import pandas as pd
alt.renderers.enable('jupyterlab')
data = pd.DataFrame()
data['from'] = [dt.datetime(2018, 7, 17, 0, 15),
dt.datetime(2018, 7, 17, 0, 30),
dt.datetime(2018, 7, 17, 0, 45),
dt.datetime(2018, 7, 17, 1, 0),
dt.datetime(2018, 7, 17, 1, 15),
dt.datetime(2018, 7, 17, 1, 30)]
data['to'] = [dt.datetime(2018, 7, 17, 0, 30),
dt.datetime(2018, 7, 17, 0, 45),
dt.datetime(2018, 7, 17, 1, 0),
dt.datetime(2018, 7, 17, 1, 15),
dt.datetime(2018, 7, 17, 1, 30),
dt.datetime(2018, 7, 17, 1, 45)]
data['activity'] = ['sleep','eat','work','sleep','eat','work']
#data
alt.Chart(data).mark_bar().encode(
x='from',
x2='to',
y='activity',
color=alt.Color('activity', scale=alt.Scale(scheme='dark2'))
)
Output:

How to split a List into "n" number of sublists in Java? User will input the value of "n"

Say I have the below-mentioned list:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
Now, if user wants 4 sub-lists (n=4), then the sub-lists will be
[0,1,2,3,4,5]
[6,7,8,9,10,11]
[12,13,14,15,16,17]
[18,19,20]
Similarly, if user wants 6 sub-lists (n=6), then the sub-lists will be
[0,1,2,3]
[4,5,6,7]
[8,9,10,11]
[12,13,14,15]
[16,17,18,19]
[20]
Please let me know how can I achieve this.
list1 =[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
n=6
k = int(len(list1)/float(n))+1
i=0
for x in range(n-1):
i=(x+1)*k
print list1[i-k:i]
print list1[i:]
Try this python code. It answers your problem perfectly.

Pandas, Update df1 rows from df2

I have df1:
id, colA, colB, colC, name
1, 1, 2, 3, a
2, 2, 3, 4, a
3, 3, 4, 5, b
4, 4, 5, 6, b
and df2:
id, colA, colB, colD, name
2, 10, 20, D1, a
3, 20, 30, D2, a
Is there a way, perhaps using merge or join to replace the rows in df with df2 matching id and name
So the result would look like:
id, colA, colB, colC, name, colD
1, 1, 2, 3, a, N/A
2, 10, 20, N/A, a, D1
3, 3, 4, 5, b, N/A
4, 4, 5, 6, b, N?A
I was thinking something like: df1.loc[df1.Locident.isin(df2.Locident)] = df2 but that only matches on one column.
You could:
df = pd.concat([df1, df2]).drop_duplicates(subset=['id', 'name'], keep='last').drop_duplicates(subset='id')
to combine both DataFrames and keep duplicate ids and names that stem from df2, and get rid of ids from df2 that you do not want to keep.