Pandas, Update df1 rows from df2 - python-2.7

I have df1:
id, colA, colB, colC, name
1, 1, 2, 3, a
2, 2, 3, 4, a
3, 3, 4, 5, b
4, 4, 5, 6, b
and df2:
id, colA, colB, colD, name
2, 10, 20, D1, a
3, 20, 30, D2, a
Is there a way, perhaps using merge or join to replace the rows in df with df2 matching id and name
So the result would look like:
id, colA, colB, colC, name, colD
1, 1, 2, 3, a, N/A
2, 10, 20, N/A, a, D1
3, 3, 4, 5, b, N/A
4, 4, 5, 6, b, N?A
I was thinking something like: df1.loc[df1.Locident.isin(df2.Locident)] = df2 but that only matches on one column.

You could:
df = pd.concat([df1, df2]).drop_duplicates(subset=['id', 'name'], keep='last').drop_duplicates(subset='id')
to combine both DataFrames and keep duplicate ids and names that stem from df2, and get rid of ids from df2 that you do not want to keep.

Related

Django filter - is DateTimeField filled

to my model I added a simply DateTimeField:
expired = models.DateTimeField(default=None)
. The value of the field can be either None or a Datetime.
I'd like to filter for objects where the expired is filled with any datum, however I'm struggling to find the right filter.
I think I tried all the combinations of filter / exclude and expired__isnull=True / expired=None, but I never get back the exact number.
What's the right way to filter if the field has a DateTime in it, or not?
Django: 1.11.16
Thanks.
In my model there're 2122 lines:
Counter(Obj.objects.filter().values_list('expired'))
Counter({(datetime.datetime(2021, 9, 24, 1, 6, 50),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 51),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 32),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 3),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 44),): 1,
(datetime.datetime(2021, 12, 4, 1, 31, 25),): 1,
(datetime.datetime(2021, 12, 4, 1, 37, 49),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 55),): 1,
(None,): 2087,
(datetime.datetime(2021, 12, 4, 1, 37, 52),): 1,
(datetime.datetime(2021, 12, 4, 1, 2, 8),): 4,
(datetime.datetime(2021, 12, 4, 1, 5, 14),): 9,
(datetime.datetime(2021, 9, 28, 0, 43, 51),): 1,
(datetime.datetime(2021, 12, 4, 1, 0, 13),): 7,
(datetime.datetime(2021, 12, 4, 1, 9, 59),): 2,
(datetime.datetime(2021, 12, 3, 17, 25, 46),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 54),): 1,
(datetime.datetime(2021, 9, 24, 1, 14, 30),): 1})
.
Obj.objects.filter(expired__isnull=False).count()
returns all the lines (2122) ... .
Obj.objects.filter(expired=None).count() returns 2087 lines instead of the 35 expected.
Obj.objects.exclude(expired=None).count() returns 2122, so all the lines.
The query is good, the problem is in the model definition. It should be blank=True and null=True.
try changing the field in Model
expired = models.DateTimeField(
auto_now=False,
null=True,
blank=True
)

Regexp split key values into columns in Bigquery

My column data looks like
"dayparts": [{"day": "Saturday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Sunday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Thursday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}]
I would like to have the result like
You can try:
WITH sample AS (
SELECT "1" AS id, "{\"dayparts\":[{\"day\":\"Saturday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Sunday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Thursday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]}]}" AS msg
)
SELECT id,
JSON_VALUE(dparts, '$.day') AS day,
JSON_QUERY(dparts, '$.hours') AS hours
FROM (
SELECT id,
JSON_EXTRACT_ARRAY(JSON_QUERY(msg, '$.dayparts')) AS dayparts
FROM sample) t, UNNEST(t.dayparts) dparts
(I added enclosing "{" and "}" to be able to perform JSON operations, if they are not there just concatenate them I guess)
(You can also add "JSON_EXTRACT_ARRAY" around "JSON_QUERY(dparts, '$.hours')" if you wish an actual array in the result table)

How to select rows by a column value in D with mir.ndslice?

I am browsing through mir.ndslice docs trying to figure out how to do a simple row selection by column.
In numpy I would do:
a = np.random.randint(0, 20, [4, 6])
# array([[ 8, 5, 4, 18, 1, 4],
# [ 2, 18, 15, 7, 18, 19],
# [16, 5, 4, 6, 11, 11],
# [15, 1, 14, 6, 1, 4]])
a[a[:,2] > 10] # select rows where the second column value is > 10
# array([[ 2, 18, 15, 7, 18, 19],
# [15, 1, 14, 6, 1, 4]])
Using mir library I naively tried:
import std.range;
import std.random;
import mir.ndslice;
auto a = generate!(() => uniform(0, 20)).take(24).array.sliced(4,6);
// [[12, 19, 3, 10, 19, 11],
// [19, 0, 0, 13, 9, 1],
// [ 0, 0, 4, 13, 1, 2],
// [ 6, 19, 14, 18, 14, 18]]
a[a[0..$,2] > 10];
But got
Error: incompatible types for `((ulong __dollar = a.length();) , a.opIndex(a.opSlice(0LU, __dollar), 2)) > (10)`: `Slice!(int*, 1LU, cast(mir_slice_kind)0)` and `int`
dmd failed with exit code 1.
So, I went through the docs and couldn't find anything that would look like np.where or similar. Is it even possible in mir?

Plotting graph of items in list into corresponding category using PyPlot in Python 2.7

I have a list with 10 records, and each record has one or more elements with 3 categories like below:
list = [('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')]
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')]
[('0.6', 8, 'doc2.txt')]
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')]
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')]
[('0.3', 7, 'doc9.txt')]
[('0.1', 8, 'doc12.txt')]
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')]
[('0.9', 7, 'doc22.txt')]
[('0.3', 7, 'doc24.txt')]
You many notice the third category of every record has the same text for each record. There are 10 categories as the list consists of 10 records.
According to the structure of the list:
For example, [('0.6', 8, 'doc2.txt')]
First element, '0.6' represents X-axis value in the range of [0.1 -> 1.0]
Second element of an integer represents Y-axis value in graph
Third element, 'doc2.txt' represents the Category name in graph
The list should be plotted as the image below,
I've been trying with several approaches, but still couldn't figure that out
>>> plt.scatter(*zip(*list))
>>> plt.xlabel('X-Axis')
>>> plt.ylabel('Y-Axis')
>>> plt.show()
I think you can just keep the list as it is and iterate over it. You'd then produce a scatter plot for each sublist in the outer list, as the items from the sublist should share the same marker, color and legend label.
import matplotlib.pyplot as plt
#don't call a variable "list" or "print" or any other python command's name
liste=[[('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')],
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')],
[('0.6', 8, 'doc2.txt')],
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')],
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')],
[('0.3', 7, 'doc9.txt')],
[('0.1', 8, 'doc12.txt')],
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')],
[('0.9', 7, 'doc22.txt')],
[('0.3', 7, 'doc24.txt')]]
markers=[ur"$\u25A1$", ur"$\u25A0$", ur"$\u25B2$", ur"$\u25E9$"]
colors= ["k", "crimson", "#112b77"]
fig, ax = plt.subplots()
for i, l in enumerate(liste):
x,y,cat = zip(*l)
ax.scatter(list(map(float, x)),y, s=64,c=colors[(i//4)%3],
marker=markers[i%4], label=cat[0])
ax.legend(bbox_to_anchor=(1.01,1), borderaxespad=0)
plt.subplots_adjust(left=0.1,right=0.8)
plt.show()
There are multiple issues. You assignment of list makes no sense (presumably you forgot some parentheses). Also, you really shouldn't reuse built-in names like "list". You should not represent floats as strings (your x coordinates). You cannot simply unpack a list into plt.scatter and hope that magically all of these issues work themselves out.
Below some code how to properly pass your data to scatter (I use plot instead of scatter as you can pass plot proper colour names).
import numpy as np
import matplotlib.pyplot as plt
# 'list' is a bad name for a variable as it overwrites the list() built-in function
# -> rename to data
data = [
[('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')],
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')],
[('0.6', 8, 'doc2.txt')],
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')],
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')],
[('0.3', 7, 'doc9.txt')],
[('0.1', 8, 'doc12.txt')],
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')],
[('0.9', 7, 'doc22.txt')],
[('0.3', 7, 'doc24.txt')]
]
# flatten nested list
flat = [item for sublist in data for item in sublist]
# convert strings to numbers
numeric = [(float(x), y, label) for (x, y, label) in flat]
# create a dictionary that maps a label to a set of x,y coordinates
data = dict()
for (x, y, label) in numeric:
if label in data:
data[label].append((x,y))
else:
data[label] = [(x,y)]
# initialise figure
fig, ax = plt.subplots(1,1)
colors = ['blue', 'red', 'yellow', 'green', 'orange', 'brown', 'violet', 'magenta', 'white', 'black']
# populate figure
for color, (label, xy) in zip(colors, data.iteritems()):
x, y = np.array(xy).T
ax.plot(x, y, 'o', label=label, color=color)
ax.set_xlim(0, 1.1)
ax.set_ylim(0, 16)
ax.legend(numpoints=1)
plt.show()

Importing text file to Excel with VBA - multiple strings with same delimiter

I'm trying to import a number of text files into Excel using the VBA code below. Whilst the code produces a list of the Transaction Sales Numbers with corresponding date for each file imported, I can't work out how to get the associated Transaction Sales Numbers into seperate columns in each imported file row. I have tried RegEx but struggled with the differing formats of the Sales Numbers (an example of each is in the sample file)... Can anyone help?
Many thanks in advance
Sample text file:
This is sales enquiry response for SER:SS09458GQPBXX201503191300WWPL0933 *********************************************************** Sales record match For SER:SS09458GQPBXX201503191300WWPL0933 **********************Original File********************** File Data Source POS Type of Transaction EFT Date Mar 19 2015 12:00PM Transaction Sales Number LLRUMOLN120150319FLRPLIS08783 Product Name HAIRDRYER ***************Sales File # 1*************** File Data Source POS Type of Transaction EFT Date Apr 23 2015 12:00PM Transaction Sales Number PLVOLMJBD0960807420300 Product Name HAIRDRYER ***************Sales File # 2*************** File Data Source POS Type of Transaction EFT Date May 28 2015 12:00PM Transaction Sales Number 781266HO3 Product Name HAIRDRYER ***************Sales File # 3*************** File Data Source POS Type of Transaction EFT Date May 10 2015 12:00PM Transaction Sales Number CVFORM05061126581000433 Product Name HAIRDRYER ***************Sales File # 4*************** File Data Source POS Type of Transaction EFT Date Jun 28 2015 12:07PM Transaction Sales Number LLB01L32330772427059291FOLM400P00295 Product Name HAIRDRYER
Option Explicit
Sub Sales_File_Extractor()
Dim fName As String, fPath As String, fPathDone As String
Dim LR As Long, NR As Long
Dim wbData As Workbook, wsMaster As Worksheet
Dim TSN_Start As String, TSN_End As String
Dim Date_Start As String, Date_End As String
Dim textline As String, text As String
'Setup
Application.ScreenUpdating = False 'speed up macro execution
Application.EnableEvents = False 'turn off other macros for now
Application.DisplayAlerts = False 'turn off system messages for now
Set wsMaster = ThisWorkbook.Sheets("SALES") 'sheet report is built into
With wsMaster
NR = .Range("A" & .Rows.Count).End(xlUp).Row + 1 'appends data to existing data
'Path and filename (edit this section to suit)
fPath = "C:\Users\burnsr\desktop\sales"
fPathDone = fPath & "Imported\" 'remember final \ in this string
On Error Resume Next
MkDir fPathDone 'creates the completed folder if missing
On Error GoTo 0
fName = Dir(fPath & "*.txt*") 'listing of desired files, edit filter as desired
Do While Len(fName) > 0
Open (fPath & fName) For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline 'second loop text is already stored -> see reset text
Loop
Close #1
On Error Resume Next
.Cells(NR, "A").Value = fName
Date_Start = InStr(text, "Date ") 'position of start delimiter
Date_End = InStr(text, "Transaction Sales Number") 'position of end delimiter
.Cells(NR, "C").Value = Mid(text, Date_Start + 34, Date_End - Date_Start - 34) 'position number is length of start string
TSN_Start = InStr(text, "Transaction Sales Number ") 'position of start delimiter
TSN_End = InStr(text, "Product Name") 'position of end delimiter
.Cells(NR, "B").Value = Mid(text, TSN_Start + 34, TSN_End - TSN_Start - 34) 'position number is length of start string
'How to get all other successive values in columns?
text = "" 'reset text
Close #1 'close file
NR = .Range("A" & .Rows.Count).End(xlUp).Row + 1 'next row
Name fPath & fName As fPathDone & fName 'move file to IMPORTED folder
fName = Dir 'ready next filename
Loop
End With
ErrorExit: 'Cleanup
Application.DisplayAlerts = True 'turn system alerts back on
Application.EnableEvents = True 'turn other macros back on
Application.ScreenUpdating = True 'refreshes the screen
MsgBox "Import completed"
Rabbie, I have an XLSM file that reads 6 CSV files and adds 6 sheets to inside itself. Text are TAB delimited.
UTF-8 CSV Headers Example:
Customer Number Customer description Cust. Name-Lang 2 Status Phone Number Fax Number E-mail Address Type of Business Cust. Group Code
VBA:
Function IsOpen(File$) As Boolean
Dim FN%
FN = FreeFile
On Error Resume Next
Open File For Random Access Read Write Lock Read Write As #FN
Close #FN
IsOpen = Err
End Function
Public Sub Load_Data()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
allName = Worksheets("START").Cells(6, "B").Value
tmpltName = Worksheets("START").Cells(4, "B").Value
savePath = Worksheets("START").Cells(3, "B").Value
Set currBook = ActiveWorkbook
Set prevsheet = ActiveSheet
'Load all ZOOM files
i = 2
For Each n In Worksheets("START").Range("E2:E8")
On Error Resume Next
currBook.Sheets(n.Text).Select
If Not Err Then
Err.Clear
currBook.Worksheets(n.Text).Delete
End If
Sheets.Add(Before:=Sheets("START")).Name = n.Text
' Checking if file is opened
If Not IsOpen(Worksheets("START").Cells(i, "F").Value) Then
' Loadd CSV file
LoadCSV Worksheets("START").Cells(i, "F").Value, n.Text
End If
' List of combining fields
' Find column with combining field
With Worksheets(n.Text).Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(i, "G").Value, LookIn:=xlValues)
If result Then
combFields.Add result.Address, n.Text
End If
End With
i = i + 1
Next n
' Find column with combining field in Peoples
combFieldPeople = combFields.Item("peoples")
' Find column with combining field in Companies
combFieldCompany = combFields.Item("companies")
' Find company names field in "companies"
With Worksheets("companies").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(3, "I").Value, LookIn:=xlValues)
If result Then
companyNameField = result.Address
End If
End With
' Find column with "CopyToExcel" checkbox for Peolles
With Worksheets("peoples").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(2, "H").Value, LookIn:=xlValues)
If result Then
copyUserField = result.Address
End If
End With
' Find column with "CopyToExcel" checkbox for "Companies"
With Worksheets("companies").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(3, "H").Value, LookIn:=xlValues)
If result Then
copyField = result.Address
End If
End With
' Remove unnecessary organizations
startBook.Activate
With Worksheets("companies")
.Activate
.AutoFilterMode = False
fldNum = .Range(copyField).Column
.UsedRange.AutoFilter Field:=fldNum, Criteria1:="Y"
ActiveCell.CurrentRegion.Select ' copy unique values
nRow = Selection.Rows.Count
Selection.Copy
'.UsedRange.AutoFilter
Worksheets.Add.Name = "tmp1"
ActiveSheet.Range("A1").Select
ActiveSheet.Paste
Worksheets("companies").Delete
Worksheets("tmp1").Name = "companies"
End With
Worksheets("START").Activate
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
Function LoadCSV(fName As String, shName As String)
ActiveWorkbook.Worksheets(shName).Activate
iPath = ThisWorkbook.Path
fullFileName = iPath & "\" & fName
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" + fullFileName, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = fullFileName
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 65001
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
'.TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
' 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 _
' , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
' 1, 1, 1, 1, 1)
.TextFileColumnDataTypes = Array(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Function
It works fine with Hebrew and Zoom/Priority. MS Office 2010/2013/2016(32/64)