Importing text file to Excel with VBA - multiple strings with same delimiter - regex

I'm trying to import a number of text files into Excel using the VBA code below. Whilst the code produces a list of the Transaction Sales Numbers with corresponding date for each file imported, I can't work out how to get the associated Transaction Sales Numbers into seperate columns in each imported file row. I have tried RegEx but struggled with the differing formats of the Sales Numbers (an example of each is in the sample file)... Can anyone help?
Many thanks in advance
Sample text file:
This is sales enquiry response for SER:SS09458GQPBXX201503191300WWPL0933 *********************************************************** Sales record match For SER:SS09458GQPBXX201503191300WWPL0933 **********************Original File********************** File Data Source POS Type of Transaction EFT Date Mar 19 2015 12:00PM Transaction Sales Number LLRUMOLN120150319FLRPLIS08783 Product Name HAIRDRYER ***************Sales File # 1*************** File Data Source POS Type of Transaction EFT Date Apr 23 2015 12:00PM Transaction Sales Number PLVOLMJBD0960807420300 Product Name HAIRDRYER ***************Sales File # 2*************** File Data Source POS Type of Transaction EFT Date May 28 2015 12:00PM Transaction Sales Number 781266HO3 Product Name HAIRDRYER ***************Sales File # 3*************** File Data Source POS Type of Transaction EFT Date May 10 2015 12:00PM Transaction Sales Number CVFORM05061126581000433 Product Name HAIRDRYER ***************Sales File # 4*************** File Data Source POS Type of Transaction EFT Date Jun 28 2015 12:07PM Transaction Sales Number LLB01L32330772427059291FOLM400P00295 Product Name HAIRDRYER
Option Explicit
Sub Sales_File_Extractor()
Dim fName As String, fPath As String, fPathDone As String
Dim LR As Long, NR As Long
Dim wbData As Workbook, wsMaster As Worksheet
Dim TSN_Start As String, TSN_End As String
Dim Date_Start As String, Date_End As String
Dim textline As String, text As String
'Setup
Application.ScreenUpdating = False 'speed up macro execution
Application.EnableEvents = False 'turn off other macros for now
Application.DisplayAlerts = False 'turn off system messages for now
Set wsMaster = ThisWorkbook.Sheets("SALES") 'sheet report is built into
With wsMaster
NR = .Range("A" & .Rows.Count).End(xlUp).Row + 1 'appends data to existing data
'Path and filename (edit this section to suit)
fPath = "C:\Users\burnsr\desktop\sales"
fPathDone = fPath & "Imported\" 'remember final \ in this string
On Error Resume Next
MkDir fPathDone 'creates the completed folder if missing
On Error GoTo 0
fName = Dir(fPath & "*.txt*") 'listing of desired files, edit filter as desired
Do While Len(fName) > 0
Open (fPath & fName) For Input As #1
Do Until EOF(1)
Line Input #1, textline
text = text & textline 'second loop text is already stored -> see reset text
Loop
Close #1
On Error Resume Next
.Cells(NR, "A").Value = fName
Date_Start = InStr(text, "Date ") 'position of start delimiter
Date_End = InStr(text, "Transaction Sales Number") 'position of end delimiter
.Cells(NR, "C").Value = Mid(text, Date_Start + 34, Date_End - Date_Start - 34) 'position number is length of start string
TSN_Start = InStr(text, "Transaction Sales Number ") 'position of start delimiter
TSN_End = InStr(text, "Product Name") 'position of end delimiter
.Cells(NR, "B").Value = Mid(text, TSN_Start + 34, TSN_End - TSN_Start - 34) 'position number is length of start string
'How to get all other successive values in columns?
text = "" 'reset text
Close #1 'close file
NR = .Range("A" & .Rows.Count).End(xlUp).Row + 1 'next row
Name fPath & fName As fPathDone & fName 'move file to IMPORTED folder
fName = Dir 'ready next filename
Loop
End With
ErrorExit: 'Cleanup
Application.DisplayAlerts = True 'turn system alerts back on
Application.EnableEvents = True 'turn other macros back on
Application.ScreenUpdating = True 'refreshes the screen
MsgBox "Import completed"

Rabbie, I have an XLSM file that reads 6 CSV files and adds 6 sheets to inside itself. Text are TAB delimited.
UTF-8 CSV Headers Example:
Customer Number Customer description Cust. Name-Lang 2 Status Phone Number Fax Number E-mail Address Type of Business Cust. Group Code
VBA:
Function IsOpen(File$) As Boolean
Dim FN%
FN = FreeFile
On Error Resume Next
Open File For Random Access Read Write Lock Read Write As #FN
Close #FN
IsOpen = Err
End Function
Public Sub Load_Data()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
allName = Worksheets("START").Cells(6, "B").Value
tmpltName = Worksheets("START").Cells(4, "B").Value
savePath = Worksheets("START").Cells(3, "B").Value
Set currBook = ActiveWorkbook
Set prevsheet = ActiveSheet
'Load all ZOOM files
i = 2
For Each n In Worksheets("START").Range("E2:E8")
On Error Resume Next
currBook.Sheets(n.Text).Select
If Not Err Then
Err.Clear
currBook.Worksheets(n.Text).Delete
End If
Sheets.Add(Before:=Sheets("START")).Name = n.Text
' Checking if file is opened
If Not IsOpen(Worksheets("START").Cells(i, "F").Value) Then
' Loadd CSV file
LoadCSV Worksheets("START").Cells(i, "F").Value, n.Text
End If
' List of combining fields
' Find column with combining field
With Worksheets(n.Text).Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(i, "G").Value, LookIn:=xlValues)
If result Then
combFields.Add result.Address, n.Text
End If
End With
i = i + 1
Next n
' Find column with combining field in Peoples
combFieldPeople = combFields.Item("peoples")
' Find column with combining field in Companies
combFieldCompany = combFields.Item("companies")
' Find company names field in "companies"
With Worksheets("companies").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(3, "I").Value, LookIn:=xlValues)
If result Then
companyNameField = result.Address
End If
End With
' Find column with "CopyToExcel" checkbox for Peolles
With Worksheets("peoples").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(2, "H").Value, LookIn:=xlValues)
If result Then
copyUserField = result.Address
End If
End With
' Find column with "CopyToExcel" checkbox for "Companies"
With Worksheets("companies").Columns("A:DZ")
Set result = .Find(What:=Worksheets("START").Cells(3, "H").Value, LookIn:=xlValues)
If result Then
copyField = result.Address
End If
End With
' Remove unnecessary organizations
startBook.Activate
With Worksheets("companies")
.Activate
.AutoFilterMode = False
fldNum = .Range(copyField).Column
.UsedRange.AutoFilter Field:=fldNum, Criteria1:="Y"
ActiveCell.CurrentRegion.Select ' copy unique values
nRow = Selection.Rows.Count
Selection.Copy
'.UsedRange.AutoFilter
Worksheets.Add.Name = "tmp1"
ActiveSheet.Range("A1").Select
ActiveSheet.Paste
Worksheets("companies").Delete
Worksheets("tmp1").Name = "companies"
End With
Worksheets("START").Activate
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
Function LoadCSV(fName As String, shName As String)
ActiveWorkbook.Worksheets(shName).Activate
iPath = ThisWorkbook.Path
fullFileName = iPath & "\" & fName
With ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;" + fullFileName, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = fullFileName
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 65001
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
'.TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
' 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 _
' , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _
' 1, 1, 1, 1, 1)
.TextFileColumnDataTypes = Array(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, _
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Function
It works fine with Hebrew and Zoom/Priority. MS Office 2010/2013/2016(32/64)

Related

Django filter - is DateTimeField filled

to my model I added a simply DateTimeField:
expired = models.DateTimeField(default=None)
. The value of the field can be either None or a Datetime.
I'd like to filter for objects where the expired is filled with any datum, however I'm struggling to find the right filter.
I think I tried all the combinations of filter / exclude and expired__isnull=True / expired=None, but I never get back the exact number.
What's the right way to filter if the field has a DateTime in it, or not?
Django: 1.11.16
Thanks.
In my model there're 2122 lines:
Counter(Obj.objects.filter().values_list('expired'))
Counter({(datetime.datetime(2021, 9, 24, 1, 6, 50),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 51),): 1,
(datetime.datetime(2021, 9, 24, 1, 6, 32),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 3),): 1,
(datetime.datetime(2021, 9, 24, 1, 12, 44),): 1,
(datetime.datetime(2021, 12, 4, 1, 31, 25),): 1,
(datetime.datetime(2021, 12, 4, 1, 37, 49),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 55),): 1,
(None,): 2087,
(datetime.datetime(2021, 12, 4, 1, 37, 52),): 1,
(datetime.datetime(2021, 12, 4, 1, 2, 8),): 4,
(datetime.datetime(2021, 12, 4, 1, 5, 14),): 9,
(datetime.datetime(2021, 9, 28, 0, 43, 51),): 1,
(datetime.datetime(2021, 12, 4, 1, 0, 13),): 7,
(datetime.datetime(2021, 12, 4, 1, 9, 59),): 2,
(datetime.datetime(2021, 12, 3, 17, 25, 46),): 1,
(datetime.datetime(2021, 12, 4, 1, 9, 54),): 1,
(datetime.datetime(2021, 9, 24, 1, 14, 30),): 1})
.
Obj.objects.filter(expired__isnull=False).count()
returns all the lines (2122) ... .
Obj.objects.filter(expired=None).count() returns 2087 lines instead of the 35 expected.
Obj.objects.exclude(expired=None).count() returns 2122, so all the lines.
The query is good, the problem is in the model definition. It should be blank=True and null=True.
try changing the field in Model
expired = models.DateTimeField(
auto_now=False,
null=True,
blank=True
)

Regexp split key values into columns in Bigquery

My column data looks like
"dayparts": [{"day": "Saturday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Sunday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}, {"day": "Thursday", "hours": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}]
I would like to have the result like
You can try:
WITH sample AS (
SELECT "1" AS id, "{\"dayparts\":[{\"day\":\"Saturday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Sunday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]},{\"day\":\"Thursday\",\"hours\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13]}]}" AS msg
)
SELECT id,
JSON_VALUE(dparts, '$.day') AS day,
JSON_QUERY(dparts, '$.hours') AS hours
FROM (
SELECT id,
JSON_EXTRACT_ARRAY(JSON_QUERY(msg, '$.dayparts')) AS dayparts
FROM sample) t, UNNEST(t.dayparts) dparts
(I added enclosing "{" and "}" to be able to perform JSON operations, if they are not there just concatenate them I guess)
(You can also add "JSON_EXTRACT_ARRAY" around "JSON_QUERY(dparts, '$.hours')" if you wish an actual array in the result table)

How to select rows by a column value in D with mir.ndslice?

I am browsing through mir.ndslice docs trying to figure out how to do a simple row selection by column.
In numpy I would do:
a = np.random.randint(0, 20, [4, 6])
# array([[ 8, 5, 4, 18, 1, 4],
# [ 2, 18, 15, 7, 18, 19],
# [16, 5, 4, 6, 11, 11],
# [15, 1, 14, 6, 1, 4]])
a[a[:,2] > 10] # select rows where the second column value is > 10
# array([[ 2, 18, 15, 7, 18, 19],
# [15, 1, 14, 6, 1, 4]])
Using mir library I naively tried:
import std.range;
import std.random;
import mir.ndslice;
auto a = generate!(() => uniform(0, 20)).take(24).array.sliced(4,6);
// [[12, 19, 3, 10, 19, 11],
// [19, 0, 0, 13, 9, 1],
// [ 0, 0, 4, 13, 1, 2],
// [ 6, 19, 14, 18, 14, 18]]
a[a[0..$,2] > 10];
But got
Error: incompatible types for `((ulong __dollar = a.length();) , a.opIndex(a.opSlice(0LU, __dollar), 2)) > (10)`: `Slice!(int*, 1LU, cast(mir_slice_kind)0)` and `int`
dmd failed with exit code 1.
So, I went through the docs and couldn't find anything that would look like np.where or similar. Is it even possible in mir?

how to store many large multidimensional arrays?

I have a deep learning model which generates an output multidimensional array of size 2x2x4096.
Then there are 40,000 of such outputs for each input image.
How to do this in python?
Hdf5 format seems to be a interesting direction.
Can someone point me to the right direction?
I would recommend to use HDF5 with PyTables. Putting an array into a file is as easy as this:
import numpy as np
import tables
a = np.arange(100)
h5_file = tables.open_file('my_array.h5', mode='w', titel='many large arrays')
h5_file.create_array('/', 'my_array', a)
h5_file.close()
An example with 10 multi dimensional arrays:
import numpy as np
import tables
my_arrays = [np.ones((2, 2, 4098)) for x in range(10)]
h5_file = tables.open_file('my_array.h5', mode='w', titel='many large arrays')
for n, arr in enumerate(my_arrays):
h5_file.create_array('/', 'my_array{}'.format(n), arr)
h5_file.close()
Having a look at the file structure with h5ls:
h5ls my_array.h5
my_array0 Dataset {2, 2, 4098}
my_array1 Dataset {2, 2, 4098}
my_array2 Dataset {2, 2, 4098}
my_array3 Dataset {2, 2, 4098}
my_array4 Dataset {2, 2, 4098}
my_array5 Dataset {2, 2, 4098}
my_array6 Dataset {2, 2, 4098}
my_array7 Dataset {2, 2, 4098}
my_array8 Dataset {2, 2, 4098}
my_array9 Dataset {2, 2, 4098}
Reading the data back is easy.
Reading all:
import tables
h5_file = tables.open_file('my_arrays.h5', mode='r')
for node in h5_file:
print(node)
Output:
/ (RootGroup) ''
/my_array0 (Array(2, 2, 4098)) ''
/my_array1 (Array(2, 2, 4098)) ''
/my_array2 (Array(2, 2, 4098)) ''
/my_array3 (Array(2, 2, 4098)) ''
/my_array4 (Array(2, 2, 4098)) ''
/my_array5 (Array(2, 2, 4098)) ''
/my_array6 (Array(2, 2, 4098)) ''
/my_array7 (Array(2, 2, 4098)) ''
/my_array8 (Array(2, 2, 4098)) ''
/my_array9 (Array(2, 2, 4098)) ''
or just one by name:
print(h5_file.root.my_array0)
Output:
/my_array0 (Array(2, 2, 4098)) ''

Pandas, Update df1 rows from df2

I have df1:
id, colA, colB, colC, name
1, 1, 2, 3, a
2, 2, 3, 4, a
3, 3, 4, 5, b
4, 4, 5, 6, b
and df2:
id, colA, colB, colD, name
2, 10, 20, D1, a
3, 20, 30, D2, a
Is there a way, perhaps using merge or join to replace the rows in df with df2 matching id and name
So the result would look like:
id, colA, colB, colC, name, colD
1, 1, 2, 3, a, N/A
2, 10, 20, N/A, a, D1
3, 3, 4, 5, b, N/A
4, 4, 5, 6, b, N?A
I was thinking something like: df1.loc[df1.Locident.isin(df2.Locident)] = df2 but that only matches on one column.
You could:
df = pd.concat([df1, df2]).drop_duplicates(subset=['id', 'name'], keep='last').drop_duplicates(subset='id')
to combine both DataFrames and keep duplicate ids and names that stem from df2, and get rid of ids from df2 that you do not want to keep.