Connectivity of Neo4j with Hive Table - mapreduce

Hi I have a hive table like below:
EmpName | ID | dept | sal
A1 01 IT 100
B1 02 IT 200
C1 03 CS 500
I want to create neo4j nodes directly from hive table..Is it possible....
Appreciate any ideas in advance...

Check out our CSV import page:
http://neo4j.com/developer/guide-import-csv/

Related

Calculate YTD , MTD, WTD, Transaction Count in Power BI

I have 1 table with multiple rows. It looks something like this:
------------------------------------------------
StoreId| PostingDate | SalesAmt
MAIN | 2021-02-04 | 100
WEST | 2021-08-11 | 15
WEST | 2021-09-11 | 36
MAIN | 2021-11-11 | 78
MAIN | 2021-04-11 | 56
------------------------------------------------
And soon and so forth...
Now I want to produce the following in the Power BI as Table:
--------------------------------------------
StoreId| YTD | MTD | WTD | TransactionCount |
WEST |5,447| 800 | 74 | 1,475 |
MAIN |4,500| 421 | 15 | 1,855 |
--------------------------------------------
How can I achieve that? I am very new to this so I don't know how to do it.
I have been reading DAX and Power Query but maybe DAX is suitable for this?
I assumed your data looks like this. I've added data for the year 2022.
Also, I'm assuming there aren't future dates, the observations behave as transactions that happened in the past.
Table
StoreID
PostingDate
SalesAmt
WEST
16/01/2021
141
MAIN
24/01/2021
221
WEST
25/01/2021
119
MAIN
18/04/2021
209
MAIN
22/04/2021
220
MAIN
24/04/2021
167
WEST
16/11/2021
224
WEST
03/02/2022
155
MAIN
07/02/2022
236
WEST
11/02/2022
216
WEST
23/03/2022
135
MAIN
28/05/2022
153
WEST
01/06/2022
121
Calendar Table
For the calculations below to work, you need to create a calendar table.
It goes from the first date of Table until today.
If your calendar table is different the time intelligence function will not work.
Calendar = CALENDAR(MIN('Table'[PostingDate]),TODAY())
And mark the Calendar table as a Date Table.
Sales Amount
Sales Amount = sum('Table'[SalesAmt])
WTD
Assumes your week starts on Monday.
WTD =
VAR WeekStart = TODAY() - WEEKDAY(today(),2)
RETURN
CALCULATE([Sales Amount],'Table'[PostingDate]>=WeekStart)
MTD:
MTD =
TOTALMTD([Sales Amount],'Calendar'[Date])
YTD
YTD =
TOTALYTD([Sales Amount],'Calendar'[Date])

Convert Time Stamp while Creating Table in Amazon Athena

I have been using the below query to create a table within Athena,
CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
`converteddate` string,
`userid` string,
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3:XXXX'
TBLPROPERTIES ('has_encrypted_data'='false',"skip.header.line.count"="1")
This returns me:
converteddate | userid
-------------------------------------
2017-11-29T05:00:00 | 00001
2017-11-27T04:00:00 | 00002
2017-11-26T03:00:00 | 00003
2017-11-25T02:00:00 | 00004
2017-11-24T01:00:00 | 00005
I would like to return:
converteddate | userid
-------------------------------------
2017-11-29 05:00:00 | 00001
2017-11-27 04:00:00 | 00002
2017-11-26 03:00:00 | 00003
2017-11-25 02:00:00 | 00004
2017-11-24 01:00:00 | 00005
and have converteddate as a datetime and not a string.
It is not possible to convert the data while table creation. But you can get the data while querying.
You can use date_parse(string,format) -> timestamp function. More details are mentioned here.
For your usecase you can do something like as follows
select date_parse(converteddate, '%y-%m-%dT%H:%i:%s') as converted_timestamp, userid
from test_table
Note : Based on type of your string you have to choose proper specifier for month(always two digits or not), day, hour(12 or 24 hours format), etc
(My answer has one premise: you are using OpenCSVSerDe. It doesn't apply to LazySimpleSerDe, for instance.)
If you have the option of changing the format of your input CSV file, you should convert your timestamp to UNIX Epoch Time. That's the format that OpenCSVSerDe is expecting.
For instance, your sample CSV looks like this:
"converteddate","userid"
"2017-11-29T05:00:00","00001"
"2017-11-27T04:00:00","00002"
"2017-11-26T03:00:00","00003"
"2017-11-25T02:00:00","00004"
"2017-11-24T01:00:00","00005"
It should be:
"converteddate","userid"
"1511931600000","00001"
"1511755200000","00002"
"1511665200000","00003"
"1511575200000","00004"
"1511485200000","00005"
Those integers are the number of milliseconds since Midnight January 1, 1970 for each one of your original dates.
Then you can run a slightly modified version of your CREATE TABLE statement:
CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
converteddate timestamp,
userid string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3:XXXX'
TBLPROPERTIES ("skip.header.line.count"="1");
If you query your Athena table with select * from test_table, this will be the result:
converteddate userid
------------------------- --------
2017-11-29 05:00:00.000 00001
2017-11-27 04:00:00.000 00002
2017-11-26 03:00:00.000 00003
2017-11-25 02:00:00.000 00004
2017-11-24 01:00:00.000 00005
As you can see, type TIMESTAMP on Athena includes milliseconds.
I wrote a more comprehensive explanation on using types TIMESTAMP and DATE with OpenCSVSerDe. You can read it here.

Adding foreign key to kdb existing table

I need to add a foreign key to a table that I have imported using a csv
table:("SSSSSSSSSFFFFSSSSSFSSSSSSSSSSSSSSS"; enlist ",") 0:
`:table.csv
I do not want to have to redefine the whole table. is there a way to do this?
q)p:([p:`p1`p2`p3`p4`p5`p6]name:`nut`bolt`screw`screw`cam`cog;color:`red`green`blue`red`blue`red;weight:12 17 17 14 12 19;city:`london`paris`rome`london`paris`london)
q)sp:([]s:`s1`s1`s1`s1`s4`s1`s2`s2`s3`s4`s4`s1;p:`p$`p1`p2`p3`p4`p5`p6`p1`p2`p2`p2`p4`p5;qty:300 200 400 200 100 100 300 400 200 200 300 400)
q)
q)update `p$p from `sp
`sp
q)meta sp
c | t f a
---| -----
s | s
p | s p
qty| j
Defining a foreign key is similar to enumerating/casting and therefore an overload of $ is used.
`sp means that the table is updated in place.

informcatica workflow polling for a file under FTP location

I wanted to develop a workflow which continuously runs, looking for a file.
The source file data is like this :
eno
10
20
30
40
Once the file is received in the FTP location , the workflow should automatically pick the file and load into the target Table.
The output of target table table <EMP_TGT> will be like below
eno | Received
--- | -------
10 | Y
20 | Y
30 | Y
40 | Y
50 |
60 |
70 |
80 |
The condition to load the target table would be [ Update EMP_TGT set Received='Y' where eno='<flat_file_Eno> ]
You could use the EventWait task, but:
You need an exact file name to wait for. Pattern cannot be used.
No problem if you can access the path directly, but I'm not sure if you can monitor ftp location with EventWait.
The other way to implement it would be to have workflow sheduled to run e.g. every 10 minutes and try to fetch the file from ftp.

Is there any DAX expression for calculating SUM of few rows based on an index?

Suppose I have the following data :
MachineNumber | Duration
01 | 234
01 | 200
01 | 150
02 | 320
02 | 120
02 | 100
I want to know a DAX query which can add 234 + 200 + 150 since it belongs to machine 01 and give me the sum.
If you want to see the Machine
like this table
you have do avoid the automatic sum on your MachineNumber
You can also do a transformation in the PowerQuery editor by specifying that your MachineNumber is a string in place of a Number
To find the total duration machine wise, I chose SUM option from the field and machine wise total sum got displayed.