Choose random column value between rows with same ID

Choose random column value between rows with same ID - informatica

I'm a beginner with SQL and I have to do a saved query in Informatica Cloud for a connection with a SQL Server database.
I have a table where the rows with the same formId have the same columns except "possibleSalesman", which is a text column:
formId, email, possibleSalesman
1, email1, user1
1, email1, user2
1, email1, user4
2, email2, user2
3, email3, user3
3, email3, user1
What I need, is to get one row for each id and pick "possibleSalesman" randomly.
For example:
1, email1, user4
2, email2, user2
3, email3, user3
I found kind of a similar question but the solution wouldn't help me because there are a few restrictions in Informatica:
Only SELECT statements
Can't use asterisk to select all columns
Can't use conversion functions
Can't use COUNT function
If someone could help me I would be very grateful!

SELECT
FormId
,Email
,possibleSalesMan
FROM
(
SELECT
FormId
,Email
,possibleSalesMan
,ROW_NUMBER() OVER (PARTITION BY FormId ORDER BY NEWID()) AS RowNumber
FROM
TableName) t
WHERE
t.RowNumber = 1
In SQL Server 2008+ you can use the ROW_NUMBER() window function and NEWID() to achieve a random order and then select the result where ROW_NUMBER() = 1.

Related

Loop through a ColdFusion query and sorting results

I have two tables:
Users (2 columns): ID, DisplayName, Active
Ticket_Followups (4 colums): id, requested_by, requested_date, ticket_id
I am tryiwng to group all the similar records in the ticket_followup table, first by recordcount and then by displayName.
Here is what I have so far:
<cfquery name="active_users" datasource="#datasource#">
select * from users
where active='1'
</cfquery>
<cfloop query="active_users">
<cfquery name="get_followups" datasource="#datasource#">
select date_of_followup_request, requested_by, ticket_id
from ticket_followup
where requested_by = '#active_users.displayName#'
</cfquery>
<cfoutput>
<tr>
<td>#active_users.displayName#</td>
<td>#get_followups.recordcount#</td>
</tr>
</cfoutput>
</cfloop>
I am able to successfully show the output for the total records by user, but there is no order to the output. I would like to group it so that it shows the DisplayName with the highest recordcount first, descending in order.
How can I do that?

This is a SQL issue, CF is just displaying data after the data is gathered.
You need to do this in one query.
You need to associate the ticket follow ups by user ID, not by name (Name could change, but not the ID).
There's a table of tickets I assume, but we'll stick to your two tables.
First, the tables:
Users
----------
id
DisplayName
Active
Ticket_Followups
----------
id
requested_by_id (Users.id)
requested_date
ticket_id
You can technically join by name, but it's a much slower query and I've no idea how much data you have.
This query joins the two tables and gives you a count of ticket follow ups by user. You can add an ORDER BY statement before the GROUP BY depending on your needs.
SELECT
a.DisplayName
, count(*) AS requested_count
FROM
Users AS a
INNER JOIN
Ticket_Followups b ON b.requested_by_id = a.id
WHERE
a.active = 1
GROUP BY
a.id
If you don't do this in one query, then for every user that has an active ticket, you're making another query.
10 users, 11 queries
20 users, 21 queries
etc.
Updated 2022-02-15
Query using DisplayName with an ORDER BY clause. This should make it clearer that you're counting the tickets per user and not the number of users.
SELECT
a.DisplayName
, count(a.*) AS ticket_count
FROM
Ticket_Followups AS a
INNER JOIN
Users AS b ON b.DisplayName = a.DisplayName
WHERE
a.active = 1
ORDER BY
a.DisplayName DESC
GROUP BY
a.DisplayName
Output:
<cfoutput query="queryName">
<li>#queryName.DisplayName# - #queryName.ticket_count#</li>
</cfoutput>

i am looking to get the date diff from two or more row in a way that first rows serviceto date - second rows service start date so that i can get diff

my data looks like this
userid
completedat
serviceperiodfrom
serviceperiodto
00002cd9-94eb-4c06-a2c4-75253fd541b9
2020-11-25T14:20:04.293Z
2020-11-25T14:20:04.200Z
2021-02-25T14:20:04.200Z
00002cd9-94eb-4c06-a2c4-75253fd541b9
2021-03-21T10:27:34.842Z
2021-03-21T10:27:34.800Z
2022-03-21T10:27:34.800Z
00002cd9-94eb-4c06-a2c4-75253fd541b9
2020-07-24T11:22:12.410Z
2020-07-24T11:22:12.300Z
2020-10-24T11:22:12.300Z
I need the date diff from serviceperiodto date of first row - serviceperiodfrom date of secondrow and it goes for as many iteration as it has these details for each userid
please help me i tried joining the tables using subqueries tried to create a pivot table but none of them seem working for me please help

You can use lag/lead to access previous/next item:
WITH dataset
AS (SELECT *
FROM
(
VALUES
(1, from_iso8601_timestamp('2020-11-25T14:20:04.200Z'), from_iso8601_timestamp('2021-02-25T14:20:04.200Z')),
(1, from_iso8601_timestamp('2021-03-21T10:27:34.800Z'), from_iso8601_timestamp('2022-03-21T10:27:34.800Z')),
(1, from_iso8601_timestamp('2020-07-24T11:22:12.300Z'), from_iso8601_timestamp('2020-10-24T11:22:12.300Z'))
) AS t (userid, serviceperiodfrom, serviceperiodto)
)
SELECT date_diff(
'hour',
serviceperiodto,
lead(serviceperiodfrom, 1) OVER (PARTITION BY userid ORDER BY serviceperiodfrom))
FROM dataset
Output:
_col0
770
572

Merge Query Matching on Dates in Multiple Rows

I'm trying to merge 2 queries in Power BI Desktop, matching rows based off a user and date column in one query to a row in the other query, where the user matches and the date in the 2nd query is the closest one before the date in the 1st query.
In other scenarios I need to match on more than one column, I'll usually create a composite key to match, but here's it's not a direct match.
Examples of the 2 queries are:
QUERY1
User Activity Activity Date
User 1 Activity 1 2019-01-24
User 1 Activity 2 2019-03-03
User 1 Activity 3 2019-04-17
QUERY2
User Status Status Change Date
User 1 Status 1 2019-02-05
User 1 Status 2 2019-03-06
User 1 Status 3 2019-04-05
And the merged query I'm looking for is:
MERGED QUERY
User Activity Activity Date Status
User 1 Activity 1 2019-01-24
User 1 Activity 2 2019-03-03 Status 1
User 1 Activity 3 2019-04-17 Status 3
Both queries are sourced from a REST API. If it was a SQL source, I'd use a SQL query to create a derived island table of start and stop dates based on Query2 and do a BETWEEN join against Query1 and have that be the source for Power BI.
Within the Power Query Editor, how would I get to the merged query result?

First, you want to do as you suggested and modify the status table to have start and stop dates instead of Status Change Date. You can do this by sorting, indexing, and self-merging as I've previously explained here and here.
Once you have that, you can load a copy of the status table in each row and use the User and Date columns to filter the table and finally return a single value for Status.
let
Source = <Query1 Source>
#"Added Custom" =
Table.AddColumn(Source, "Status",
(C) => List.First(
Table.SelectRows(Status,
each [User] = C[User] and
[Start] < C[Date] and
([Stop] = null or C[Date] <= [Stop])
)[Status]
),
type text)
in
#"Added Custom"
This says we take the Status table and filter it so that based on the current row the User matches and the Date is between Start and Stop. From that filtered table, we select the Status column, which is a list data type, so we pick the first element of the list to get the text value of the only member of the list.

How to add current session time to each event in BigQuery?

I've got some data that looks similar to this:
I want to add a column that contains the start time of the session that each event occurred in so that the output looks something like this:
The session_start_time column is based on the session_start event.
I've tried using partitions in analytic functions but to do so I need values that are the same in each row to start with and if I had that I would have solved my problem.
I've also tried FIRST_VALUE with a window function but I haven't managed to pull only the events where the event_name is "session_start" because I can't see a way to filter inside window functions.
How can I achieve this using Standard SQL on BigQuery?
Below is a sample query that includes the sample data:
WITH user_events AS (
SELECT
1 AS user_id,
'session_start' AS event_name,
0 AS event_time
UNION ALL SELECT 1, 'video_play', 2
UNION ALL SELECT 1, 'ecommerce_purchase', 3
UNION ALL SELECT 1, 'session_start', 100
UNION ALL SELECT 1, 'video_play', 105
)
SELECT
user_id,
event_name,
event_time
FROM
user_events
ORDER BY
event_time

#standardSQL
WITH user_events AS (
SELECT 1 AS user_id, 'session_start' AS event_name, 0 AS event_time UNION ALL
SELECT 1, 'video_play', 2 UNION ALL
SELECT 1, 'ecommerce_purchase', 3 UNION ALL
SELECT 1, 'session_start', 100 UNION ALL
SELECT 1, 'video_play', 105
)
SELECT
user_id,
event_name,
event_time,
MIN(event_time) OVER(PARTITION BY user_id, session) AS session_start_time
FROM (
SELECT
user_id,
event_name,
event_time,
COUNTIF(event_name='session_start') OVER(PARTITION BY user_id ORDER BY event_time) AS session
FROM user_events
)
ORDER BY event_time

How to order a query by values in different columns

I have a recordset named rsProductClass that is returned from a table in the database. It is a very simple SELECT * FROM Table WHERE ProductID = {ID Value Here} and the table is like this:
ProductID | UPPERTIER | LOWERTIER | NATIER | OTHERTIER
1 20 60 10 10
2 10 90 NULL NULL
3 NULL 40 NULL 5
The table may or may not have a value for each of the various tiers.
What I want to do is show to the user which column has the highest value and what the name of that column is. So for example, if you were looking at ProductID 2, then the page should display "This is likely to be a LOWERTIER product"
I need to sort the rsProductClass query in such a way that it returns me a list of columns in that query ordered by the value in each column. I want to treat the NULL values as zeros.
I tried to mess about with doing valuelist() and some ArrayToList() type functions but it crashes on the NULL values. Say I add columns to an array, and then use ArraySort() to get them in some kind of order, I'll get an error saying something like "Position 1 is not numeric" because it has a NULL value.
Is this something that can be done by ColdFusion? I suppose its some sort of pivoting of the recordset which is beyond my ability.

Something like this would work:
<cfquery name="tiers" datasource="...">
SELECT ProductID, UPPERTIER VALUE, 'UPPERTIER' TIER
WHERE UPPERTIER IS NOT NULL
UNION
SELECT ProductID, LOWERTIER VALUE, 'LOWERTIER' TIER
WHERE LOWERTIER IS NOT NULL
UNION
SELECT ProductID, OTHERTIER VALUE, 'OTHERTIER' TIER
WHERE OTHERTIER IS NOT NULL
UNION
SELECT ProductID, NATIER VALUE, 'NATIER' TIER
WHERE NATIER IS NOT NULL
ORDER BY ProductID, VALUE
</cfquery>
<cfset productGroup = StructNew()>
<cfoutput query="tiers" group="ProductID">
<cfset productGroup[ProductID].TIER = TIER>
<cfset productGroup[ProductID].VALUE = VALUE>
</cfoutput>
<cfdump var="#productGroup#">
Starting with ColdFusion 10 you can use <cfloop query="..." group="...">, before that <cfoutput> must be used.

If you're willing to unpivot your query, you might do something like the following. I used COALESCE() instead of ISNULL() (either one works in this situation, but COALESCE() is the ANSI standard). The column tier_rank will give the rank of the given tier -- that is, the tier with the highest value will have a rank of 1. If there are two tiers that both have the highest value, then both will have a value in tier_rank of 1 (this is why you would use RANK() instead of ROW_NUMBER() -- you could also use DENSE_RANK() if it better fits your requirements):
SELECT p1.product_id, p1.tier_name, p1.tier_value
, RANK() OVER ( PARTITION BY p1.product_id ORDER BY p1.tier_value DESC ) tier_rank
FROM (
SELECT product_id, 'UPPERTIER' AS tier_name
, COALESCE(uppertier, 0) AS tier_value
FROM products
UNION ALL
SELECT product_id, 'LOWERTIER' AS tier_name
, COALESCE(lowertier, 0) AS tier_value
FROM products
UNION ALL
SELECT product_id, 'NATIER' AS tier_name
, COALESCE(natier, 0) AS tier_value
FROM products
UNION ALL
SELECT product_id, 'OTHERTIER' AS tier_name
, COALESCE(othertier, 0) AS tier_value
FROM products
) p1
Please see SQL Fiddle demo here.
It might be possible to re-pivot the above unpivoted query, but I must admit my attempts at doing so failed.

I had to do something similar to this recently and looked into UNPIVOT in SQL Server. Going with the suggestion to Unpivot your query like David said, you could do something like this. This doesn't add RANK column, but it does order the values.
SELECT ProductID, Tier, TierValue
FROM
(SELECT ProductID, ISNULL(UpperTier,0) UpperTier, ISNULL(LowerTier,0) LowerTier, ISNULL(NaTier,0) NaTier, ISNULL(OtherTier,0) OtherTier
FROM products) p
UNPIVOT
(TierValue FOR Tier IN
(UpperTier, LowerTier, NaTier, OtherTier)
)AS unpvt
ORDER BY ProductID, TierValue Desc
SQL FIDDLE

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Choose random column value between rows with same ID - informatica

Related

Loop through a ColdFusion query and sorting results

i am looking to get the date diff from two or more row in a way that first rows serviceto date - second rows service start date so that i can get diff

Merge Query Matching on Dates in Multiple Rows

How to add current session time to each event in BigQuery?

How to order a query by values in different columns

Categories

Resources