Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 11 hours ago.
Improve this question
I am trying to write some unit tests to ensure that data within our ETL pipeline is as expected. I would have generally expected unit tests to be a standard practice, but currently, at my place of work, the devs are not expected to write any tests (Bad I know).
We are ingesting data from on-prem source tables into Azure. We are then pulling the data from Azure into Databricks and Transforming that data into essentially a new table. The new table has data from various different source tables, which is then transferred across into a SaaS product.
As previously mentioned, there are currently no tests to ensure this SQL query is doing what it should do, but I am trying to change that. However, due to the nature of the project and how it has been developed, I feel like it has made testing at a unit level very difficult.
I have made fairly good progress with writing some tests that check that the source data is correctly being transformed into the new table, but I keep asking myself do I need to do more? It's difficult working in a place where noone seems to care about quality, but I do.
As an example, please see the below query, this is pulling the recently transferred source data and building that into a new table with a number of joins, and in my opinion, is becoming very messy:
DROP TABLE IF EXISTS platform.student;
CREATE OR REPLACE TABLE platform.student
(
STUDENT_ID string,
ULN string,
DOB string,
ETHNICITY string,
SEXID string,
DIFFLEARN1 string,
DIFFLEARN2 string,
DOMICILE string,
PARENTS_ED string,
SOCIO_EC string,
OVERSEAS string,
APPSHIB_ID string,
VLE_ID string,
HUSID string,
USERNAME string,
LAST_NAME string,
FIRST_NAME string,
ADDRESS_LINE_1 string,
ADDRESS_LINE_2 string,
ADDRESS_LINE_3 string,
ADDRESS_LINE_4 string,
POSTCODE string,
PRIMARY_EMAIL_ADDRESS string,
PERSONAL_EMAIL_ADDRESS string,
HOME_PHONE string,
MOBILE_PHONE string,
PHOTO_URL string,
ENTRY_POSTCODE string,
CARELEAVER string,
ADJUSTMENT_PLAN string,
COURSE_REGISTRATION_DATE string,
PROVIDED_AT string
);
insert into platform.student
(
STUDENT_ID,
ULN,
DOB,
ETHNICITY,
SEXID,
DIFFLEARN1,
DIFFLEARN2,
DOMICILE,
PARENTS_ED,
SOCIO_EC,
OVERSEAS,
APPSHIB_ID,
VLE_ID,
HUSID,
USERNAME,
LAST_NAME,
FIRST_NAME,
ADDRESS_LINE_1,
ADDRESS_LINE_2,
ADDRESS_LINE_3,
ADDRESS_LINE_4,
POSTCODE,
PRIMARY_EMAIL_ADDRESS, --University Email
PERSONAL_EMAIL_ADDRESS,
HOME_PHONE,
MOBILE_PHONE,
PHOTO_URL,
ENTRY_POSTCODE,
CARELEAVER,
ADJUSTMENT_PLAN,
COURSE_REGISTRATION_DATE,
PROVIDED_AT
)
select
STUDENT_ID,
ULN,
DOB,
ETHNICITY,
SEXID,
DIFFLEARN1,
DIFFLEARN2,
DOMICILE,
PARENTS_ED,
SOCIO_EC,
OVERSEAS,
APPSHIB_ID,
VLE_ID,
HUSID,
USERNAME,
LAST_NAME,
FIRST_NAME,
ADDRESS_LINE_1,
ADDRESS_LINE_2,
ADDRESS_LINE_3,
ADDRESS_LINE_4,
POSTCODE,
PRIMARY_EMAIL_ADDRESS, --University Email
PERSONAL_EMAIL_ADDRESS,
HOME_PHONE,
MOBILE_PHONE,
PHOTO_URL,
ENTRY_POSTCODE,
CARELEAVER,
ADJUSTMENT_PLAN,
COURSE_REGISTRATION_DATE,
PROVIDED_AT
from
(
SELECT DISTINCT
spriden_id AS STUDENT_ID,
case when skbspin_uln is null then 'NULL' else skbspin_uln end AS ULN,
DATE_FORMAT(spbpers_birth_date, 'yyyy-MM-dd') AS DOB,
stvethn_desc AS ETHNICITY,
'NULL' AS DIFFLEARN1,
'NULL' AS DIFFLEARN2,
'NULL' AS DOMICILE,
'NULL' AS PARENTS_ED,
'NULL' AS SOCIO_EC,
CASE WHEN sgbstdn_resd_code in ('1', '4', 'H') THEN '1'
WHEN sgbstdn_resd_code in ('5', '9', 'F', '7' ) THEN '3'
ELSE '99' END AS OVERSEAS,
CONCAT(syraccs_username, '#salford.ac.uk') AS APPSHIB_ID,
LOWER(syraccs_username) AS VLE_ID,
case when skbspin_husid is null then 'NULL' else skbspin_husid end AS HUSID,
syraccs_username AS USERNAME,
spriden_last_name AS LAST_NAME,
spriden_first_name AS FIRST_NAME,
tt.spraddr_street_line1 AS ADDRESS_LINE_1,
tt.spraddr_street_line2 AS ADDRESS_LINE_2,
tt.spraddr_street_line3 AS ADDRESS_LINE_3,
tt.spraddr_city AS ADDRESS_LINE_4,
tt.spraddr_zip AS POSTCODE,
goremal.goremal_email_address AS PRIMARY_EMAIL_ADDRESS, --University Email
personal_goremal.goremal_email_address AS PERSONAL_EMAIL_ADDRESS,
pr_sprtele.sprtele_intl_access AS HOME_PHONE,
mob_sprtele.sprtele_intl_access AS MOBILE_PHONE,
CONCAT('<URL>', spriden_id, '.png') AS PHOTO_URL,
'NULL' AS CARELEAVER,
PR_Address.spraddr_zip AS ENTRY_POSTCODE,
CASE
WHEN rap.student_id IS NULL THEN 'No'
ELSE 'Yes'
END AS ADJUSTMENT_PLAN,
spbpers_sex AS SEXID,
sfbetrm_ests_date AS COURSE_REGISTRATION_DATE,
'NULL' AS PROVIDED_AT,
row_number() OVER(PARTITION BY spriden_id ORDER BY skrsain_ucas_app_date desc) AS APPLICANT_ORDER_NUMBER
FROM global_temp.spriden
JOIN global_temp.spbpers ON spbpers_pidm = spriden_pidm
LEFT JOIN global_temp.spraddr tt ON spraddr_pidm = spriden_pidm
AND spraddr_atyp_code = 'TT'
AND spraddr_to_date IS NULL
AND spraddr_status_ind IS NULL
JOIN global_temp.skbspin ON skbspin_pidm = spriden_pidm
JOIN global_temp.skrsain ain1 ON ain1.skrsain_pidm = spriden_pidm
LEFT JOIN
(select distinct spraddr_pidm,spraddr_zip
from
global_temp.spraddr s1
join global_temp.skrsain on spraddr_pidm=skrsain_pidm
where
spraddr_atyp_code='PR'
and
spraddr_to_date is null
and
spraddr_seqno =(select max(spraddr_seqno)
from global_temp.spraddr s2
where s2.spraddr_pidm = s1.spraddr_pidm
and s2.SPRADDR_ATYP_CODE ='PR')) PR_Address
ON PR_Address.spraddr_pidm = spriden_pidm
JOIN(
SELECT MAX (ain2.skrsain_activity_date) AS skrsain_activity_date, ain2.skrsain_pidm
FROM global_temp.skrsain ain2
GROUP BY ain2.skrsain_pidm) ain3
ON ain3.skrsain_pidm = ain1.skrsain_pidm
AND ain3.skrsain_activity_date = ain1.skrsain_activity_date
JOIN global_temp.syraccs on syraccs_pidm = spriden_pidm
LEFT JOIN global_temp.stvethn on stvethn_code = spbpers_ethn_code
JOIN global_temp.sgbstdn stdn1 on sgbstdn_pidm = spriden_pidm
JOIN global_temp.sfbetrm ON sfbetrm_pidm = stdn1.sgbstdn_pidm AND sfbetrm_term_code = stdn1.sgbstdn_term_code_eff
LEFT JOIN global_temp.V_Maximiser_RAP_Status rap ON rap.student_id = spriden_id
JOIN global_temp.shrdgmr dgmr ON dgmr.shrdgmr_pidm = sgbstdn_pidm
AND dgmr.shrdgmr_program = sgbstdn_program_1
LEFT JOIN (SELECT goremal_pidm, goremal_email_address, goremal_emal_code, goremal_status_ind, goremal_activity_date
FROM global_temp.goremal mal
WHERE goremal_emal_code = 1
AND goremal_status_ind = 'A'
AND goremal_activity_date = (SELECT max(goremal_activity_date)
FROM global_temp.goremal g
WHERE goremal_pidm = mal.goremal_pidm
AND goremal_emal_code = 1
AND goremal_status_ind = 'A')) goremal
ON goremal_pidm = stdn1.sgbstdn_pidm
LEFT JOIN (SELECT goremal_pidm, goremal_email_address, goremal_emal_code, goremal_status_ind, goremal_activity_date
FROM global_temp.goremal mal
WHERE goremal_emal_code = 2
AND goremal_status_ind = 'A'
AND goremal_activity_date = (SELECT max(goremal_activity_date)
FROM global_temp.goremal g
WHERE goremal_pidm = mal.goremal_pidm
AND goremal_emal_code = 2
AND goremal_status_ind = 'A')) personal_goremal
ON personal_goremal.goremal_pidm = stdn1.sgbstdn_pidm
LEFT JOIN (SELECT DISTINCT sprtele_pidm,sprtele_intl_access, sprtele_tele_code, sprtele_status_ind,sprtele_seqno
FROM global_temp.sprtele s
WHERE sprtele_tele_code = 'MOB'
AND sprtele_status_ind IS NULL
AND sprtele_seqno = (SELECT MAX(sprtele_seqno)
FROM global_temp.sprtele s2
WHERE s2.sprtele_pidm = s.sprtele_pidm
AND sprtele_tele_code = 'MOB'
AND sprtele_status_ind IS NULL)) mob_sprtele
ON sprtele_pidm = stdn1.sgbstdn_pidm
LEFT JOIN (SELECT DISTINCT sprtele_pidm,sprtele_intl_access, sprtele_tele_code, sprtele_status_ind,sprtele_seqno
FROM global_temp.sprtele s
WHERE sprtele_tele_code = 'PR'
AND sprtele_status_ind IS NULL
AND sprtele_seqno = (SELECT MAX(sprtele_seqno)
FROM global_temp.sprtele s2
WHERE s2.sprtele_pidm = s.sprtele_pidm
AND sprtele_tele_code = 'PR'
AND sprtele_status_ind IS NULL)) pr_sprtele
ON pr_sprtele.sprtele_pidm = stdn1.sgbstdn_pidm
WHERE spriden_change_ind IS NULL
AND spriden_entity_ind = 'P'
--AND sfbetrm_ests_code IN ('RE', 'RS', 'RP', 'WU','EL')
AND (sfbetrm_ests_code IN ('RE', 'RS', 'RP', 'WU')
OR sfbetrm_ests_code = 'EL' AND substr(sgbstdn_blck_code, -1,1) > 1) -- include EL's on previous years except first year
AND stdn1.sgbstdn_program_1 IN (SELECT DISTINCT skrspri_program
FROM global_temp.skrspri
WHERE nvl(skrspri_frnchact,1) <> 3)
AND stdn1.sgbstdn_term_code_eff = (SELECT MAX(stdn2.sgbstdn_term_code_eff)
FROM global_temp.sgbstdn stdn2
WHERE stdn1.sgbstdn_pidm = stdn2.sgbstdn_pidm)
-- Previous 2 years data only
AND SUBSTR(stdn1.sgbstdn_term_code_eff,1,4) >= (
SELECT DISTINCT SUBSTR(stvterm_code,1,4) - 1 stvterm_code
FROM global_temp.stvterm
WHERE CURRENT_DATE BETWEEN stvterm_start_date AND stvterm_end_date)
AND dgmr.shrdgmr_degs_code <> 'AW'
AND dgmr.shrdgmr_seq_no =
(
SELECT MAX(dgmr2.shrdgmr_seq_no)
FROM global_temp.shrdgmr dgmr2
WHERE dgmr.shrdgmr_pidm = dgmr2.shrdgmr_pidm
AND dgmr.shrdgmr_program = dgmr2.shrdgmr_program)
AND EXISTS -- must have student record in last 2 years
(SELECT sgbstdn_pidm
FROM global_temp.sgbstdn stdn2
WHERE substr(stdn2.sgbstdn_term_code_eff,1,4) >= (
SELECT DISTINCT SUBSTR(stvterm_code,1,4) - 1 stvterm_code
FROM global_temp.stvterm
WHERE CURRENT_DATE BETWEEN stvterm_start_date AND stvterm_end_date)
AND stdn1.sgbstdn_pidm = stdn2.sgbstdn_pidm)
) a
where
APPLICANT_ORDER_NUMBER = 1
Tests, I have written so far (As an example), but these are more data validation checks?
test("Test - Date of Birth - Banner & Extracts Match") {
var dateOfBirth = spark.sql("""
SELECT distinct spriden.SPRIDEN_ID, spriden.SPRIDEN_PIDM, SPBPERS.SPBPERS_BIRTH_DATE, s.DOB
FROM global_temp.spriden spriden
JOIN global_temp.SPBPERS spbpers on spbpers.SPBPERS_PIDM = spriden.spriden_pidm
JOIN global_temp.V_SAL_EA_STUDENT s on s.STUDENT_ID = spriden.spriden_id
""")
dateOfBirth.as[DateOfBirth].collect().foreach(record => {
if (!dataMatches(s"${record.SPBPERS_BIRTH_DATE}", s"${record.DOB}")) {
println("Error between DOB records for student " + s"${record.SPRIDEN_ID}")
} else {
assert(s"${record.SPBPERS_BIRTH_DATE}" === s"${record.DOB}")
}
})
}
Has anyone ever unit tested large SQL queries like this? I have generally always wrote unit tests for units of code, which generally should be written in a way to do a specific job, which can then easily be identified and tested. I am not really looking for specific answers, but just ideally need someone who has done some thing similiar to give me some hints/tips. All of the articles I have found, just do not seem to do what I need them to.
To unit test, this, do I need to Mock out all of the ingested source data? But even if I did that (Which is a huge job) I would not know what the best way to run that is. If we were running that data against methods/functions then I could easily create tests but it will be being run against the SQL query, right?
Some articles I have found:
Scenario1
Databricks Unit Testing
Please get back to me if you have any information, or if you need any more info, please let me know :)
Thanks!
How can I convert this query into symfony 2 doctrine query builder?
SELECT
artist_id,
DATE,
balance,
TYPE
FROM TRANSACTION AS
t1
WHERE
DATE =(
SELECT
MAX(DATE)
FROM TRANSACTION
WHERE
artist_id = t1.artist_id AND
STATUS
IN(
'partial',
'pending',
'deducted',
'accepted'
) AND TYPE NOT LIKE 'payment'
)
GROUP BY
artist_id
ORDER BY
artist_id
I tried the following:
$qb = $this->getEntityManager()->createQueryBuilder()
->select('t.balance','a.id','t.date')
->from('TestMainBundle:Transaction','t')
->Join('t.artist','a')
->where("t.status in ('partial','pending','deducted','accepted')")
->andWhere("t.type NOT LIKE 'payment'")
->groupBy('a.id')
->orderBy('a.id');
return $qb->getQuery()->getResult();
But I am stuck with including the condition of max (date) as well. Any help on this is very much appreciated.
Your Doctrine query will look something like this,
$qb1 = $this->getDoctrine()->getManager()->createQueryBuilder();
$select = $qb1->select('MAX(date) AS max_data')
->from('YourBundle:Transaction', 's')
->where('s.artist_id = :ti_artist_id')
->andWhere('s.status IN (:statuses)')
->andWhere('s.type NOT LIKE :type')
->getQuery();
$qb2 = $this->getDoctrine()->getManager()->createQueryBuilder();
$result = $qb2->select('t.artist_id', 't.date', 't.balance', 't.type')
->from('YourBundle:Transaction', 't');
$result->where($qb2->expr()->eq('t.date', $select->getDQL()))
->setParameter('ti_artist_id', 't.id')
->setParameter('statuses', array('partial','pending','deducted','accepted'))
->setParameter('type', 'payment') //possibly '%payment%'
->orderBy('t.artist_id')
->getQuery()
->getResult();
Cheers!!!
I have a table with 10,000 rows and I want to select the first 1000 rows and then select again and this time, the next set of rows, which is 1001-2001.
I am using the BETWEEN clause in order to select the range of values. I can also increment the values. Here is my code:
count = cursor.execute("select count(*) from casa4").fetchone()[0]
ctr = 1
ctr1 = 1000
str1 = ''
while ctr1 <= count:
sql = "SELECT AccountNo FROM ( \
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum \
FROM casa4 ) seq \
WHERE seq.rownum BETWEEN " + str(ctr) + " AND " + str(ctr1) + ""
ctr = ctr1 + 1
ctr1 = ctr1 + 1000
cursor.execute(sql)
sleep(2) #interval in printing of the rows.
for row in cursor:
str1 = str1 + '|'.join(map(str,row)) + '\n'
print "Records:" + str1 #var in storing the fetched rows from database.
print sql #prints the sql statement(str) and I can see that the var, ctr and ctr1 have incremented correctly. The way I want it.
What I want to achieve is using a messaging queue, RabbitMQ, I will send this rows to another database and I want to speed up the process. Selecting all and sending it to the queue returns an error.
The output of the code is that it returns 1-1000 rows correctly on the 1st but, on the 2nd loop, instead of 1001-2001 rows, it returns 1-2001 rows, 1-3001 and so on.. It always starts on 1.
I was able to recreate your issue with both pyodbc and pypyodbc. I also tried using
WITH seq (AccountNo, rownum) AS
(
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum
FROM casa4
)
SELECT AccountNo FROM seq
WHERE rownum BETWEEN 11 AND 20
When I run that in SSMS I just get rows 11 through 20, but when I run it from Python I get all the rows (starting from 1).
The following code does work using pyodbc. It uses a temporary table named #numbered, and might be helpful in your situation since your process looks like it would do all of its work using the same database connection:
import pyodbc
cnxn = pyodbc.connect("DSN=myDb_SQLEXPRESS")
crsr = cnxn.cursor()
sql = """\
CREATE TABLE #numbered (rownum INT PRIMARY KEY, AccountNo VARCHAR(10))
"""
crsr.execute(sql)
cnxn.commit()
sql = """\
INSERT INTO #numbered (rownum, AccountNo)
SELECT
ROW_NUMBER() OVER (ORDER BY Accountno) AS rownum,
AccountNo
FROM casa4
"""
crsr.execute(sql)
cnxn.commit()
sql = "SELECT AccountNo FROM #numbered WHERE rownum BETWEEN ? AND ? ORDER BY rownum"
batchsize = 1000
ctr = 1
while True:
crsr.execute(sql, [ctr, ctr + batchsize - 1])
rows = crsr.fetchall()
if len(rows) == 0:
break
print("-----")
for row in rows:
print(row)
ctr += batchsize
cnxn.close()
for cleaning up unused IPC-Sources I need a Repository Query for getting Workflow, Session, Mapping and Source/Target of Mapping.I have startet by joining REP_LOAD_SESSIONS and REP_TBL_MAPPING on mapping_id but only a fraction of mappings seem to be present in the joined output.
I can't find the right tables to join to get the job done.
Any help will be greatly appreciated!
I was struggling with the same issue. Here is my query. Hope it helps
SELECT SUBJECT_AREA,SESSIONNAME,MPGANDP MAPPINGNAME,SOURCENAMES,TARGET_NAMES,INSTANCE_NAME,LOOKUPTABLENAME,CASE WHEN OBJECTTYPE='Lookup ' THEN CONNECTION ELSE CNX_NAME END CONNECTIONNAME,USER_NAME
FROM
( SELECT * FROM
(SELECT SUBJECT_AREA,SESSION_ID,MPGANDP, MPNGID,OBJECTTYPE,INSTANCE_NAME,MAX(LOOKUPTABLE) LOOKUPTABLENAME, MAX(CONNECTION) CONNECTION
--,LISTAGG(SQLQUERY, '' ) WITHIN GROUP (ORDER BY SQLQUERY) SQLOVERRIRDE
FROM
(
SELECT CASE WHEN MAPPING_NAME=PARENT_MAPPING_NAME THEN MAPPING_NAME ELSE MAPPING_NAME||','||PARENT_MAPPING_NAME END MPGANDP, B.MAPPING_ID MPNGID,
SUBSTR(WIDGET_TYPE_NAME,1,INSTR(WIDGET_TYPE_NAME,' ')) OBJECTTYPE, INSTANCE_NAME, CASE WHEN UPPER(ATTR_NAME) ='CONNECTION INFORMATION' THEN ATTR_VALUE ELSE NULL END CONNECTION,
ATTR_NAME, ATTR_VALUE,SUBJECT_AREA, --A.*,B.*,C.*
--CASE WHEN ATTR_NAME='Sql Query' OR ATTR_NAME='Lookup Sql Override' THEN ATTR_VALUE END SQLQUERY,
CASE WHEN ATTR_NAME='Lookup table name' THEN ATTR_VALUE END LOOKUPTABLE,
CASE WHEN ATTR_NAME='Sql Query' OR ATTR_NAME='Lookup Sql Override' THEN SUBSTR(ATTR_VALUE,INSTR(UPPER(ATTR_VALUE),'FROM'),15) END SQLQUERYV
FROM REP_WIDGET_INST A
INNER JOIN REP_ALL_MAPPINGS B
ON A.MAPPING_ID = B.MAPPING_ID
INNER JOIN REP_WIDGET_ATTR C
ON A.WIDGET_ID = C.WIDGET_ID
WHERE A.WIDGET_TYPE IN (2, 11,3)
--AND MAPPING_NAME<>PARENT_MAPPING_NAME
--AND B.MAPPING_ID=515
--AND PARENT_SUBJECT_AREA='EDW'
AND ATTR_NAME IN ( 'Connection Information','Lookup Sql Override','Lookup table name','Sql Query')
) , OPB_SESSION
WHERE MPNGID=MAPPING_ID
GROUP BY SUBJECT_AREA,MPGANDP, MPNGID,OBJECTTYPE,INSTANCE_NAME,SESSION_ID
) T1
INNER JOIN
(SELECT OPB_TASK_INST.WORKFLOW_ID,OPB_TASK_INST.TASK_ID ,OPB_TASK_INST.INSTANCE_NAME SESSIONNAME
FROM OPB_TASK_INST
WHERE OPB_TASK_INST.TASK_TYPE IN (68) --,70)
START WITH WORKFLOW_ID IN (SELECT TASK_ID FROM OPB_TASK WHERE TASK_TYPE = 71 AND /* **************SPECIFY WORKFLOW NAME HERE*********/ TASK_NAME='wf_TEST')
CONNECT BY PRIOR OPB_TASK_INST.TASK_ID = OPB_TASK_INST.WORKFLOW_ID ) WFSESSCONN
ON TASK_ID=SESSION_ID
INNER JOIN
( SELECT MAPPING_ID MAPID,LISTAGG(SOURCE_NAME,',') WITHIN GROUP (ORDER BY SOURCE_NAME) SOURCENAMES
FROM REP_SRC_MAPPING E
GROUP BY SUBJECT_AREA,MAPPING_NAME,MAPPING_ID ) SOURCENAMES
ON MAPID=MPNGID
LEFT JOIN
(SELECT DISTINCT SUBJECT_AREA SA,TASK_NAME,INSTANCE_NAME INSNAME,CNX_NAME,SESSION_ID SSID
FROM
REP_ALL_TASKS A,
REP_SESS_WIDGET_CNXS B
WHERE
A.TASK_ID = B.SESSION_ID
) T2
ON SESSION_ID=SSID
AND INSNAME=INSTANCE_NAME
AND SUBJECT_AREA=SA
LEFT JOIN
( SELECT SUBJECT_AREA SAT, SESSION_NAME SESSNT, SESSION_ID SSIDT, LISTAGG(WIDGET_NAME,',') WITHIN GROUP (ORDER BY WIDGET_NAME) AS TARGET_NAMES
FROM (SELECT distinct SUBJECT_AREA,SESSION_NAME,SESSION_ID,WIDGET_NAME
FROM REP_SESS_TBL_LOG
WHERE TYPE_NAME='Target Definition' )
GROUP BY SUBJECT_AREA,SESSION_NAME,SESSION_ID
)
ON SESSION_ID=SSIDT
)
LEFT JOIN OPB_CNX
ON TRIM(OBJECT_NAME)=TRIM(CASE WHEN OBJECTTYPE='Lookup ' THEN CONNECTION ELSE CNX_NAME END)
ORDER BY SUBJECT_AREA,SESSIONNAME,MPGANDP,INSTANCE_NAME
My Requirement is to write a sql query to get the sub-region wise (fault)events count that occurred for the managedobjects. My database is postgres 8.4. Let me explain using the table structure.
My tables in django:
Managedobject:
class Managedobject(models.Model):
name = models.CharField(max_length=200, unique=True)
iscontainer = models.BooleanField(default=False,)
parentkey = models.ForeignKey('self', null=True)
Event Table:
class Event(models.Model):
Name = models.CharField(verbose_name=_('Name'))
foid = models.ForeignKey(Managedobject)
Managedobject Records:
NOC
Chennai
MO_1
MO_2
MO_3
Mumbai
MO_4
MO_5
MO_6
Delhi
Bangalore
IP
Calcutta
Cochin
Events Records:
event1 MO_1
event2 MO_2
event3 MO_3
event4 MO_5
event5 MO_6
Now I need to get the events count for all the sub-regions. For example,
for NOC region:
Chennai - 3
Mumbai - 2
Delhi - 0
Bangalore - 0
So far I am able to get the result in two different queries.
Get the subregions.
select id from managedobject where iscontainer = True and parentkey = 3489
For each of the region (using for loop), get the count as follows:
SELECT count(*)
from event ev
WHERE ev.foid
IN (
WITH RECURSIVE q AS (
SELECT h
FROM managedobject h
WHERE parentkey = 3489
UNION ALL
SELECT hi
FROM q
JOIN managedobject hi
ON hi.parentkey = (q.h).id
)
SELECT (q.h).id FROM q
)
Please help to combine the queries to make it a single query and for getting the top 5 regions. Since the query is difficult in django, I am going for a raw sql query.
I got the query:
WITH RECURSIVE q AS (
SELECT h,
1 AS level,
id AS ckey,
displayname as dname
FROM managedobject h
WHERE parentkey = 3489
and logicalnode=True
UNION ALL
SELECT hi,
q.level + 1 AS level,
ckey,
dname
FROM q
JOIN managedobject hi ON hi.parentkey = (q.h).id
)
SELECT count(ckey) as ccount,
ckey,
dname
FROM q
JOIN event as ev on ev.foid_id = (q.h).id
GROUP BY ckey, dname
ORDER BY ccount DESC
LIMIT 5