How to reduce JSON object? - c++

I have a JSON file created from a SQL query on a database. I'm trying to reduce several lines of the same "car_id" into a single line.
There is an example of my JSON file with several lines for a single :
[{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"1","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"2","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"3","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"4","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"5","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"6","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"}]
I searched in StackOverflow, Google, and either I did not understand how to reduce this file into this.
{"car_id":"1","sca_multiverseid":"430690","car_convertedmanacost":"2","car_coloridentity":"W","ccal_capacity":null,"clel_legality":"1,2,3,4,5,6","csul_supertype":null,"ctyl_type":"7","cstl_subtype":null,"set_block":"2","car_layout":"7","car_power":"0","car_toughness":"0"},
Here, there is only one different field to merge (clel_legality), but there can be several fields with different values (like (car_coloridentity, ccal_capacity, csul_supertype ...)
I'm sorry for my English if I made a mistake. Thank you in advance.
Edit :
There is my SQL Query :
SELECT car_id, sca_multiverseid, car_convertedmanacost, car_coloridentity, ccal_capacity, clel_legality, csul_supertype, ctyl_type, cstl_subtype, set_block, car_layout, car_power, car_toughness
FROM mag_card A LEFT JOIN mag_setcard B ON A.car_id = B.sca_card
LEFT JOIN mag_cardcapacityli C ON A.car_id=C.ccal_card
LEFT JOIN mag_cardlegalityli D ON A.car_id=D.clel_card
LEFT JOIN mag_cardsupertypeli E ON A.car_id=E.csul_card
LEFT JOIN mag_cardtypeli F ON A.car_id=F.ctyl_card
LEFT JOIN mag_cardsubtypeli G ON A.car_id=G.cstl_card
LEFT JOIN mag_set H ON B.sca_set=H.set_id

Thank's to Caleb McNevin :
SELECT car_id, sca_multiverseid, car_convertedmanacost, GROUP_CONCAT(DISTINCT car_coloridentity), GROUP_CONCAT(DISTINCT ccal_capacity), GROUP_CONCAT(DISTINCT clel_legality), GROUP_CONCAT(DISTINCT csul_supertype),GROUP_CONCAT(DISTINCT ctyl_type), GROUP_CONCAT(DISTINCT cstl_subtype), set_block, car_layout, car_power, car_toughness
FROM mag_card A LEFT JOIN mag_setcard B ON A.car_id = B.sca_card
LEFT JOIN mag_cardcapacityli C ON A.car_id=C.ccal_card
LEFT JOIN mag_cardlegalityli D ON A.car_id=D.clel_card
LEFT JOIN mag_cardsupertypeli E ON A.car_id=E.csul_card
LEFT JOIN mag_cardtypeli F ON A.car_id=F.ctyl_card
LEFT JOIN mag_cardsubtypeli G ON A.car_id=G.cstl_card
LEFT JOIN mag_set H ON B.sca_set=H.set_id
GROUP BY car_id
The command GROUP_CONCAT(DISTINCT xxx) combined a and a GROUP_BY(primary_key) at the end of the request has worked!

Related

How to fix "Query returns and empty value" error in google spread sheet

I have the following code. I've read many threads on how to solve query returning empty value but none of them worked in my case.
I've also shared a workaround Google spreadsheet.
What I'm going to do is that search first tab by query and if it matched, show a green tick, otherwise show a red cross.
The code below works for green tick but in case of red cross, it shows an error.
How the code below is expected to operate:
It makes a query on tab1 and if student number is in column C (with some defined conditions) OR is in column D (without any condition), it shows a green tick, otherwise, it must insert a red cross.
if(OR(QUERY(LiveAttendanceForm!$A:$C,
"select C
where A >= timestamp '"&TEXT(D$1, "yyyy-MM-dd HH:mm:ss")&"'
and A <= timestamp '"&TEXT(D$2, "yyyy-MM-dd HH:mm:ss")&"'
and C = "&$C5, 0)=$C5,Query(LiveAttendanceForm!$A:$D,"select D where D = "&$C5,0)=$C5),"✅","❌")
I also tried using iferror function before the query, but it shows all fields as true and makes all cells green tick. I'll be very grateful if someone can help me fix this annoying issue! The share Google sheet's link is below:
https://docs.google.com/spreadsheets/d/1KfkA48OyOnZAPQAbtdEIs9AYaAPRFRFOpvQRM5QyPss/edit?usp=sharing
you will need to wrap your queries into IFERROR:
=IF(OR(IFERROR(QUERY(LiveAttendanceForm!$A:$C,
"select C
where A >= timestamp '"&TEXT(D$1, "yyyy-MM-dd HH:mm:ss")&"'
and A <= timestamp '"&TEXT(D$2, "yyyy-MM-dd HH:mm:ss")&"'
and C = "&$C4, 0))=$C4,
IFERROR(QUERY(LiveAttendanceForm!$A:$D,
"select D
where D = "&$C4, 0))=$C4), "✅", "❌")

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql().
Then I tried using sqlalchemy, query objects but none of these methods are
useful as the sql getting executed for long time and it never ends.
I tried using Hints.
I guess the problem is the following: Pandas creates a cursor object in the
background. With cx_Oracle we cannot influence the "arraysize" parameter which
will be used thereby, i.e. always the default value of 100 will be used which
is far too small.
CODE:
import pandas as pd
import Configuration.Settings as CS
import DataAccess.Databases as SDB
import sqlalchemy
import cx_Oracle
dfs = []
DBM = SDB.Database(CS.DB_PRM,PrintDebugMessages=False,ClientInfo="Loader")
sql = '''
WITH
l AS
(
SELECT DISTINCT /*+ materialize */
hcz.hcz_lwzv_id AS lwzv_id
FROM
pm_mbt_materialbasictypes mbt
INNER JOIN pm_mpt_materialproducttypes mpt ON mpt.mpt_mbt_id = mbt.mbt_id
INNER JOIN pm_msl_materialsublots msl ON msl.msl_mpt_id = mpt.mpt_id
INNER JOIN pm_historycompattributes hca ON hca.hca_msl_id = msl.msl_id AND hca.hca_ignoreflag = 0
INNER JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_id = hca.hca_tpm_id
inner join pm_tin_testdefinsertions tin on tin.tin_id = tpm.tpm_tin_id
INNER JOIN pm_hcz_history_comp_zones hcz ON hcz.hcz_hcp_id = hca.hca_hcp_id
WHERE
mbt.mbt_name = :input1 and tin.tin_name = 'x1' and
hca.hca_testendday < '2018-5-31' and hca.hca_testendday > '2018-05-30'
),
TPL as
(
select /*+ materialize */
*
from
(
select
ut.ut_id,
ut.ut_basic_type,
ut.ut_insertion,
ut.ut_testprogram_name,
ut.ut_revision
from
pm_updated_testprogram ut
where
ut.ut_basic_type = :input1 and ut.ut_insertion = :input2
order by
ut.ut_revision desc
) where rownum = 1
)
SELECT /*+ FIRST_ROWS */
rcl.rcl_lotidentifier AS LOT,
lwzv.lwzv_wafer_id AS WAFER,
pzd.pzd_zone_name AS ZONE,
tte.tte_tpm_id||'~'||tte.tte_testnumber||'~'||tte.tte_testname AS Test_Identifier,
case when ppd.ppd_measurement_result > 1e15 then NULL else SFROUND(ppd.ppd_measurement_result,6) END AS Test_Results
FROM
TPL
left JOIN pm_pcm_details pcm on pcm.pcm_ut_id = TPL.ut_id
left JOIN pm_tin_testdefinsertions tin ON tin.tin_name = TPL.ut_insertion
left JOIN pm_tpr_testdefprograms tpr ON tpr.tpr_name = TPL.ut_testprogram_name and tpr.tpr_revision = TPL.ut_revision
left JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_tpr_id = tpr.tpr_id and tpm.tpm_tin_id = tin.tin_id
left JOIN pm_tte_testdeftests tte on tte.tte_tpm_id = tpm.tpm_id and tte.tte_testnumber = pcm.pcm_testnumber
cross join l
left JOIN pm_lwzv_info lwzv ON lwzv.lwzv_id = l.lwzv_id
left JOIN pm_rcl_resultschipidlots rcl ON rcl.rcl_id = lwzv.lwzv_rcl_id
left JOIN pm_pcm_zone_def pzd ON pzd.pzd_basic_type = TPL.ut_basic_type and pzd.pzd_pcm_x = lwzv.lwzv_pcm_x and pzd.pzd_pcm_y = lwzv.lwzv_pcm_y
left JOIN pm_pcm_par_data ppd ON ppd.ppd_lwzv_id = l.lwzv_id and ppd.ppd_tte_id = tte.tte_id
'''
#method1: using query objects.
Q = DBM.getQueryObject(sql)
Q.execute({"input1":'xxxx',"input2":'yyyy'})
while not Q.AtEndOfResultset:
print Q
#method2: using sqlalchemy
connectstring = "oracle+cx_oracle://username:Password#(description=
(address_list=(address=(protocol=tcp)(host=tnsconnect string)
(port=pertnumber)))(connect_data=(sid=xxxx)))"
engine = sqlalchemy.create_engine(connectstring, arraysize=10000)
df_p = pd.read_sql(sql, params=
{"input1":'xxxx',"input2":'yyyy'}, con=engine)
#method3: using pd.read_sql()
df_p = pd.read_sql_query(SQL_PCM, params=
{"input1":'xxxx',"input2":'yyyy'},
coerce_float=True, con= DBM.Connection)
It would be great if some one could help me out in this. Thanks in advance.
And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!
class Connection(cx_Oracle.Connection):
def __init__(self):
super(Connection, self).__init__("user/pw#dsn")
def cursor(self):
c = super(Connection, self).cursor()
c.arraysize = 5000
return c
engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)
Here's another alternative to experiment with.
Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:
<?xml version="1.0"?>
<oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess"
xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"
schemaLocation="http://xmlns.oracle.com/oci/oraaccess
http://xmlns.oracle.com/oci/oraaccess.xsd">
<default_parameters>
<prefetch>
<rows>1000</rows>
</prefetch>
</default_parameters>
</oraaccess>
If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.
cx_Oracle needs to be using Oracle Client 12c, or later, libraries.
Experiment with different sizes.
See OCI Client-Side Deployment Parameters Using oraaccess.xml.

How to use regular expressions properly on a SQL files?

I have a lot of undocumented and uncommented SQL queries. I would like to extract some information within the SQL-statements. Particularly, I'm interested in DB-names, table names and if possible column names. The queries have usually the following syntax.
SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'
Usually, the statements involes several DBs and Tables. I would like only extract DBs and Tables with any other information. I thought if whether it is possible to extract first the information which begins after FROM & JOIN & LEFT JOIN. Here its usually db.table letters such as o t s correspond already to referenced tables. I suppose they are difficult to capture. What I tried without any success is to use something like:
gsub(".*FROM \\s*|WHERE|ORDER|GROUP.*", "", vec)
Assuming that each statement ends with WHERE/where or ORDER/order or GROUP... But that doesnt work out as expected.
You haven't indicated which database system you are using but virtually all such systems have introspection facilities that would allow you to get this information a lot more easily and reliably than attempting to parse SQL statements. The following code which supposes SQLite can likely be adapted to your situation by getting a list of your databases and then looping over the databases and using dbConnect to connect to each one in turn running code such as this:
library(gsubfn)
library(RSQLite)
con <- dbConnect(SQLite()) # use in memory database for testing
# create two tables for purposes of this test
dbWriteTable(con, "BOD", BOD, row.names = FALSE)
dbWriteTable(con, "iris", iris, row.names = FALSE)
# get all table names and columns
tabinfo <- Map(function(tab) names(fn$dbGetQuery(con, "select * from $tab limit 0")),
dbListTables(con))
dbDisconnect(con)
giving an R list whose names are the table names and whose entries are the column names:
> tabinfo
$BOD
[1] "Time" "demand"
$iris
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
or perhaps long form output is preferred:
setNames(stack(tabinfo), c("column", "table"))
giving:
column table
1 Time BOD
2 demand BOD
3 Sepal.Length iris
4 Sepal.Width iris
5 Petal.Length iris
6 Petal.Width iris
7 Species iris
You could use the stringi package for this.
library(stringi)
# Your string vector
myString <- "SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'"
# Three stringi functions used
# stringi_extract_all_regex will extract the strings which have FROM or JOIN followed by some text till the next space
# string_replace_all_regex will replace all the FROM or JOIN followed by space with null string
# stringi_unique will extract all unique strings
t <- stri_unique(stri_replace_all_regex(stri_extract_all_regex(myString, "((FROM|JOIN) [^\\s]+)", simplify = TRUE),
"(FROM|JOIN) ", ""))
> t
[1] "mydb.table1" "mydb.sometable" "otherdb.sometable"

Cursor execution - Python

I am working on a Django project where i use MySql database. In my project, one of the method used two cursor connections to execute the query. It was closed correctly.
The MySql database has a table called GeometryTable where it had two columns (Geometry Datatypes). One is GeomPolygon (Polygon values) and another is GeomPoint (Point data). I wrote a MySql query in my python project which returns the selected points and polygons within the given polygon. The table (GeometryTable) had 9 million rows of values.
When i run the query in MySql Workbench, it took a few seconds. But in project, it took several minutes to return the values. Anyone please help me to optimize the code. Thanks
The method is :
def GeometryShapes(polygon):
geom = 'POLYGON((' + ','.join(['%s %s' % v for v in polygon]) + '))'
query1 = 'SELECT GeomId, AsText(GeomPolygon)' \
' FROM GeometryTable' \
' WHERE MBRWithin(GeomPolygon, GeomFromText("%s")) AND length(GeomPolygon) < 10 million' % (geom)
query2 = 'SELECT GeomId, AsText(GeomPoint)' \
' FROM GeometryTable' \
' WHERE MBRWithin(GeomPoint, GeomFromText("%s")) AND length(GeomPoint) < 10 million' % (geom)
if query1:
cursor = connection.cursor()
cursor.execute(query1)
.....
cursor.close()
if query2:
cursor = connection.cursor()
cursor.execute(query2)
.......
cursor.close()
Here, while cursor execution (cursor.execute(query1) and cursor.execute(query2)) took several minutes to execute the query. I have indexed both the columns in the specified table.
Can anyone help me to optimize the code?

SQL Comparison to a value in the next row

I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC
This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc
Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading