Wrong field names on Postgresql with ADO - c++

I'm trying to get data from a PostgreSQL database using ADO and official PostgreSQL ODBC driver. In some cases I get wrong field names when using NextRecordset(). It looks like a bug in a driver. Is there any workaround for this?
Here is a small example. It prints 'g1' for the last field of the second recordset, but it must be 'g2'.
SQL
Create Table Test ("Field" int);
C++
_bstr_t strCnn("Provider='MSDASQL';Driver=PostgreSQL Unicode;uid=postgres;Server=127.0.0.1;port=5432;database=MyDB;pwd=password;");
_RecordsetPtr pRstCompound = NULL;
TESTHR(pRstCompound.CreateInstance(__uuidof(Recordset)));
auto Statement =
"Select '1' a1, '2' b1, '3' c1, '4' d1, '5' e1, '6' f1, \"Field\" g1 From Test;\n"
"Select '1' a2, '2' b2, '3' c2, '4' d2, '5' e2, \"Field\" f2, '7' g2 From Test;\n";
pRstCompound->Open(Statement, strCnn, adOpenForwardOnly, adLockReadOnly, adCmdText);
int intCount = 1;
while (!(pRstCompound == NULL)) {
printf("\n\nContents of recordset #%d\n", intCount++);
auto Fields = pRstCompound->Fields;
long const nFields = Fields->Count;
for (long nField = 0; nField < nFields; ++nField)
printf("%s%s",
(LPCSTR)(_bstr_t)Fields->GetItem(nField)->Name,
nField + 1 == nFields ? "\n" : "\t");
pRstCompound = pRstCompound->NextRecordset(nullptr);
}
Output:
Contents of recordset #1
a1 b1 c1 d1 e1 f1 g1
Contents of recordset #2
a2 b2 c2 d2 e2 f2 g1
Expected output:
Contents of recordset #1
a1 b1 c1 d1 e1 f1 g1
Contents of recordset #2
a2 b2 c2 d2 e2 f2 g2

Related

Is it possible to use arrayformula to copy the values above if the cell on its left is empty?

I'm trying to populate a column (Cell C2) with the value of the cell in the column to its left (Cell B2) if B2 is not empty; if B2 is empty, then C2 equals to C1.
If B3 is not empty, then C3 equals to B3; if B3 is empty, C3 equals C2.
I try using an array formula, but it returns circular reference error. And I certainly can't use INDEX and INDIRECT either.
Please help.
The sample file:
https://docs.google.com/spreadsheets/d/1TXz5m5LtTF632bMrwIIDlO-4NMavgs6HA4FE8_BFhRs/edit?usp=sharing
paste in C3 cell:
=ARRAYFORMULA(IF(D3:D<>"", IF(ROW(B3:B) <= MAX(IF(D3:D<>"", ROW(B3:B))),
TEXT(VLOOKUP(ROW(B3:B), FILTER({ROW(B3:B), B3:B}, LEN(B3:B)), 2),
"dd mmm yyyy"), ), ))
You can use vlookup on the row number:
=ArrayFormula(to_date(if(D3:D="","",vlookup(row(A3:A),if(B3:B<>"",{row(A3:A),B3:B}),2,true))))

Nested IF AND OR in Excel

How do I write the formula for the following:
if B2 = "SF" and D2 = "1"
then H2 = E2 + .75
else if B2 = "SF" and D2 = ".25"
then H2 = E2 + .625
else if B2 = "CW" and D2 = "1"
then H2 = E2 + 1
I want my answers to be in H2, with data being entered into B2, D2 and E2.
=if(AND(B2="SF",D2=1)=TRUE,E2+0.75,if(AND(B2="SF",D2=0.25)=TRUE,E2+0.625,if(AND(B2="CW",D2=1)=TRUE,E2+1,0)))
Regards

Reading csv with several subgroups

I have a csv-file that contains "pivot-like" data that I would like to store into a pandas DataFrame. The original data file is divided using different number of whitespaces to differentiate between the level in the pivot-data like so:
Text that I do not want to include,,
,Text that I do not want to include,Text that I do not want to include
,header A,header B
Total,100,100
A,,2.15
a1,,2.15
B,,0.22
b1,,0.22
" slightly longer name"...,,0.22
b3,,0.22
C,71.08,91.01
c1,57.34,73.31
c2,5.34,6.76
c3,1.33,1.67
x1,0.26,0.33
x2,0.26,0.34
x3,0.48,0.58
x4,0.33,0.42
c4,3.52,4.33
x5,0.27,0.35
x6,0.21,0.27
x7,0.49,0.56
x8,0.44,0.47
x9,0.15,0.19
x10,,0.11
x11,0.18,0.23
x12,0.18,0.23
x13,0.67,0.85
x14,0.24,0.2
x15,0.68,0.87
c5,0.48,0.76
x16,,0.15
x17,0.3,0.38
x18,0.18,0.23
d2,6.75,8.68
d3,0.81,1.06
x19,0.3,0.38
x20,0.51,0.68
Others,24.23,0
N/A,,
"Text that I do not want to include(""at all"") ",,
(It looks aweful, but you should be able to paste in e.g. Notepad to see it a bit clearer)
Basically, there are only two columns a and b, but the rows are indented using 0, 3, 6, 9, ... etc whitespaces to differentiate between the levels. So for instance,
zero level, the main group, A has 0 spaces,
first level a1 has 3 spaces,
second level a2 has 6 spaces,
third level a3 has 9 spaces and
fourth and final level has 12 spaces with the corresponding values for columns a and b respectively.
I would now like to be able to read and group this data on these levels in order to create a new summarizing DataFrame, with columns corresponding to these different levels, looking like:
Level 4 Diff(a,b) Level 0 Level 1 Level 2 Level 3
x7 525 C c1 c2 c3
x5 -0.03 A a1 a22 NaN
x4 -0.04 A a1 a22 NaN
x8 -0.08 C c1 c2 c3
…
Any clue on how to do this?
Thanks
Easiest is to split this into different functions
read the file
parse the lines
generate the 'tree'
construct the DataFrame
Parse the lines
def parse_file(file):
import ast
import re
pat = re.compile(r'^( *)(\w+),([\d.]+),([\d.]+)$')
for line in file:
r = pat.match(line)
if r:
spaces, label, a, b = r.groups()
diff = ast.literal_eval(a) - ast.literal_eval(b)
yield len(spaces)//3, label, diff
Reads each line, yields the level, 'label' and diff using a regular expression. I use ast to convert the string to int or float
Generate the tree
def parse_lines(lines):
previous_label = list(range(5))
for level, label, diff in lines:
previous_label[level] = label
if level == 4:
yield tuple(previous_label), diff
Initiates a list of length 5, and then overwrites the level this node is on.
Construct the DataFrame
with StringIO(file_content) as file:
lines = parse_file(file)
index, data = zip(*parse_lines(lines))
idx = pd.MultiIndex.from_tuples(index, names=[f'level_{i}' for i in range(len(index[0]))])
df = pd.DataFrame(data={'Diff(a,b)': list(data)}, index=idx)
Opens the file, constructs the index and generates the DataFrame with the different levels in the index. If you don't want this, you can add a .reset_index() or construct the DataFrame slightly different
df
level_0 level_1 level_2 level_3 level_4 Diff(a,b)
A a1 a2 a3 x1 -0.07
A a1 a2 a3 x2 -0.08000000000000002
A a1 a22 a3 x3 -0.04999999999999999
A a1 a22 a3 x4 -0.04000000000000001
A a1 a22 a3 x5 -0.03
A a1 a22 a3 x6 -0.06999999999999998
C c1 c2 c3 x7 525.0
C c1 c2 c3 x8 -0.08000000000000002
alternative for missing levels
def parse_lines(lines):
labels = [None] * 5
previous_level = None
for level, label, diff in lines:
labels[level] = label
if level == 4:
if previous_level < 3:
labels = labels[:previous_level + 1] + [None] * (5 - previous_level)
labels[level] = label
yield tuple(labels), diff
previous_level = level
the items under a22 don't seem to have a level_3, so it copies that from the previous. If this is unwanted, you can take this variation
df
level_0 level_1 level_2 level_3 level_4 Diff(a,b)
C c1 c2 c3 x1 -0.07
C c1 c2 c3 x2 -0.08000000000000002
C c1 c2 c3 x3 -0.09999999999999998
C c1 c2 c3 x4 -0.08999999999999997
C c1 c2 c4 x5 -0.07999999999999996
C c1 c2 c4 x6 -0.060000000000000026
C c1 c2 c4 x7 -0.07000000000000006
C c1 c2 c4 x8 -0.02999999999999997
C c1 c2 c4 x9 -0.04000000000000001
C c1 c2 c4 x11 -0.05000000000000002
C c1 c2 c4 x12 -0.05000000000000002
C c1 c2 c4 x13 -0.17999999999999994
C c1 c2 c4 x14 0.03999999999999998
C c1 c2 c4 x15 -0.18999999999999995
C c1 c2 c5 x17 -0.08000000000000002
C c1 c2 c5 x18 -0.05000000000000002
C c1 d2 d3 x19 -0.08000000000000002
C c1 d2 d3 x20 -0.17000000000000004

Python Pandas Dataframe merge and pick only few columns

I have a basic question on dataframe merge. After I merge two dataframe , is there a way to pick only few columns in the result.
Taking an example from documentation
https://pandas.pydata.org/pandas-docs/stable/merging.html#
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
result = pd.merge(left, right, on=['key1', 'key2'])
Result comes as :
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2
None
Is there a way I can chose only column 'C' from 'right' dataframe? For example, I would like my result to be like:
A B key1 key2 C
0 A0 B0 K0 K0 C0
1 A2 B2 K1 K0 C1
2 A2 B2 K1 K0 C2
None
result = pd.merge(left, right[['key1','key2','C']], on=['key1', 'key2'])
OR
right.merge(left, on=['key1','key2'])[['A','B','C','key1','key2']]

Redshift: insert column C1 of table T1 into column C2 of Table T2

I have two tables:
T1 with columns A1, A2, A3, A4,...., A20.
T2 with columns B1, B2, B3,...., B15.
The data type of all columns is varchar.
I want to copy all values of column range A1-A10 to B1-B10. How do I do so in Redshift? I tried:
insert into T2(B1,B2,...,B10) select A1 A2 A3 ... A10 from T1
but it failed. I corrected errors like missing ), (dot) in the column name.
How can I insert selected column from one table to another? Is there any other way to do that?
You need to do insert into T2 (select A1, A2 ... A10 from T1).
I tested with following queries and things worked fine for me:
create temp table T1 (a varchar(5), b varchar(5), c varchar(5), d varchar(5), e varchar(5));
insert into T1 values ('t11', 't12', 't13', 't14', 't15');
create temp table T2 (a varchar(5), b varchar(5), c varchar(5));
insert into T2 values ('t21', 't22', 't23');
insert into T2 (select a, b, c from T1);
select * from T2;
The last line correctly printed the following:
t21 t22 t23
t11 t12 t13