Compare line excel and line table sql - c++

I will need help or leads for a seemingly simple problem but I am blocking a lot on it. (I'm still a beginner in C ++ / mysql)
I have to fill a table of my database under MySQL by importing the data from a .csv file (it's OK, I succeeded) but before doing this import, I must compare the data of my first line , second, third etc ... until the end of the excel file (which corresponds to the names of the columns that there must be in my table) with the name of the columns of my table under MySQL in order to verify that we have a good scheduling and we put the right data in the right places.

Related

Informatica powercenter always can't read the file properly. All data always appears in one column

I have problem with informatica powercenter. When i want to import data from flat file csv, all datas always appear in one column. I need to edit the file first, and set define name in excel then informatica can read all data properly. How to read the data properly in powercenter without doing define name first in excel?
Thank you
You need to ensure,
You're reading file definition as delimited. Here is a file wizard where you can define it as delimited.
while reading set it so it reads col name from first row.
And then read from second row.
You can check this img.
https://2.bp.blogspot.com/-enDSMKLYyRY/UXADBtNE8WI/AAAAAAAAAu8/oVfr6IsAl8Y/s1600/8.jpg
If you set above properties up, infa should be able to read definition properly and you dont have to set col name or datatype.

How to properly import tsv to athena

I am following this example:
LazySimpleSerDe for CSV, TSV, and Custom-Delimited Files - TSV example
Summary of the code:
CREATE EXTERNAL TABLE flight_delays_tsv (
yr INT,
quarter INT,
month INT,
...
div5longestgtime INT,
div5wheelsoff STRING,
div5tailnum STRING
)
PARTITIONED BY (year STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://athena-examples-myregion/flight/tsv/';
My questions are:
My tsv does not have column names
(my tsv)
Is it ok if I just list the columns as c1,c2… and all of them as string ?
I do not understand this:
PARTITIONED BY (year STRING)
in the example, the column ‘year’ is not listed in any of the columns…
Column names
The column names are defined by the CREATE EXTERNAL TABLE command. I recommend you name them something useful so that it is easier to write queries. The column names do not need to match any names in the actual file. (It does not interpret header rows.)
Partitioning
From Partitioning Data - Amazon Athena:
To create a table with partitions, you must define it during the CREATE TABLE statement. Use PARTITIONED BY to define the keys by which to partition data.
The field used to partition the data is NOT stored in the files themselves, which is why they are not in the table definition. Rather, the column value is stored in the name of the directory.
This might seem strange (storing values in a directory name!) but actually makes sense because it avoids situations where an incorrect value is stored in a folder. For example, if there is a year=2018 folder, what happens if a file contains a column where the year is 2017? This is avoided by storing the year in the directory name, such that any files within that directory are assigned the value denoted in the directory name.
Queries can still use WHERE year = 2018 even though it isn't listed as an actual column.
See also: LanguageManual DDL - Apache Hive - Apache Software Foundation
The other neat thing is that data can be updated by simply moving a file to a different directory. In this example, it would change the year value as a result of being in a different directory.
Yes, it's strange, but the trick is to stop thinking of it like a normal database and appreciate the freedom that it offers. For example, appending new data is as simple as dropping a file into a directory. No loading required!

Power Query - Select Columns from table instead of removing afterwards

The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.
Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.

Bad Data in linked table

I pull data to SQL Server from a cobol database that is connected as a linked server.
we have ended up with bad data in one of our tables, and I am trying to track down the offending record. specifically we have a letter entered in to a year field, when SQL pulls the data over it attempts to convert that column to a numeric data type.
I believe what I need is a combination of openquery and cast to select all columns with at least that specific column as varchar, so that I can retrieve the specific offending record and have the dept. fix the error.
I have tried the following two syntax but both produces an error.
select * from [incode]...ctvehl
where VEH_YEAR like '992D'
select * from openquery (incode, 'select cast(* as nvarchar) from ctvehl')
for clarity
linked server name = incode
table name = CTVEHL
Specific offending column = VEH_YEAR
assistance with this would be greatly appreciated.
Thanks
You could just initially insert the data into a work table within SQL Server that has all varchar() columns. You could then validate and parse the work table for possible errors, moving the bad rows to an "error" table for other processing/reporting. Then insert the remaining rows into your actual table.
You should look into SQL Server Integration Services, it offers ways to mass import data and handle bad rows, see: SQL Server Integration Services Dealing with Bad Data

Postgres Copy select rows from CSV table

This is my first post to stackoverflow. Your forum has been SO very helpful as I've been learning Python and Postgres on the fly for the last 6 months, that I haven't needed to post yet. But this task is tripping me up and I figure I need to start earning reputation points:
I am creating a python script for backing up data into an SQL database daily. I have a CSV file with an entire months worth of hourly data, but I only want to select a single day of data from from the file and copy those select rows into my database. Am I able to query the CSV table and append the query results into my database? For example:
sys.stdin = open('file.csv', 'r')
cur.copy_expert("COPY table FROM STDIN
SELECT 'yyyymmddpst LIKE 20140131'
WITH DELIMITER ',' CSV HEADER", sys.stdin)
This code and other variations aren't working out - I keep getting syntax errors. Can anyone help me out with this task? Thanks!!
You need create temporary table at first:
cur.execute('CREATE TEMPORARY TABLE "temp_table" (LIKE "your_table") WITH OIDS')
Than copy data from csv:
cur.execute("COPY temp_table FROM '/full/path/to/file.csv' WITH CSV HEADER DELIMITER ','")
Insert necessary records:
cur.execute("INSERT INTO your_table SELECT * FROM temp_table WHERE yyyymmddpst LIKE 20140131")
And don't forget do conn.commit()
Temp table will destroy after cur.close()
You can COPY (SELECT ...) TO an external file, because PostgreSQL just has to read the rows from the query and send them to the client.
The reverse is not true. You can't COPY (SELECT ....) FROM ... . If it were a simple SELECT PostgreSQL could try to pretend it was a view, but really it doesn't make much sense, and in any case it'd apply to the target table, not the source rows. So the code you wrote wouldn't do what you think it does, even if it worked.
In this case you can create an unlogged or temporary table, copy the full CSV to it, and then use SQL to extract just the rows you want, as pointed out by Dmitry.
An alternative is to use the file_fdw to map the CSV file as a table. The CSV isn't copied, it's just read on demand. This lets you skip the temporary table step.
From PostgreSQL 12 you can add a WHERE clause to your COPY statement and you will get only the rows that match the condition.
So your COPY statement could look like:
COPY table
FROM '/full/path/to/file.csv'
WITH( FORMAT CSV, HEADER, DELIMITER ',' )
WHERE yyyymmddpst LIKE 20140131