Mass data insert into SQL Server? - c++

I've got 8 worksheets within an Excel workbook that I'd like to import into separate tables within a SQL Server DB.
I'd like to import each of the 8 worksheets into a separate table, ideally, with table names coinciding with worksheet tab names, but initially, I just want to get the data into the tables, so arbitrary table names work for the time being too.
The format of the data in each of the worksheets (and tables by extension) is the same (and will be identical), so I'm thinking some kind of loop could be used to do this.
Data looks like this:
Universe Date Symbol Shares MktValue Currency
SMALLCAP 6/30/2011 000360206 27763 606361.92 USD
SMALLCAP 6/30/2011 000361105 99643 2699407.52 USD
SMALLCAP 6/30/2011 00081T108 103305 810926.73 USD
SMALLCAP 6/30/2011 000957100 57374 1339094.76 USD
And table format in SQL would/should be consistent with the following:
CREATE TABLE dbo.[market1] (
[Universe_ID] char(20),
[AsOfDate] smalldatetime,
[Symbol] nvarchar(20),
[Shares] decimal(20,0),
[MktValue] decimal(20,2),
[Currency] char(3)
)
I'm open to doing this using either SQL/VBA/C++ or some combination (as these are the languages I know and have access to). Any thoughts on how to best go about this?

You could use SSIS or DTS packages to import them. Here are a couple references to get you going.
Creating a DTS Package - pre 2005
Creating a SSIS Package - 2005 forward

For an Excel file (2007 or 2010) with an xlsx extension, I have renamed them to .zip and extracted their contents into a directory and use SQL XML Bulk Load to import the sheets and reference tables. When I have all the data in SQL server, I use basic SQL queries to extract/transform the data needed into designated worksheets. -- This keeps the "digestion" logic in SQL and uses minimal external VB script of C# development.
Link to SQL Bulk Load of XML data: http://support.microsoft.com/kb/316005

In SQL Management Studio, right click on a database, then click Tasks, then Import Data. This will take you through some screens and create an SSIS package to import the file. At some point in the process it will ask you if you want to save the package (I would run it a few times as well to make sure it imports your data the way you want it). Save it and then you can schedule the package to be run as a Job via the SQL Server Agent. (The job type will be Sql Server Integration Services).

You can use following script
SELECT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\test\xltest.xls;Extended Properties=Excel 8.0')...[Customers$]
SELECT * INTO XLImport4 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\test\xltest.xls', [Customers$])
SELECT * INTO XLImport5 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\test\xltest.xls', 'SELECT * FROM [Customers$]')
OR can use following
SELECT * INTO XLImport2 FROM OPENQUERY(EXCELLINK,
'SELECT * FROM [Customers$]')

Related

Loading multiple files from multiple paths to Big Query

I have a file structure such as:
gs://BUCKET/Name/YYYY/MM/DD/Filename.csv
Every day my cloud functions are creating another path with another file innit corresponding to the date of the day (so for today's 5th of August) we would have gs://BUCKET/Name/2022/08/05/Filename.csv
I need to find a way to query this data to Big Query automatically so that if I want to query it for 'manual inspection' I can select for example data from all 3 months in one query doing CREATE TABLE with gs://BUCKET/Name/2022/{06,07,08}/*/*.csv
How can I replicate this? I know that BigQuery does not support more than 1 wildcard, but maybe there is a way to do so.
To query data inside GCS from Big Query you can use an external table.
Problem is this will fail because you cannot have a comma (,)
as part of the URI list
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = ['gs://bucket/2022/{1,2,3}/data.csv']
)
You have to specify the 3 CSV file locations like this:
CREATE EXTERNAL TABLE `bigquerydevel201912.foobar`
OPTIONS (
format='CSV',
uris = [
'gs://inigo-test1/2022/1/data.csv',
'gs://inigo-test1/2022/2/data.csv']
'gs://inigo-test1/2022/3/data.csv']
)
Since you're using this sporadically, probably makes more sense to create a temporal external table.
se I found a solution that works at least for my use case, without using the external table.
During the creation of table in dataset in BigQuery use create table from: GCS and then when using URI pattern I used gs://BUCKET/Name/2022/* ; As long as filename is the same in each subfolder and schema is identical, then BQ will load everything and then you can perform date operations directly in BQ (I have a column with ingestion date)

How to create files having date in the file name using big query export data statement

I am using BIG QUERY EXPORT DATA statement to create files in cloud storage for an another team to extract for further reprocessing. I am using below statement, not pasting the select query as its huge.
EXPORT DATA OPTIONS(
uri='gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter='|') AS
SELECT
I see below files getting created in my cloud storage bucket
radhika_sharma_ibm#cloudshell:~ (whr-asia-datalake-nonprod)$ gsutil ls gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000000.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000001.csv
gs://whr-asia-datalake-dev-standard/outbound/Adobe/Customer_Master_000000000002.csv
I cannot remove the suffix part as BIG QUERY creates it, but I am wondering if I can create files with DATE in the file name for the other team to identify what date it is created for??
That is like
Customer_Master_04022021_000000000000_.csv
I need to have a date in my file. Any help or inputs please?
Is there a work around or I will have to go with a data flow here that is using a data flow job to extract data from table in a file.
You can use the uri value as:
'gs://bucket/folder/your_filename-'||current_datetime()||'-*.csv'
Either Current_date() or current_datetime() can be used.
Thanks

Combine or Simplify Web Service Queries in Power BI

I was wondering if there was a flexible way to setup multiple web services (API) queries in Power BI. The web service I am using is only capable of getting me one day of data for one location per query and I need daily data for 10 locations. Which means on a standard 31 day month I would need to setup 310 queries. The data I am interested in are the Final LMPs and the website I am pulling from is https://webservices.iso-ne.com/docs/v1.1/. An example of a working query in PowerBI that is grabbing Final LMP data for just 02/01/2020 for location 4152 is:
https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/20200201/location/4152.xml
For such case I would setup the following items:
A data source (e.g. manually via 'Enter Data') with 10 individual locations
A data source with the required date range (e.g. something like that)
A data source that combines the the first two (cross join e.g. an example)
Add a column (to source 3) that calls a 'custom function' that for each combined record (in step 3) makes an API request to retrieve the data. The function should take two parameters (location and the day) as the input
Thanks,
Mau
You'll need to create 10 queries, for each of the locations, you can then use a variable to drive the date of extraction.
In the advanced editor after the 'let' add
var_date = Date.ToText(Date.From(DateTime.FixedLocalNow()), "yyyyMMdd")
Which today will return 20200221. When the query runs it will evaluate it and run it for the right day.
https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/20200201/location/4152.xml
Change to in your source to something like:
"https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/" & var_date & "/location/4152.xml"
So it should be like this:
If you need to offset the date try DateTime.FixedLocalNow() -1 to get the previous day.

Building app to upload CSV to Oracle 12c database via Apex

I'v been asked to create an app in Oracle Apex that will allow me to drop a CSV file. The file contains a list of all active physicians and associated info in my area. I do not know where to begin! Requirements:
-after dropping CSV file to apex, remove unnecessary columns
-edit data in each field, ie if phone# > 7 characters and begins with 1, remove 1. Or remove all special characters from a column.
-The CSV contains physicians of every specialty, I only want to upload specific specialties to the database table.
I have a small amount of SQL experience from Uni, and I know some HTML and CSS, but beyond that I am lost. Please help!
Began tutorial on Oracle-Apex. Created upload wizard on a dev environment
User drops CSV file to apex
Apex edits columns to remove unneccesary characteres
Only uploads specific columns from CSV file
Only adds data when column "Specialties" = specific specialties
Does not add redundant data (physician is already located in table, do nothing)
Produces report showing all new physicians added to table
Huh, you're in deep trouble as you have to do some job using a tool you don't know at all, with limited knowledge of SQL language. Yes, it is said that Apex is simple to use, but nonetheless ... you have to know at least something. Otherwise, as you said, you're lost.
See if the following helps.
there's the CSV file
create a table in your database; its description should match the CSV file. Mention all columns it contains. Pay attention to datatypes, column lengths and such
this table will be "temporary" - you'll use it every day to load data from CSV files: first you'll delete all it contains, then load new rows
using Apex "Create page" Wizard, create the "Data loading" process. Follow the instructions (and/or read documentation about it). Once you're done, you'll have 4 new pages in your Apex application
when you run it, you should be able to load CSV file into that temporary table
That's the first stage - successfully load data into the database. Now, the second stage: fix what's wrong.
create another table in the database; it will be the "target" table and is supposed to contain only data you need (i.e. the subset of the temporary table). If such a table already exists, you don't have to create a new one.
create a stored procedure. It will read data from the temporary table and edit everything you've mentioned (remove special characters, remove leading "1", ...)
as you have to skip physicians that already exist in the target table, use NOT IN or NOT EXISTS
then insert "clean" data into the target table
That stored procedure will be executed after the Apex loading process is done; a simple way to do that is to create a button on the last page which will - when pressed - call the procedure.
The final stage is the report:
as you have to show new physicians, consider adding a column (into the target table) which will be a timestamp (perhaps DATE is enough, if you'll be doing it once a day) or process_id (all rows inserted in the same process will share the same value) so that you could distinguish newly added rows from the old ones
the report itself would be an Interactive report. Why? Because it is easy to create and lets you (or end users) to adjust it according to their needs (filter data, sort rows in a different manner, ...)
Good luck! You'll need it.

SAS/ACCESS and data step on external DB

I have the following concern regarding SAS/ACCESS facility.
Let's imagine that we have an external DB (i.e. Oracle), which we have assigned to a certain libname.
Next, we do a simple operation on one of the tables within this DB, i.e.
data db.table_new;
set db.table_old(keep=var1 var2 var3);
if var1>0 then new_var1=5;
run;
My question is the following:
Will the whole table table_old be pulled from external DB to SAS Server in order to process the data?
Will SAS/ACCESS transform the data step into DBMS operation or SQL so the whole processing will be performed outside SAS?
The documentation is unclear about it . See page 62.
Usually the rule of thumb is: if the SAS functions that are used in DATA step can be converted to native db sql functions, then SAS will let the DB server do the data processing. In your case, this seems to be the situation.
You can answer this question on any piece of code through a set of non-syntax-highlighted options that need to be simplified:
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
When you run the data step, check the log. You will see information about whether SAS is able to successfully translate the code or not. If it was unsuccessful, you will see:
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
If this occurs, SAS will usually send out a select * to the server and pull everything before filtering. When you see that error, try doing explicit passthrough, or redesign your query so that it can do everything on the server. It is possible to bring down the SAS server, or severely degreade performance on the Oracle server, if the table is large enough.
Some common functions you'll want to avoid using directly in the query, especially with Oracle:
datepart()
intnx()
intck()
today()
put()
input()
If I have to use any of those functions, I usually play it safe and create a macro variable of static ones beforehand (e.g. today()), filter the raw data at the lowest level first to get it into the SAS server, or use explicit SQL passthrough.
In summary, I would say it depends on your method. On the second page of Chapter 1 of the SAS/Access 9.2 document in your above link, there are two methods (among the older DBLOAD procedure) of the SAS/ACCESS facility:
LIBNAME reference - assign SAS librefs to DBMS objects such
as schemas and databases; you can then work with the table or view as you would with a SAS data set...You can use such SAS procedures as PROC SQL or DATA step programming on any libref that references DBMS data.
SQL Pass-through facility - to interact with a data source using its
native SQL syntax without leaving your SAS session. SQL statements are passed directly to the data source for processing...The DBMS optimizer can take advantage of indexes on DBMS columns to process a query more quickly and
efficiently
Hence, for the first method SAS handles processing and second method DBMS handles processing. Like most clients (Java, C#, Python script or PHP webpage) that connect to external RDMS sources, unless a direct ODBC/OLEDB or other API connection is explicitly employed and request sent, processing is handled in the frontend (i.e., calculating parameters) and the end result is updated to the backend via transactions. All SAS's libraries would live in memory (or temporary hard disk) during the appointed session and depending on the code handles data itself and passes results to external source or passes data handling entirely to another source.
Comparative Example: Microsoft Access
One good comparative example would be Microsoft Access which like SAS too provides a linked table connection and pass-through query for any ODBC-compliant RDMS including SQL Server, Oracle, MySQL, etc. It is often a misnomer to tag Access as a database when actually it is a GUI program and collection of objects, one of which is the default Windows JET/ACE engine (a .dll file) not at all restricted to Access but available to all Office programs. Notice the world default as this can be switched out to any ODBC database source.
Linked tables are essentially Access GUI objects (specifically special tabledefs) not unlike SAS's libname refs that are loaded into a JET/ACE table container with data pointing externally. One can then use a linked table like any other Access local table and use anything of the ACE SQL dialect. This special linked table (much like SAS's libname refs are established by ODBC or other connection type) points to the external source and the driver translates query command for the migration action. Therefore, an exact same Access linked table query may perform differently than same RDMS query.
Analogy
I imagine SAS behaves the same way and exists as a front-end with libname ref as local objects with pointers to the backend. All data step handling is processed locally and simply the resultset are imported or extracted by the engine. To use an analogy. A database would be the home and SAS is the garbage man, home decorator, or move-in helper. SAS (like Java's JDBC, PHP's PDO, Python's cursors, R's libraries) knocks on the door which the database answers (annoyed by so many requests). "Hey buddy, we need to take out the garbage and here are the exact items...or we need to remodel the basement and here are the exact specs...or we have new furniture to add in the truck ready for drop off...with credentials signed please carry out immediately." And like in both, pass-through methods are requests carried out on the backend engine. So SAS leaves instructions, maybe a note on the door (without exactness) for homeowner to carry out.