How to make a x column table in mysql? - c++

So I want to make a table in MYSQL (using c++) with about 128 columns each on representing an INT.
I don't know the syntax to make a 129 column row (1 for id 128 for each int)
Kinda like an array: int myArray[128];
CREATE TABLE SIFTFEATUES(ID INT not null, myArray[128] INT) would be Ideal or something close
where I don't have to write out each column name.

To have a table with 128 columns defined, you need to "write out" each column name. Each column in the table gets a name, a datatype, and other optional attributes.
In order to retrieve data from the column, it has to be referenced by name; the same is true for inserting and updating the contents of the column.
(I put "write out" in double quotes, because it's very simple task to generate a text file of 128 lines of a table definition that vary only by name. (It's not necessary to type each line.)
, col001 int
, col002 int
, ...
, col128 int
Absent any other information about your use case, what you are trying to accomplish, it's nearly impossible to make any sort of recommendation.

A table should be use for data persistence, thus not to be used like an array (with obvious exceptions). With that said I see two scenarios:
1) You want to use it like a temporary data structure, which I strongly do not recommend, unless for medical reasons.
2) You want to keep that table, for which I'd use a text editor or even MS Excel and create a macro to create the CREATE TABLEstatement with your 128 columns.

Related

ColumnsofType Record not returning any columns

I've got a table full of different data types, including records, that I want to extract all column names of records to then use in an expand function. I've included a screenshot of a column containing record's however, when I use this = Table.ColumnsOfType(#"Expanded fields", {type record}), it returns an empty list .
I've tried looking through the entire column to see if there was anything different but its all record types. Any help please.
EDIT:
Error using Table.TransformColumnTypes
Record is not a valid type to search for. And judging by your image, your type is Type.Any as denoted by the ABC123
You best bet is to unpivot all the columns (perhaps those starting with a certain prefix) then on the new Value column, expand like so
#"PriorStepNameHere" = .... ,
ExpandList= List.Distinct(List.Combine(List.Transform(Table.Column(#"PriorStepNameHere", "Value"), each if _ is record then Record.FieldNames(_) else {}))),
Expand= Table.ExpandRecordColumn(#"PriorStepNameHere", "Value", ExpandList,ExpandList)
It sounds like the Table.ColumnsOfType function is not properly identifying the columns in your table that contain records.One possible reason for this is that the column's datatype is not properly set as 'record'. Another possible reason could be that the data in the columns is not structured properly and hence it is not being identified as a record. You can try to use the Table.TransformColumnTypes function to convert the column's datatype to 'record' and see if that resolves the issue.
If the issue still persists, please share the sample data and the code you are using.

Fastest way to select a lot of rows based on their ID in PostgreSQL?

I am using postgres with libpqxx, and I have a table that we will simplify down to
data_table
{
bytea id PRIMARY KEY,
BigInt size
}
If I have a set of ID's in cpp, eg std::unordered_set<ObjectId> Ids, what is the best way to get the ID and the Size parameters out of data_table?
I have so far used a prepared statement:
constexpr char* preparedStatement = "SELECT size FROM data_table WHERE id = $1";
Then in a transaction I have called that prepared statement for every entry in the set, and retrieved the result for every entry in the set,
pqxx::work transaction(SomeExistingPqxxConnection);
std::unordered_map<ObjectId, uint32_t> result;
for (const auto& id : Ids)
{
auto transactionResult = transaction.exec_prepared(preparedStatement, ToPqxxBinaryString(id));
result.emplace(id, transactionResult[0][0].as<uint32_t>());
}
return result;
Because the set can contain tens of thousands of objects, and the table can contain millions, this can take quite some time to process, and I don't think it is a particularly efficient use of postgres.
I am pretty much brand new to SQL, so I don't really know if what I am doing is the right way to go about this, or if this is a much more efficient way.
E: For what it's worth the ObjectId class is basically a type wrapper over std::array<uint8_t, 32>, aka a 256 bit cryptographic hash.
The task as I understand it:
Get id (PK) and size (bigint) for "tens of thousands of objects" from a table with millions of rows and presumably several more columns ("simplified down").
The fastest way of retrieval is index-only scans. The cheapest way to get that in your particular case would be a "covering index" for your query by "including" the size column in the PK index like this (requires Postgres 11 or later):
CREATE TEMP TABLE data_table (
id bytea
, size bigint
, PRIMARY KEY (id) INCLUDE (size) -- !
)
About covering indexes:
Do covering indexes in PostgreSQL help JOIN columns?
Then retrieve all rows in a single query (or few queries) for many IDs at once like:
SELECT id, size
FROM data_table
JOIN (
VALUES ('id1'), ('id2') -- many more
) t(id) USING (id);
Or one of the other methods laid out here:
Query table by indexes from integer array
Or create a temporary table and join to it.
But do not "insert all those IDs one by one into it". Use the much faster COPY (or the meta-command \copy in psql) to fill the temp table. See:
How to update selected rows with values from a CSV file in Postgres?
And you do not need an index on the temporary table, as that one will be read in a sequential scan anyway. You only need the covering PK index I lined out.
You may want to ANALYZE the temporary table after filling it, to give Postgres some column statistics to work with. But as long as you get the index-only scans I am aiming for, you can skip that, too. The query plan won't get any better than that.
The id is a primary key and so is indexed, so my first concern would be query setup time. A stored procedure is precompiled, for instance. A second tack is to put your set in a temp table, possibly also keyed on the id, so the two tables/indexes can be joined in one select. The indexes for this should be ordered, like tree not hash, so they can be merged.

Create.io Order of column when creating a table

I have CrateDb version 3.2.7 running under Windows Server 2012. I create a table like this:
create table test3 (firstcolumn bigint primary key, secondcolumn int, thirdcolumn timestamp, fourthcolumn double, fifthcolumn double, sixtcolumn smallint, seventhcolumn double, heightcolumn int, ninthcolumn smallint, tenthcolumn smallint) clustered into 12 shards with(number_of_replicas = 0, refresh_interval =0);
So I'm expecting the firstcolumn to be the first, and so on. But after the creation, when I do a SELECT * FROM test3, I get the following result:
It seems that the first column returned is the "fifth" Looks like columns are returned in alphabetical order.
Does it means that CrateDB created the columns in that order? Does it keeps the order somewhere? If columns are in alphabetical order, does that mean that if I want to COPY data from another dbms to CrateDB, then I have to export data based on alphabetical order?
For insert not necessarily, only if they are omitted do they have to be in an alphabetical order see here. Order doesn't seem to be "kept" anywhere per se.
COPY FROM is a different kind of import tactic and not quite what the good old INSERT would do. I would suggest writing a command line app to import data into cratedb. COPY FROM doesn't do any type checking, nor does it cast types and will always import the data as it was in the source file (see here). From your other question I see you may have gps related data (?) you will need to manually map them to a GEO_POINT type just as 1 example.
Crate offers good performance (whatever that means to you or me) with bulk endpoint

How do I join huge csv files (1000's of columns x 1000's rows) efficiently using C/C++?

I have several (1-5) very wide (~50,000 columns) .csv files. The files are (.5GB-1GB) in size (avg. size around 500MB). I need to perform a join on the files on a pre-specified column. Efficiency is, of course, the key. Any solutions that can be scaled out to efficiently allow multiple join columns is a bonus, though not currently required. Here are my inputs:
-Primary File
-Secondary File(s)
-Join column of Primary File (name or col. position)
-Join column of Secondary File (name or col. position)
-Left Join or Inner Join?
Output = 1 File with results of the multi-file join
I am looking to solve the problem using a C-based language, but of course an algorithmic solution would also be very helpful.
Assuming that you have a good reason not to use a database (for all I know, the 50,000 columns may constitute such a reason), you probably have no choice but to clench your teeth and build yourself an index for the right file. Read through it sequentially to populate a hash table where each entry contains just the key column and an offset in the file where the entire row begins. The index itself then ought to fit comfortably in memory, and if you have enough address space (i.e. unless you're stuck with 32-bit addressing) you should memory-map the actual file data so you can access and output the appropriate right rows easily as you walk sequentially through the left file.
Your best bet by far is something like Sqlite, there's C++ bindings for it and it's tailor made for lighting fast inserts and queries.
For the actual reading of the data, you can just go row by row and insert the fields in Sqlite, no need for cache-destroying objects of objects :) As an optimization, you should group up multiple inserts in one statement (insert into table(...) select ... union all select ... union all select ...).
If you need to use C or C++, open the file and load the file directly into a database such as MySQL. The C and C++ languages do not have adequate data table structures nor functionality for manipulating the data. A spreadsheet application may be useful, but may not be able to handle the capacities.
That said, I recommend objects for each field (column). Define a record (file specific) as a collection of fields. Read a text line from a file into a string. Let the record load the field data from the string. Store records into a vector.
Create a new record for the destination file. For each record from the input file(s), load the new record using those fields. Finally, for each record, print the contents of each field with separation characters.
An alternative is to whip up a 2 dimensional matrix of strings.
Your performance bottleneck will be I/O. You may want to read huge blocks of data in. The thorn to the efficiency is the variable record length of a CSV file.
I still recommend using a database. There are plenty of free ones out there, such as MySQl.
It depends on what you mean by "join". Are the columns in file 1 the same as in file 2? If so, you just need a merge sort. Most likely a solution based on merge sort is "best". But I agree with #Blindy above that you should use an existing tool like Sqlite. Such a solution is probably more future proof against changes to the column lists.

multiple strings as argument in table input

I'm trying to use SQL like select column from table where column in (?)
as ? should be concatenation of strings. I did script, that concatenates rows in something like 'string','secondstring' and so on.
I know, I should use just more parameters, but to the moment of execution I don't know, how many arguments there will be, and that is hundreds of them each time.
I'd like to do it in one SQL, so putting every argument in a single row, and check "execute for each row" isn't perfect either.
Any clue, how to do this?
You can use the cycles and variables kettle.
For example:
-create a job that contains:
1)a transformation where you store in an environment variable
(setVariable ("varname" value, "r") r is the parameter to be accessible by the parent job) the concat all input rows.
2)a transformation which makes the desired query with variable replacement (SELECT column FROM table WHERE column IN (${varname})).
If you need I can send the example files.