problem: in c++, using sqlite, im looking for a way to insert multiple rows into a table, directly from a string containing multiple "rows".
i would like to "reformulate" the following statement for insertion into a table T
INSERT into T VALUES(row1); // 1st row insertion.
INSERT into T VALUES(row2); // 2nd row insertion.
i am seeking to substitute it with something like
INSERT into T mycsvvals; // std::string mycsvvals contains 2 csv-rows
where mycsvvals is a string read-in from a CSV file, "containing" 2 rows of comma separated values.
i could parse the string first, put contents into an array and loop the inserts. however, im wondering if sqlite already provides an efficient bulk import/insert from a string, instead of but similar to import/insert from a csv file.
please, can you provide any links/info that will allow me to achieve this?
thx
the answer provided by #Shawn was not what i was hoping for in terms of brevity; it is not possible to bulk insert directly from a csv-string, some looping structure is necessary. but, in terms of efficiency, its the recommended way to go. thanks #Shawn. the full "prescription" for "prepared statements" in sqlite is given and examplified here:
http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/
Related
I have stored a set of four numbers in an array which I want to add to a CSV file under the 'Score' column. Could anyone please help me with this.
You can't do that. One solution would be to place Score at the end. The only quick 'solution' I think is to 'model' your data like this:
game_id, score_1 {,..., score_n}
So you don't have to care about scores' array size, since all the next values after game_id will be scores.
(Note that this is a quick fix example for your problem. I dont know your data model and probably you will have to modify things, but you've my approach)
Also when your data model becomes harder, it's not good idea to work with files. Consider storing your data into a database like SQLite or whatever.
I currently have a package pulling data from an excel file, but when pulling the data out I get rows I do not want. So I need to extract everything from the 'ID' field that has any sort of letter in it.
I need to be able to run a RegEx command such as "%[a-zA-Z]%" to pull out that data. But with the current limitation of conditional split it's not letting me do that. Any ideas on how this can be done?
At the core of the logic, you would use a Script Transformation as that's the only place you can access the regex.
You could simply a second column to your data flow, IDCleaned and that column would only contain cleaned values or a NULL. You could then use the Conditional Split to filter good rows vs bad. System.Text.RegularExpressions.Regex.Replace error in C# for SSIS
If you don't want to add another column, you can set your current ID column to be ReadWrite for the Script and then update in place. Perhaps adding a boolean column might make the Conditional Split logic easier at this point.
I have several (1-5) very wide (~50,000 columns) .csv files. The files are (.5GB-1GB) in size (avg. size around 500MB). I need to perform a join on the files on a pre-specified column. Efficiency is, of course, the key. Any solutions that can be scaled out to efficiently allow multiple join columns is a bonus, though not currently required. Here are my inputs:
-Primary File
-Secondary File(s)
-Join column of Primary File (name or col. position)
-Join column of Secondary File (name or col. position)
-Left Join or Inner Join?
Output = 1 File with results of the multi-file join
I am looking to solve the problem using a C-based language, but of course an algorithmic solution would also be very helpful.
Assuming that you have a good reason not to use a database (for all I know, the 50,000 columns may constitute such a reason), you probably have no choice but to clench your teeth and build yourself an index for the right file. Read through it sequentially to populate a hash table where each entry contains just the key column and an offset in the file where the entire row begins. The index itself then ought to fit comfortably in memory, and if you have enough address space (i.e. unless you're stuck with 32-bit addressing) you should memory-map the actual file data so you can access and output the appropriate right rows easily as you walk sequentially through the left file.
Your best bet by far is something like Sqlite, there's C++ bindings for it and it's tailor made for lighting fast inserts and queries.
For the actual reading of the data, you can just go row by row and insert the fields in Sqlite, no need for cache-destroying objects of objects :) As an optimization, you should group up multiple inserts in one statement (insert into table(...) select ... union all select ... union all select ...).
If you need to use C or C++, open the file and load the file directly into a database such as MySQL. The C and C++ languages do not have adequate data table structures nor functionality for manipulating the data. A spreadsheet application may be useful, but may not be able to handle the capacities.
That said, I recommend objects for each field (column). Define a record (file specific) as a collection of fields. Read a text line from a file into a string. Let the record load the field data from the string. Store records into a vector.
Create a new record for the destination file. For each record from the input file(s), load the new record using those fields. Finally, for each record, print the contents of each field with separation characters.
An alternative is to whip up a 2 dimensional matrix of strings.
Your performance bottleneck will be I/O. You may want to read huge blocks of data in. The thorn to the efficiency is the variable record length of a CSV file.
I still recommend using a database. There are plenty of free ones out there, such as MySQl.
It depends on what you mean by "join". Are the columns in file 1 the same as in file 2? If so, you just need a merge sort. Most likely a solution based on merge sort is "best". But I agree with #Blindy above that you should use an existing tool like Sqlite. Such a solution is probably more future proof against changes to the column lists.
I am trying to insert a large number of records into a SQLite database. I get the above error if I try to use the sqlite3_exec C-API.
The code looks like this:
ret = sqlite_exec(db_p,".import file.txt table", NULL, NULL, NULL);
I know that the .import is command line, but can there be any way that you can do a extremely large insert of records that takes minimal time. I have read through previous bulk insert code and attempted to make changes but these are not providing the desired results.
Is there not a way to directly insert the string into the tables without having intermediate API's being called?
.import is most probably not available via the API. However there's one crucial thing to speed up inserts: wrap them in a transaction.
BEGIN;
lots of insert statements here;
COMMIT;
Without this, sqlite will need to write to the file after each insert to keep the ACID principle. The transaction let's it write to file later in bulk.
The answer to the syntax error could well be, that your strings are not enclosed in quotes in your SQL statement.
I'm writing a jmeter script and I have a huge csv file with a bunch of data which I use in my requests, is it possible to start not from first entry but from 5th or nth entry?
Looking at the CSVDataSet, it doesn't seem to directly support skipping to a given row. However, you can emulate the same effect by first executing N loops where you just read from the data set and do nothing with the data. This is then followed by a loop containing your actual tests. It's been a while since I've used JMeter - for this approach to work, you must share the same CVSDataSet between both loops.
If that's not possible, then there is an alternative. In your main test loop, use a Counter and a If Controller. The Counter counts up from 1. The If controller contains your tests, with the condition ${Counter}>N where N is the number to skip. ("Counter" In the expression is whatever you set the "reference name" property" to in the counter.)
mdma's 2nd idea is a clean way to do it, but here are two other options that are simple, but annoying to do:
Easiest:
Create separate CSV files for where you want to start the file, deleting the rows your don't need. I would create separate CSV data config elements for each CSV file, and then just disable the ones you don't want to run.
Less Easy:
Create a new column in your CSV file, called "ignore". In the rows you want to skip, enter the value "True". In your test plan, create an IF controller that is parent to your requests. Make the If condition: "${ignore}"!="True" (include the quotes and note that 'true' is case sensitive). This will skip the requests if the 'ignore' column has a value of 'true'.
Both methods require modifying the CSV file, but method two has other applications (like excluding a header row) and can be fast if you're using Open Office, Excel, etc.