read csv using jmeter(starting from x) - unit-testing

I'm writing a jmeter script and I have a huge csv file with a bunch of data which I use in my requests, is it possible to start not from first entry but from 5th or nth entry?

Looking at the CSVDataSet, it doesn't seem to directly support skipping to a given row. However, you can emulate the same effect by first executing N loops where you just read from the data set and do nothing with the data. This is then followed by a loop containing your actual tests. It's been a while since I've used JMeter - for this approach to work, you must share the same CVSDataSet between both loops.
If that's not possible, then there is an alternative. In your main test loop, use a Counter and a If Controller. The Counter counts up from 1. The If controller contains your tests, with the condition ${Counter}>N where N is the number to skip. ("Counter" In the expression is whatever you set the "reference name" property" to in the counter.)

mdma's 2nd idea is a clean way to do it, but here are two other options that are simple, but annoying to do:
Easiest:
Create separate CSV files for where you want to start the file, deleting the rows your don't need. I would create separate CSV data config elements for each CSV file, and then just disable the ones you don't want to run.
Less Easy:
Create a new column in your CSV file, called "ignore". In the rows you want to skip, enter the value "True". In your test plan, create an IF controller that is parent to your requests. Make the If condition: "${ignore}"!="True" (include the quotes and note that 'true' is case sensitive). This will skip the requests if the 'ignore' column has a value of 'true'.
Both methods require modifying the CSV file, but method two has other applications (like excluding a header row) and can be fast if you're using Open Office, Excel, etc.

Related

Loop over all the files from a folder and import one that match regex criteria in ssis

I have a particular task that i need to get done but i find difficult to find any matching case on the internet.
In the company i work for, we have a VPN that every day folders named with the current date are dropped.
I need to create an ETL (in SSIS) which will loop over all the files from specific folders and extract one file that i need and then populate a table.
The name of the particular file changes every day. It holds the same n first characters and ends with a date that may be one day before the current date or two or three.
It's straightforward that i need to use a foreach loop container to loop over all the files of the folder. But how can i select the one file that starts with specific characters?
Essentially, does anybody know how can I use regular expression in a connection in SSIS?
Thank you,
SSIS for each loop container accepts * operator. So for my case, my files are a1,a2,a3,a4 etc. a* then read them sequentially. You just need to capture it in a variable and use it inside loop.
Try like this below :
You could use a Script Task for this, and use the System.IO namespace to find the file you need. You could use System.IO.Directory.EnumerateFiles to loop over the files, or, as you know what the filename will look like, you could check for the existence of the file with yesterday's date, then the day before, etc. in a loop (going back as far as you wish to), and then set a variable with the file path once you find one that exists. You could then use this variable as the Connection Manager's Connection String, setting it in the Connection Manager's Properties (Expressions).

Is it bad practice to use cfquery inside cfloop?

I am having an array of structure. I need to insert all the rows from that array to a table.
So I have simply used cfquery inside cfloop to insert into the database.
Some people suggested me not to use cfquery inside cfloop as each time it will make a new connection to the database.
But in my case Is there any way I can do this without using cfloop inside cfquery?
Its not so much about maintaining connections as hitting the server with 'n' requests to insert or update data for every iteration in the cfloop. This will seem ok with a test of a few records, but then when you throw it into production and your client pushes your application to look around a couple of hundred rows then you're going to hit the database server a couple of hundred times as well.
As Scott suggests you should see about looping around to build a single query rather than the multiple hits to the database. Looping around inside the cfquery has the benefit that you can use cfqueryparam, but if you can trust the data ie. it has already been sanatised, you might find it easier to use something like cfsavecontent to build up your query and output the string inside the cfquery at the end.
I have used both the query inside loop and loop inside query method. While having the loop inside the query is theoretically faster, it is not always the case. You have to try each method and see what works best in your situation.
Here is the syntax for loop inside query, using oracle for the sake of picking a database.
insert into table
(field1, field2, etc)
select null, null, etc
from dual
where 1 = 2
<cfloop>
union
select <cfqueryparam value="#value1#">
, <cfqueryparam value="#value2#">
etc
from dual
</cfloop>
Depending on the database, convert your array of structures to XML, then pass that as a single parameter to a stored procedure.
In the stored procedure, do an INSERT INTO SELECT, where the SELECT statement selects data from the XML packet. You could insert hundreds or thousands of records with a single INSERT statement this way.
Here's an example.
There is a limit to how many <CFQUERY><cfloop>... iterations you can do when using <cfqueryparam>. This is also vendor specific. If you do not know how many records you will be generating, it is best to remove <cfqueryparam>, if it is safe to do so. Make sure your data is coming from trusted sources & is sanitised. This approach can save huge amounts of processing time, because it is only make one call to the database server, unlike an outer loop.

Using RegEx in SSIS

I currently have a package pulling data from an excel file, but when pulling the data out I get rows I do not want. So I need to extract everything from the 'ID' field that has any sort of letter in it.
I need to be able to run a RegEx command such as "%[a-zA-Z]%" to pull out that data. But with the current limitation of conditional split it's not letting me do that. Any ideas on how this can be done?
At the core of the logic, you would use a Script Transformation as that's the only place you can access the regex.
You could simply a second column to your data flow, IDCleaned and that column would only contain cleaned values or a NULL. You could then use the Conditional Split to filter good rows vs bad. System.Text.RegularExpressions.Regex.Replace error in C# for SSIS
If you don't want to add another column, you can set your current ID column to be ReadWrite for the Script and then update in place. Perhaps adding a boolean column might make the Conditional Split logic easier at this point.

Informatica target file

I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?
Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.
Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.

How do I join huge csv files (1000's of columns x 1000's rows) efficiently using C/C++?

I have several (1-5) very wide (~50,000 columns) .csv files. The files are (.5GB-1GB) in size (avg. size around 500MB). I need to perform a join on the files on a pre-specified column. Efficiency is, of course, the key. Any solutions that can be scaled out to efficiently allow multiple join columns is a bonus, though not currently required. Here are my inputs:
-Primary File
-Secondary File(s)
-Join column of Primary File (name or col. position)
-Join column of Secondary File (name or col. position)
-Left Join or Inner Join?
Output = 1 File with results of the multi-file join
I am looking to solve the problem using a C-based language, but of course an algorithmic solution would also be very helpful.
Assuming that you have a good reason not to use a database (for all I know, the 50,000 columns may constitute such a reason), you probably have no choice but to clench your teeth and build yourself an index for the right file. Read through it sequentially to populate a hash table where each entry contains just the key column and an offset in the file where the entire row begins. The index itself then ought to fit comfortably in memory, and if you have enough address space (i.e. unless you're stuck with 32-bit addressing) you should memory-map the actual file data so you can access and output the appropriate right rows easily as you walk sequentially through the left file.
Your best bet by far is something like Sqlite, there's C++ bindings for it and it's tailor made for lighting fast inserts and queries.
For the actual reading of the data, you can just go row by row and insert the fields in Sqlite, no need for cache-destroying objects of objects :) As an optimization, you should group up multiple inserts in one statement (insert into table(...) select ... union all select ... union all select ...).
If you need to use C or C++, open the file and load the file directly into a database such as MySQL. The C and C++ languages do not have adequate data table structures nor functionality for manipulating the data. A spreadsheet application may be useful, but may not be able to handle the capacities.
That said, I recommend objects for each field (column). Define a record (file specific) as a collection of fields. Read a text line from a file into a string. Let the record load the field data from the string. Store records into a vector.
Create a new record for the destination file. For each record from the input file(s), load the new record using those fields. Finally, for each record, print the contents of each field with separation characters.
An alternative is to whip up a 2 dimensional matrix of strings.
Your performance bottleneck will be I/O. You may want to read huge blocks of data in. The thorn to the efficiency is the variable record length of a CSV file.
I still recommend using a database. There are plenty of free ones out there, such as MySQl.
It depends on what you mean by "join". Are the columns in file 1 the same as in file 2? If so, you just need a merge sort. Most likely a solution based on merge sort is "best". But I agree with #Blindy above that you should use an existing tool like Sqlite. Such a solution is probably more future proof against changes to the column lists.