Multiple iterations in the htmlextra report with newman - postman

I am fairly new to using newman and I am trying to figure out how exactly to create multiple iterations within one report.
I cannot find the htmlextra.js file anywhere locally on my laptop (Win 10) to just change that field stated on: https://hub.docker.com/r/dannydainton/htmlextra
image
Can anyone please help me out on how to add more than 1 iteration to a collection for the reporter?
Thank you very much and sorry to bother you all with this basic question, but I just cannot figure it out.

Iteration is set through newman and not htmlextra report , you can add iteration count through -n flag.
newman run collection.json -n 5 -r htmlextra
This will run collection 5 times
https://www.npmjs.com/package/newman will show all newman specific flag and
https://www.npmjs.com/package/newman-reporter-htmlextra shows all htmlextra specific flags
-n , --iteration-count Specifies the number of times the collection has to be run when used in conjunction with iteration
data file.

Related

Postman - how to target the final iteration?

I would like to run a Postman test on just the final iteration of a test run - I am building a variable (array) of response time values across all of the iterations and then want to test this variable for extreme values once we've reached the last / final iteration.
I hoped pm.info.iteration would have my answer but didn't see anything relevant.
I'm using a data file - the test runner highlights how many iterations are applicable (rows in the csv) as soon as the file is chosen so I'm guessing that Postman does know 'final' iteration? I just haven't worked out how to get at it.
My workaround is to hard code the number of iterations per test run based on how many rows my csvs currently have (e.g. if(pm.iteration.info === 70) but not ideal as the data file is likely to grow.
As #DannyDainton mentioned you can use iterationCOunt
iteration index starts from 0 ,so use
(pm.info.iteration === (pm.info.iterationCount-1) )

Computing GroupBy once then passing it to multiple transformations in Google DataFlow (Python SDK)

I am using Python SDK for Apache Beam to run a feature extraction pipeline on Google DataFlow. I need to run multiple transformations all of which expect items to be grouped by key.
Based on the answer to this question, DataFlow is unable to automatically spot and reuse repeated transformations like GroupBy, so I hoped to run GroupBy first and then feed the result PCollection to other transformations (see sample code below).
I wonder if this is supposed to work efficiently in DataFlow. If not, what is a recommended workaround in Python SDK? Is there an efficient way to have multiple Map or Write transformations taking results of the same GroupBy? In my case, I observe DataFlow scale to the max number of workers at 5% utilization and make no progress at the steps following the GroupBy as described in this question.
Sample code. For simplicity, only 2 transformations are shown.
# Group by key once.
items_by_key = raw_items | GroupByKey()
# Write groupped items to a file.
(items_by_key | FlatMap(format_item) | WriteToText(path))
# Run another transformation over the same group.
features = (items_by_key | Map(extract_features))
Feeding output of a single GroupByKey step into multiple transforms should work fine. But the amount of parallelization you can get depends on the total number of keys available in the original GroupByKey step. If any one of the downstream steps are high fanout, consider adding a Reshuffle step after those steps which will allow Dataflow to further parallelize execution.
For example,
pipeline | Create([<list of globs>]) | ParDo(ExpandGlobDoFn()) | Reshuffle() | ParDo(MyreadDoFn()) | Reshuffle() | ParDo(MyProcessDoFn())
Here,
ExpandGlobDoFn: expands input globs and generates files
MyReadDoFn: reads a given file
MyProcessDoFn: processes an element read from a file
I used two Reshuffles here (note that Reshuffle has a GroupByKey in it) to allow (1) parallelizing reading of files from a given glob (2) parallelizing processing of elements from a given file.
Based on my experience in troubleshooting this SO question, reusing GroupBy output in more than one transformation can make your pipeline extremely slow. At least this was my experience with Apache Beam SDK 2.11.0 for Python.
Common sense told me that branching out from a single GroupBy in the execution graph should make my pipeline run faster. After 23 hours of running on 120+ workers, the pipeline was not able to make any significant progress. I tried adding reshuffles, using a combiner where possible and disabling the experimental shuffling service.
Nothing helped until I split the pipeline into two ones. The first pipeline computes the GroupBy and stores it in a file (I need to ingest it "as is" to the DB). The second reads the file with a GroupBy output, reads additional inputs and run further transformations. The result - all transformations successfully finished under 2 hours. I think if I just duplicated the GroupBy in my original pipeline, I would have probably achieved the same results.
I wonder if this is a bug in the DataFlow execution engine or the Python SDK, or it works as intended. If it is by design, then at least it should be documented, and the pipeline like this should not be accepted when submitted, or there should be a warning.
You can spot this issue by looking at the 2 branches coming out of the "Group keywords" step. It looks like the solution is to rerun GroupBy for each branch separately.

rrdtool xport - limit on DEFs

I have a script that generates command line invocations of rrdtool xport based on input provided in a domain specific language. This works well, until the number of DEFs in the command line exceed a certain number - it seems to be around 50. At that point the command simply returns without any output or error information.
Is there a limit on the number of DEFs in rrdtool export? If so, then can it be raised or circumvented?
The issue turned out to be the character limit on the command line sent to the shell via Python's os.system method call. The issue can be worked around by creating a temporary executable script, writing the command line to the script and executing it.

Use a long running database migration script

I'm trialing FluentMigrator as a way of keeping my database schema up to date with minimum effort.
For the release I'm currently building, I need to run a database script to make a simple change to a large number of rows of existing data (around 2% of 21,000,000 rows need to be updated).
There's too much data for to be updated in a single transaction (the transaction log gets full and the script aborts), so I use a WHILE loop to iterate through the table, updating 10,000 rows at a time, each batch in a separate transacticon. This works, and takes around 15 minutes to run to completion.
Now I have the script complete, I'm trying to integrate it into FluentMigrator.
FluentMigrator seems to run all the migrations for a single batch in one transaction.
How do I get FM to run each migration in a separate transaction?
Can I tell FM to not use a transaction for a specific migration?
This is not possible as of now.
There are ongoing discussions and some work already in progress.
Check it out here : https://github.com/schambers/fluentmigrator/pull/178
But your use case will surely help in pushing the things in the right direction.
You are welcome to take part to the discussion!
Maybe someone will find a temporary workaround?

how to ignore list of attributes using the command line while clustering in weka?

I am running a series of clustering analyses in weka and I have realized that automatizing it is the way to go if I want to get somewhere. I'll explain a bit how I am working.
I do all the pre-processing manually in R and save it as a csv file, importing it in weka and saving it again as an arff file.
I use weka's GUI, and in general I just open my data with in the arff file and go directly to the clustering tab and play around. (My experience using the CLI is limited).
I am trying to reproduce some results I've got by using the GUI, but now with commands in the CLI. The problem is that I usually ignore a list of attributes when clustering using the GUI. I cannot find a way of selecting a list of attributes to be ignored in the command line.
For example:
java weka.clusterers.XMeans \
-I 10 -M 1000 -J 1000 \
-L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
My experience with weka is limited so I don't know if I am missing some basic understanding of how it works.
Data Preprocessing functions are called filters.
You need to use filters together with cluster algorithm.
See below example.
java weka.clusterers.FilteredClusterer \
-F weka.filters.unsupervised.attribute.Remove -V -R 1,5 \
-W weka.clusterers.XMeans -I 10 -M 1000 -J 1000 -L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
Here we remove attributes 1-5 then use xmeans.
To ignore an attribute you have to do it from the distance function
Ignore attributes from command line (Matlab):
COLUMNS = '3-last'; % The indices start from 1, 'first' and 'last' are valid as well. E.g .: first-3,5,6-last
Df = weka.core.EuclideanDistance (); % Setup distance function.
Df.setAttributeIndices (COLUMNS); % Setup distance function.
Ignore attributes from GUI
Ignore attributes from GUI
I do not understand why when someone asks how to ignore attributes all the answers say how to modify the dataset, using a filter in the preprocess section.