How to get the name of the document, the pipeline is currently working on?

How to get the name of the document, the pipeline is currently working on? - gate

Let's say, a corpus have 1k docs, and be processed by a pipeline.
At some point, the pipeline stucks, throws exception or have funny behavior. But all these are very likely to be document-relevant.
So it'd be nice to know which document is being processed in the pipeline. For example, to print out the doc name in a Jape transducer.

To get document processing you can write a simple JAPE rule like:
Phase: DocName
Input: Token
Options: control = once
Rule:DocName
(
{Token}
)
-->
{
System.out.println(doc.getName());
}
Put this rule as a first rule in your pipeline. I hope that you have a least 1 Token in the document.

Related

Jmeter - Run sample multiple times based on regular expression extractor

I need to run a HTML request multiple times based on a regular expression extractor applied on the same request. I have put the HTML request under while controller however, I am stuck with writing the correct javascript code in while controller.
Here is the while controller code: ${__javaScript("${uidvalue}"=="test")}.
I want the loop to end as soon as a uidvalue from the regular expression extractor is found. Can anybody help me with this?

If I understood your questions correctly, you want to keep on executing one request until response contains specific text.
I am able to do that by using JMeter's while controller.
Add While Controller to Test Plan.
Add request and Put Counter (Add-> Config Element-> Counter)
Please find following image 1 for reference of Counter
Put condition in While Controller following:
${__groovy(("${Status}"!="Success") && ${count} <= 5)}
Always use count to repeat request and avoid JMeter to go in infinite loop.Here I am waiting for 'Success' status and executing request 5 times only.
Please find following image 2 for reference of While loop
Thank you, I hope this helps.

How does multi-line logging work in Lambda -> CloudWatch

My multi-line logging events all end up multi-events - one event per line. According to the documentation:
Each call to LambdaLogger.log() results in a CloudWatch Logs event...
but then:
However, note that AWS Lambda treats each line returned by System.out
and System.err as a separate event.
Looking inside LambdaAppender's source code, it seems that it proceeds to log the event to System.out anyway. So does that mean multi-line messages will always be broken down into multiple event?
I have read about configuring the multi_line_start_pattern, but that seems only applicable when you get to deploy a log agent, which isn't accessible in Lambda.
[Edit] LambdaAppender logs to LambdaLogger which logs to System.out.
[Edit] I found some post where a workaround was suggested - use '\r' for the eol when printing the messages. This seems to work for messages that my code produces. Stack traces logged everywhere are still a problem.
[Edit] I have been using two workarounds:
Log complex data structures (e.g. sizable maps) in JSON. CloudWatch actually recognizes JSON strings in log events, and pretty print them.
Replace '\n' with '\r'. For stack traces I created a utility method (this is in Kotlin, but the idea is generic enough):
fun formatThrowable(t: Throwable): String {
val buffer = StringWriter()
t.printStackTrace(PrintWriter(buffer))
return buffer.toString().replace("\n", "\r")
}
I think in the long run a more ideal solution would be an Appender implementation that decorates ConsoleAppender, which would do the \r replacement on all messages passing through.

Best practice is to use json in your logs. Instead of sending multiline outputs, send a formatted json (regardless of the language you are using, you will find a lib that already does that for you)
You will be amazed how easy it gets to browse your logs from there. For instance, aws cloudwatch insights automatically detects your fields, it allow to parse them and query them within seconds

I suggest to use the project slf4j-simple-lambda and to refer to this blog for more explanations.
Using slf4j and slf4j-simple-lambda is solving elegantly your problem and the solution stay lightweight. The project includes the usage of the parameter org.slf4j.simpleLogger.newlineMethod which is there to solve this problem. By default, its value is auto and should be able to detect automatically the need for manual newline handling.
Discloser: I am co-author of slf4j-simple-lambda and author of the blog.

REST - post to get data. How else can this be done?

According to my understandings, you should not post to get data.
For example, I'm on a project and we are posting to get data.
For example, this following.
{
"zipCOde":"85022",
"city":"PHOENIX"
"country":"US"
"products":[
{
"sku":"abc-21",
"qty":2
},
{
"sku":"def-13",
"qty":2
}
]
}
Does it make sense to post? How could this be done without posting? There could be 1 or more products.

Actually there is a SEARCH method in HTTP, but sadly it is for webdav. https://msdn.microsoft.com/en-us/library/aa143053(v=exchg.65).aspx So if you want to send a request body with the request, then you can try with that.
POSTing is okay if you have a complex search. Complex search is relative, by me it means, that you have different logical operators in your query.
The current one is not that complex, and you can put the non-hierarchical components into the query string of the URI. An example with additional line breaks:
GET /products/?
zipCOde=85022&
city=PHOENIX&
country=US&
filters[0]['sku']=abc-21&
filters[0]['qty']=2&
filters[1]['sku']=def-13&
filters[1]['qty']=2
You can choose a different serialization format and encode it as URI component if you want.
GET /products/?filter={"zipCOde":"85022","city":"PHOENIX","country":"US","products":[{"sku":"abc-21","qty":2},{"sku":"def-13","qty":2}]}

One potential option is to JSON.serialize your object and send it as a query string parameter on the GET.

Load testing with SOAP UI

I have a SOAP UI 4.5.1, I have made a load test, it is working fine. My problem is that I run the same request every time and I need to change the values of the soap request I am sending.
For e.g. I have a block of my soap request:
<ns:Assessment>
<ns:Project>
<ns:ProviderId>SHL</ns:ProviderId>
<ns:ProjectId>SampleAssessment</ns:ProjectId>
</ns:Project>
</ns:Assessment>
Provider ID: SHL
Project ID: SampleAssessment
Is there a way to make those values changing from some kind of interval?
For e.g.: Provider IDs [SHL, SLH, LHS]
Project IDs [SampleAssessment, TestAssessment, AnotherAssessment]
And with a load test I am making three request so that for the first request values looks like this:
<ns:Assessment>
<ns:Project>
<ns:ProviderId>SHL</ns:ProviderId>
<ns:ProjectId>SampleAssessment</ns:ProjectId>
</ns:Project>
</ns:Assessment>
for the second like this:
<ns:Assessment>
<ns:Project>
<ns:ProviderId>SLH</ns:ProviderId>
<ns:ProjectId>TestAssessment</ns:ProjectId>
</ns:Project>
</ns:Assessment>
and so on...
Is there a way to make this happen with SOAP UI?

From my experience, you will need to use a Groovy Script step.
For example, if you have a step before your request that is a script, you can use something like:
context.setProperty("ProviderId", "SHL")
Then in your request, use:
<ns:ProviderId>${ProviderId}</ns:ProviderId>
Of course, this doesn't buy you much by itself. There are few ways to vary what the context.setProperty("ProviderId", "SHL") line will set. You can create a collection and iterate over it using something like:
def providers = ['ABC', 'DEF', 'GHI', 'JKL']
providers.each() {
context.setProperty("ProviderId", it)
testRunner.runTestStepByName( "nameofteststep" )
}
Where "nameofteststep" is the name of the Soap Request test step. This might sound odd, but if you right click the test step and disable it, the groovy script will still be able to execute it but it will not run sequentially. By that I mean that the groovy script will run it 4 times, but it won't run a fifth time when the script is complete because it is after the script. Then you just need to keep in mind that each load test thread makes four requests, but I am pretty sure that the SoapUI statistics will take this into account for you... might want to keep an eye out for it, though.
Alternatively, you could check the 'threadIndex' and set a the context variable based on that. A bit like this here: Log ThreadCount.
You could also use a collection without a loop and increment an index that you save as a testcase property and send the string corresponding to the index.
Personally, I think the first way is the most straightforward but I can provide an example of the other ones if you like.

There is a simple way of doing this without writing a groovy script.
After creating a test case you should include the below test steps:
1-Data source
2-Request
3-Loop
Data source will read an excel file (or other data source methods such as XML, groovy, JDBC, gird .. however the excel is the simplest one).
You should include the datas (that you need to change within the request)
Within the test request you need the right click and select "get data" . please notice that your test request should be as below
<ns:ProviderId>${ProviderId}</ns:ProviderId>
Then the last step is the "Loop" . This for returning to the first step until the data ends.
I hope this helps.

Create single and multiple resources using restful HTTP

In my API server I have this route defined:
POST /categories
To create one category you do:
POST /categories {"name": "Books"}
I thought that if you want to create multiple categories, then you could do:
POST /categories [{"name": "Books"}, {"name": "Games"}]
I just wanna confirm that this is a good practice for Restful HTTP API.
Or should one have a
POST /bulk
for allowing them to do whatever operations at once (Creating, Reading, Updating and Deleting)?

In true REST, you should probably POST this in multiple separate calls. The reason is that each one will result in a new representation. How would you expect to get that back otherwise.
Each post should return the resultant resource location:
POST -> New Resource Location
POST -> New Resource Location
...
However, if you need a bulk, then create a bulk. Be dogmatic where possible, but if not, pragmatism gets the job done. If you get too hung up on dogmatism, then you never get anything done.
Here is a similar question
Here is one that suggests HTTP Pipelining to make this more efficient

There's nothing particularly wrong with having a bulk operation that you POST to, to activate (it'll be non-idempotent so POST is the right verb) but there are some caveats:
You're making multiple resources, so you need to respond with multiple URLs. This means you can't use the redirect pattern: you'll have to send a list of URLs back in some form.
You have a problem in that bulk operations are often not very discoverable. Discoverability is one of the most important things about RESTfulness, as it means that someone can come along and figure out how to write a client without lots of help from the server author.
Dealing with partial failures when you've got bulk operations remains problematic. It's a problem with any other paradigm too (I've watched people tie themselves in knots over this when working with extensions to SOAP) so it isn't a surprise, but unless you can guarantee that all the creations will work, you're going to have to work out what happens when you make one resource and fail to make the second. (Also, if the bulk request wanted a third one done, would you go on and try that?)
The simplest approach is just to support one create per request; that's a much easier pattern to get right and is better understood all round.

There's nothing wrong with creating multiple resources at once with POST (just don't try it with PUT). It's not "un-REST-ful", especially if you create a representation for the bulk operation itself. I suggest you create an index resource at the same time you create the individual resources, and return a "303 See Other" to it. That index representation would then contain links to all of the created resources (and possibly error information if any of them failed).
POST /categories/uploads/
[{"name": "Books"}, {"name": "Games"}]
303 See Other
Location: /categories/uploads/321/
(actually, now that I think about it, 201 might be better than 303)
GET /categories/uploads/321/
200 OK
Content-Type: application/json
[{"name": "Books", "link": "/categories/Books/"},
{"name": "Games", "error": "The 'Games' category already exists."}]

In your case I would also go the /bulk resource way. But the pattern I would suggest is the following and from my understanding the most natural: Work with the 202 Accepted status code.
The idea of a bulk request is that the server should not be forced to answer immediately as this would mean client needs to wait until it's bulk request completed.
Here is the pattern:
POST /bulk [{"name": "Books"}, {"name": "Games"}]
202 Accepted | Location: /bulk/processing/status/resourceId
GET /bulk/processing/status/resourceId
entry = "REST in peace" | completed | 0 errors | /categories/category/resourceId
entry = "Walking dead" | processing | 0 errors ->
So, the client POSTs the bulk information to the server. The server just accepts them with a 202 which gives no guarantee about the processing state at the time of response.
But the server also provides the link to a status resource. Here the client can have a look on each of the created resources and the processing state. When finished the client can access the resource via the given link.
Error cases can be identified by the client and erroneous data might be resend by a PUT on the completed resource.
Finally, a good advice I am usually following is: Whenever you hit a resource in your design that cannot be mapped on a HTTP feature it is probably because of a missing resource.

Actually this is still a hot topic till today, But simplify things I almost of the time say there is always a batter suited scenario for each practice.
Eg:
1. If you are receiving the likes from a post you don't need the bulk as in case there is only one like per comment.
2. If you are receiving favorites comment the bulk can fit well by considering someone reviewing the comment he reads and check box all of his favorites and send it once.
Again this is based on my experience working with Restful API, and but currently for the sake of multi tasking and others things, me and my colleague we found our selves doing the bulk all the time in most MIS(Management Information System) we do. This is because modern days web app and mobile app that can do a lot of work and send the final results to the back-end, this way the back-end has little job to do as long as the data received don't violate the business logic.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js