I am using eclipse Jetty 9.2.2. I am surprised to see that 10-12% of my request are taking around 500+ mili-seconds each while others get processed within 2-3 mili-seconds each. portion of the code taking time
BufferedReader br = new BufferedReader(new InputStreamReader(request.getInputStream()));
String line = null;
String postData = "";
while ((line = br.readLine()) != null){
postData+=line;
}
br.close();
Related
I am working on a C++ library that does an API request using curl. I'm having an issue where I am trying to process the cookies that are returned in the response using curl_slist and curl_easy_getinfo(curl, CURLINFO_COOKIELIST, &cookies);.
It looks like its a linked list so I have a current and next attributes and I'm checking when next is NULL I should break out of the loop but for some reason its getting stuck in the loop, looks like its doing the same cookie over and over again but can't see why.
struct curl_slist* cookies;
curl_easy_getinfo(curl, CURLINFO_COOKIELIST, &cookies);
int i = 1;
std::vector<CookieObj> cookieObjects;
struct curl_slist* current = cookies;
if (current != NULL)
{
string cookieString = string(current->data);
CookieObj cookieObj = storeCookie(cookieString);
cookieObjects.push_back(cookieObj);
while (current->next != NULL)
{
string cookieString = string(current->data);
CookieObj cookieObj = this->storeCookie(cookieString);
cookieObjects.push_back(cookieObj);
current = cookies->next;
i++;
}
The problem is this line
current = cookies->next
cookies always points to the same thing, so cookies->next will always point to the same thing. SO, yes, the loop will continue indefinitely (assuming there are at least 2 cookies in the list). You want instead:
current = current->next;
code simplification
Your current code also process the first cookie twice since current isn't updated until the end. One possible code rearrangement:
for (current=cookies; current; current=current->next) {
/* process cookie */
}
Please see the below code sample
JavaRDD<String> mapRDD = filteredRecords
.map(new Function<String, String>() {
#Override
public String call(String url) throws Exception {
BufferedReader in = null;
URL formatURL = new URL((url.replaceAll("\"", ""))
.trim());
try {
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
} finally {
if (in != null) {
in.close();
}
}
}
});
here url is http GET request. example
http://ip:port/cyb/test?event=movie&id=604568837&name=SID×tamp_secs=1460494800×tamp_millis=1461729600000&back_up_id=676700166
This piece of code is very slow . IP and port are random and load is distributed so ip can have 20 different value with port so I dont see bottleneck .
When I comment
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
The code is too fast.
NOTE: Input data to process is 10GB. Using spark to read from S3.
is there anything wrong I am doing with BufferedReader or InputStreamReader any alternative .
I cant use foreach in spark as I have to get the response back from server and need to save JAVARdd as textFile on HDFS.
if we use mappartition code something as below
JavaRDD<String> mapRDD = filteredRecords.mapPartitions(new FlatMapFunction<Iterator<String>, String>() {
#Override
public Iterable<String> call(Iterator<String> tuple) throws Exception {
final List<String> rddList = new ArrayList<String>();
Iterable<String> iterable = new Iterable<String>() {
#Override
public Iterator<String> iterator() {
return rddList.iterator();
}
};
while(tuple.hasNext()) {
URL formatURL = new URL((tuple.next().replaceAll("\"", ""))
.trim());
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
try(BufferedReader br = new BufferedReader(new InputStreamReader(con
.getInputStream()))) {
rddList.add(br.readLine());
} catch (IOException ex) {
return rddList;
}
}
return iterable;
}
});
here also for each record we are doing same .. isnt it ?
Currently you are using
map function
which creates a url request for each row in the partition.
You can use
mapPartition
Which will make the code run faster as it creates connection to the server only once , that is only one connection per partition.
A big cost here is setting up TCP/HTTPS connections. This is exacerbated by the fact that Even if you only read the first (short) line of a large file, in an attempt to re-use HTTP/1.1 connections better, modern HTTP clients try to read() to the end of the file, so avoiding aborting the connection. This is a good strategy for small files, but not for those in MB.
There is a solution there: set the content-length on the read, so that only a smaller block is read in, reducing the cost of the close(); the connection recycling then reduces HTTPS setup costs. This is what the latest Hadoop/Spark S3A client does if you set fadvise=random on the connection: requests blocks rather than the entire multi-GB file. Be aware though: that design is actually really bad if you are going byte-by-byte through a file...
I want to use "GATE" through web. Then I decide to create a SOAP web service in java with help of GATE Embedded.
But for the same document and saved Pipeline, I have a different run-time duration, when GATE Embedded runs as a java web service.
The same code has a constant run-time when it runs as a Java Application project.
In the web service, the run-time will be increasing after each execution until I get a Timeout error.
Does any one have this kind of experience?
This is my Code:
#WebService(serviceName = "GateWS")
public class GateWS {
#WebMethod(operationName = "gateengineapi")
public String gateengineapi(#WebParam(name = "PipelineNumber") String PipelineNumber, #WebParam(name = "Documents") String Docs) throws Exception {
try {
System.setProperty("gate.home", "C:\\GATE\\");
System.setProperty("shell.path", "C:\\cygwin2\\bin\\sh.exe");
Gate.init();
File GateHome = Gate.getGateHome();
File FrenchGapp = new File(GateHome, PipelineNumber);
CorpusController FrenchController;
FrenchController = (CorpusController) PersistenceManager.loadObjectFromFile(FrenchGapp);
Corpus corpus = Factory.newCorpus("BatchProcessApp Corpus");
FrenchController.setCorpus(corpus);
File docFile = new File(GateHome, Docs);
Document doc = Factory.newDocument(docFile.toURL(), "utf-8");
corpus.add(doc);
FrenchController.execute();
String docXMLString = null;
docXMLString = doc.toXml();
String outputFileName = doc.getName() + ".out.xml";
File outputFile = new File(docFile.getParentFile(), outputFileName);
FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
OutputStreamWriter out;
out = new OutputStreamWriter(bos, "utf-8");
out.write(docXMLString);
out.close();
gate.Factory.deleteResource(doc);
return outputFileName;
} catch (Exception ex) {
return "ERROR: -> " + ex.getMessage();
}
}
}
I really appreciate any help you can provide.
The problem is that you're loading a new instance of the pipeline for every request, but then not freeing it again at the end of the request. GATE maintains a list internally of every PR/LR/controller that is loaded, so anything you load with Factory.createResource or PersistenceManager.loadObjectFrom... must be freed using Factory.deleteResource once it is no longer needed, typically using a try-finally:
FrenchController = (CorpusController) PersistenceManager.loadObjectFromFile(FrenchGapp);
try {
// ...
} finally {
Factory.deleteResource(FrenchController);
}
But...
Rather than loading a new instance of the pipeline every time, I would strongly recommend you explore a more efficient approach to load a smaller number of instances of the pipeline but keep them in memory to serve multiple requests. There is a fully worked-through example of this technique in the training materials on the GATE wiki, in particular module number 8 (track 2 Thursday).
I am having an issue with a WMI query.
I use a WMI query to search and resume an instance in BizTalk.
When there are not that many instances (so when the data isn't that much) the query performs pretty good.
But when the data is large (about 3000 instances) the query takes about 6 - 10 seconds to execute, and this isn't tolerable.
Code is as following:
string query = "SELECT * FROM MSBTS_ServiceInstance WHERE InstanceID = \"" + OrchestrationId + "\"";
ManagementObjectSearcher searcher = new ManagementObjectSearcher(new ManagementScope(#"root\MicrosoftBizTalkServer"), new WqlObjectQuery(query), null);
int count = searcher.Get().Count;
if (count > 0)
{
string[] strArray = new string[count];
string[] strArray2 = new string[count];
string[] strArray3 = new string[count];
string str2 = string.Empty;
string str3 = string.Empty;
int index = 0;
foreach (ManagementObject obj2 in searcher.Get())
{
if (str2 == string.Empty)
{
str2 = obj2["HostName"].ToString();
}
strArray2[index] = obj2["ServiceClassId"].ToString();
strArray3[index] = obj2["ServiceTypeId"].ToString();
strArray[index] = obj2["InstanceID"].ToString();
str3 = str3 + string.Format(" {0}\n", obj2["InstanceID"].ToString());
index++;
}
new ManagementObject(string.Format("root\\MicrosoftBizTalkServer:MSBTS_HostQueue.HostName=\"{0}\"", str2)).InvokeMethod("ResumeServiceInstancesByID", new object[] { strArray2, strArray3, strArray, 1 });
It's the first query (Select * from MSBS_ServiceInstance..) that takes to long when te data is getting bigger.
Any ideas how I can improve this?
The platform is Windows Server 2008 Enterprise..
Thx!
It looks like you are getting all service instances for your orchestration, not just the suspended ones.
Try adding the following to your query's where clause, so that only suspended and suspended-not-resumable service instances are returned:
and (ServiceStatus = 4 or ServiceStatus = 16)
Thank you for the replies.
The reason I got that many suspended instances sometimes is by design.
Whenever a message isn't in sequence, the orchestration gets suspended until the previous message went through.
I found another way to resume the instances using the BizTalkOperations class that is installed with BizTalk:
BizTalkOperations operations = new BizTalkOperations(dataSource, initialCatalog);
foreach (Guid id in instanceIds)
{
operations.ResumeInstance(id);
}
This code is much more performant then the WMI code (and less code ^^) :)
Thanks
Environment: C#, .Net 3.5, Sql Server 2005
I have a method that works in a stand-alone C# console application project. It creates an XMLElement from data in the database and uses a private method to send it to a web service on our local network. When run from VS in this test project, it runs in < 5 seconds.
I copied the class into a CLR project, built it, and installed it in SQL Server (WITH PERMISSION_SET = EXTERNAL_ACCESS). The only difference is the SqlContext.Pipe.Send() calls that I added for debugging.
I am testing it by using an EXECUTE command one stored procedure (in the CLR) from an SSMS query window. It never returns. When I stop execution of the call after a minute, the last thing displayed is "Calling GetResponse() using http://servername:53694/odata.svc/Customers/". Any ideas as to why the GetResponse() call doesn't return when executing within SQL Server?
private static string SendPost(XElement entry, SqlString url, SqlString entityName)
{
// Send the HTTP request
string serviceURL = url.ToString() + entityName.ToString() + "/";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(serviceURL);
request.Method = "POST";
request.Accept = "application/atom+xml,application/xml";
request.ContentType = "application/atom+xml";
request.Timeout = 20000;
request.Proxy = null;
using (var writer = XmlWriter.Create(request.GetRequestStream()))
{
entry.WriteTo(writer);
}
try
{
SqlContext.Pipe.Send("Calling GetResponse() using " + request.RequestUri);
WebResponse response = request.GetResponse();
SqlContext.Pipe.Send("Back from GetResponse()");
/*
string feedData = string.Empty;
Stream stream = response.GetResponseStream();
using (StreamReader streamReader = new StreamReader(stream))
{
feedData = streamReader.ReadToEnd();
}
*/
HttpStatusCode StatusCode = ((HttpWebResponse)response).StatusCode;
response.Close();
if (StatusCode == HttpStatusCode.Created /* 201 */ )
{
return "Created # Location= " + response.Headers["Location"];
}
return "Creation failed; StatusCode=" + StatusCode.ToString();
}
catch (WebException ex)
{
return ex.Message.ToString();
}
finally
{
if (request != null)
request.Abort();
}
}
The problem turned out to be the creation of the request content from the XML. The original:
using (var writer = XmlWriter.Create(request.GetRequestStream()))
{
entry.WriteTo(writer);
}
The working replacement:
using (Stream requestStream = request.GetRequestStream())
{
using (var writer = XmlWriter.Create(requestStream))
{
entry.WriteTo(writer);
}
}
You need to dispose the WebResponse. Otherwise, after a few calls it goes to timeout.
You are asking for trouble doing this in the CLR. And you say you are calling this from a trigger? This belongs in the application tier.
Stuff like this is why when the CLR functionality came out, DBAs were very concerned about how it would be misused.