I am doing a data extraction project where i am required to build a web scraping program written using python using selenium and phantomjs headless webkit as browser for scaping public information like friendlist in facebook.The program is starting fairly fast but after a day of running it is getting slower and slower and I cannot figure out why ?? Can anyone give me an idea why it is getting slower ? I am running on a local machine which pretty good specs of 4gb ram and quad core processor . Does FB provide any API to find friends of friends ?
We faced the same issue. We resolved this by closing browser automatically after particular time interval. Clear temporary cache and open new browser instance and continue the process.
Related
I had installed ColdFusion 2018 recently and with the installation less than a month old (and my understanding of the technology even less), my Cold Fusion service has stopped working. I have tried a number of things and have referred to a number of articles and out of many such errors where the service is not being accessible, some of them were able to get it resolved. However, some other obscure reason that may be causing this error have been untouched and unknown.
Whenever, I try to restart the service, I get an error as shown below:
Windows could not start the ColdFusion 8 Application Server on Local Computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to server-specific code error 2.”
Without much understanding, I started to google it out. Looking into every one of these posts, I tried
Configure JRE and try to relaunch the service by looking at "JAVA_HOME" variable and JVM.config
Run the batch files in every possible combination to find if anything clicks
Check if the present JAVA version works and is compatible with Coldfusion version installed
Fiddling with the "SessionStorage" var in neo-runtime.xml file as some suggested
and few other tricks coupled with a numerous service restart attempts and a few machine reboots as well.
A service that renders Cold Fusion pages should be shut down abruptly. To add to agony, the CF Admin also depends on the service and hence does not work.
Any pointers to any potential solutions?
I am writing an ASP.NET web page which calls an API to update my client's property website using XML data. The data from the API is real-time, so I would like to run the page every 10 minutes.
Clearly I don't want to load my page manually to keep my client's property website up-to-date. There is a lot of help in Stack Overflow and elsewhere on this type of question but I have become a little overwhelmed by the options. I think that one way to go would be:
Windows Task Scheduler to fire every ten minutes (to trigger a VB.Net Service)
VB.Net Service (to run the web page)
My page runs..
That feels like overkill, and I haven't written a Windows Service or used the Task Scheduler and it feels like there should be 2 steps not three.
Now if I do use a VB.Net Service then I think that it might be better to give more work to the VB.Net Service rather than put my script in a web page, but I am used to writing web pages!
I can't help feeling that if I just keep the page open in a browser somewhere I can easily use JavaScript to run the page every 10 minutes, but that means ensuring it's open in a browser. Bad solution I think...
What I need is an overview of my options to make an informed decision and if it means learning then fine. Thanks in advance!
You can use javascript/Jquery to call a page/webmethod continously in timely manner
setInterval(function() {
// call your page or webmethod
}, 1000 * 60 * x); // x is your time interval in mins, in your case x=10
In my opinion the best approach would be to create a windows service and have the service call the web page. The windows service is much more stable than the Task Scheduler because the task scheduler can overlap if the previous Scheduled event did not finish. Also using the windows service gives you more control over error handling and logging
Get started with this link:
http://code.msdn.microsoft.com/windowsdesktop/CSWindowsService-9f2f568e
I've started using SOAP UI recently to test web services and it's pretty cool, but it's a huge resource hog.
Is there any way to reduce the amount of resources it uses?
It shouldn't be a resource hog, although I've seen it do this before. I leave it running on my PC all week, and a co-worker with a similar machine (dual-core running XP) has to kill it every few hours, otherwise it keeps using CPU. I'd try uninstalling/re-installing. Currently, my instance has been up for 10 days, running a mockservice that I've been hitting very hard (I've sent it thousands of requests). CPU time total (over 10 days) is about an hour and a half, but the "right now" number is about 1%.
There are no popular alternatives, aside from writing your own client in the language of your choice.
If you're testing WCF services, you can run wcftestclient from the Visual Studio command line. It works for local or remotely hosted services. Its no good for ASMX-style .NET 2.0 SOAP services though.
if you want to test using only json, you could use some of the light weight Rest clients ex. Mozilla Rest plugin.
We test our SOAP APIs manually with SOAP UI and otherwise use jMeter for automated SOAP API testing. While having a GUI seems attractive first, I find both applications quiet user-unfriendly and time consuming to work with.
As already suggested, you could do it in code using Java or maybe use a dynamic language like Ruby:
Testing SOAP Webservices with RSpec
SOAP web Services testing in RUBY
As user mitchnull mentions in his comment:
Disabling the browser component (-Dsoapui.jxbrowser.disable=true)
solved the 100% CPU usage issues for me. (when it was enabled, it
periodically went to 100% CPU even when not running any
tests/requests).
I want to build the following back-end service:
For each call to the service, spawn a web browser that loads a webpage (including flash) and returns a screenshot of the page to the caller at intervals (ie every 3 seconds) until the caller disconnects. This needs to scale for many callers (thousands perhaps), each of which needs its own browser session.
When I decided I needed to build this program, I was surprised that I had basically no idea how I could do it.
On stackoverflow, I found the following link which looks promising: http://www.genuitec.com/about/labs.html
Any other ideas?
You can use XULRunner (Mozilla engine) on your server side. I'm in doubt though that this solution is scalable.
While building this web service and the app that calls it, we have noticed that the first call to the web service each day is extremely slow. It even will time out on some days. However, every call after that work great. Can anybody shed light on why this might be and how we can get rid of this pain?
Thanks in advance!
If it's an ASP.NET web service, it may be the CLR initializing and loading and verifying the assemblies for the first time. You may want to consider pre-compilation
Agree with the other answers on caching, initialization, etc. As far as a workaround, one possibility may be to set up some sort of daily task (SQL Server job, Windows service, something else?) to simulate a hit to the service each day, so that your users don't experience this first slow request.
If it is an ASP.NET web service, then you might want to check the settings of the application pool the web service is running in, especially the idle timeout which defaults to 20 minutes in IIS7.
Configuring IIS7 idle-timeout
Even if it is not an ASP.NET web service, other web servers will have equivalent configuration settings you have to tweak to keep your web service alive overnight.
Can you duplicate the same behavior on your database? It could just be the db needing to optimise the query for the first run (Maybe the parameter is today's date?).
Are there a lot of static constructors or set up code in the Global.asax class? Because IIS recycles worker processes periodically, the start up code may be running again.
The rule for optimization is: don't guess. Put in profiling to find out exactly what is slow, and then work to make that faster. Everything already posted provides excellent tips on where to start looking for slowness.