I am testing a web application where the HTML is built by JavaScript/jQuery/AJAX using the JSON respones from REST calls.
This being a very huge js file (require.js is also used), I'm finding it very hard to find the function which is sanitizing the JSON responses. Backtracking from the AJAX call or from the functions listed in https://code.google.com/archive/p/domxsswiki/wikis/jQuery.wiki is taking me nowhere.
Is there any way I can track the execution of the JS step by step, before it creates the HTML source for me?
Any add-on which could help? I have tried the 'visual event' add-on from Chrome, but it is not completely helpful as it displays only the JavaScript function attached to an event for DOM elements.
Related
I have just started with Python Web scraping through Requests. This could be a broad question, I will try to make it as brief as possible.
I came through situation where sometimes an entire page source can be downloaded with r.content (where r is a response object of requests's get call)
Sometimes some part of the data is stored in json format... In files that can be accessed by deeply observing the get and post calls made.
However, I even found websites where the entire content is in DOM but part of it is neither in Page source nor in Json files.
I am wondering how many of such places can a website store a data in?
(Just the names, I am not looking for how to get there)
For these last type of websites, I have observed almost every requests call made, but couldn't find where the data is.
So are there any other place except the 2 mentioned above? Or those are the only two indicating I am not doing my job right of observing the requests call?
You may answer it in brief bullet points and I can take my study from there.
Thanks in advance.
Lets assume we are talking only about HTML data. A web server could serve you data in many other formats (JSON/XML, etc.)
Please note that what I have described is generalisation, and like most generalisations, you could find exceptions that do not fit in it.
Broadly we could divide the type of data displayed (for the end user) into two categories
Pre render
Post render
Pre render
The entire HTML page is constructed at server-side and sent across to the client. Here, the JS side is concerned with the user interaction, and not with the structure of the data.
We are slowly moving away from this type of structure, but currently a large majority of all web pages uses this.
Web scraping is relatively easy here, as we can programatically pull the html page, and not bother about the javascript code that accompanies it.
a combination of requests and beautifulsoup should work in almost all of the cases (assuming that you could identify the general structure of the document).
Post render
Here the HTML page that is returned from the server is just a "skeleton" or placeholders for the actual data. The data is rendered by the accompanying JS code.
In such cases, if you fetch the source file via for eg., requests, you will get an empty shell, with no data in it.
for this if you inspect the calls made by a browser while rendering, (chrome's network tab or firefox's inspect tool or the more popular firebug), you will most likely see ajax requests that brings back the actual data from the server)
depending on how the requests are made, you could hit that ajax endpoint, and get the data in JSON.
you could use response.json() function to extract it into python-dicts.
In certain (rare) cases, there would not be an ajax call, but the HTML served from the server will still be a shell. The actual data is part of that file served, but stored as part of the JS code itself. This could be done for a variety of reasons, for example for dynamic data to be sent to static js files, or just to deter simple attempts of scraping the page.
One approach to scraping such pages would be to 'render' the page in a headless browser, which executes the JS code and returns an HTML that could be parsed via parsers like beautifulsoup
beautifulsoup has the ability to work with many parsers, one of which is html5lib, which could solve this issue.
you could also look at selenium or mechanize
or you could try parsing the js code yourself which might be faster.
Arriving at a conclusion as to what to use requires careful inspection of how the page is rendered on a browser. Even if you don't see an ajax request, the html that is served by the server need not be how the browser displays it.
A good way to start is by looking at the bare-html that is being served, by either downloading the page via curl or requests.get or simply rendering it in your browser with javascript disabled.
Good luck.
Here's the deal: I had a excel table that fulfills a MySQL table. I already made a procedure in server side who receives the sheet, read it and put it on the database. Saddly the sheet and data table doesn't have the same structure, so I need to use a php object/script in server side to manipulate it. I have a interface to upload the file (excel file), so the PHP program can read it...
...but my boss job isn't make my life easier, is it? NO! He says that is a lot of work have to upload every excel file by the web interface. So, he asked me to make a button in the sheet that he might click after his "job" is done. That would replace the web interface.
But, the system itself is a interface that would be saled one day (well, it's the plan!). So, I just can't just role out the web interface.
WHAT I'M ASKING IS: There's a way that I could send a file (the sheet itself) in a post method straight from the VBA Macro without using XML files and name each data that I'm sending, like a form post?
So far, I've found some tutorials or even some SO posts that made me get somewhere. But all of them were talking about a XML, and I already have a method that receives a HTTP POST (from a form) and work. I aiming to reuse the same method. From my VBA script I'm already able to make the request (not a big deal) and post it. But, in the server-side script, I'm expecting a POST come out from a form, so it calls a field's name. I don't seen to be able to do that from a VBA post. =/
Here's the answer... the two first functions/methods define how to send a file to a web service. You only need the file path and the URL from service. It has answered even more than I expected. :D
I am using the Django template system. What I want is, when I submit a form, or click to an url link, page does not refreshes, but loads with the data returning from the server. Is it possible?
I recommend a combination of jQuery (easy, powerful, popular javascript library) and dajax/dajaxice (http://www.dajaxproject.com/). Dajax is very easy to set up and use, and jQuery is also easy to set up and use. Dajax is strictly for AJAX communications through Django. jQuery is perfect for taking a simple site and making it more fluid, intuitive, and user-friendly.
You need JavaScript to do that. What you are looking for is called AJAX (Asynchronous JavaScript and XML). Essentially, it means you use JavaScript to send a request to the server as soon as the link/button is clicked. The server returns some data to your Script, which then can be used to manipulate the HTML page, e.g. by inserting the responded data into the DOM. Since you do everything with JavaScript, no reloading of the whole page is required.
To start, read the AJAX tutorial. There are certain JavaScript libraries that make these things more simple for you (e.g. jQuery), but you really should understand how this stuff works first, since else you might get into trubble while trying to debug it.
I want to retrieve server content which is like wsdl link(WCF service URL)using FLEX 4.5.. I haven't worked with webservices on FLEX. I have worked with xml data retrieval using httpservices where I had a local xml datas. Right now, i am trying to retrieve a server content. I have provided with the service link, method name and xml tags. (seems like parameters).. Since this is the first time im trying the server content, I need some help. Your help is highly appreciated.. thanks in advance... Would be better if i can get a sample project on webservices.
This is what I'm trying. The service link is below.
http://mfsapi.blisslogix.net/RSS_FEEDS_SERVICE.svc
When I click on this link, i'm getting the below link.
http://mfsapi.blisslogix.net/RSS_FEEDS_SERVICE.svc?wsdl
where I can see a lot of tags.
I am using HTTPSERVICES and WEBSERVICES to work on this issue and i'm not getting the xml data. I guess I did some mistake on passing the parameters. Please walk me through the steps how can I pass the method and parameters with this link..
First you need to create a WebService tag. Or use ActionScript an object of type WebService.
<mx:WebService id="myWebService"
useProxy="false"
showBusyCursor="true"
load="OnServiceLoad(event)"
fault="OnFault(event)">
<s:operation name="GetInformation" result="onLoad(event)" fault="onFault(event)">
</s:operation>
</mx:WebService>
Then you need to specify the WSDL document location and load it.
myWebService.loadWSDL("http://mfsapi.blisslogix.net/RSS_FEEDS_SERVICE.svc?wsdl");
Then you can simply call the operations specified in the WebService tag.
myWebService.GetInformation();
Here is a link on how to communicate with web services using MXML and AS.
I developed an application in JSPs and Servlets involving drop down menus that kept growing with how many authors per publication their were.
This was done in JavaScript and then in my application iterated through them using a loop. Is this possible using Django? This would be useful in my application.
This link might help you out if you don't want to dive into javascript (too much)
http://www.dajaxproject.com/
Or have a look at this stackoverflow question/awnser:
What is the best AJAX library for Django?
In any case, you need to serialize your array to a JSON string.
Then pass the JSON with an XMLHTTPRequest (ajax) to the server.
Add the javascript tag to your question if you don't mind more JS solutions.
Otherwise look for a Django Ajax framework to do the heavy lifting for you.