Transfer-encoding: chunked and MP3/Lame - mp3

I have a PHP webservice that returns an mp3 HTTP response. It works, but when I turn on Chrome's network throttling in DevTools, it returns only part of the response:
$stream_start1 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start2 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start3 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start4 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start5 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start6 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start7 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start8 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start9 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start10 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start11 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start12 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
$stream_start13 = Psr7\stream_for(fopen('./sounds/ping.mp3', 'r'));
return new Psr7\AppendStream([$stream_start1 , $stream_start2 , $stream_start3 , $stream_start4 , $stream_start5 , $stream_start6 , $stream_start7 , $stream_start8 , $stream_start9 , $stream_start10, $stream_start11, $stream_start12, $stream_start13]);
With the above code, with dev throttling on, I am getting back 7 pings instead of 13.
In reality the real code is getting a stream from a 3rd party service and and sandwiching it between two files:
return new Psr7\AppendStream([file1Stream, 3rdPartyStream, file2Stream])
So, I don't necessarily know the length in order to set Content-Length, and so it's using Transfer-Encoding: chunked.
When I look in dev tools at the shortened responses of the requests, I consistently see them ending around what looks like some LAME metadata or added silence:
e¨Deš•†š¹‡ÌC`̹3†'2 CBR9S¨Âöe­i´g¡ÿ–W©^¥ÔzWI}I8¸¶vv,¸¼Ù£J*ÙÙû±W•'X&+X±£#È ·bbÂìýbŸâÿâŸëLAME3.100UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUÿóbÄ”AIÖtUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
What I'm wondering is if it's possible that the browser is confusing some of this LAME info (or something else) for the end of the chunk sequence due to delays in receiving chunks. Is it possible that mixed up in the audio binary is something resembling an "empty chunk" indicating a last chunk? Is there a way to "escape" or encode the data to avoid potential sequences like this? Through compression, maybe?
My other option is to read the 3rd party stream (in the middle of the sandwich) all the way into memory before forwarding it. That way I can calculate the length and set the Content-Length, so that's plan B.
UPDATE:
I have tried setting Content-Length on my little 13-file example there but I still get 7 pings as a response. The length I've calculated with:
$length = strlen($stream_start0->getContents()) * 13
Which is 122252. But the file content that is returned is around 65529 bytes long (hex editor says 65536...?). Anyway, it's too short. Also, the Transfer-Encoding header is gone, which means I guess it's not being chunked anymore?
UPDATE2:
It's also possible the problem is that I am concatenating mp3 files raw. I think technically things get shady when you do that, especially if the audio contains ID3 tags and such, though in the ping files used above, there shouldn't be any ID3 tags. However, in our stg environment where I observe this issue w/o dev tools, the cutoffs don't always happen on the breaks between files.

So all of the above was a red herring. It turned out the problem was I copied some blog post with JavaScript code that was calling response.getReader().read() only once, and thus only getting the first chunk of the response.
Apparently it's meant to be invoked in a loop until the end of the stream is reached like most stream APIs, but I missed it.
The solution was using response.blob(), which reads to the end of the stream and creates the Blob I need in one go.
I thought I had validated everything I copied, but I guess not. But I learned a few things, so there's that.
Docs on reader.read()
Docs on response.blob()

Related

How to get number of lines of code of a file in a remote repo using PyGithub/ Githubsearch api?

commit = repo.get_commit(sha="0adf369fda5c2d4231881d66e3bc0bd12fb86c9a")
print(commit.stats.total)
i = commit.files[0].filename
I can get the filename, even the file sha; but can't seem to get loc of the file. Any pointers?
So let's see this line
commit = repo.get_commit(sha="0adf369fda5c2d4231881d66e3bc0bd12fb86c9a")
Here the commit is of type github.Commit.Commit
Now when you pick a file, it's of the type github.File.File
If you checked that, you'll see that there is no real way of getting lines of code directly. But there is one important field raw_url.
This will give you the raw_url of the file, which you can now get, perhaps like
url = commit.files[0].raw_url
r = requests.get(url)
r.text
This will give you the raw data of the file and you can use it to get the number of lines of code.

Flask: stream file upload without form boundary data to file

I want to upload large files using flask. Rather than try to load the entire file into memory, I've implemented the request.stream.read() method to stream the file to disk in chunks, as per the following code, which is very similar to answers given to many similar questions I have found:
#app.route("/uploadData", methods=["POST"])
def uploadData():
filename = uuid.uuid4().hex + '.nc'
filePath = os.path.join("/tmp", filename)
with open(filePath, "wb+") as f:
chunk_size = 4096
while True:
chunk = flask.request.stream.read(chunk_size)
if len(chunk) == 0:
break
f.write(chunk)
return flask.jsonify({'success': True, 'filename': filename})
This works well, except that it "wraps" the file in post data, like the following:
------WebKitFormBoundaryoQ8GPdNkcfUNrKBd
Content-Disposition: form-data; name="inputFile"; filename="some_file_upload.nc"
Content-Type: application/x-netcdf
<Actual File content here>
------WebKitFormBoundaryoQ8GPdNkcfUNrKBd--
How can I stream the file to disk without getting the form boundary stuff?
In theory, I could call flask.request.file or the like to get the file correctly, but as that loads the entire file into memory (or more likely a temporary file), and is quite slow relative to the stream method, I don't like it as a solution.
If it makes a difference, I'm initiating the file upload using the following javascript:
var formData=new FormData($('#fileform')[0])
$.ajax({
url:'/uploadData',
data:formData,
processData:false,
contentType:false,
type:'POST'
})
EDIT: I've managed to work around the issue by using readline() rather than read(), discarding the first four lines, and then checking for chunk starting with "---" to discard the last line, which works. However, this feels both kludgy and fragile, so if there is a better solution, I would love to hear it.

How do I parse a file sent with other data from a multipart HTML form?

My server is uWSGI and Python. I send myself an image from a file upload on the web page. How do I parse that file on the server?
I was able to handle a CSV because it's just text and I sent it by itself, but I have no idea how to handle images, or if I send the text file with other data. I'll add sample POST data to clarify when I'm back at my computer.
Part of my problem is the previous developer did some weird things with parsing POST data, so instead of being able to let uWSGI turn it into usable data, I have to do that myself in Python.
I assume you were handeling url encoded data by doing read on environ['wigs.imput'], something like this.
try:
request_body_size = int(environ.get('CONTENT_LENGTH', 0))
except (ValueError):
request_body_size = 0
request_body = environ['wsgi.input'].read(request_body_size)
dP = parse_qs(request_body)
For multipart/form-data data you need to use cgi.FieldStorage.
d = cgi.FieldStorage(environ=environ, fp=environ['wsgi.input'], keep_blank_values=True)
For Normal values in your form you can do
firstName = d.getvalue("firstName")
For the file you can get it by
file_data = d['imageFile'].file.read()
filename = d['imageFile'].filename

how to read only URL from txt file in MATLAB

I have a text file having multiple URLs with other information of the URL. How can I read the txt file and save the URLs only in an array to download it? I want to use
C = textscan(fileId, formatspec);
What should I mention in formatspec for URL as format?
This is not a job for textscan; you should use regular expressions for this. In MATLAB, regexes are described here.
For URLs, also refer here or here for examples in other languages.
Here's an example in MATLAB:
% This string is obtained through textscan or something
str = {...
'pre-URL garbage http://www.example.com/index.php?query=test&otherStuf=info more stuff here'
'other foolish stuff ftp://localhost/home/ruler_of_the_world/awesomeContent.py 1 2 3 4 misleading://';
};
% find URLs
C = regexpi(str, ...
['((http|https|ftp|file)://|www\.|ftp\.)',...
'[-A-Z0-9+&##/%=~_|$?!:,.]*[A-Z0-9+&##/%=~_|$]'], 'match');
C{:}
Result:
ans =
'http://www.example.com/index.php?query=test&otherStuf=info'
ans =
'ftp://localhost/home/ruler_of_the_world/awesomeContent.py'
Note that this regex requires you to have the protocol included, or have a leading www. or ftp.. Something like example.com/universal_remote.cgi?redirect= is NOT matched.
You could go on and make the regex cover more and more cases. However, eventually you'll stumble upon the the most important conclusion (as made here for example; where I got my regex from): given the full definition of what precisely constitutes a valid URL, there is no single regex able to always match every valid URL. That is, there are valid URLs you can dream up that are not captured by any of the regexes shown.
But please keep in mind that this last statement is more theoretical rather than practical -- those non-matchable URLs are valid but not often encountered in practice :) In other words, if your URLs have a pretty standard form, you're pretty much covered with the regex I gave you.
Now, I fooled around a bit with the Java suggestion by pm89. As I suspected, it is an order of magnitude slower than just a regex, since you introduce another "layer of goo" to the code (in my timings, the difference was about 40x slower, excluding the imports). Here's my version:
import java.net.URL;
import java.net.MalformedURLException;
str = {...
'pre-URL garbage http://www.example.com/index.php?query=test&otherStuf=info more stuff here'
'pre--URL garbage example.com/index.php?query=test&otherStuf=info more stuff here'
'other foolish stuff ftp://localhost/home/ruler_of_the_world/awesomeContent.py 1 2 3 4 misleading://';
};
% Attempt to convert each item into an URL.
for ii = 1:numel(str)
cc = textscan(str{ii}, '%s');
for jj = 1:numel(cc{1})
try
url = java.net.URL(cc{1}{jj})
catch ME
% rethrow any non-url related errors
if isempty(regexpi(ME.message, 'MalformedURLException'))
throw(ME);
end
end
end
end
Results:
url =
'http://www.example.com/index.php?query=test&otherStuf=info'
url =
'ftp://localhost/home/ruler_of_the_world/awesomeContent.py'
I'm not too familiar with java.net.URL, but apparently, it is also unable to find URLs without leading protocol or standard domain (e.g., example.com/path/to/page).
This snippet can undoubtedly be improved upon, but I would urge you to consider why you'd want to do this for this longer, inherently slower and far uglier solution :)
As I suspected you could use java.net.URL according to this answer.
To implement the same code in Matlab:
First read the file into a string, using fileread for example:
str = fileread('Sample.txt');
Then split the text with respect to spaces, using strsplit:
spl_str = strsplit(str);
Finally use java.net.URL to detect the URLs:
for k = 1:length(spl_str)
try
url = java.net.URL(spl_str{k})
% Store or save the URL contents here
catch e
% it's not a URL.
end
end
You can write the URL contents into a file using urlwrite. But first convert the URLs obtained from java.net.URL to char:
url = java.net.URL(spl_str{k});
urlwrite(char(url), 'test.html');
Hope it helps.

Amazon Product Advertising API: Get Average Customer Rating

When using Amazon's web service to get any product's information, is there a direct way to get the Average Customer Rating (1-5 stars)? Here are the parameters I'm using:
Service=AWSECommerceService
Version=2011-08-01
Operation=ItemSearch
SearchIndex=Books
Title=A Game of Thrones
ResponseGroup=Large
I would expect it to have a customer rating of 4.5 and total reviews of 2177. But instead I get the following in the response.
<CustomerReviews><IFrameURL>http://www.amazon.com/reviews/iframe?...</IFrameURL></CustomerReviews>
Is there a way to get the overall customer rating, besides for reading the <IFrameURL/> value, making another HTTP request for that page of reviews, and then screen scraping the HTML? That approach is fragile since Amazon could easily change the reviews page structure which would bust my application.
You can scrape from here. Just replace the asin with what you need.
http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=B000P0ZSHK
As far as i know, Amazon changed it's API so its not possible anymore to get the reviewrank information. If you check this Link the note sais:
As of November 8, 2010, only the iframe URL is returned in the request
content.
However, testing with the params you used to get the Iframe it seems that now even the Iframe dosn't work anymore. Thus, even in the latest API Reference in the chapter "Motivating Customers to Buy" the part "reviews" is compleatly missing.
However: Since i'm also very interested if its still possible somehow to get the reviewrank information - maybe even not using amazon API but a competitors API to get review rank informations - i'll set up a bounty if anybody can provide something helpful on that. Bounty will be set in this topic in two days.
You can grab the iframe review url and then use css to position it so only the star rating shows. It's not ideal since you're not getting raw data, but it's an easy way to add the rating to your page.
Sample of this in action - http://spamtech.co.uk/positioning-content-inside-an-iframe/
Here is a VBS script that would scrape the rating. Paste the code below to a text file, rename it to Test.vbs and double click to run on Windows.
sAsin = InputBox("What is your ASIN?", "Amazon Standard Identification Number (ASIN)", "B000P0ZSHK")
if sAsin <> "" Then
sHtml = SendData("http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=" & sAsin)
sRating = ExtractHtml(sHtml, "<span class=""a-size-base a-color-secondary"">(.*?)<\/span>")
sReviews = ExtractHtml(sHtml, "<a class=""a-size-small a-link-emphasis"".*?>.*?See all(.*?)<\/a>")
MsgBox sRating & vbCrLf & sReviews
End If
Function ExtractHtml(sHtml,sPattern)
Set oRegExp = New RegExp
oRegExp.Pattern = sPattern
oRegExp.IgnoreCase = True
Set oMatch = oRegExp.Execute(sHtml)
If oMatch.Count = 1 Then
ExtractHtml = Trim(oMatch.Item(0).SubMatches(0))
End If
End Function
Function SendData(sUrl)
Dim oHttp 'As XMLHTTP30
Set oHttp = CreateObject("Msxml2.XMLHTTP")
oHttp.open "GET", sUrl, False
oHttp.send
SendData = Replace(oHttp.responseText,vbLf,"")
End Function
Amazon has completely removed support for accessing rating/review information from their API. The docs mention a Response Element in the form of customer rating, but that doesn't work either.
Google shopping using Viewpoints for some reviews and other sources
This is not possible from PAPI. You either need to scrape it by yourself, or you can use other free/cheaper third-party alternatives for that.
We use the amazon-price API from RapidAPI for this, it supports price/rating/review count fetching for up to 1000 products in a single request.