I am writing a webserver in C++. I am looking at the POST documentation on w3:
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
I see that a POST is supposed to support the full multi-parts scheme: parts and sub-parts (and obviously, sub-sub-parts...) just like for email attachments.
Is there any browser and/or tool that do that on a normal basis? In other words, is it really important for a server to support parts and sub-parts?
The obvious problem with that is the fact that it could mean that two files are uploaded under the same name. That's quite a problem if you ask me. Also, from what I can see in PHP it is not supported at all in that realm. Am I correct?
Ah! I guess I should have searched a little more and to tell you the truth I had not thought of looking at HTML5 for the answer.
The following paragraph actually includes the answer:
http://www.w3.org/html/wg/drafts/html/master/forms.html#multipart-form-data
Note: In particular, this means that multiple files submitted as
part of a single element will
result in each file having its own field; the "sets of
files" feature ("multipart/mixed") of RFC 2388 is not used.
So it is clear that sub-parts (multipart/mixed) are not to be supported.
Related
I have a set of .groovy files (Java). All of these files have the same comment format.
I developped a tool with wich I'm able to read those files and applying a REGEX to get all the comments in a list. (Finally i just have to copy paste these comments to .html file)
I would like to know if it's a correct practice in order to generate a HTML page with the comment (a kind of documentation). If not, what would you recommend ?
I read about Doxygen and Javadoc but i'm not sure about using them (if they can be really useful in my case since the comments are already written)
If you can suggest a library in order to generate easily a HTML Webpage or any other advice.
Any help is appreciated.
There exists Groovydoc which is roughly the equivalent of Javadoc, just for Groovy.
As your setup is not that (you already have comments, probably not in Groovydoc format, and you have half the tooling), there are still multiple ways open to you. As you already extract the documentation from groovy, if I were you, I would do a minimal post-formatting, if necessary, and output the documentation as markdown (e.g., github markdown) or asciidoc (e.g., asciidoctor). Then you can use any preferred tool to convert the post-formatted documentation into HTML.
To answer the question "How to parse the java comments" – you shouldn't. If possible, especially in a new project, stick with the standard tooling. In the case of Groovy that's Groovydoc. The normal (non Java/Groovy-Doc style) comments themselves you should never need to extract from the source code. They should be so much context-specific, that without the corresponding code they are anyways useless.
These two links reference the ability, in ColdFusion, to get the name of an uploaded file using form.getPartsArray(). However I can not find ColdFusion documentation on it. I would like to use this but not if it has been deprecated or will be. Does anyone have more information on the origin and fate of this function?
ColdFusion: get the name of a file before uploading
http://www.stillnetstudios.com/get-filename-before-calling-cffile/
ColdFusion: get the name of a file before uploading
Ignoring the main question for a moment, can you elaborate on why you want to use it? Reason for asking is the title of the first question might give you a mistaken impression about what that method actually does. Form.getPartsArray() does not provide access to file information before the file is uploaded. The file is already on the server at that point, so in later versions of CF (with additional functionality) it does not necessarily buy you much over just using cffile action=upload.
Does anyone have more information on the origin and fate of this
function?
However, to answer your other question - it is an undocumented feature last I checked. (It was more useful in earlier versions of CF, which lacked some of the newer features relating to form fields and uploads.)
Internally, most form data can be handled using standard request objects, ie HttpServletRequest. However, those do not support multipart requests, ie file uploads. So a special handler is needed. Macromedia/Adobe chose to use the com.oreilly.servlet library for their internal implementation. That is what you are accessing when using FORM.getPartsArray().
The O'Reilly stuff has been bundled with CF since (at least) CF8, which is a good indicator. However, using any internal feature always comes with the risk the implementation will change and break your application. Also, if you move to another engine, the code may not be supported/compatible. So "You pays your money, you takes your choice".
CF8 / Form Scope
Is it possible to embed code to Trac wiki page straight from source code? I mean code blocks, not links pointing to the source. Like
MyCode.java contents
Look at IncludeMacro which is also able to embed from source repository (keyword source:).
Furthermore you can copy source code to wiki and format it with syntax-highlighting, for example:
{{{
#!python
hello = lambda: "world"
}}}
Read more about it here.
You're using 'code blocks' in a way that makes me think of partial citation.
As falkb pointed out, IncludeMacro is the current best way of embedding Trac (and even external) content into a Trac resource, that is rendered with support for Trac's WikiFormatting. But sadly, there is NO such partial citation capability yet.
You may want to at least request it as enhancement for the aforementioned plugin, and could even push it closer to reality by providing valid use case example - or better: some real code to make it happen. Be prepared to test code, if a patch is proposed or - ideally - if the trunk (development) branch receives changes to make partial citation happen.
Does anyone know any more details about google's web-crawler (aka GoogleBot)? I was curious about what it was written in (I've made a few crawlers myself and am about to make another) and if it parses images and such. I'm assuming it does somewhere along the line, b/c the images in images.google.com are all resized. It also wouldn't surprise me if it was all written in Python and if they used all their own libraries for most everything, including html/image/pdf parsing. Maybe they don't though. Maybe it's all written in C/C++. Thanks in advance-
you can find a bit about how googlebot works here:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=158587
for example the "fetch as googlebot" tool lets you see a page as Googlebot sees it.
The crawler is very likely written in C or C++, at least backrub's crawler was written in one of these.
Be aware that the crawler only takes a snapshot of the page, then stores it in a temporary database for later processing. The indexing and other attached algorithms will extract the data, for example the image references.
Officially allowed languages at Google, I think, are Python/C++/Java.
The bot likely uses all 3 for different tasks.
I'm increasingly becoming aware that there must be major differences in the ways that regular expressions will be interpreted by browsers.
As an example, a co-worker had written this regular expression, to validate that a file being uploaded would have a PDF extension:
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$
This works in Internet Explorer, and in Google Chrome, but does NOT work in Firefox. The test always fails, even for an actual PDF. So I decided that the extra stuff was irrelevant and simplified it to:
^.+\.pdf$
and now it works fine in Firefox, as well as continuing to work in IE and Chrome.
Is this a quirk specific to asp:FileUpload and RegularExpressionValidator controls in ASP.NET, or is it simply due to different browsers supporting regex in different ways? Either way, what are some of the latter that you've encountered?
Regarding the actual question: The original regex requires the value to start with a drive letter or UNC device name. It's quite possible that Firefox simply doesn't include that with the filename. Note also that, if you have any intention of being cross-platform, that regex would fail on any non-Windows system, regardless of browser, as they don't use drive letters or UNC paths. Your simplified regex ("accept anything, so long as it ends with .pdf") is about as good of a filename check as you're going to get.
However, Jonathan's comment to the original question cannot be overemphasized. Never, ever, ever trust the filename as an adequate means of determining its contents. Or the MIME type, for that matter. The client software talking to your web server (which might not even be a browser) can lie to you about anything and you'll never know unless you verify it. In this case, that means feeding the received file into some code that understands the PDF format and having that code tell you whether it's a valid PDF or not. Checking the filename may help to prevent people from trying to submit obviously incorrect files, but it is not a sufficient test of the files that are received.
(I realize that you may know about the need for additional validation, but the next person who has a similar situation and finds your question may not.)
As far as I know firefox doesn't let you have the full path of an upload. Interpretation of regular expressions seems irrelevant in this case. I have yet to see any difference between modern browsers in regular expression execution.
If you're using javascript, not enclosing the regex with slashes causes error in Firefox.
Try doing var regex = /^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.pdf)$/;
As Dave mentioned, Firefox does not give the path, only the file name. Also as he mentioned, it doesn't account for differences between operating systems. I think the best check you could do would be to check if the file name ends with PDF. Also, this doesn't ensure it's a valid PDF, just that the file name ends with PDF. Depending on your needs, you may want to verify that it's actually a PDF by checking the content.
I have not noticed a difference between browsers in regards to the pattern syntax. However, I have noticed a difference between C# and Javascript as C#'s implementation allows back references and Javascript's implementation does not.
I believe JavaScript REs are defined by the ECMA standard, and I doubt there are many differences between JS interpreters. I haven't found any, in my programs, or seen mentioned in an article.
Your message is actually a bit confusing, since you throw ASP stuff in there. I don't see how you conclude it is the browser's fault when you talk about server-side technology or generated code. Actually, we don't even know if you are talking about JS on the browser, validation of upload field (you can no longer do it, at least in a simple way, with FF3) or on the server side (neither FF nor Opera nor Safari upload the full path of the uploaded file. I am surprised to learn that Chrome does like IE...).