Save Word HTML document as "Word Document"

Save Word HTML document as "Word Document" - coldfusion

I have reports which are written in "Word HTML" then served to users as a normal Word document (.doc) via content-type="application/msword". However, when a user attempts to "Save As ..." one of these reports, Word automatically defaults the "Save as type:" to "Web Page". Is there a way to change it so that Word will default to "Word Document" when saving?
I am using ColdFusion running on Ubuntu to serve up the reports.

As you probably know, <cfcontent> simply changes the MIME type of the HTTP response, it does not change the data being returned at all. So given you've sent HTML data, that is how Word will treat it.
If you want Word to treat the doc as a native Word doc, you need to send it a native Word doc.
I've got no experience in doing this, but you might want to look at what Apache POI offers by way of creating native Word docs. Alternatively Microsoft offer an API for doc creation / manipulation (cannae find a link, sorry), which would be something else to look at.

Related

How to generate Django template as MS word?

Is there a way to build a Django template or make it like Microsoft word by making the Django template have tools and the ability to write word documents from a website instead?
if we supposed that there is a domain name called: example.com
so, when I open that website for example.com/word-doc/ it will open a blank page and some tools from Microsoft Word to write in.
Is there any package or API to do so?

You are probably looking for a kind of "wysiwyg" editor for client side (this part has nothing do with Django). The closest thing I could found is this one, however it comes with a price tag.
In Django-side you'll (probably) simply store the data given by editor (client). The editor will then be able to parse it for update/read purposes.

Select all frames at once in Selenium

This could be a stupid questions for some. But its truly important for me.
I know how to switch frames using selenium webdriver.
However, is there a way to download all the page_source of the entire page for all the frames at once.
Instead of switching them again and again?
Could someone please let me know the command if it exists?
If not then please say there is none. And that should answer my question.
Thanks in advance

Webdrivers' getPageSource will return some state in some formatting of the last page the driver was on.
From the (java)docs, but most probably applies to other languages:
getPageSource
java.lang.String getPageSource()
Get the source of the last loaded page. If the page has been modified
after loading (for example, by Javascript) there is no guarantee that
the returned text is that of the modified page. Please consult the
documentation of the particular driver being used to determine whether
the returned text reflects the current state of the page or the text
last sent by the web server. The page source returned is a
representation of the underlying DOM: do not expect it to be formatted
or escaped in the same way as the response sent from the web server.
Think of it as an artist's impression.
Returns:
The source of the current page
http://selenium.googlecode.com/git/docs/api/java/org/openqa/selenium/WebDriver.html#getPageSource%28%29

encoding of query string parameters in IE10

I got a request from a customer that he wants to be able to type the query string of my web service with parameters in the IE10 address bar and get the service results. The parameters include string in Hebrew, like:
http://mywebsite.com/service.asmx/foo?param1=123&param2=מחרוזתבעברית
It seems to me that that IE10 won't encode the query string parameters - every non-ASCII character that goes after the ? mark would be turned to '3f' byte, though it does encode what goes before the ? mark - the url itself.
For example, if i try to reach the url (the parameter is fictional, url is not, and I have no connection with the site)
http://www.shlomo.co.il/pageshe/sales/רכב-למכירה.asp?param=פאראם
and look in wireshark for the bytes I send to the server, it shows me
You can see it does substitute the hebrew part of the URL with urlencoded string, but substitutes the hebrew parameters with ?????, which are '3f's.
The same string in chrome would be encoded in it's entirety:
GET http://www.shlomo.co.il/pageshe/sales/%D7%A8%D7%9B%D7%91-%D7%9C%D7%9E%D7%9B%D7%99%D7%A8%D7%94.asp?param=%D7%A4%D7%90%D7%A8%D7%90%D7%9D HTTP/1.1
I tried it on machines with win7/IE10 and winXPheb/IE8.
My IE settings are (especially checked the "Always show encoded addresses option" to see if it helps and restarted, but made no difference):
I tried to search around for any info about the issue, but didn't find much of it.
My questions are:
Is it indeed like this, or am I missing something?
Is this behavior documented anywhere?
Are there any settings in IE/Win which enable the parameters encoding.
p.s. Sure if I was developing the client/web ui, I would simply urlencode my query, but my request from customer was exactly to paste the query to IE address bar, that's why I'm interested in this specific behavior.
Thanks.

Yes, your observation of the behavior is accurate. Internet Explorer 10 and below follow a complicated algorithm for encoding the URL. This was allegedly updated in Internet Explorer 11, but I've found that the new option doesn't seem to work.
The "Always show encoded addresses option" concerns whether PunyCode is shown for IDN hostnames, and does not impact the query string. Send UTF-8 URLs mostly applies to the encoding of the path, although it can also affect other codepaths
The behavior isn't fully documented anywhere. I'd meant to write a full post on my IEInternals blog about it but ended up moving on from Microsoft before doing so. There's a partial explanation in this blog post.
Yes, there are settings that impact the behavior. The Send UTF-8 URLs checkbox inside Tools > Internet Options > Advanced is one of the variables that determines how URLs are sent, but the option does not blindly do what it implies (it only UTF-8 encodes the path, not the query string). Other variables involved include:
Where the URL was typed (e.g. address bar vs. Start > Run, etc)
What the system's ANSI codepage is (e.g. what locale the OS uses as default)
The charset of the currently loaded page in the browser
As a consequence of these variables, you cannot reliably use URLs which are not properly encoded (e.g. %-escaped UTF8) in Internet Explorer.

Unfortunately this is still true for Internet Explorer 11 (build 11.0.9600.17358, win7-x64)
I saw that you can not unfortunately change the web server. However those who are developing new services may consider changing request parameters into path variables, e.g. from http://myserver.com/page?τεστ into http://myserver.com/τεστ/

If the client is calling the web-service from javascript,
encodeuricomponent can be used. In your case encodeuricomponent("מחרוזתבעברית");
http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp

Multi language website, how to approach this?

I have a website (Coldfusion) on which I want to offer multi language, but no idea what is the best way to do this.
There 2 plans I have:
1:
Of course all content (text) is in a database.
If a user would want a different language, the user would click on a link/flag, this would put the requested language in a session variable, for example: session.language = "es"
In the database I would have 2 columns (every language has 1 column) and then select the text which belongs to 'es'
Every page would then do a request to the database to get the text beloging to the session.language.
PROS: Relatively simple to implement
CONS: SEO wise I don't think this could be very good. http:// www.domain.com/page.cfm would give an english text or spanish text (or other language). Google will not add duplicate URL's
2:
Do something with http:// www.domain.com/en/page.cfm for english and http:// www.domain.com/es/page.cfm for english.
With a URL rewrite rule the language value in the URL http:// www.domain.com/en/page.cfm would actually be a page http:// www.domain.com/page.cfm?language=en
The url.language variable will then select the correct language from the database.
PROS: Unique URL for each language. Good for SEO and Google indexing.
CONS: A bit more difficult to implement. (I think)
Or does anyone have other / better ideas?
Thanks!!

You should always first check the browser header "Accept-Language" for the default language(s) (the correct standard way to do it), and offer links (the intuitively seemingly right way) only as an alternative.
Doing it in a database doesn't seem very standard. Let's assume you would like to use MVC architecture (model-view-controller). Most software uses keys in the presentation layer (view) (eg. html) and along with the presentation layer, you have language files (in Java, this is typically properties files) which are mapped simply by their filenames, and can be modified by regular users, without any special skills, such as professional translators with no computer skills. Certainly you could put it in a database, but then it is just more work, and moves the information out of the presentation layer.
There are various libraries for doing this. You should find the normal one for your application. Please edit your question to include what you are using to develop the application. (eg. JSP, Tapestry, Wicket, ASP, PHP, etc.) So for example, if you wanted to use JSPs, I would then suggest you use the JSTL tag library's language support. Or if you were using Tapestry, I would point you to http://tapestry.apache.org/localization.html or http://tapestry.apache.org/tapestry4.1/UsersGuide/localization.html
To look it up, you can look for the terms "internationalization" aka "i18n", or "localization". (The terms don't mean the same thing, but few use them correctly, so either works. http://www.w3.org/International/questions/qa-i18n)

I would go for option 2. Every translation should have its own url. Links to your website will already be in the intended translation.
To store translations in a database, I wouldn't put every translation in a seperate column, but rather put them in a seperate table:
Table Posts:
- id
- title_id
- ...
Table Translations:
- label_id
- value
- country_code
- language_code
Where title_id matches label_id
This way you won't have to alter your table structure when a new translation is added. This allows you to have infinite translations for any label or text.

To effectively do a multi-lingual site then you need set a rule for yourself that NO TEXT is ever put in the source as hard coded. It either needs to come from the database and / or a Resource Bundle.
Text from the database
You need to make sure that the column you are storing your data in is unicode otherwise you'll have issues with accented character. Also don't have a column per language as this is not scalable, do what #jan suggests and have a translations table where the items are keyed on a reference as well as a language.
Resource bundles
You are not going to want to get every last little bit of text from the database so for those you can utilise a resource bundle. This is an, admittedly old, link http://www.sustainablegis.com/blog/cfg11n/index.cfm?mode=entry&entry=FD48909C-50FC-543B-1FE177C1B97E8CC1 from Paul Hastings's blog about some solutions to resource bundles. To be honest his blog is an excellent resource on this very subject.
With regards to how you handle the URLs do not do option 1 as you quite rightly identified you will cause issues the SEO rankings of the page and it will mean that users cannot correctly share or return to the page.
Two approaches are having the language code in the URL as you identified in option 1.
Pros
Simpler to configure
Cons
You have one application which means that as you add more languages you add more complexity and weight on the memory of that app
Or you can have a different sub domain or domain per application e.g. es.yourdomain.com or yourdomain.es they can all be the same codebase
Pros
Each language is a standalone application meaning it has it's own memory
Cons
more effort to configure

http://i18n.riaforge.org/ has a download for i18n. It can be used to make sure that all string labels match. That way if some one wants to change "Save" to "Update", it can all be done in one spot.
It is also important to consider the technical background of those that will being doing the translation. It is often easier to get the translation team to edit files in notepad as opposed to updating a db. Text files work well with version control.

The best way i found is to use an XML to hold just that pages language stuff, one xml to cover each page, and you then vary it for language. when the page loads, just load a different XML from the database or files... many ways to do this. all other methods i have tried have their issues, and at least this one allows you to take a language XML, hand it to someone who will copy it, and then change the boxes... you put it in the DB to serve it.
one can also do this for text, and have the DB make the XML for just the text for that page by using a list of items to include in the XML for the page.
once you get the idea, the rest becomes very easy...
and given CF ways of accessing such data with dot notation, easy peasy to us
say you have "Load Images"
in english xml it may be <LoadIMGS>Load Images</LoadIMGS>
in chinese it may be <LoadIMGS>加载图像</LoadIMGS>
or <LoadIMGS>Jiā zǎi túxiàng</LoadIMGS>
regardless, in your CFM code you would just put #variablename.LoadIMGS# in the place... i would also suggest putting in the loadimages tag the size the font should be adjusted to if not normal size. that way, when translations are too large, you can shrink that font there for that... etc.
enjoy!!!

Regex with iframe in Yahoo! Pipes

I'm building a Yahoo! Pipe to pull an RSS feed from Reddit which links to some content in the description. I'm using a regex to match the href attribute of the anchor link in an item.description field. The regex I'm using is:
^.+?href="([^"]+)">\[link\].+?$
As a test, I set the replace to simply:
$1
and I see that the entire description field has been replaced with the URL. So far, so good.
I then put the following in the replace field. The idea being to iframe the content that's linked to:
Content: <iframe src="$1">no iframe support</iframe> End
What I get out however is:
Content: no iframe support End
I've confirmed that this is also coming through in the pipe's output and not just in the Yahoo! Pipes debug console.
I've so far tried replacing my angle brackets with < and > entities. I've tried wrapping the entire thing in a <![CDATA[ ... ]]> block and still, I get nothing. If I break my iframe tag by removing an angle bracket, the broken content comes through fine, but if I have a well-formed iframe element, it vanishes, leaving the "no iframe support" text. Am I doing something wrong here, or is Yahoo! actively preventing me from using iframe tags in my generated pipe? A cursory search on Google isn't turning up anything related to this.
The pipe in question is here:
http://pipes.yahoo.com/pipes/pipe.info?_id=2ba41448cadd2347d86f377efd3d199f

This Pipes FAQ Question "Why does Pipes Strip <object> and <embed> tags... ?" shows that a certain amount of sanitization is performed, by placing content (at least certain content) into an iframe for the safety of RSS consumers - though it does not state it specifically, this probably also removes other iframes in order to avoid nesting and other work-arounds.
Yahoo is big enough I would doubt they have a week sanitizer, but an extremely long shot is that you might be able to fool it by nesting the iframe in a bunch of other tags (again I doubt this will work). Also depending upon which step does the sanitization, perhaps adding part of the tag in one step, then adding another part somewhere else might work (yet again, doubt overwhelms me)
Not sure what else to suggest, other than getting something else to consume and transform your RSS a little bit more (by fixing otherwise broken tags??) - but that's what you're using pipes for to begin with, isn't it? Idunno...
Good luck!

Pipes has an fanatical devotion to the RSS spec and the spec says the description field is plain text only. HTML etc is supposed to go in the content:encoded field, not that I've had much luck getting pipes to do that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js