Standard/common mark up language for user provided content? - wiki

I suppose this might be similar to this question, but I'm wondering if there's a standard/popular mark up language for wiki and similar style user provided content. With the proliferation of different mark up syntaxes out there it seems like one would be a defacto one to implement. There appears to have been at least one group that wanted to create a standard (and an RFC for it), but they appear to have fizzled out around mid-2005.
So does anybody know what the standard/most popular one is? If there isn't one, what's the easiest for users but has a good flexibility for advanced users?

Markdown is popular in some circles. In addition to the original version in Perl, there's a C version called Discount and several other implementations including one in Lua.
There's a list of lightweight markup languages here.

No standard that I know of.
I'd say the most popular (well, in the projects I tend to follow at least) seems to tend towards Markdown and Textile these days.
It's often a matter of preference. My preference goes to Markdown but there wouldn't be that much to argue about if someone preferred another markup.

Related

Is the English language a standard to describe a Webservice (Rest, SOAP Etc.)?

In the current project where I'm working on, we had a discussion about the language (English or Spanish) we should use to describe pathnames, fields, payloads, Etc.
I thought this was an easy topic, however, I couldn't find an accurate source where is described that English should be the standard for describing Webservice's components.
For example, assume we have a Rest API very simple with CRUD for creating Users
English
POST /v1/users
GET /v1/users/{uid}
DELETE /v1/users/{uid}
PUT /v1/users/{uid}
Spanish
POST /v1/usuarios
GET /v1/usuarios/{uid}
DELETE /v1/usuarios/{uid}
PUT /v1/usuarios/{uid}
I think this is not a problem, however, I want to understand if I need to follow a standard or if it doesn't matter which language I use for describing Webservice's components.
Probably, this is a primarily opinion-based question, if you think it is, please just address me with a comment.
Being an Spanish native speaker myself, I see no technical justification to write an API in anything other than English, the de-facto technical language.
If the intention is to develop an API that is intractable to anyone outside of the Spanish-speaking world, I guess this would be the way to go.
Yet, I see this happens again and again among Spanish-speaking software projects. I think it is detrimental.
OTH, let us look at Rakuten, one of the most powerful tech companies in Japan. Rakuten decided to enforce English as the working language for all employees, even though the bulk of its business is in Japan.
It tells you something. In the globally connected 21st century world, aim your products for the widest audiences.
I cannot conjure an axiomatic reason to do so (in the way I would say "do not use goto statements.") But writing something in anything other than English is something I would not do.
I don't think there's a norm, but I'd simply say : there could be lots of way your code can be used by people with other language. So, sticking to english is the better way to let it understandable and usable by anybody.
I'm a native bilingual Spanish-English speaker living and working in Spain. In many of my IT software development projects I find myself working in international teams collaborating with people in India, France and UK. In these scenarios it's sensible to be understood in a language common to all. On the other hand I've also worked on projects where the client only spoke a regional language and there was no need to use neither Spanish nor English. It really depends who your colleagues are and who your end-users are.
Using another language as English always enforces you to replace the special characters of your language, like accents and ñ and stuff like that. But that is solvable.
The question is: Who is going to use your service? If the consumers are non-Spanish speakers, they definitely appreciate English.
However, in my experience you often have very domain-specific names, that sometimes can't be translated properly into English or it is hard to know what the right word in English would be as translators often give you weird translations for your domain specific (Spanish) words. So I saw many Webservices that were translated into English but I still had to ask: What does that mean? because the translation just didn't make sense.
Therefore my opinion is to use Spanish, unless your Webservice is actually consumed outside the Spanish speaking world. Because if it is consumed outside the Spanish speaking word, a good translation is available too.
I am a German native speaker and my working language is mostly English (working for a US company), definitely when it comes to coding. I basically have all my professional software set to English as well, and I even have to say that I'm getting confused when I see it in German ;-)
So my point is, that from my daily routine point of view I wouldn't even consider using my native language for coding.
However, on a more general level, I would want to argue that an API should be created in such way that the consumer can easily tell from the endpoint naming what it is supposed to do. So as some others stated above, I think it is worthwhile thinking about who is the consumer eventually.
If you have a closed target audience, all speaking the same language, no plans to expose the API externally, then I could well imagine to make the endpoint labels in that language.
Two more points worth considering:
- Can you be sure that also all (future) developers speak that language and it should not be confusing to them?
- What about providing the API endpoints with labels in both languages? Depending on the programming language / framework you use, I think it could be quite simple to duplicate the endpoints, yet, calling the same logic. I'm not sure this is good practice, but if there's a benefit to the consumers, and you'd still like to stick to the working language to be on the safe side, I could imagine this.
I think if your team (and future developers in your project/company) and client are Spanish speakers you could use English or Spanish as you want. But if you think in any moment a non-Spanish speaker will code or use your API, you should use English.

Best choice for runtime templating engine?

We're designing an app that will generate lots of different types of text output, eg email, html, sms, etc. The output will be generated using some kind of template, with data coming from a db. Our requirements include:
Basic logic / calculated fields within template. Eg "ifs" and "for" loops, plus some things like adding percentages for tax etc
Runtime editing. Our users need to be able to tweak the templates to their needs, such as change boilerplate text, add new logic, etc
Multi lingual. We need to choose the correct template for the current culture.
Culture sensitive. Eg dates and currencies will output according to current ui culture.
Flexibility. We need the templates to be able to handle multiple repeating groups, hierarchies, etc.
Cannot use commercial software as a solution (e.g. InfoPath). We need to be able to modify the source code at any time.
The app is c#.net. We are considering using T4, XML + XSLT or hosting the Razor engine. Given that the syntax cant be too overwhelming for non-techie users, we'd like to get your opinion on which you feel is the right templating engine for us. We're happy to consider ones notalready mentioned too.
Thanks.
I'm very hesitant to try and answer this question on a forum, because technology choices depend on far more factors than are conveyed in the question, including things such as attitude to risk, attitude to open source, previous good and bad experiences, politics and leadership on the project etc. The big advantage of XSLT over Razor is that it's a standard and has multiple implementations on multiple platforms (including at least three implementations on .NET!) so there's no lock-in; but that doesn't seem to be a factor in your statement of requirements. And the fact that you're using .NET suggests that supplier lock-in isn't something that worries you anyway.
One thing to bear in mind is that non-programmers often take to XSLT a lot more quickly than programmers do. Its rule-based declarative approach, and its XML syntax, sometimes make programmers uncomfortable (it's not like anything they have seen before) but end-users often take to it like ducks to water.
We've decided to go with Razor Hosting. The reason why I've posted this is an answer is that I thought it would help others if I include the following article link:
http://www.west-wind.com/weblog/posts/2010/Dec/27/Hosting-the-Razor-Engine-for-Templating-in-NonWeb-Applications
This excellent piece of work by Rick Strahl makes it really easy to host Razor.

javascript list manipulation library/framework

I'm looking for a javascript library/framework to manipulate lists. Is there anything like this already out there?
Ideally I'd like something equivalent to .NET's List. One of the main requirements is the ability to remove items from anywhere in a list. Some LINQ-like functionality would be great.
Underscore.js
It's not a substitute for a completely functional List replacement, but:
it does get you a long way,
it's well designed,
well documented (and has a very nicely "literate-programming"-style annotated source),
easy to extend,
lightweight.
It has a nice expressiveness and power to memory footprint ratio.
(Note that is was inspired by the two following libraries, FunctionalJS and Data.js.)
FunctionalJS
It shares most of Underscore.js's attributes, and is definitely more oriented towards functional programming. However:
it is less actively maintained,
it is slightly harder to use if you're not familiar with functional concepts.
Data.js
More than a purely functional programming library like FunctionalJS, Data.js also covers storage aspects, graph-like data-structures and other goodies.
(It is funny to note that Data.js now lists Underscore.js has an influence in its newer iteration, while Underscore.js already lists Data.js as its own influence.)
List.js
List.js is for manipulating HTML lists. It may not be what you want, but I thought of adding it here as well as it does its job very well and fits a nice niche in terms of briding data and UI management in one (not necessarily a good idea, but works for some cases).
Others...
Dojo (and many other JS libraries nowadays) supports some of the newer JS APIs or provides substitute implementations if they are missing, with some of them fairly functional by nature and design.
However, they don't push the concept quite as far, and these libraries are more heavyweight, so I wouldn't recommend them if that's all you want out of them.
jLinq, as mentioned by JanusTroelsen in your question's comments, looks very promising as well but I would be more concerned about the maturity of the library and its memory footprint for what it is (but the code seems very "spaced", so a compressed version might be acceptable).
May linq.js what you're looking for ? http://neue.cc/reference.htm
Also, http://microjs.com/ is a good site to find a library corresponding to a specific need :)
Also you can try manipula package that implements all of C# LINQ methods and preserves its syntax:
https://www.npmjs.com/package/manipula

Word language detection in C++

After searching on Google I don't know any standard way or library for detecting whether a particular word is of which language.
Suppose I have any word, how could I find which language it is: English, Japanese, Italian, German etc.
Is there any library available for C++? Any suggestion in this regard will be greatly appreciated!
Simple language recognition from words is easy. You don't need to understand the semantics of the text. You don't need any computationally expensive algorithms, just a fast hash map. The problem is, you need a lot of data. Fortunately, you can probably find dictionaries of words in each language you care about. Define a bit mask for each language, that will allow you to mark words like "the" as recognized in multiple languages. Then, read each language dictionary into your hash map. If the word is already present from a different language, just mark the current language also.
Suppose a given word is in English and French. Then when you look it up ex("commercial") will map to ENGLISH|FRENCH, suppose ENGLISH = 1, FRENCH=2, ... You'll find the value 3. If you want to know whether the words are in your lang only, you would test:
int langs = dict["the"];
if (langs | mylang == mylang)
// no other language
Since there will be other languages, probably a more general approach is better.
For each bit set in the vector, add 1 to the corresponding language. Do this for n words. After about n=10 words, in a typical text, you'll have 10 for the dominant language, maybe 2 for a language that it is related to (like English/French), and you can determine with high probability that the text is English. Remember, even if you have a text that is in a language, it can still have a quote in another, so the mere presence of a foreign word doesn't mean the document is in that language. Pick a threshhold, it will work quite well (and very, very fast).
Obviously the hardest thing about this is reading in all the dictionaries. This isn't a code problem, it's a data collection problem. Fortunately, that's your problem, not mine.
To make this fast, you will need to preload the hash map, otherwise loading it up initially is going to hurt. If that's an issue, you will have to write store and load methods for the hash map that block load the entire thing in efficiently.
I have found Google's CLD very helpful, it's written in C++, and from their web site:
"CLD (Compact Language Detector) is the library embedded in Google's Chromium browser. The library detects the language from provided UTF8 text (plain text or HTML). It's implemented in C++, with very basic Python bindings."
Well,
Statistically trained language detectors work surprisingly well on single-word inputs, though there are obviously some cases where they can't possible work, as observed by others here.
In Java, I'd send you to Apache Tika. It has an Open-source statistical language detector.
For C++, you could use JNI to call it. Now, time for a disclaimer warning. Since you specifically asked for C++, and since I'm unaware of a C++ free alternative, I will now point you at a product of my employer, which is a statistical language detector, natively in C++.
http://www.basistech.com, the product name is RLI.
This will not work well one word at a time, as many words are shared. For instance, in several languages "the" means "tea."
Language processing libraries tend to be more comprehensive than just this one feature, and as C++ is a "high-performance" language it might be hard to find one for free.
However, the problem might not be too hard to solve yourself. See the Wikipedia article on the problem for ideas. Also a small support vector machine might do the trick quite handily. Just train it with the most common words in the relevant languages, and you might have a very effective "database" in just a kilobyte or so.
I wouldn't hold my breath. It is difficult enough to determine the language of a text automatically. If all you have is a single word, without context, you would need a database of all the words of all the languages in the world... the size of which would be prohibitive.
Basically you need a huge database of all the major languages. To auto-detect the language of a piece of text, pick the language whose dictionary contains the most words from the text. This is not something you would want to have to implement on your laptop.
Spell check first 3 words of your text in all languages (the more words to spell check, the better). The spelling with least number of spelling errors "wins". With only 3 words it is technically possible to have same spelling in a few languages but with each additional word it becomes less probable. It is not a perfect method, but I figure it would work in most cases.
Otherwise if there is equal number of errors in all languages, use the default language. Or randomly pick another 3 words until you have more clear result. Or expand the number of spell checked words to more than 3, until you get a more clear result as well.
As for the spell checking libraries, there are many, I personally prefer Hunspell. Nuspell is probably also good. It is a matter of personal opinion and/or technical capabilities which one to use.
I assume that you are working with text not with speech.
If you are working with UNICODE than it has provided slot for each languages.
So you can identify that all characters of particular word is fall in this language slot.
For more help about unicode language slot you can fine over here

isFollowingCamelCaseConventionInCPlusPlus more_import_than_readability?

I'm moving back from Python to C++ for my next project.
I know why I shouldn't and I know why I should. Never mind that debate.
C++ conventionForVariablesIsCamelCaseAndI'mHavingTroubleAcceptingIt as, at_least_for_my_eyes_it's_less_readable_than_the_lower_case_underscored_convention.
Did any of you encounter an article or information claiming programmers should adopt lower_case_underscored convention and abandon the camelCase, even in C++ projects? Or perhaps research that shows that one is indeed scientifically more readable than the other?
If coding in a team, consistency is probably more important than personal preference; team members should not have to context switch between reading Joe's code and reading Jane's code. Equally if coding academic assignments, course style like team style should be adhered to (for right or wrong), simply because the person awarding the marks will be used to reading that, and you need to extract the best possible mark for your work!
I would suggest otherwise that one convention has little advantage over another. CamelCase does provide a certain efficiency of symbol length.
If it's a private project, use whatever naming convention you feel comfortable with and helps you be productive. Just bear in mind that it is generally a good idea to be in keeping with the overall "style" oft he language / usual practise since any samples / examples etc. will usually use that style, making integration etc. easier and less "jarring".
If it's a public project it's probably better to use the conventions since that's easier for other people to work with.
If it's corporate, do whatever your corporate guidelines mandate. If there aren't any, then I'd do the same as for a public project.
One thing I'd personally say about CamelCase, is not to get completely hung up on it, and apply common sense for the sake of readability. For example, I've often seen abbreviations in camel case names written as part upper / part lower, which I think really hurts readability. In cases like this I'd always go for the more readable option. So, for example I'd always write:
private string targetURL;
rather than
private string targetUrl;
But, this is just personal preference.
There is no agreed upon convention for C++ naming among C++ programmers, however lower case with underscores is used in both the C++ standard library and in boost.
As for the coding standards document you linked...
A link to a random company's coding standards that use "Common practice in the C++ development community" as a justification for their standards, yet provide no citation for that statement smells like a false appeal to authority in order to justify the preferences of whoever wrote the document.
CamelCase vs underscores: Scientific showdown describes a single scientific study which found:
Considering all four hypotheses together, it becomes evident that the camel case style leads to better all around performance once a subject is trained on this style. Training is required to quickly recognize such an identifier.
But then the page disagrees that their conclusions are valid. :)
It also has two polls of visitors to the site, both of which are 50/50 in favor of each style.