Why doesn't std::string define multiplication or literals? [closed] - c++

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In the language I was first introduced to, there was a function repeat(), that took a string, and repeated it n times. For example, repeat ("hi", 3) gives a result of "hihihi".
I did have quite a few times that I used this function, but to my dismay, I've never found something similar in C++. Yes, I can easily make my own, or make it easier to use, but I'm kind of surprised that it isn't included already.
One place it would fit in really well is in std::string:
std::string operator* (const std::string &text, int repeatCount);
std::string operator* (int repeatCount, const std::string &text);
That would allow for syntax such as:
cout << "Repeating \"Hi\" three times gives us \"" << std::string("Hi") * 3 << "\"."
Now that in itself isn't all too great yet, but it could be better, which brings me to my other part: literals.
Any time we use the string operators, such as operator+, we have to make sure one argument is actually a string. Why didn't they just define a literal for it, like ""s? Literal suffixes not beginning with an underscore are reserved for the implementation, so this shouldn't conflict seeing as how this could have been added before anyone actually started making their own.
Coming back to the repeat example, the syntax would simply be:
cout << "123"s * 3 + "456"s;
This would produce:
123123123456
While at it, one for characters could be included as well, to satisfy cout << '1's + '2's;
Why weren't these two features included? They definitely have a clear meaning, and make coding easier, while still using the standard libraries.

Well, as for the multiplication, it's not really in C++'s philosophy: languages like Ruby come "batteries included" and with the "principle of least surprise". They are intended to have lots of these little niceties as an above-and-beyond-the-minimum feature. C++ is a system level language which is intended to be "closer to the metal", which is abundantly clear in that a string isn't even a core data type of the language, but a library-provided addon. Even FORTRAN has a native string type, so you can see how low-level C++ is positioned to be. This is intentional: what if you're programming an embedded chip with 1K storage and have no use for strings? Just don't include 'em.
Anyway, it's not 100% clear what the multiplication operator is supposed to do. In other words, in the core language and libraries of C++, no feature gets in unless it seems that almost everyone would agree on the proposed functionality. It's got to be really universal.
I for one might think that "123" * 3 could give "369" - multiply any digits by 3. I further propose that this interpretation is "sane" enough to keep your repeat operator interpretation from being the only unambiguous interpretation.
The literal notation is far easier and more clear to answer: std::string is a part of the standard library, which is one level of abstraction above the language syntax itself. In other words, the literal notation would bust through the level of abstraction that separates "language features" and "the library that you can expect to be bundled with your compiler".

std::string.append(numtimes, char to repeat); There. It does. Now, that works for character literals, as for doing that with strings there is no way to do that declaratively. Also to note, the std::string class isn't a language type, a string litteral in c++ is const char *. A pointer. Which Arrays decay into pointers, and that's how the std::string class treats what it contains , an array of characters ie char, or wchar_t for std::wstring. there templated on those types.

Related

what does operator"" overloading mean in C++? [duplicate]

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.

Where is the standard library predefined 'user-defined' literal "m" defined?

I stumbled upon the following example while browsing the C++ Core Guidlines document:
Example
change_speed(double s); // bad: what does s signify?
// ...
change_speed(2.3);
A better approach is to be explicit about the meaning of the double (new speed or delta on old speed?) and the unit used:
change_speed(Speed s); // better: the meaning of s is specified
// ...
change_speed(2.3); // error: no unit
change_speed(23m / 10s); // meters per second
We could have accepted a plain (unit-less) double as a delta, but that would have been error-prone.
With regard to that last line of code, there was no mention on that specific page what that syntax meant, and it looked positively alien to me.
After several hours wasted invested figuring out this is probably a standard library predefined 'user-defined' literal and learning more about them, I tried to find out where this particular literal is defined, but while 's' is mentioned here along with a handful of other literals, I found no information on 'm'.
There is also this question on SO, but I think the answers there seem to be completely out of date.
Q: Where is the standard-library predefined user-defined literal "m" defined*?
* Sweet, sweet alliteration.
It is not a standard user defined literal
The list of standard user defined literals can be found here at the bottom: http://en.cppreference.com/w/cpp/language/user_literal
And operator""m is not one of them, as the standard library does not deal with units of length (yet).

Why do C fundamental types have identifiers with multiple keywords

Apart from obvious answer because, guys designed it that way, why does C/C++ have types, which consist of multiple identifiers, e.g.
long long (int)
short int
signed char
A do have some basic knowledge of parsing and have used flex/bison tools to make few parsers and I think, that this bring much more complexity to parsing type names. And looking on C++ grammar in standard, everything about types really is complicated.
I know, that C++ (also C, I believe) do not specify much about sizes of fundamental data types, thus making types int_8, uint_8, etc. would not work (Altough c++11 gave us fixed width integers).
So, why did developers of standard agreed on multi-word type identifiers, when they could make int, uint and similar.
Speaking in terms of C, why did the developers of the standard agree on multi-word identifiers? It's because that was what the language had at the time of standardisation.
The mandate for the original standard was not to create a new language but to codify existing practice. As per the C89 standard itself:
The Committee evaluated many proposals for additions, deletions, and changes to the base documents during its deliberations. A concerted effort was made to codify existing practice wherever unambiguous and consistent practice could be identified. However, where no consistent practice could be identified, the Committee worked to establish clear rules that were consistent with the overall flavor of the language.
And, from the C99 rationale document:
The original X3J11 charter clearly mandated codifying common existing practice, and the C89 Committee held fast to precedent wherever that was clear and unambiguous. The vast majority of the language defined by C89 was precisely the same as defined in Appendix A of the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie, and as was implemented in almost all C translators of the time.
Beyond that, each iteration of the standard has valued backward compatibility highly so that code doesn't break. From that same rationale document:
Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. Every attempt has been made to ensure that the bulk of this code will be acceptable to any implementation conforming to the Standard. The C89 Committee did not want to force most programmers to modify their C programs just to have them accepted by a conforming translator.
So, while later versions of the standard gave us things like stdint.h with its fixed width integral types, taking away the standard ones like int and long would be a gross violation of that guideline.
In terms of C++, it's almost certainly a holdover from the earliest days of that language where it was put forward as "C plus classes". In fact, the very early cfront C++ compiler was so named because it took C++ source code and turned that into C before giving it to a suitable C compiler (i.e., a front end for C, hence cfront).
This would have allowed the original author Bjarne to minimise the workload in delivering C++ since the bulk of it was already provided by the C compiler itself.
In terms of parsing a language, it's certainly more difficult to have to process unsigned long int x (a) than it is to handle ulong x.
But, given that the compiler already has to handle a large number of optional "modifiers/specifiers" for a variable (e.g., const char * const x), handling a few others is par for the course.
(a) Or int long unsigned x or long unsigned x or any of the other type specifiers that end up becoming the singular unsigned long int type. See here for more details.
Adding new reserved words to a language will break any code which happens to use such words as identifiers unless those words are of a form which is reserved for future expansion (e.g. contain two leading underscores, or start with an underscore and a capital letter, etc.)
By contrast, if some particular sequence of reserved words has no defined meaning in any existing implementation, there can be no existing code which uses that sequence of reserved words, and thus no danger of breaking existing code by attaching a new meaning to it.

What is the decimal point ('.') in C++ and can I make one?

I am in a C++ class right now so this question will concern itself primarily with that language, though I haven't been able to find any information for any other language either and I have a feeling whatever the answer is it's probably largely cross language.
In C++ unmarked numbers are assumed to be of integral type ('4', for example, is an integer)
various bounding marks allow for the number to be interpreted differently (''4'', for example, is a character, '"4"' a string).
As far as I know, there is only one kind of unary mark: the decimal point. ('4.' is a double).
I would like to create a new unary mark that designates a constant number in the code to be interpreted as a member of a created datatype. More fundamentally, I would like to know what the '.' and ',' and '"', and ''' are (they aren't operators, keywords, or statements, so what are they?) and how the compiler deals with/interprets them.
More information, if you feel it is necessary:
I am trying to make a complex number header that I can include in any project to do complex math. I am aware of the library but it is, IMHO, ugly and if used extensively slows down coding time. Also I'm mostly trying to code this to improve my programming skills. My goal is to be able to declare a complex variable by doing something of the form cmplx num1= 3 + 4i; where '3' and '4' are arbitrary and 'i' is a mark similar to the decimal point which indicates '4' as imaginary.
I would like to create a new unary mark that designates a constant number in the code to be interpreted as a member of a created datatype.
You can use user defined literals that were introduced in C++11. As an example, assuming you have a class type Type and you want to use the num_y syntax, where num is a NumericType, you can do:
Type operator"" _y(NumericType i) {
return Type(i);
}
Live demo
Things like 4, "4" and 4. are all single tokens,
indivisible. There's no way you can add new tokens to the
language. In C++11, it is possible to define user defined
literals, but they still consist of several tokens; for complex,
a much more natural solution would be to support a constant i,
to allow writing things like 4 + 3*i. (But you'd still need
the C++11 support for constexpr for it to be a compile time
constant.)

Is it feasible to ascribe pronunciations to distinct source code concepts?

I frequently tutor fellow students in programming, most often in C++ or Java.
It is uniquely aggravating to try to verbally convey the essential syntax of a C++ expression. The speaker must give either an idiomatic translation into English, or a full specification of the code in verbal longhand, using explicit yet slow terms such as "opening parenthesis", "bitwise and", et cetera. Neither of these solutions is optimal.
In C++, there is a finite set of keywords—63—and operators—54, discounting named operators and treating compound assignment operators and prefix versus postfix auto-increment and decrement as distinct. There are just a few types of literal, a similar number of grouping symbols, and the semicolon. Unless I'm utterly mistaken, that's about it.
Would it not then be feasible to ascribe a concise, unique pronunciation to each of these distinct concepts (including one for whitespace, where it is required) and go from there? Programming languages are far more regular than natural languages, so the pronunciation could be standardised.
Instead of creating new "words" to describe them, for things such as "include" you could simply prefix it with "keyword" when saying it aloud. You could use words/phrases commonly known to say other parts as well. As with any new programmer, you have to literally describe everything anyway, so I don't think that requires special attention. I think creating new words is the harder method...
So, for example:
#include <iostream>;
int main()
{
if (1 < 2)
return 1;
else
return 0;
}
Could be read out as:
(keyword) include iostream new-line
(keyword) int main no params start
block if number 1 (operator) less than
number 2 new-line (keyword) return
number 1 new-line (keyword) else
new-line (keyword) return number 0 end
block
Treat words in () as optional descriptive words, most likely to be used in more complex code. You could use the word 'literal' if you want them to actually write the descriptive word. For example
(keyword) if literal number (operator)
less than literal keyword
becomes
if (number < keyword)
Other words could be given defined meanings as well, such as 'split-line' when you want them to continue on the next line, without closing any currently open parenthesis, etc.
I personally find this method quite simple to use and easy to teach. YMMV, as always.
Of course, this doesn't solve the internationalisation issue, but at worst, would result in 'new words' being used in the non-English languages, which is no worse than the proposed solution you offered.
As a blind developer, programming since I was 13, I found this question really interesting. First of all, as mentioned by other peple, learning a new language to be able to understand code is not a practical solution, as it would probably take longer to learn the spoken utterances as it would to learn the actual programming language.
Reading the question/answers two further points occured to me:
Firstly, you'd be surprised how important "thinking time" is. I have previously programmed in C/C++/Java and now use C# as my primary language, and consider myself very competant. But when I did a couple of projects in Python, I found the reduced punctuation robbed me of my "thinking time" - subconsciously, I was using the punctuation to digest what I'd just heard - fascinating... However, the situation is a bit different when it comes to identifiers, as these aren't well known by the listener - I personally find it hard to listen to code with acronym variables (RGXRatio, RGVRatio) as I don't have time to figure out what it means. On the flip side, hungarian notation and initial underscores makes code hard to listen to as the length of the variables (in terms of time taken to speak) is much longer than the more important operations being performed on those variables.
Another thing to consider is that the length of the audio stream is an end result, but not the root cause. The reason the audio is so long is because audio is a one-dimensional medium, whereas reading text is a 2d medium with the ability to jump around and skip past irelevant/familiar text. It wouldn't work for a face-to-face lecture, but what if there were keyboard commands for controlling the speech. In text documents my screen reader lets me jump to the next line, but what if this were adapted to the semantics of a programming language. some research, such as by T V Raman at Google, includes using different voices for syntax highlighting, and audio cues to mark metadata like capitals.
I know the original question specifically related to a lecture given to a class, but if like myself you have to listen to entire files of source code , I also find the structure of the code makes a huge difference. I personally read code like a story - left to right, top to bottom. so it's very hard to trace through unfamiliar code when it's written bottom-up.
So would it not then be feasible to simply ascribe a concise, unique pronunciation to each of these distinct concepts (including one for whitespace, where it is required) and go from there? Programming languages are far more regular than natural languages, so the pronunciation could be standardised
Perhaps, but you've lost sight of your goal. The premise was that the person listening did not already know the language. If he does, we can simply say "include iostream" when we mean #include <iostream>, or "vector of int" when we mean std::vector<int>.
Your premise was that the person listening is not familiar enough with the language to understand what you read out loud unless you read out exactly what it says.
Now, inventing a whole new language just to describe the primitives that occur in your source code doesn't solve the problem. Instead, you still have to read out every syntactic token (with simpler, more "standardized" pronunciations, yes, but they still have to be read out loud), and the person listening still won't understand you, because if they don't know C++ well enough to understand "include iostream", they won't understand your standardized pronunciation either. And if you're going to teach them your pronunciation, why bother, when you could've just taught them to understand C++ syntax directly instead?
There's also the root problem that C++ code tends to consist of a lot of syntactic tokens. Take a line as simple as this:
std::vector<int> v;
I count 9 tokens. Not one of them can be omitted. If the person listening does not understand the code and syntax well enough to understand a high-level description such as "declare a vector of int, named v", then you'll have to read out all 9 tokens in some form. Even if you come up with simpler names than "namespace resolution operator" and "less than sign", you still have to list 9 token names. Which is a lot of work.
In short, no, I don't think it'd work. First, it's still too cumbersome, and second, it's presuming prior knowledge on the part of the person listening, when the motivation for this was that the person listening was a student without the prior knowledge that made it possible to understand a high-level description of the code.