Okay, so I need know which of below flowcharts has the accepted 'standard' when it comes to representing selection.
AFAIK both are accurate, but which one has the standard representation?
Method 1,
Method 2,
I would say that "Method 2" is a standard way of showing this selection flowchart, as a decision (diamond) doesn't usually have both boolean outputs leading to the same point.
Related
Is there any native (cross platform) C++ function in any of standard libraries which returns the actual length of std::string?
Update:
as we know std::string.length() returns the number of bytes not the number of characters.
I already have a custom function which returns the actual one, but I'm looking for an standard one.
codecvt ought to be helpful, the Standard provides implementations for UTF-8, for example codecvt_utf8<char32_t>() would be appropriate in this case.
Probably something like:
wstring_convert< codecvt_utf8<char32_t>, char32_t >().from_bytes(the_std_string).size()
Actual length is the number of bytes. There is very little meaning to counting codepoints. You may though want to count other things like grapheme clusters.
See more about different kind of string lengths in http://utf8everywhere.org
There is no way to do that in C/C++, without 3rd party libraries.
Even if you convert to char32_t, you will get code points, not characters.
A code point does not match the user perception of a character, because of things like decompose formats, ligatures, variation selectors.
The closest available construct to a "user character" is a "grapheme cluster"
(see http://www.unicode.org/reports/tr29/)
Your best cross-platform option is ICU4C (http://site.icu-project.org/)
Specs provides two different means of hierarchically structuring your specifications. One is by defining a "system under specification" and the other is by making sub-examples (one example is one specification/test statement).
Can someone please answer or point to a website what the intended usage of those different mechanisms is in general? I'm also curious about the reusing of specifications/examples.
My Use-Case
In particular I have a project that contains different algorithms A to compute some output X given a specific input examples Y. Should I choose the algorithms A to be the SUS, so that I can reuse a setup like "must compute the correct result for example Y_1; must compute the correct result for example Y_2; ..."? Or should I specify the different examples to be the SUS, so that I get "must be solvable by algorithm A_1; must be solvable by algorithm A_2; ..."?
What shall I turn into SUS and what into sub-examples?
Usually the system under specification (SUS) is the code that you're specifying, not the data.
Then the main differences between the SUS and normal examples/sub-examples in specs is the fact that a SUS has several additional methods to set the context such as the ->- method.
What I would actually suggest in your case, if the data is effectively the same for each algorithm is simply to define a method to create your examples:
def examplesMustPassFor(algo: Algorithm) = {
"The algo "+algo.name should {
"pass the data set 1" in { ... }
"pass the data set 2" in { ... }
"pass the data set 3" in { ... }
}
}
examplesMustPassFor(algo1)
examplesMustPassFor(algo2)
examplesMustPassFor(algo3)
Another important point that I want to mention is that the specs project has now been superseded by specs2 so you might want to check this one out if you're just starting writing your specifications.
Of course don't hesitate to ask more specific questions with code samples on the mailing-list if you want.
Eric.
To Dutch speaking people the two characters "ij" are considered to be a single letter that is easily exchanged with "y".
For a project I'm working on I would like to have a variant of the Damerau–Levenshtein distance that calculates the distance between "ij" and "y" as 1 instead of the current value of 2.
I've been trying this myself but failed. My problem is that I do not have a clue on how to handle the fact that both texts are of different lengths.
Does anyone have a suggestion/code fragment on how to solve this?
Thanks.
The Wikipedia article is rather loose with terminology. There are no such things as "strings" in "natural language". There are phonemes in natural language which can be represented by written characters and character-combinations.
Some character-combinations are vestiges of historical conventions which have survived into modern times, as in modern English "rough" where the "gh" can sound like -f- or make no sound at all. It seems to me that in focusing on raw "strings" the algorithm must be agnostic about the historical relationship of language and orthographic convention, which leads to some arbitrary metrics whenever character-combinations correlate to a single phoneme. How would it measure "rough" to "ruf"? Or "through" to "thru"?
Or German o-umlaut to "oe"?
In your case the -y- can be exchanged phonetically and orthographically with -ij-. So what is that according to the algorithm, two deletions followed by an insertion, or a single deletion of the -j- or of the -i- followed by a transposition of the remaining character to -y-? Or is -ij- being coalesced and the coalescence is followed by a transposition?
I would recommend that you use another unused comnbining character for -ij- before applying the algorithm, perhaps U00EC, Latin small letter i with grave accent.
How does the algorithm handle multi-codepoint characters?
Well the D-L distance itself isn't going to handle it for you, due to the way it measure distances.
As there is no code (or language) involved here, I can only leave you with a suggestion to ensure all strings adhere to the same structure.
To clarify the situation since your asking in general terms,
bear in mind that the D-L distance compares character for character and doesn't actually read your strings in themselves, as such you'll have to parse before compare, as cases where ij shouldn't be exchanged with y will cause other issues instead.
An idea is to translate each string into some sort of constructed orthographemic representation, where digraphs such as "ij" and the english "gh" "th" and friends are only one character long. The distance metric does not have to be equal for all types of replactements when doing Damerau-Levenshtein so you can use whatever penalties you want, but the table needs to be filled locally, therefore you really want each sound to be one cell in the table.
This however breaks when the "ij" was not intended as "ij" but a misspelling or at a word-segmentation border (I don't know if that can happen in Dutch), or in any other situation it is not actually (meant as) a digraph.
Otherwise you will need to do some lookaround, this will complicate things but should not change the growth order of the algorithm (I believe), provided you only look at constant number of cells around. The constant factors will still be much bigger though.
tribool strikes me as one of the oddest corners of Boost. I see how it has some conveniences compared to using an enum but an enum can also be easily expanded represent more than 3 states.
In what real world ways have you put tribool to use?
While I haven't used C++, and hence boost, I have used three-state variables quite extensively in a network application where I need to store state as true/false/pending.
An extra state in any value type can be extremely valuable. It avoids the use of "magic numbers" or extra flags to determine if the value of a variable is "maybe" or "unknown".
Instead of true or false, the state of a tribool is true, false, or indeterminate.
Let's say you have a database that contains a list of customers and their dateOfBirth. So you write a function along the lines of :
tribool IsCustomerAdult(customerName);
The function returns:
`true` if the customer is 18 or older;
`false` if the customer is less than 18;
`indeterminate` if the customer is not in the database
(or the dateOfBirth value is not present).
Very useful.
I think the extra benefit is not only the 3rd value, but also that you can easily use the 3-valued logic!
For example:
(true && indeterminate) == indeterminate
(true || indeterminate) == true
SQL implements such logic.
I've seen numerous examples of two booleans being used to represent three possible states, explicitly or otherwise, with the fourth state being silently assumed to be impossible. In at least two cases, I've changed such constructions to use tribool since we started using boost.
I am a big fan of the Boost library and started using it at company who I have since left. After getting exposure to and using the boost library extensively throughout our project I stumbled on tribool and was considering using for some "Fuzzy Logic" algorithms needing improvements.
I left before I had a chance to get into it, but beyond the "Fuzzy Logic" example, other modules in the system had components with this sort of between state that considering now, I would probably end up using tribool in a decent amount of code if I was still with the company.
-bn
I think it is very useful for Language moulding such as OCR applications and Speech synthesis because as you know human languages are ambiguous and they have a lot of Intermediate statuses
looking foreword to improve the current technologies using the tribool
I am considering the problem of validating real numbers of various formats, because this is very similar to a problem I am facing in design.
Real numbers may come in different combinations of formats, for example:
1. with/without sign at the front
2. with/without a decimal point (if no decimal point, then perhaps number of decimals can be agreed beforehand)
3. base 10 or base 16
We need to allow for each combination, so there are 2x2x2=8 combinations. You can see that the complexity increases exponentially with each new condition imposed.
In OO design, you would normally allocate a class for each number format (e.g. in this case, we have 8 classes), and each class would have a separate validation function. However, with each new condition, you have to double the number of classes required and it soon becomes a nightmare.
In procedural programming, you use 3 flags (i.e. has_sign, has_decimal_point and number_base) to identify the property of the real number you are validating. You have a single function for validation. In there, you would use the flags to control its behaviour.
// This is part of the validation function
if (has_sign)
check_sign();
for (int i = 0; i < len; i++)
{
if (has_decimal_point)
// Check if number[i] is '.' and do something if it is. If not, continue
if (number_base = BASE10)
// number[i] must be between 0-9
else if (number_base = BASE16)
// number[i] must be between 0-9, A-F
}
Again, the complexity soon gets out of hand as the function becomes cluttered with if statements and flags.
I am sure that you have come across design problems of this nature before - a number of independent differences which result in difference in behaviour. I would be very interested to hear how have you been able to implement a solution without making the code completely unmaintainable.
Would something like the bridge pattern have helped?
In OO design, you would normally
allocate a class for each number
format (e.g. in this case, we have 8
classes), and each class would have a
separate validation function.
No no no no no. At most, you'd have a type for representing Numeric Input (in case String doesn't make it); another one for Real Number (in most languages you'd pick a built-in type, but anyway); and a Parser class, which has the knowledge to take a Numeric Input and transform it into a Real Number.
To be more general, one difference of behaviour in and by itself doesn't automatically map to one class. It can just be a property inside a class. Most importantly, behaviours should be treated orthogonally.
If (imagining that you write your own parser) you may have a sign or not, a decimal point or not, and hex or not, you have three independent sources of complexity and it would be ok to find three pieces of code, somewhere, that treat one of these issues each; but it would not be ok to find, anywhere, 2^3 = 8 different pieces of code that treat the different combinations in an explicit way.
Imagine that add a new choice: suddenly, you remember that numbers might have an "e" (such as 2.34e10) and want to be able to support that. With the orthogonal strategy, you'll have one more independent source of complexity, the fourth one. With your strategy, the 8 cases would suddenly become 16! Clearly a no-no.
I don't know why you think that the OO solution would involve a class for each number pattern. My OO solution would be to use a regular expression class. And if I was being procedural, I would probably use the standard library strtod() function.
You're asking for a parser, use one:
http://www.pcre.org/
http://www.complang.org/ragel/
sscanf
boost::lexical_cast
and plenty of other alternatives...
Also: http://en.wikipedia.org/wiki/Parser_generator
Now how do I handle complexity for this kind of problems ? Well if I can, I reformulate.
In your case, using a parser generator (or regular expression) is using a DSL (Domain Specific Language), that is a language more suited to the problem you're dealing with.
Design pattern and OOP are useful, but definitely not the best solution to each and every problem.
Sorry but since i use vb, what i do is a base function then i combine a evaluator function
so ill fake code it out the way i have done it
function getrealnumber(number as int){ return getrealnumber(number.tostring) }
function getrealnumber(number as float){ return getrealnumber(number.tostring) }
function getrealnumber(number as double){ return getrealnumber(number.tostring) }
function getrealnumber(number as string){
if ishex(){ return evaluation()}
if issigned(){ return evaluation()}
if isdecimal(){ return evaluation()}
}
and so forth up to you to figure out how to do binary and octal
You don't kill a fly with a hammer.
I realy feel like using a Object-Oriented solution for your problem is an EXTREME overkill. Just because you can design Object-Oriented solution , doesn't mean you have to force such one to every problem you have.
From my experience , almost every time there is a difficulty in finding an OOD solution to a problem , It probably mean that OOD is not appropiate. OOD is just a tool , its not god itself. It should be used to solve large scale problems , and not problems such one you presented.
So to give you an actual answer (as someone mentioned above) : use regular expression , Every solution beyond that is just an overkill.
If you insist using an OOD solution.... Well , since all formats you presented are orthogonal to each other , I dont see any need to create a class for every possible combination. I would create a class for each format and pass my input through each , in that case the complexity will grow linearly.