Clustering initialization - weka

How do you set clustering initialization method?
I found that besides random initialization you can select from a couple of more methods, such as k-means++ and farthest first.
I found that you can use the following method for that:
clusterer.setInitializationMethod(new SelectedTag);
Now, I'm really confused by this SelectedTag. What does it represent and how to use it? More specifically, how to specify k-means++ or farthest first as initialization methods?
Thanks

I found the solution, here is what needs to be done:
clusterer.setInitializationMethod(new SelectedTag(SimpleKMeans.KMEANS_PLUS_PLUS, SimpleKMeans.TAGS_SELECTION));
If you look at SimpleKMeans you will see that it has the following static members:
static int CANOPY
static int FARTHEST_FIRST
static int KMEANS_PLUS_PLUS
static int RANDOM
static Tag[] TAGS_SELECTION
And this is how you use them. You can pass any distance identifier you need.
Cheers!

Related

Static variable from a static linked lib used before its construction [duplicate]

When I use static variables in C++, I often end up wanting to initialize one variable passing another to its constructor. In other words, I want to create static instances that depend on each other.
Within a single .cpp or .h file this is not a problem: the instances will be created in the order they are declared. However, when you want to initialize a static instance with an instance in another compilation unit, the order seems impossible to specify. The result is that, depending on the weather, it can happen that the instance that depends on another is constructed, and only afterwards the other instance is constructed. The result is that the first instance is initialized incorrectly.
Does anyone know how to ensure that static objects are created in the correct order? I have searched a long time for a solution, trying all of them (including the Schwarz Counter solution), but I begin to doubt there is one that really works.
One possibility is the trick with the static function member:
Type& globalObject()
{
static Type theOneAndOnlyInstance;
return theOneAndOnlyInstance;
}
Indeed, this does work. Regrettably, you have to write globalObject().MemberFunction(), instead of globalObject.MemberFunction(), resulting in somewhat confusing and inelegant client code.
Update: Thank you for your reactions. Regrettably, it indeed seems like I have answered my own question. I guess I'll have to learn to live with it...
You have answered your own question. Static initialization order is undefined, and the most elegant way around it (while still doing static initialization i.e. not refactoring it away completely) is to wrap the initialization in a function.
Read the C++ FAQ items starting from https://isocpp.org/wiki/faq/ctors#static-init-order
Maybe you should reconsider whether you need so many global static variables. While they can sometimes be useful, often it's much simpler to refactor them to a smaller local scope, especially if you find that some static variables depend on others.
But you're right, there's no way to ensure a particular order of initialization, and so if your heart is set on it, keeping the initialization in a function, like you mentioned, is probably the simplest way.
Most compilers (linkers) actually do support a (non-portable) way of specifying the order. For example, with visual studio you can use the init_seg pragma to arrange the initialization into several different groups. AFAIK there is no way to guarantee order WITHIN each group. Since this is non-portable you may want to consider if you can fix your design to not require it, but the option is out there.
Indeed, this does work. Regrettably, you have to write globalObject().MemberFunction(), instead of globalObject.MemberFunction(), resulting in somewhat confusing and inelegant client code.
But the most important thing is that it works, and that it is failure proof, ie. it is not easy to bypass the correct usage.
Program correctness should be your first priority. Also, IMHO, the () above is purely stylistic - ie. completely unimportant.
Depending on your platform, be careful of too much dynamic initialization. There is a relatively small amount of clean up that can take place for dynamic initializers (see here). You can solve this problem using a global object container that contains members different global objects. You therefore have:
Globals & getGlobals ()
{
static Globals cache;
return cache;
}
There is only one call to ~Globals() in order to clean up for all global objects in your program. In order to access a global you still have something like:
getGlobals().configuration.memberFunction ();
If you really wanted you could wrap this in a macro to save a tiny bit of typing using a macro:
#define GLOBAL(X) getGlobals().#X
GLOBAL(object).memberFunction ();
Although, this is just syntactic sugar on your initial solution.
dispite the age of this thread, I would like to propose the solution I've found.
As many have pointed out before of me, C++ doesn't provide any mechanism for static initialization ordering. What I propose is to encapsule each static member inside a static method of the class that in turn initialize the member and provide an access in an object-oriented fashion.
Let me give you an example, supposing we want to define the class named "Math" which, among the other members, contains "PI":
class Math {
public:
static const float Pi() {
static const float s_PI = 3.14f;
return s_PI;
}
}
s_PI will be initialized the first time Pi() method is invoked (in GCC). Be aware: the local objects with static storage have an implementation dependent lifecyle, for further detail check 6.7.4 in 2.
Static keyword, C++ Standard
Wrapping the static in a method will fix the order problem, but it isn't thread safe as others have pointed out but you can do this to also make it thread if that is a concern.
// File scope static pointer is thread safe and is initialized first.
static Type * theOneAndOnlyInstance = 0;
Type& globalObject()
{
if(theOneAndOnlyInstance == 0)
{
// Put mutex lock here for thread safety
theOneAndOnlyInstance = new Type();
}
return *theOneAndOnlyInstance;
}

Borrowing values from a member variable?

I've sat here for 10 minutes trying to think of the title. If anyone can, feel free to change it.
Anyways, I have two classes campus and building as well as two arrays in campus itself. However those arrays are what I have to use to set the value of ID in building anyways.
In campus, I have loadFile and searchList
loadFile reads from a file, and sets values to three member variables in building and the two arrays mentioned earlier.
However I need to use searchList to print out those values. (It's an assignment. If I had it my way I'd probably do it differently cause I'm frankly bad at this).
Here's where it all goes downhill. I'm confused with how to proceed. The first thing I tried to do was call the constructor for building in loadFile which is part of campus using building::building(ID, name, description) which is supposed to set those values in the constructor itself.
Then in searchList, I thought I could go cout << building::getID() << building::getName() << building::getDescription(); (all of these are member functions in building) but obviously through a series of errors I realized this isn't the way to go.
So I tried instead of using the scope resolution operator, I should make a variable of type building somewhere. So I made building a inside loadFile and set a = building(ID, name, description) after defining those values.
Of course, this way I can't access a.getID in searchList. So I tried making it a global variable but obviously it wouldn't be that simple.
If you need more context, here's the assignment details.
Here's my class header
int const maxPerBuildingType = 7;
class campus
{
friend class building;
//friend building::building(int, std::string, std::string);
private:
//int academic[maxPerBuildingType]; removed
//int recreation[maxPerBuildingType]; removed
building recreation[maxPerBuildingType];
building academic[maxPerBuildingType];
int numberOfAcademic;
int numberOfRecreation;
std::string temp;
protected:
public:
campus();
static int const maxPerBuildingType;
void loadFile();
void searchList();
};
I don't know if you need the definitions since it's wrong anyways but it's here (updated for edit) if you need it. I'll edit it into this post if necessary.
Anyways guys, thanks.
EDIT: Of course, the answer was because I was stupid. Arrays had to be of type building so i could allocate the required number of buildings. I haven't completed the assignment but for now I believe everything's fine. That'll probably change but eh. Thank you guys!
AT the moment you are not storing your buildings anywhere. You put them into a local variable called a then throw them away.
I suspect the intention of the exercise is to store the buildings into the arrays academic and recreation. You currently have them as int arrays, but possibly they should be arrays of the specific building classes (I assume the exercise also wants you to have different building classes for the types of buildings, probably as a subclass of Building).
Once your campus has its own lists of buildings, at lot of your questions will answer themselves naturally.

A case where named constants are not needed over magic numbers

Obviously the point of using named constants over magic numbers is for code clarity and for not having to go through code changing numbers throughout.
However, what do you do if you just have a number used just once in a function? Say you have a short member function that uses an object's velocity (which we'll say won't change) to calculate its motion, but this is the only function that uses that velocity. Would you...
A) Give the class a named static constant to use
B) Put a named constant in the function
C) Use the magic number but comment it
D) Other...
I am kind of leaning towards using a magic number and commenting it if the number is ONLY BEING USED ONCE, but I'd like to hear others' thoughts.
Edit: Does putting a named constant in a function called many times and assigning to it have performance implications? If it does I guess the best approach would be to put the constant in a namespace or make it a class variable, etc.
Just move it up:
void do_something(void)
{
const float InitialVelocity = 5.0f;
something = InitialVelocity;
// etc.
}
Say you have a short member function
that uses an object's velocity
You said it, the constant has a name:
const type object_velocity = ....;
Magic numbers are my enemies :)
I'd use a function-local named constant, at a minimum. Usually I'd use an anonymous namespace named constant to make the value available throughout the source file, assuming that it might be useful later to other functions.
Use Eclipses refactoring functions to move the constant into a named variable of the method.
Use it as a constant inside the function:
const int x = myMagicNumber; //Now document the magic.

C++ static initialization order

When I use static variables in C++, I often end up wanting to initialize one variable passing another to its constructor. In other words, I want to create static instances that depend on each other.
Within a single .cpp or .h file this is not a problem: the instances will be created in the order they are declared. However, when you want to initialize a static instance with an instance in another compilation unit, the order seems impossible to specify. The result is that, depending on the weather, it can happen that the instance that depends on another is constructed, and only afterwards the other instance is constructed. The result is that the first instance is initialized incorrectly.
Does anyone know how to ensure that static objects are created in the correct order? I have searched a long time for a solution, trying all of them (including the Schwarz Counter solution), but I begin to doubt there is one that really works.
One possibility is the trick with the static function member:
Type& globalObject()
{
static Type theOneAndOnlyInstance;
return theOneAndOnlyInstance;
}
Indeed, this does work. Regrettably, you have to write globalObject().MemberFunction(), instead of globalObject.MemberFunction(), resulting in somewhat confusing and inelegant client code.
Update: Thank you for your reactions. Regrettably, it indeed seems like I have answered my own question. I guess I'll have to learn to live with it...
You have answered your own question. Static initialization order is undefined, and the most elegant way around it (while still doing static initialization i.e. not refactoring it away completely) is to wrap the initialization in a function.
Read the C++ FAQ items starting from https://isocpp.org/wiki/faq/ctors#static-init-order
Maybe you should reconsider whether you need so many global static variables. While they can sometimes be useful, often it's much simpler to refactor them to a smaller local scope, especially if you find that some static variables depend on others.
But you're right, there's no way to ensure a particular order of initialization, and so if your heart is set on it, keeping the initialization in a function, like you mentioned, is probably the simplest way.
Most compilers (linkers) actually do support a (non-portable) way of specifying the order. For example, with visual studio you can use the init_seg pragma to arrange the initialization into several different groups. AFAIK there is no way to guarantee order WITHIN each group. Since this is non-portable you may want to consider if you can fix your design to not require it, but the option is out there.
Indeed, this does work. Regrettably, you have to write globalObject().MemberFunction(), instead of globalObject.MemberFunction(), resulting in somewhat confusing and inelegant client code.
But the most important thing is that it works, and that it is failure proof, ie. it is not easy to bypass the correct usage.
Program correctness should be your first priority. Also, IMHO, the () above is purely stylistic - ie. completely unimportant.
Depending on your platform, be careful of too much dynamic initialization. There is a relatively small amount of clean up that can take place for dynamic initializers (see here). You can solve this problem using a global object container that contains members different global objects. You therefore have:
Globals & getGlobals ()
{
static Globals cache;
return cache;
}
There is only one call to ~Globals() in order to clean up for all global objects in your program. In order to access a global you still have something like:
getGlobals().configuration.memberFunction ();
If you really wanted you could wrap this in a macro to save a tiny bit of typing using a macro:
#define GLOBAL(X) getGlobals().#X
GLOBAL(object).memberFunction ();
Although, this is just syntactic sugar on your initial solution.
dispite the age of this thread, I would like to propose the solution I've found.
As many have pointed out before of me, C++ doesn't provide any mechanism for static initialization ordering. What I propose is to encapsule each static member inside a static method of the class that in turn initialize the member and provide an access in an object-oriented fashion.
Let me give you an example, supposing we want to define the class named "Math" which, among the other members, contains "PI":
class Math {
public:
static const float Pi() {
static const float s_PI = 3.14f;
return s_PI;
}
}
s_PI will be initialized the first time Pi() method is invoked (in GCC). Be aware: the local objects with static storage have an implementation dependent lifecyle, for further detail check 6.7.4 in 2.
Static keyword, C++ Standard
Wrapping the static in a method will fix the order problem, but it isn't thread safe as others have pointed out but you can do this to also make it thread if that is a concern.
// File scope static pointer is thread safe and is initialized first.
static Type * theOneAndOnlyInstance = 0;
Type& globalObject()
{
if(theOneAndOnlyInstance == 0)
{
// Put mutex lock here for thread safety
theOneAndOnlyInstance = new Type();
}
return *theOneAndOnlyInstance;
}

Best way to take a snapshot of an object to a file

What's the best way to output the public contents of an object to a human-readable file? I'm looking for a way to do this that would not require me to know of all the members of the class, but rather use the compiler to tell me what members exist, and what their names are. There have to be macros or something like that, right?
Contrived example:
class Container
{
public:
Container::Container() {/*initialize members*/};
int stuff;
int otherStuff;
};
Container myCollection;
I would like to be able to do something to see output along the lines of "myCollection: stuff = value, otherStuff = value".
But then if another member is added to Container,
class Container
{
public:
Container::Container() {/*initialize members*/};
int stuff;
string evenMoreStuff;
int otherStuff;
};
Container myCollection;
This time, the output of this snapshot would be "myCollection: stuff = value, evenMoreStuff=value, otherStuff = value"
Is there a macro that would help me accomplish this? Is this even possible? (Also, I can't modify the Container class.)
Another note: I'm most interested about a potential macros in VS, but other solutions are welcome too.
What you're looking for is "[reflection](http://en.wikipedia.org/wiki/Reflection_(computer_science)#C.2B.2B)".
I found two promising links with a Google search for "C++ reflection":
http://www.garret.ru/cppreflection/docs/reflect.html
http://seal-reflex.web.cern.ch/seal-reflex/index.html
Boost has a serialization library that can serialize into text files. You will, however, not be able to get around with now knowing what members the class contains. You would need reflection, which C++ does not have.
Take a look at this library .
What you need is object serialization or object marshalling. A recurrent thema in stackoverflow.
I'd highly recommend taking a look at Google's Protocol Buffers.
There's unfortunately no macro that can do this for you. What you're looking for is a reflective type library. These can vary from fairly simple to home-rolled monstrosities that have no place in a work environment.
There's no real simple way of doing this, and though you may be tempted to simply dump the memory at an address like so:
char *buffer = new char[sizeof(Container)];
memcpy(buffer, containerInstance, sizeof(Container));
I'd really suggest against it unless all you have are simple types.
If you want something really simple but not complete, I'd suggest writing your own
printOn(ostream &) member method.
XDR is one way to do this in a platform independent way.