train_test_slplit() function behavior - data-mining

How can the values in X become this in X_train after train_test_split() function? How can I avoid that?

Related

Calling function with variable that is being initialized [duplicate]

This question already has answers here:
Can initializing expression use the variable itself?
(3 answers)
What's the behavior of an uninitialized variable used as its own initializer?
(3 answers)
Self-assignment of variable in its definition
(3 answers)
Closed 6 months ago.
I am learning C++ using the books listed here. In particular, I read that using an uninitialized local variable of built in type is undefined behavior. Now, I came across the below given example where the user seems to be using uninitialized variables x and y. The user says:
#include <string>
template<typename T>
T num_convert(const char* s, const T) {
return static_cast<T>(std::stoll(s));
}
int main() {
//--------------------------v---------------> is this UB because x is uninitialized?
int x = num_convert("55", x);
//-----------------------------------v------> is this UB because y is uninitialized?
long y = num_convert("5555555555", y);
}
In the above example, I've used arrows to highlight the points where I have doubts. In particular, I want to know that are the above two highlighted statements UB because x and y are not initialized and the user is passing those to the function by value?
I want to know that are the above two highlighted statements UB because x and y are not initialized and the user is passing those to the function by value?
Yes, your understanding is correct. The variable x and y are uninitialized local variables and they're being used/passed by value when calling the functions. Thus, the program has undefined behavior as x and y have indeterminate values and you're using/copying those values to the parameter of the function.

Using Bit-wise operators with an Enum and an unsigned char - Results in 0 [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What's the difference between passing by reference vs. passing by value?
I read that in C arguments are passed by value, but what's is the difference between passing arguments by value (like in C) or by refencence (like C++ - C#)?
What's the difference between a pointer and a reference?
void with_ptr(int *i)
{ *i = 0; }
void with_ref(int &i)
{ i = 0; }
In these cases are modified both value? If yes, why C++ allows to pass arguments by reference? I think it is not clear inside the function that the i value could be modified.
what's is the difference between passing arguments by value or by reference
If you pass by value, changes to the variable will be local to the function, since the value is copied when calling the function. Modifications to reference arguments will propagate to the original value.
What's the difference between a pointer and a reference?
The difference is largely syntactic, as you have seen in your code. Furthermore, a pointer can be reassigned to point to something else (unless it’s declared const), while a reference can’t; instead, assigning to a reference is going to assign to the referenced value.
I think it is not clear inside the function that the i value could be modified.
On the contrary, it’s absolutely clear: the function signature tells you so.
There’s actually a case to be made that it’s not clear outside the function. That’s why original versions of C# for instance mandated that you explicitly annotate any by-reference calling with ref (i.e. f(ref x) instead of plain f(x)). This would be similar to calling a function in C++ using f(&x) to make it clear that a pointer is passed.
But in recent versions of C#, the use of ref for calling was made optional since it didn’t confer enough of an advantage after all.
Consider this:
1) Passing by reference provides more simple element access i instead of *i
2) Generally you cannot pass null reference to a method, but can pass a null pointer
3) You can't change the address of reference, but can change it for a pointer(although, as pointer itself passed by value, this change will be discarded upon function exit)
Hope, this helped a bit
Actually, in the first case, you can't modify the argument. The pointer itself is immutable, you can only modify the value it points to.
If yes, why C++ allows to pass arguments by reference?
Because pointers can very easily be miss-used. References should almost always be prefered. For your case, what if you pass a NULL to with_ptr? You'll get undefined behavior, which is not possible if you use with_ref.
I think it is not clear inside the function that the i value could be modified.
It is very clear. If you see a function that takes a parameter by non-const reference, you can assume it will be changed.
I think that a method can only change an argument's value, if this is passed by reference. If you pass a argument by value in a method, then whatever change you make to its value, this will no be available in the parent method.
As far as I know, I think the reference is safer to use in a sense that it can't be modified (always points to the same thing), and should be initialized if it's a local variable. Pointer, however, can be change to point to somewhere else.
int x = 10;
int &y = x;
int *p = &x;
p++; //Legal if you know what's next
y++; // Increases the value of x. Now x = y = 11;
As my two cents, I think reference variables are mere alternative names for the same memory address by which it was initialized. This also explains pretty nice:
http://www.dgp.toronto.edu/~patrick/csc418/wi2004/notes/PointersVsRef.pdf

C++ functions and scope of variables

#include<iostream>
using namespace std;
void fun(int a) {
int x;
cout << x << endl;
x = a;
}
int main() {
fun(12);
fun(1);
return 0;
}
The output of this code is as follows:
178293 //garbage value
12
why are we getting 12 and not garbage value instead??
why are we getting 12 and not garbage value instead??
In theory, the value of x could be anything. However, what's happening in practice is that two calls to fun one after the other is responsible for the previous value of x to be still on the stack frame.
Let's say the stack frame is structured as below:
arguments
return value
local variables
In your case,
Memory used for arguments is equal to sizeof(int).
The compiler may omit using any memory for the return value since the return type is void.
Memory used for local variables is equal to sizeof(int).
When the function call is made the first time, the value in the arguments part is set to 12. The value in the local variables is garbage, as you noticed. However, before you return from the function, you set the value of the local variable to 12.
The second time the function is made, the the value of the argument is set to 1. The value in the local variable is still left over from the previous call. Hence, it is still 12. If you call the function a third time, you are likely to see the value 1 in the local variable.
Anyway, that's one plausible explanation. Once again, remember that this is undefined behavior. Don't count on any specific behavior. The compiler could decide to scrub the stack frame before using it. The compiler could decide to scrub the stack frame immediately after it is used. The compiler could do whatever it wants to do with the stack frame after it is used. If there is another call between the calls to fun, you will most likely get completely different values.
You have not initialized the value x when printing it. Reading from uninitialized memory is UB, i.e., there are no guarantees as to what will happen. It could output a random number, or invoke an unlikely combination of bits that will do something unexpected.
Reading an uninitialized integer is UNDEFINED BEHAVIOR, that means that it can do literally anything, can print anything. It can format your hard drive or collapse the observable universe! Basically I dont know compiler implementations that do so but theoretically they can!

What's the difference between passing argument by value and by reference? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What's the difference between passing by reference vs. passing by value?
I read that in C arguments are passed by value, but what's is the difference between passing arguments by value (like in C) or by refencence (like C++ - C#)?
What's the difference between a pointer and a reference?
void with_ptr(int *i)
{ *i = 0; }
void with_ref(int &i)
{ i = 0; }
In these cases are modified both value? If yes, why C++ allows to pass arguments by reference? I think it is not clear inside the function that the i value could be modified.
what's is the difference between passing arguments by value or by reference
If you pass by value, changes to the variable will be local to the function, since the value is copied when calling the function. Modifications to reference arguments will propagate to the original value.
What's the difference between a pointer and a reference?
The difference is largely syntactic, as you have seen in your code. Furthermore, a pointer can be reassigned to point to something else (unless it’s declared const), while a reference can’t; instead, assigning to a reference is going to assign to the referenced value.
I think it is not clear inside the function that the i value could be modified.
On the contrary, it’s absolutely clear: the function signature tells you so.
There’s actually a case to be made that it’s not clear outside the function. That’s why original versions of C# for instance mandated that you explicitly annotate any by-reference calling with ref (i.e. f(ref x) instead of plain f(x)). This would be similar to calling a function in C++ using f(&x) to make it clear that a pointer is passed.
But in recent versions of C#, the use of ref for calling was made optional since it didn’t confer enough of an advantage after all.
Consider this:
1) Passing by reference provides more simple element access i instead of *i
2) Generally you cannot pass null reference to a method, but can pass a null pointer
3) You can't change the address of reference, but can change it for a pointer(although, as pointer itself passed by value, this change will be discarded upon function exit)
Hope, this helped a bit
Actually, in the first case, you can't modify the argument. The pointer itself is immutable, you can only modify the value it points to.
If yes, why C++ allows to pass arguments by reference?
Because pointers can very easily be miss-used. References should almost always be prefered. For your case, what if you pass a NULL to with_ptr? You'll get undefined behavior, which is not possible if you use with_ref.
I think it is not clear inside the function that the i value could be modified.
It is very clear. If you see a function that takes a parameter by non-const reference, you can assume it will be changed.
I think that a method can only change an argument's value, if this is passed by reference. If you pass a argument by value in a method, then whatever change you make to its value, this will no be available in the parent method.
As far as I know, I think the reference is safer to use in a sense that it can't be modified (always points to the same thing), and should be initialized if it's a local variable. Pointer, however, can be change to point to somewhere else.
int x = 10;
int &y = x;
int *p = &x;
p++; //Legal if you know what's next
y++; // Increases the value of x. Now x = y = 11;
As my two cents, I think reference variables are mere alternative names for the same memory address by which it was initialized. This also explains pretty nice:
http://www.dgp.toronto.edu/~patrick/csc418/wi2004/notes/PointersVsRef.pdf

General question: What to pass as pointer in C/C++?

Hey there,
I wonder if it's worth passing primitive single values like int, float, double or char by pointer? Probably it's not worth!? But if you would simply pass everything by pointer, is this making the program slower?
Should you always just pass arrays as pointer?
Thanks!
I wonder if it's worth passing primitive single values like int, float, double or char by pointer?
What are you trying to accomplish? Do you want to be able to write to the passed in value? Or do you just need to use it? If you want to write to it, the idiomatic way is to pass by reference. If you don't need to write to it, you're best avoiding any risk that you'll write to it accidentally and pass by value. Pass by value will make a copy of the variable for local use. (as an aside, if you don't want to make a copy AND want some level of safety, you can pass by const reference)
But if you would simply pass everything by pointer, is this making the program slower?
Difficult to say. Depends on a lot of things. In both pass by value and pass by reference (or pointer) your making a new primitive type. In pass by value, you're making a copy. In pass by reference/pointer you're passing an address to the original. In the latter case, however, you're requiring an extra fetch of memory that may or may not be cached. Its very difficult to say 100% without measuring it.
That all being said, I doubt the difference is even noticeable. The compiler may be able to optimize out the copy in many pass-by-value cases, as indicated in this article. (thanks Space C0wb0y).
Should you always just pass arrays as pointer?
From this.
In C++ it is not possible to pass a complete block of memory by value as a parameter to a function, but we are allowed to pass its address.
To pass an array:
int foo(int bar[], unsigned int length)
{
// do stuff with bar but don't go past length
}
I'd recommended avoiding arrays and using std::vector which has more easily understood copy semantics.
It's probably not worth passing primitive values by pointer if your concern is speed -- you then have the overhead of the "indirection" to access the value.
However, pointers often are the "width of the bus", meaning the processor can send the whole value at once, and not "shift" values to send-down-the-bus. So, it is possible pointers are transferred on the bus faster than smaller types (like char). That's why the old Cray computers used to make their char values 32 bits (the width of the bus at that time).
When dealing with large objects (such as classes or arrays) passing pointer is faster than copying the whole object onto the stack. This applies to OOP for example
Look in your favorite C++ textbook for a discussion of "output parameters".
Some advantages of using a pointer for output parameters instead of a reference are:
No surprising behavior, no action at a distance, the semantics are clear at the call site as well as the caller.
Compatibility with C (which your question title suggests is important)
Usable by other languages, functions exported from a shared library or DLL should not use C++-only features such as references.
You should rarely have to pass anything by pointer. If you need to modify the value of the parameter, or want to prevent a copy, pass by reference, otherwise pass by value.
Note that preventing a copy can also be done by copy-elision, so you have to be very careful not to fall into the trap of premature optimization. This can actually make your code slower.
There's is no real answer to your question except few rules that I tend to bare in mind:
char is 8 bytes and a pointer is 4 bytes so never pass a single char as a pointer.
after things like int and float are the same size as a pointer but a pointer has to be referenced so that technically takes more time
if we go to the pentium i386 assembler:
loading the value in a register of a parameter "a" in C which is an int:
movl 8(%ebp),%eax
the same thing but passed as a pointer:
movl 8(%ebp),%eax
movl (%eax),%eax
Having to dereference the pointer takes another memory operation so theorically (not sure it is in real life) passing pointers is longer...
After there's the memory issue. If you want to code effectively everything composed type (class,structure,arrays...) has to be passed by pointer.
Just imagine doing a recursive function with a type of 16bytes that is passed by copy for 1000 calls that makes 16000 bytes in the stack (you don't really want that do you ? :) )
So to make it short and clear: Look at the size of your type if it's bigger than a pointer pass it by pointer else pass it by copy...
Pass primitive types by value and objects as const references. Avoid pointers as much as you can. Dereferencing pointers have some overhead and it clutters code. Compare the two versions of the factorial function below:
// which version of factorial is shorter and easy to use?
int factorial_1 (int* number)
{
if ((*number) <= 1)
return 1;
int tmp = (*number) - 1;
return (*number) * factorial_1 (&tmp);
}
// Usage:
int r = 10;
factorial_1 (&r); // => 3628800
int factorial_2 (int number)
{
return (number <= 1) ? 1 : (number * factorial_2 (number - 1));
}
// Usage:
// No need for the temporary variable to hold the argument.
factorial_1 (10); // => 3628800
Debugging becomes hard, as you cannot say when and where the value of an object could change:
int a = 10;
// f cound modify a, you cannot guarantee g that a is still 10.
f (&a);
g (&a);
Prefer the vector class over arrays. It can grow and shrink as needed and keeps track of its size. The way vector elements are accessed is compatible with arrays:
int add_all (const std::vector<int>& vec)
{
size_t sz = vec.size ();
int sum = 0;
for (size_t i = 0; i < sz; ++i)
sum += vec[i];
}
NO, the only time you'd pass a non-const reference is if the function requires an output parameter.