HDFS block size and replication - hdfs

I have a file called employee.txt.The size of that file is 66 MB.
We know that default HDFS block size is 64 MB.
So, There is going to be two blocks for employee.txt as the size of it is greater than 64 MB
employee.txt = block A + block B
So, There are two blocks.
For block A it is 64 MB,So there is no problem at all ..
For block B it is 2 MB, So what will happen to the reamining 62 MB of block B.Does it kept as empty?
I would like to know what happens to that unoccupied space for block B.

HDFS blocks are logical blocks that sits on Physical file system. So when your second block uses just 8 MB, it only consumes that much space from the underlying file system, rest of the disk space will be left free.

Related

Is the maximum size of an item in a type 19 file configurable?

A WRITEBLK command fails when the item reaches 2GB in size (item is truncated to 2147483647 bytes).
Using cat I was able to create an item larger than 2GB in the same directory, but opening it in UV gave a corrupt (negative) value for STATUS<4> (Number of bytes available to read).
uv 11.1.4
64bit Linux on a VM
64BIT_FILES = 1
You can make the universe files 32 or 64 bit (regardless of the OS). So you can do a FILEINFO call to see if the file is actually 64bit (even if the account is 64bit).
My guess is that there is an File system limitation on the file size. in the Rocket UniVerse documentation (page 927) it says:
If the device runs out of disk
space, WRITEBLK takes the ELSE clause and returns –4 to the STATUS
function.
Generally only 32 bit systems would be the hard limit on 2 GB, but maybe there is some kind of 32 bit process running in our 64 bit virtual machine that is producing the same effect. See here for a few leads: https://unix.stackexchange.com/questions/274380/file-size-limit

how HDFS will divide the file in block

i know that hdfs divide file in 128 mb block
now suppose a file size of 1034mb is been divided in 128*8 blocks.what will happen 10 mb will it be allocated the whole 128 mb or just 10mb block.
if yes. then rest 118mb gets wasted.
how this issue can be resolved.
I have just started learning hadoop.
please pardon me if my question is naive.
Thanks

what happens to HDFS block size in case of smaller file size

i read but still confused
what happens if a file size is less than block size.
if file size is 1MB will it consume 64MB or only 1MB
It will consume only 1 MB. That remaining can be used to store some other files block.
Ex: Consider your HDFS Data node total size is 128MB and block size is 64MB.
Then HDFS can store 2, 64MB blocks
or 128, 1MB blocks
or any number of block that can consume 128MB of Data node.

In Hadoop HDFS, how many data nodes a 1GB file uses to be stored?

I have a file of 1GB size to be stored on HDFS file system. I am having a cluster setup of 10 data nodes and a namenode. Is there any calculation that the Namenode uses (not for replicas) a particular no of data nodes for the storage of the file? Or Is there any parameter that we can configure to use for a file storage? If so, what is the default no of datanodes that Hadoop uses to store the file if it is not specifically configured?
I want to know if it uses all the datanodes of the cluster or only specific no of datanodes.
Let's consider the HDFS block size is 64MB and free space is also existing on all the datanodes.
Thanks in advance.
If the configured block size is 64 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/64 = 16 blocks, which means 1 Datanode will consume 16 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your 1 GB file is -> *16 * 3 =48 blocks*.
If your one block is of 64 MB, then total size your 1 GB file consumed is ->
*64 * 48 = 3072 MB*.
Hope that clears your doubt.
In Second(2nd) Generation of Hadoop
If the configured block size is 128 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/128 = 8 blocks, which means 1 Datanode will contain 8 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your
1 GB file is -> *8 * 3 =24 blocks*.
If your one block is of 128 MB, then total size your 1 GB file consumed is -
*128 * 24 = 3072 MB*.

Inconsistencies in values from mallinfo and ps

I am trying to identify a huge memory growth in a linux application which runs around 20-25 threads. From one of those threads I dump the memory stats using the system call mallinfo . It shows the total allocated space as 1005025904 (uordblks). However, the top command shows a value of 8GB as total memory and 7GB as resident memory. Can some one explain this inconsistency?
Following is the full stat from mallinfo:
Total non-mmapped bytes (arena): 1005035520
# of free chunks (ordblks): 2
# of free fastbin blocks (smblks): 0
# of mapped regions (hblks): 43
Bytes in mapped regions (hblkhd): 15769600
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 1005025904
Total free space (fordblks): 9616
Topmost releasable block (keepcost): 9584
The reason is mallinfo gives the stats of the main arena. To find details of all arena's you have to use the system call malloc_stats.