I am trying to read the extended boot record. The location of the extended boot record is i suppose at the first sector from absolute start offset.
First i read the MBR (Master Boot Record) to get the partition info i get the extended partition start offset,partition number and total length of partition.
I want to know how to figure out which partition is the first logical drive for that extended partition and in case of the partition being extended partition is there any metadeta stored after the MBR as well??
Is there a way i can get this information ??
Thanks
Is there a way
Related
Looking for a way to process ~ 4Gb file which is a result of Athena query and I am trying to know:
Is there some way to split Athena's query result file into small pieces? As I understand - it is not possible from Athena side. Also, looks like it is not possible to split it with Lambda - this file too large and looks like s3.open(input_file, 'r') does not work in Lambda :(
Is there some other AWS services that can solve this issue? I want to split this CSV file to small (about 3 - 4 Mb) to send them to external source (POST requests)
You can use the option to CTAS with Athena and use the built-in partition capabilities.
A common way to use Athena is to ETL raw data into a more optimized and enriched format. You can turn every SELECT query that you run into a CREATE TABLE ... AS SELECT (CTAS) statement that will transform the original data into a new set of files in S3 based on your desired transformation logic and output format.
It is usually advised to have the newly created table in a compressed format such as Parquet, however, you can also define it to be CSV ('TEXTFILE').
Lastly, it is advised to partition a large table into meaningful partitions to reduce the cost to query the data, especially in Athena that is charged by data scanned. The meaningful partitioning is based on your use case and the way that you want to split your data. The most common way is using time partitions, such as yearly, monthly, weekly, or daily. Use the logic that you would like to split your files as the partition key of the newly created table.
CREATE TABLE random_table_name
WITH (
format = 'TEXTFILE',
external_location = 's3://bucket/folder/',
partitioned_by = ARRAY['year','month'])
AS SELECT ...
When you go to s3://bucket/folder/ you will have a long list of folders and files based on the selected partition.
Note that you might have different sizes of files based on the amount of data in each partition. If this is a problem or you don't have any meaningful partition logic, you can add a random column to the data and partition with it:
substr(to_base64(sha256(some_column_in_your_data)), 1, 1) as partition_char
Or you can use bucketing and provide how many buckets you want:
WITH (
format = 'TEXTFILE',
external_location = 's3://bucket/folder/',
bucketed_by = ARRAY['column_with_high_cardinality'],
bucket_count = 100
)
You won't be able to do this with Lambda as your memory is maxed out around 3GB and your file system storage is maxed out at 512 MB.
Have you tried just running the split command on the filesystem (if you are using a Unix based OS)?
If this job is reoccurring and needs to be automated and you wanted to still be "serverless", you could create a Docker image that contains a script to perform this task and then run it via a Fargate task.
As for the specific of how to use split, this other stack overflow question may help:
How to split CSV files as per number of rows specified?
You can ask S3 for a range of the file with the Range option. This is a byte range (inclusive), for example bytes=0-1000 to get the first 1000 bytes.
If you want to process the whole file in the same Lambda invocation you can request a range that is about what you think you can fit in memory, process it, and then request the next. Request the next chunk when you see the last line break, and prepend the partial line to the next chunk. As long as you make sure that the previous chunk gets garbage collected and you don't aggregate a huge data structure you should be fine.
You can also run multiple invocations in parallel, each processing its own chunk. You could have one invocation check the file size and then invoke the processing function as many times as necessary to ensure each gets a chunk it can handle.
Just splitting the file into equal parts won't work, though, you have no way of knowing where lines end, so a chunk may split a line in half. If you know the maximum byte size of a line you can pad each chunk with that amount (both at the beginning and end). When you read a chunk you skip ahead until you see the last line break in the start padding, and you skip everything after the first line break inside the end padding – with special handling of the first and last chunk, obviously.
I am trying to load the database tables into VoltDB database using csvloader utility of VoltDB. When I am trying to load one table of size 5GB, Voltdb eats the RAM so fast that free RAM become 200 MB from 55 GB, then the VoltDB process gets killed by the system.
What can be the reason for this and what are the recommended setting for VoltDB to avoid this?
Is the table you are loading partitioned? That's the first thing to check, because if you have the default sitesperhost=8 on a single server, and the table is not partitioned, there will be a complete copy of the table in each of the 8 partitions. If the table is partitioned, the data is distributed among the partitions based on the hashing assignment of the values of the partitioning key column.
If it's partitioned and you still can't load all of the data, the next thing to look at would be the schema. There are formulas in the Planning Guide that describe the memory usage for given datatypes and for indexes. The VMC interface also has a sizing worksheet that gives you the mins and maxes based on the schema. You could also post the definition of the table you are trying to load, along with any indexes you have defined on it, and we can explain more about the bytes it would use per row.
I need to get some information (model and serial) of the disk that contains the system volume (usually C:). I'm using this query:
SELECT * FROM Win32_DiskDrive WHERE Index=0
My question is, is the disk with Index=0 always the disk containing the system volume?
Edit: I added an additional query to get the index of the disk containing the boot partition:
SELECT * FROM Win32_DiskPartition WHERE BootPartition=True
Then the original query changes to
SELECT * FROM Win32_DiskDrive WHERE Index={diskIndex}
I figured I'd be pretty safe this way. Suggestions for better solutions are always welcome :)
As stated, add an extra query to get the index of the disk containing the boot partition:
{diskIndex} = SELECT * FROM Win32_DiskPartition WHERE BootPartition=True
SELECT * FROM Win32_DiskDrive WHERE Index={diskIndex}
Unfortunatly WMI doesn't seem to support JOINs, which would have made the query a little more efficient.
My question is, is the disk with Index=0 always the disk containing the system volume?
In my case the answer is No. My system disk has index 1.
Also your assumption that the system disk is always bootable is incorrect.
$ wmic os get "SystemDrive"
SystemDrive
C:
$ wmic logicaldisk where 'DeviceID="C:"' assoc /resultclass:Win32_DiskPartition
...\\DZEN\ROOT\CIMV2:Win32_DiskPartition.DeviceID="Disk #1, Partition #0"...
wmic diskdrive where 'Index=1' get "Caption"
Caption
OCZ-VERTEX4 // Yes, this is my system disk.
Also your assumpion about BootPartition usage is incorrect for cases when bootmanager is on another disk, like in my case:
wmic partition where 'DeviceID like "Disk_#1%"' get DeviceID,BootPartition
BootPartition DeviceID
FALSE Disk #1, Partition #0
wmic partition where 'BootPartition="TRUE"' get DeviceID,BootPartition
BootPartition DeviceID
TRUE Disk #4, Partition #0
TRUE Disk #3, Partition #0
As you can see, nor the system disk neither one of bootable ones do not have Index=0 for my case. Actually I have Index=0 for the one of non system and non bootable disk.
My goal is to associate a pair of "drive and partition number" with a logical drive letter for that volume. For instance, in this configuration:
for a spanned volume F: I would assume to get:
Volume F:
PhysicalDrive1-Partition1
PhysicalDrive2-Partition1
So to obtain physical drive numbers that the volume spans I use IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS control code for the DeviceIoControl API as described here (they don't allocate memory correctly for DeviceIoControl API call, but that's outside of the scope of this question.)
So in my example for volume F: I get two DISK_EXTENT structures that I can use to get physical drive numbers from, using DiskNumber member.
My question is how do I get corresponding partition numbers?
PS. The reason I need those partition numbers is so that I can associate volume drive letters with disk partitions in a later call to IOCTL_DISK_GET_DRIVE_LAYOUT_EX using the drive handle that was opened as "\\?\PhysicalDriveX" where X stands for drive number.
I'm working on a project which requires me to operate at a low level on Windows drives, and am doing so primarily using Windows API calls. But before I can operate on the drive, I need to know the types of partitions present on it.
This is fairly simple on a disk formatted by MBR, because
DeviceIoControl(...,IOCTL_DISK_GET_DRIVE_LAYOUT_EX,...);
returns a structure in format DRIVE_LAYOUT_INFORMATION_EX, which contains an array of PARTITION_INFORMATION_EX. On an MBR disk, the PARTITION_INFORMATION_EX.Mbr.PartitionType element contains a unique identifier for the partition type, e.g. for NTFS it is 0x07, for Extended it is 0x05.
However, this isn't so simple on a GPT disk. I know that I can read the identifier off of the beginning of the partition, but I'd prefer to handle this with API calls, such as DeviceIoControl. When I run DeviceIoControl on a GPT disk, the PARTITION_INFORMATION_EX.Mbr.PartitionType contains completely different values than those which would be normally there.
Note that the GUID is useless to me because that only tells me the purpose of the partition, not what type of partition it is. I'm trying to figure out if the drive is NTFS, FAT, etc.
For GPT partition in your code when you call DeviceIoControl(), this call will return the Partition information in the object of PARTITION_INFORMATION_EX. If you see the PARTITION_INFORMATION_EX structure, there are two separate structure for MBR and GPT disk. So when you get the information in PARTITION_INFORMATION_EX object, you'll have to first confirm that whether the disk type is GPT or MBR, if GPT you can get the specific partition type by comparing it's GUID.
Look at Microsoft's PARTITION_INFORMATION_GPT struct for GPT partitions.
Instead of going through PARTITION_INFORMATION_EX, I found the best way to find the filesystem of a volume is to call GetVolumeInformation. On Vista+, this seems to be just a wrapper for GetVolumeInformationByHandleW. The later might be the best for you if you already have a volume handle.
Both work well with either MBR or GPT disks. The result is the filesystem name string instead of a type ID, but should be easy to adapt.