Way to get SCSI disk names in Linux C++ application

Way to get SCSI disk names in Linux C++ application - c++

In my Linux C++ application I want to get names of all SCSI disks which are present on the
system. e.g. /dev/sda, /dev/sdb, ... and so on.
Currently I am getting it from the file /proc/scsi/sg/devices output using below code:
host chan SCSI id lun type opens qdepth busy online
0 0 0 0 0 1 128 0 1
1 0 0 0 0 1 128 0 1
1 0 0 1 0 1 128 0 1
1 0 0 2 0 1 128 0 1
// If SCSI device Id is > 26 then the corresponding device name is like /dev/sdaa or /dev/sdab etc.
if (MAX_ENG_ALPHABETS <= scsiId)
{
// Device name order is: aa, ab, ..., az, ba, bb, ..., bz, ..., zy, zz.
deviceName.append(1, 'a'+ (char)(index / MAX_ENG_ALPHABETS) - 1);
deviceName.append(1, 'a'+ (char)(index % MAX_ENG_ALPHABETS));
}
// If SCSI device Id is < 26 then the corresponding device name is liek /dev/sda or /dev/sdb etc.
else
{
deviceName.append(1, 'a'+ index);
}
But the file /proc/scsi/sg/devices also contains the information about the disk which were previously present on the system. e.g If I detach the disk (LUN) /dev/sdc from the system
the file /proc/scsi/sg/devices still contains info of /dev/sdc which is invalid.
Tell me is there any different way to get the SCSI disk names? like a system call?
Thanks

You can simply read list of all files like /dev/sd* (in C, you would need to use opendir/readdir/closedir) and filter it by sdX (where X is one or two letters).
Also, you can get list of all partitions by reading single file /proc/partitions, and then filter 4th field by sdX:
$ cat /proc/partitions
major minor #blocks name
8 0 52428799 sda
8 1 265041 sda1
8 2 1 sda2
8 5 2096451 sda5
8 6 50066541 sda6
which would give you list of all physical disks together with their capacity (3rd field).

After get disk name list from /proc/scsi/sg/devices, you can verify the existence through code. For example, install sg3-utils, and use sg_inq to query whether the disk is active.

Related

Merge rows with unique ID in stata

I have a dataset where I need unique county FIPS codes that need to be merged. The dataset looks like:
FIPS yr1990 yr2000 yr2010
1001 1 0 1
1002 1 1 0
1003 1 0 0
1004 0 0 0
1005 0 0 1
County boundaries have changed and I need to merge several FIPS codes together. Essentially, I need the dataset to look like:
FIPS yr1990 yr2000 yr2010
1001/1003 1 1 1
1002 1 1 0
1004/1005 0 0 1
Is there a way to select specific FIPS to be merged over rows?

This solution might not scale to very large datasets as writing the replace statements must be done manually. But it keeps the exact format you are using in your example. And a more scalable way might be difficult if there is no system in how the FIPS codes were combined.
* Example generated by -dataex-. For more info, type help dataex
clear
input str4 FIPS byte(yr1990 yr2000 yr2010)
"1001" 1 0 1
"1002" 1 1 0
"1003" 1 0 0
"1004" 0 0 0
"1005" 0 0 1
end
*Combine the FIPS codes
replace FIPS = "1001/1003" if inlist(FIPS,"1001","1003")
replace FIPS = "1004/1005" if inlist(FIPS,"1004","1005")
*Collapse rows by FIPS value, use max value for each var on format yr????
collapse (max) yr???? , by(FIPS)

Amazon QuickSight - Working out size of network

I have a database table with a record for each IOT device connected, each device has a unique device id and a unique network id associated with it.
For example:
device_id
network_id
1
1
2
1
3
1
4
2
5
2
6
3
7
3
8
3
9
3
10
4
I would like to be able visualise the size of each network based on its id. So I would have an output like such based on the above data:
network_id
size
1
3
2
2
3
4
4
1
I'm not currently sure how to do this

I found that using the countOver function worked for this
I made a calculated field called NetworkSize which was defined as:
countOver
(
{device_id}
,[{network_id}]
)
Which gives the right output I was looking for
However I have to include device_id in the visual which is a bit inconvenient

Python Regex for beginning and ending word

I'm trying to write a regex that will find "Vlan20" and the word "up" after line protocol is in the first line. I wrote a regex below that will give me the group that the word "up" and "vlan20" are located in but is this the best way to achieve this? The regex just seems very long. The will use those values in a conditional statememt.
((^Vlan20)(\s\w+)(\s\w+),(\s\w+)(\s\w+)(\s\w+)(\s\w+))
Sample text:
Vlan20 is up, line protocol is up
Hardware is EtherSVI, address is 588d.0939.ffb4 (bia 588d.0939.ffb4)
Description: MATS Network
Internet address is 10.88.5.49/28
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not supported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:04, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
1992293 packets input, 187299894 bytes, 0 no buffer
Received 22809 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
2115535 packets output, 813500880 bytes, 0 underruns
0 output errors, 1 interface resets
89225 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out

Scikit-learn labelencoder: how to preserve mappings between batches?

I have 185 million samples that will be about 3.8 MB per sample. To prepare my dataset, I will need to one-hot encode many of the features after which I end up with over 15,000 features.
But I need to prepare the dataset in batches since the memory footprint exceeds 100 GB for just the features alone when one hot encoding using only 3 million samples.
The question is how to preserve the encodings/mappings/labels between batches?
The batches are not going to have all the levels of a category necessarily. That is, batch #1 may have: Paris, Tokyo, Rome.
Batch #2 may have Paris, London.
But in the end I need to have Paris, Tokyo, Rome, London all mapped to one encoding all at once.
Assuming that I can not determine the levels of my Cities column of 185 million all at once since it won't fit in RAM, what should I do?
If I apply the same Labelencoder instance to different batches will the mappings remain the same?
I also will need to use one hot encoding either with scikitlearn or Keras' np_utilities_to_categorical in batches as well after this. So same question: how to basically use those three methods in batches or apply them at once to a file format stored on disk?

I suggest using Pandas' get_dummies() for this, since sklearn's OneHotEncoder() needs to see all possible categorical values when .fit(), otherwise it will throw an error when it encounters a new one during .transform().
# Create toy dataset and split to batches
data_column = pd.Series(['Paris', 'Tokyo', 'Rome', 'London', 'Chicago', 'Paris'])
batch_1 = data_column[:3]
batch_2 = data_column[3:]
# Convert categorical feature column to matrix of dummy variables
batch_1_encoded = pd.get_dummies(batch_1, prefix='City')
batch_2_encoded = pd.get_dummies(batch_2, prefix='City')
# Row-bind (append) Encoded Data Back Together
final_encoded = pd.concat([batch_1_encoded, batch_2_encoded], axis=0)
# Final wrap-up. Replace nans with 0, and convert flags from float to int
final_encoded = final_encoded.fillna(0)
final_encoded[final_encoded.columns] = final_encoded[final_encoded.columns].astype(int)
final_encoded
output
City_Chicago City_London City_Paris City_Rome City_Tokyo
0 0 0 1 0 0
1 0 0 0 0 1
2 0 0 0 1 0
3 0 1 0 0 0
4 1 0 0 0 0
5 0 0 1 0 0

Tcl: match string only if it is followed by a number [regex]

Given that I have the following output :
Loopback1 is up, line protocol is up
Hardware is Loopback
Description: ** NA4-ISIS-MGMT-LOOPBACK1_MPLS **
Internet address is 84.116.226.27/32
MTU 1514 bytes, BW 8000000 Kbit, DLY 5000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation LOOPBACK, loopback not set
Keepalive set (10 sec)
Last input 12w3d, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
6 packets output, 456 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
How can I match "Loopback1" and not "Loopback" ?
In other words, how can I match the interface name only if there is a number next to it, in Tcl ?

use lookahead
Loopback(?=\d+)
It matches only Loopback in Loopback followed by any number of digits. If you want to match loopback and the number, useLoopback\d+

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Way to get SCSI disk names in Linux C++ application - c++

After get disk name list from /proc/scsi/sg/devices, you can verify the existence through code. For example, install sg3-utils, and use sg_inq to query whether the disk is active.

Related

Merge rows with unique ID in stata

Amazon QuickSight - Working out size of network

Python Regex for beginning and ending word

Scikit-learn labelencoder: how to preserve mappings between batches?

Tcl: match string only if it is followed by a number [regex]

Categories

Resources