How to divide DPDK-ACL into multi-trie structure? - dpdk

I'm testing DPDK-ACL library performance with DPDK-20.08. I've added up to thousands of acl rules into acl context and there is still a single trie. Under what circumstance can I split it into multi-trie?
PS:
If I set the max_size in rte_acl_config to a number which is smaller than the required, it just failed after several attempts.
ACL: Gen phase for ACL ctx "ipv4_acl_0" exceeds max_size limit, bytes required: 38044176, allowed: 2097152
ACL: Build phase for ACL "ipv4_acl_0":
node limit for tree split: 16384
nodes created: 77950
memory consumed: 117440610
ACL: trie 0: number of rules: 4000, indexes: 4
ACL: Gen phase for ACL ctx "ipv4_acl_0" exceeds max_size limit, bytes required: 38044176, allowed: 2097152
ACL: Build phase for ACL "ipv4_acl_0":
node limit for tree split: 8192
nodes created: 77950
memory consumed: 117440610
ACL: trie 0: number of rules: 4000, indexes: 4
ACL: Gen phase for ACL ctx "ipv4_acl_0" exceeds max_size limit, bytes required: 38044176, allowed: 2097152
ACL: Build phase for ACL "ipv4_acl_0":
node limit for tree split: 4096
nodes created: 77950
memory consumed: 117440610
ACL: trie 0: number of rules: 4000, indexes: 4
ACL: Gen phase for ACL ctx "ipv4_acl_0" exceeds max_size limit, bytes required: 38044176, allowed: 2097152
ACL: Build phase for ACL "ipv4_acl_0":
node limit for tree split: 2048
nodes created: 77950
memory consumed: 117440610
ACL: trie 0: number of rules: 4000, indexes: 4
acl context <ipv4_acl_0>#0x10081fec0
socket_id=0
alg=3
max_rules=4000
rule_size=96
num_rules=4000
num_categories=0
num_tries=0

Related

Postgres db index not being used on Heroku

I'm trying to debug a slow query for a model that looks like:
class Employee(TimeStampMixin):
title = models.TextField(blank=True,db_index=True)
seniority = models.CharField(blank=True,max_length=128,db_index=True)
The query is: Employee.objects.exclude(seniority='').filter(title__icontains=title).order_by('seniority').values_list('seniority')
When I run it locally it takes ~0.3 seconds (same database size). An explain locally shows:
Limit (cost=1000.58..196218.23 rows=7 width=1) (actual time=299.016..300.366 rows=1 loops=1)
Output: seniority
Buffers: shared hit=2447163 read=23669
-> Gather Merge (cost=1000.58..196218.23 rows=7 width=1) (actual time=299.015..300.364 rows=1 loops=1)
Output: seniority
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2447163 read=23669
-> Parallel Index Only Scan using companies_e_seniori_12ac68_idx on public.companies_employee (cost=0.56..195217.40 rows=3 width=1) (actual time=293.195..293.200 rows=0 loops=3)
Output: seniority
Filter: (((companies_employee.seniority)::text <> ''::text) AND (upper(companies_employee.title) ~~ '%INFORMATION SPECIALIST%'::text))
Rows Removed by Filter: 2697599
Heap Fetches: 2819
Buffers: shared hit=2447163 read=23669
Worker 0: actual time=291.087..291.088 rows=0 loops=1
Buffers: shared hit=820222 read=7926
Worker 1: actual time=291.056..291.056 rows=0 loops=1
Buffers: shared hit=812538 read=7888
Planning Time: 0.209 ms
Execution Time: 300.400 ms
however when I run the same code on Heroku I get execution times of 3s+, possibly because the former is using an index while the second is not:
Limit (cost=216982.74..216983.39 rows=6 width=1) (actual time=988.738..1018.964 rows=1 loops=1)
Output: seniority
Buffers: shared hit=199527 dirtied=5
-> Gather Merge (cost=216982.74..216983.39 rows=6 width=1) (actual time=980.932..1011.157 rows=1 loops=1)
Output: seniority
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=199527 dirtied=5
-> Sort (cost=215982.74..215982.74 rows=3 width=1) (actual time=959.233..959.234 rows=0 loops=3)
Output: seniority
Sort Key: companies_employee.seniority
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=199527 dirtied=5
Worker 0: actual time=957.414..957.414 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.179 ms, Inlining 0.000 ms, Optimization 0.879 ms, Emission 9.714 ms, Total 11.771 ms
Buffers: shared hit=54855 dirtied=2
Worker 1: actual time=939.591..939.592 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.741 ms, Inlining 0.000 ms, Optimization 0.654 ms, Emission 6.531 ms, Total 7.926 ms
Buffers: shared hit=87867 dirtied=1
-> Parallel Seq Scan on public.companies_employee (cost=0.00..215982.73 rows=3 width=1) (actual time=705.244..959.146 rows=0 loops=3)
Output: seniority
Filter: (((companies_employee.seniority)::text <> ''::text) AND (upper(companies_employee.title) ~~ '%INFORMATION SPECIALIST%'::text))
Rows Removed by Filter: 2939330
Buffers: shared hit=199449 dirtied=5
Worker 0: actual time=957.262..957.262 rows=0 loops=1
Buffers: shared hit=54816 dirtied=2
Worker 1: actual time=939.491..939.491 rows=0 loops=1
Buffers: shared hit=87828 dirtied=1
Query Identifier: 2827140323627869732
Planning:
Buffers: shared hit=293 read=1 dirtied=1
I/O Timings: read=0.021
Planning Time: 1.078 ms
JIT:
Functions: 13
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.746 ms, Inlining 0.000 ms, Optimization 2.224 ms, Emission 23.189 ms, Total 28.160 ms
Execution Time: 1050.493 ms
I confirmed the model indexes are identical for my local database and on Heroku, this is what they are:
indexname | indexdef
----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------
companies_employee_pkey | CREATE UNIQUE INDEX companies_employee_pkey ON public.companies_employee USING btree (id)
companies_employee_company_id_c24081a8 | CREATE INDEX companies_employee_company_id_c24081a8 ON public.companies_employee USING btree (company_id)
companies_employee_person_id_936e5c6a | CREATE INDEX companies_employee_person_id_936e5c6a ON public.companies_employee USING btree (person_id)
companies_employee_role_8772f722 | CREATE INDEX companies_employee_role_8772f722 ON public.companies_employee USING btree (role)
companies_employee_role_8772f722_like | CREATE INDEX companies_employee_role_8772f722_like ON public.companies_employee USING btree (role text_pattern_ops)
companies_employee_seniority_b10393ff | CREATE INDEX companies_employee_seniority_b10393ff ON public.companies_employee USING btree (seniority)
companies_employee_seniority_b10393ff_like | CREATE INDEX companies_employee_seniority_b10393ff_like ON public.companies_employee USING btree (seniority varchar_pattern_ops)
companies_employee_title_78009330 | CREATE INDEX companies_employee_title_78009330 ON public.companies_employee USING btree (title)
companies_employee_title_78009330_like | CREATE INDEX companies_employee_title_78009330_like ON public.companies_employee USING btree (title text_pattern_ops)
companies_employee_institution_75d6c7e9 | CREATE INDEX companies_employee_institution_75d6c7e9 ON public.companies_employee USING btree (institution)
companies_employee_institution_75d6c7e9_like | CREATE INDEX companies_employee_institution_75d6c7e9_like ON public.companies_employee USING btree (institution text_pattern_ops)
companies_e_seniori_12ac68_idx | CREATE INDEX companies_e_seniori_12ac68_idx ON public.companies_employee USING btree (seniority, title)
title_seniority | CREATE INDEX title_seniority ON public.companies_employee USING btree (upper(title), seniority)

How to search and match pattern to get a value in ansible

My variable info has below value. (Actual case has huge data).
I am trying to search for specific word XYZ_data_001 and get the size information, which is after the pattern physical disk,
XYZ_data_001 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607
XYZ_data_002 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 13 0 8388607
here is what is tried
- name: Print size
ansible.builtin.debug:
msg: "{{ info | regex_search('XYZ_data_001(.+)') | split('physical disk,') | last }}"
this will give me below output
ok: [testhost] => {
"msg": " 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607 "
}
Thanks in advance
You can use
{{ info | regex_search('XYZ_data_001\\b.*physical disk,\\s*(\\d[\\d.]*)', '\\1') }}
See the regex demo.
Details:
XYZ_data_001 - a XYZ_data_001 string
\b - a word boundary
.* - any text (any zero or more chars other than line break chars as many as possible)
physical disk, - a literal string
\s* - zero or more whitespaces
(\d[\d.]*) - Group 1 (\1): a digit and then zero or more digits or dots.
There are two filters in the collection Community.General that will help you to create dictionaries from the info.
Split the lines, split and trim the items, and use the filter community.general.dict to create the list of dictionaries
info_dict1: "{{ info.splitlines()|
map('split', ',')|
map('map', 'trim')|
map('zip', ['dev', 'spec', 'dsync', 'dir', 'disk', 'size', 'free'])|
map('map', 'reverse')|
map('community.general.dict') }}"
gives
info_dict1:
- dev: XYZ_data_001 file system device
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 6 0 8388607'
size: 16384.00 MB
spec: special
- dev: XYZ_data_002 file system device
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 13 0 8388607'
size: 16384.00 MB
spec: special
Split the attribute dev and use the filter community.general.dict_kv to create the list of dictionaries with the attribute device
info_dev: "{{ info_dict1|
map(attribute='dev')|
map('split')|
map('first')|
map('community.general.dict_kv', 'device') }}"
gives
info_dev:
- device: XYZ_data_001
- device: XYZ_data_002
Combine the dictionaries
info_dict2: "{{ info_dict1|zip(info_dev)|map('combine') }}"
gives
info_dict2:
- dev: XYZ_data_001 file system device
device: XYZ_data_001
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 6 0 8388607'
size: 16384.00 MB
spec: special
- dev: XYZ_data_002 file system device
device: XYZ_data_002
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 13 0 8388607'
size: 16384.00 MB
spec: special
This way you can add other attributes if needed.
Q: "Search for specific word XYZ_data_001 and get the size."
A: Create a dictionary device_size
device_size: "{{ info_dict2|items2dict(key_name='device', value_name='size') }}"
gives
device_size:
XYZ_data_001: 16384.00 MB
XYZ_data_002: 16384.00 MB
Search the dictionary
- debug:
msg: "Size of XYZ_data_001 is {{ device_size.XYZ_data_001 }}"
gives
msg: Size of XYZ_data_001 is 16384.00 MB
Example of a complete playbook for testing
- hosts: localhost
vars:
info: |
XYZ_data_001 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607
XYZ_data_002 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 13 0 8388607
info_dict1: "{{ info.splitlines()|
map('split', ',')|
map('map', 'trim')|
map('zip', ['dev', 'spec', 'dsync', 'dir', 'disk', 'size', 'free'])|
map('map', 'reverse')|
map('community.general.dict') }}"
info_dev: "{{ info_dict1|
map(attribute='dev')|
map('split')|
map('first')|
map('community.general.dict_kv', 'device') }}"
info_dict2: "{{ info_dict1|zip(info_dev)|map('combine') }}"
device_size: "{{ info_dict2|items2dict(key_name='device', value_name='size') }}"
tasks:
- debug:
var: info_dict1
- debug:
var: info_dev
- debug:
var: info_dict2
- debug:
var: device_size
- debug:
msg: "Size of XYZ_data_001 is {{ device_size.XYZ_data_001 }}"

Unit test for z-score with Prometheus

I have been experimenting a lot with writing unit tests for alerts as per this: https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/#alerts-yml
I have some simple cases out, but now I am tackling rules that are less trivial. For example this:
abs(
avg_over_time(my_metrics{service_name="aService"}[1m])
-
avg_over_time(my_metrics{service_name="aService"}[3m])
)
/ stddev_over_time(my_metrics{service_name="aService"}[3m])
> 3
I have one file with the above rule and then this is in my test:
- interval: 1m
# Series data.
input_series:
- series: 'my_metrics{service_name="aService"}'
values: '0 0 0 0 1 0 0 0 0 '
alert_rule_test:
- eval_time: 3m
alertname: myalert
exp_alerts:
- exp_labels:
severity: warning
service_name: aService
exp_annotations:
summary: "some text"
description: "some other text"
I am not sure what my series should look like in order to test deviation from the mean. Is it even possible to test such rule?
Thank you
EDIT
I can have a succesful test if I set it > 0 as opposed to >3 I have tried to set a series of this sort:
'10+10x2 30+1000x1000'
but I cannot understand what would be the correct setup to have it triggered
This isn't a direct answer, rather a tip from someone who spent quite some time on these tests. Did you know that apart from testing alert expressions, you can unittest PromQL expressions as well? See how it can be useful:
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: test_metric
values: 1 1 1 10 1 1 1
promql_expr_test:
- expr: avg_over_time(test_metric[1m])
eval_time: 4m
exp_samples:
- value: #5.5
- expr: avg_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.25
- expr: stddev_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.897114317029974
I've split your alert expression into three separate, simple parts. If you run this unittest, you will see the commented-out values in the error message. From here it is not difficult to join pieces together and see why the alert is not happening. You can use that to build a working sequence of values.

AWS EC2 spot instance availability

I am using the API call request_spot_instances to create spot instance without specifying any availability zone. Normally a random AZ is picked by the API. The spot request sometimes would return a no capacity status whereas I could request for a spot instance successfully through the AWS console in another AZ. What is the proper way to check the availability of the spot instance of a specific instance type before calling the request_spot_instance?
There is no public API to check Spot Instance availability. Having said that, you can still achieve what you want by following the below steps:
Use request_spot_fleet instead, and configure it to launch a single instance.
Be flexible with the instance types you use, pick as many as you can and include them in the request. To help you pick the instances, check Spot Instance advisor for instance interruption and saving rates.
At the Spot Fleet request, configure AllocationStrategy to capacityOptimized this will allow the fleet to allocate capacity form the most available Spot instance from your instances list and reduce the likelihood of Spot interruptions.
Don't set a max price SpotPrice, the default Spot instance price will be used. The pricing model for Spot has changed and it's no longer based on bidding, therefore Spot prices are more stable and don't fluctuate.
This may be a bit overkill for what you are looking for but with parts of the code you can find the spot price history for the last hour (this can be changed). It'll give you the instance type, AZ, and additional information. From there you can loop through the instance type to by AZ. If a spot instance doesn't come up in say 30 seconds try the next AZ.
And to Ahmed's point in his answer, this information can be used in the spot_fleet_request instead of looping through the AZs. If you pass the wrong AZ or subnet in the spot fleet request, it may pass the dryrun api call, but can still fail the real call. Just a heads up on that if you are using the dryrun parameter.
Here's the output of the code that follows:
In [740]: df_spot_instance_options
Out[740]:
AvailabilityZone InstanceType SpotPrice MemSize vCPUs CurrentGeneration Processor
0 us-east-1d t3.nano 0.002 512 2 True [x86_64]
1 us-east-1b t3.nano 0.002 512 2 True [x86_64]
2 us-east-1a t3.nano 0.002 512 2 True [x86_64]
3 us-east-1c t3.nano 0.002 512 2 True [x86_64]
4 us-east-1d t3a.nano 0.002 512 2 True [x86_64]
.. ... ... ... ... ... ... ...
995 us-east-1a p2.16xlarge 4.320 749568 64 True [x86_64]
996 us-east-1b p2.16xlarge 4.320 749568 64 True [x86_64]
997 us-east-1c p2.16xlarge 4.320 749568 64 True [x86_64]
998 us-east-1d p2.16xlarge 14.400 749568 64 True [x86_64]
999 us-east-1c p3dn.24xlarge 9.540 786432 96 True [x86_64]
[1000 rows x 7 columns]
And here's the code:
ec2c = boto3.client('ec2')
ec2r = boto3.resource('ec2')
#### The rest of this code maps the instance details to spot price in case you are looking for certain memory or cpu
paginator = ec2c.get_paginator('describe_instance_types')
response_iterator = paginator.paginate( )
df_hold_list = []
for page in response_iterator:
df_hold_list.append(pd.DataFrame(page['InstanceTypes']))
df_instance_specs = pd.concat(df_hold_list, axis=0).reset_index(drop=True)
df_instance_specs['Spot'] = df_instance_specs['SupportedUsageClasses'].apply(lambda x: 1 if 'spot' in x else 0)
df_instance_spot_specs = df_instance_specs.loc[df_instance_specs['Spot']==1].reset_index(drop=True)
#unapck memory and cpu dictionaries
df_instance_spot_specs['MemSize'] = df_instance_spot_specs['MemoryInfo'].apply(lambda x: x.get('SizeInMiB'))
df_instance_spot_specs['vCPUs'] = df_instance_spot_specs['VCpuInfo'].apply(lambda x: x.get('DefaultVCpus'))
df_instance_spot_specs['Processor'] = df_instance_spot_specs['ProcessorInfo'].apply(lambda x: x.get('SupportedArchitectures'))
#look at instances only between 30MB and 70MB
instance_list = df_instance_spot_specs['InstanceType'].unique().tolist()
#---------------------------------------------------------------------------------------------------------------------
# You can use this section by itself to get the instancce type and availability zone and loop through the instance you want
# just modify instance_list with one instance you want informatin for
#look only in us-east-1
client = boto3.client('ec2', region_name='us-east-1')
prices = client.describe_spot_price_history(
InstanceTypes=instance_list,
ProductDescriptions=['Linux/UNIX', 'Linux/UNIX (Amazon VPC)'],
StartTime=(datetime.now() -
timedelta(hours=1)).isoformat(),
# AvailabilityZone='us-east-1a'
MaxResults=1000)
df_spot_prices = pd.DataFrame(prices['SpotPriceHistory'])
df_spot_prices['SpotPrice'] = df_spot_prices['SpotPrice'].astype('float')
df_spot_prices.sort_values('SpotPrice', inplace=True)
#---------------------------------------------------------------------------------------------------------------------
# merge memory size and cpu information into this dataframe
df_spot_instance_options = df_spot_prices[['AvailabilityZone', 'InstanceType', 'SpotPrice']].merge(df_instance_spot_specs[['InstanceType', 'MemSize', 'vCPUs',
'CurrentGeneration', 'Processor']], left_on='InstanceType', right_on='InstanceType')

How to match this regular expression using TCL

Kindly give me some input on this. I have the below input for a TCL regular expression.
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
From the above i want to separate like two list element, the regular expression should separate by following word.
Here there are two interface output is there, one is for
10.132.224.74 (Tunnel42)
interface and another one is for
10.135.0.86 (GigabitEthernet0/1)
If there is no line starting with "Internal tag is " after the "Originating router
is " line it should divide upto "Originating router is " line as a one
list element.
If there is a line "Internal tag is " is available after the
"Originating router is " line it should divide upto "Internal tag is "
as a one list
I am expecting the output like
{Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200
A more generalized approach can be splitting them input into line and parsing them as needed
set a { Descriptor Blocks:
10.132.224.74 (Tunnel42), from 10.132.224.74, Send flag is 0x0
Composite metric is (2032896/128256), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55000 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 1
Originating router is 10.128.9.65
10.135.0.86 (GigabitEthernet0/1), from 10.135.0.86, Send flag is 0x0
Composite metric is (2033152/2032896), route is Internal
Vector metric:
Minimum bandwidth is 4096 Kbit
Total delay is 55010 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1380
Hop count is 2
Originating router is 10.128.9.65
Internal tag is 200 }
set tunnelStart 0
set interfaceStart 0
set tunnelInfo {}
set interfaceInfo {}
set result {}
foreach line [split $a \n] {
if {[regexp {\(Tunnel\d+\)} $line]} {
# If suppose, we already identified 'tunnelInfo' and extracted it, then that variable won't be empty
if {$tunnelInfo ne {}} {
regsub {\n$} $tunnelInfo {} tunnelInfo
# So, appending it to 'result'
lappend result $tunnelInfo
# Then, resetting the 'tunnelInfo'
set tunnelInfo {}
}
set tunnelStart 1
set interfaceStart 0
} elseif {[regexp {\(GigabitEthernet\d+/\d+\)} $line]} {
# Same reason as explained above
if {$interfaceInfo ne {}} {
regsub {\n$} $interfaceInfo {} interfaceInfo
lappend result $interfaceInfo
set interfaceInfo {}
}
set interfaceStart 1
set tunnelStart 0
}
if {$tunnelStart} {
#Appending each line along with '\n'
append tunnelInfo $line\n
} elseif {$interfaceStart} {
append interfaceInfo $line\n
}
}
#Removing the last '\n' alone
regsub {\n$} $tunnelInfo {} tunnelInfo
regsub {\n$} $interfaceInfo {} interfaceInfo
# At last checking if the variable is not empty, append it to 'result'
if {$tunnelInfo ne {}} {
lappend result $tunnelInfo
}
if {$interfaceInfo ne {}} {
lappend result $interfaceInfo
}
puts $result
You can put them in a procedure & call wherever you want to separate the input. If suppose your input has more than one tunnel and interface lines information, you could re-write the code to parse it accordingly.
You can use the textutil module to do this easily:
package require textutil
textutil::split::splitx $a {\n(?=\s*\d)}
This splits the original text into a list of three items: the " Descriptor Blocks:" substring and one item each for the two blocks. It works by finding junctures where a line break and optional whitespace is followed by a digit. The line break is removed, but the leading whitespace and the digit is preserved.
Core-Tcl solution:
The substitution
regsub -all -line {^(?=\s*\d)} $a \n
will split the text into three parts (the first part being the " Descriptor Blocks:" substring) by inserting an extra line break before each block. This solution obviously depends on only the first line in each block starting with a digit optionally preceded by whitespace. The -line option makes ^ anchor after a line break.
Note that this results in a text with three parts, not a list of three elements: if you want that you will need to break the text up at every double line break. Another way to deal with this is to have regsub instead insert a character that won't occur in the text, and then split on that character, e.g.
split [regsub -all -line {^(?=\s*\d)} $a #] #
Documentation: package, regsub, split, textutil package