Apply character set in regex lookbehind - regex

I need to get the subnets count of bgp 7029, using regexp like
(?<=bgp 7029[\s]+\d[\s+])\d
but this doesn't work with positive look behind.
sh ip route vrf vrf-dnoc-mpls-test summary
IP routing table name is vrf-dnoc-mpls-test (0x2)
IP routing table maximum-paths is 32
Route Source Networks Subnets Replicates Overhead Memory (bytes)
static 0 0 0 0 0
connected 0 1 0 60 172
bgp 7029 0 1686 0 101160 289992
External: 0 Internal: 1686 Local: 0
internal 36 73652
Total 36 1687 0 101220 363816

Don't really need a lookbehind, a capture group will work just as well.
bgp[ \t]7029[ \t]+\d+[ \t]+(\d+)
where the subnet is in group 1

Related

Python Regex for beginning and ending word

I'm trying to write a regex that will find "Vlan20" and the word "up" after line protocol is in the first line. I wrote a regex below that will give me the group that the word "up" and "vlan20" are located in but is this the best way to achieve this? The regex just seems very long. The will use those values in a conditional statememt.
((^Vlan20)(\s\w+)(\s\w+),(\s\w+)(\s\w+)(\s\w+)(\s\w+))
Sample text:
Vlan20 is up, line protocol is up
Hardware is EtherSVI, address is 588d.0939.ffb4 (bia 588d.0939.ffb4)
Description: MATS Network
Internet address is 10.88.5.49/28
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not supported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:04, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
1992293 packets input, 187299894 bytes, 0 no buffer
Received 22809 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
2115535 packets output, 813500880 bytes, 0 underruns
0 output errors, 1 interface resets
89225 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out

AWS Elastic Search drop size abruptly

We have a social like app, and we started using the AWS ElasticcSearch Service in production, but we started to have a problem with ES, the ES version is the 2.3.
The cluster configuration is:
Data node: 2
Data node types: m3.medium.elasticsearch
Dedicated master instance count: 3
Dedicated master instance type: t2.small.elasticsearch.
Capacity of each data node: 50GB.
The problem is that in less than thirty minutes one of the node free storage size went from 9 GB to 0 GB, we did not know how this happened.
We have 4 types of documents, where one of them is a dynamic type, lets call it Group type, that is because every document of Group can have N fields that represents the friends of a Group.
Something like
{
13: [1,2,3,4],
5: [1,3,4],
user_ids: [1,2,3,4,6,7],
id: 1
}
This means that the users with ID 13 and 5 are friends with some of the users of the Group with ID 1.
So this document can grows according to the amount of users.
If anyone had or has the same problem, or just fully understand the Elastic Search architecture it would be awesome his help.
Indices info:
curl -XGET 'http://host/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana-4 1 1 5 0 1.9mb 1017.3kb
green open X 1 1 2259502 29575 57.5gb 28.7gb
green open Y 1 1 113156 0 21.7mb 10.8mb
curl -XGET 'http://host/_cat/nodes?v&h=host,id,ip,rp,hp,d,cpu,v,r,m,n
host id ip rp hp d cpu v r m n
x.x.x.x tIgm x.x.x.x 95 5 5.7gb 0 2.3.2 - m Shatter
x.x.x.x puUF x.x.x.x 95 6 5.7gb 0 2.3.2 - m Justice
x.x.x.x 1qZi x.x.x.x 97 54 17.7gb 7 2.3.2 d - Allatou
x.x.x.x lcty x.x.x.x 97 60 17.7gb 8 2.3.2 d - Amergin
x.x.x.x Nq1H x.x.x.x 5 15 5.7gb 0 2.3.2 - * Arkus
Thanks a lot!
I have managed to resolve the problem.
My problem is known as Mapping Explosion
Having variables keys in the mapping, like I had in the Group document type, will result on an evergrowing index.

Way to get SCSI disk names in Linux C++ application

In my Linux C++ application I want to get names of all SCSI disks which are present on the
system. e.g. /dev/sda, /dev/sdb, ... and so on.
Currently I am getting it from the file /proc/scsi/sg/devices output using below code:
host chan SCSI id lun type opens qdepth busy online
0 0 0 0 0 1 128 0 1
1 0 0 0 0 1 128 0 1
1 0 0 1 0 1 128 0 1
1 0 0 2 0 1 128 0 1
// If SCSI device Id is > 26 then the corresponding device name is like /dev/sdaa or /dev/sdab etc.
if (MAX_ENG_ALPHABETS <= scsiId)
{
// Device name order is: aa, ab, ..., az, ba, bb, ..., bz, ..., zy, zz.
deviceName.append(1, 'a'+ (char)(index / MAX_ENG_ALPHABETS) - 1);
deviceName.append(1, 'a'+ (char)(index % MAX_ENG_ALPHABETS));
}
// If SCSI device Id is < 26 then the corresponding device name is liek /dev/sda or /dev/sdb etc.
else
{
deviceName.append(1, 'a'+ index);
}
But the file /proc/scsi/sg/devices also contains the information about the disk which were previously present on the system. e.g If I detach the disk (LUN) /dev/sdc from the system
the file /proc/scsi/sg/devices still contains info of /dev/sdc which is invalid.
Tell me is there any different way to get the SCSI disk names? like a system call?
Thanks
You can simply read list of all files like /dev/sd* (in C, you would need to use opendir/readdir/closedir) and filter it by sdX (where X is one or two letters).
Also, you can get list of all partitions by reading single file /proc/partitions, and then filter 4th field by sdX:
$ cat /proc/partitions
major minor #blocks name
8 0 52428799 sda
8 1 265041 sda1
8 2 1 sda2
8 5 2096451 sda5
8 6 50066541 sda6
which would give you list of all physical disks together with their capacity (3rd field).
After get disk name list from /proc/scsi/sg/devices, you can verify the existence through code. For example, install sg3-utils, and use sg_inq to query whether the disk is active.

Tcl: match string only if it is followed by a number [regex]

Given that I have the following output :
Loopback1 is up, line protocol is up
Hardware is Loopback
Description: ** NA4-ISIS-MGMT-LOOPBACK1_MPLS **
Internet address is 84.116.226.27/32
MTU 1514 bytes, BW 8000000 Kbit, DLY 5000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation LOOPBACK, loopback not set
Keepalive set (10 sec)
Last input 12w3d, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
6 packets output, 456 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
How can I match "Loopback1" and not "Loopback" ?
In other words, how can I match the interface name only if there is a number next to it, in Tcl ?
use lookahead
Loopback(?=\d+)
It matches only Loopback in Loopback followed by any number of digits. If you want to match loopback and the number, useLoopback\d+

Speed up database inserts from ORM

I have a Django view which creates 500-5000 new database INSERTS in a loop. Problem is, it is really slow! I'm getting about 100 inserts per minute on Postgres 8.3. We used to use MySQL on lesser hardware (smaller EC2 instance) and never had these types of speed issues.
Details:
Postgres 8.3 on Ubuntu Server 9.04.
Server is a "large" Amazon EC2 with database on EBS (ext3) - 11GB/20GB.
Here is some of my postgresql.conf -- let me know if you need more
shared_buffers = 4000MB
effective_cache_size = 7128MB
My python:
for k in kw:
k = k.lower()
p = ProfileKeyword(profile=self)
logging.debug(k)
p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
#checking created is just a small optimization to save some database hits on new keywords
pass #duplicate entry
else:
p.save()
Some output from top:
top - 16:56:22 up 21 days, 20:55, 4 users, load average: 0.99, 1.01, 0.94
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 0.2%sy, 0.0%ni, 90.5%id, 0.7%wa, 0.0%hi, 0.0%si, 2.8%st
Mem: 15736360k total, 12527788k used, 3208572k free, 332188k buffers
Swap: 0k total, 0k used, 0k free, 11322048k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14767 postgres 25 0 4164m 117m 114m S 22 0.8 2:52.00 postgres
1 root 20 0 4024 700 592 S 0 0.0 0:01.09 init
2 root RT 0 0 0 0 S 0 0.0 0:11.76 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0 0.0 0:00.08 events/0
6 root 11 -5 0 0 0 S 0 0.0 0:00.00 khelper
7 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
9 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenwatch
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenbus
18 root RT -5 0 0 0 S 0 0.0 0:11.84 migration/1
19 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/1
Let me know if any other details would be helpful.
One common reason for slow bulk operations like this is each insert happening in its own transaction. If you can get all of them to happen in a single transaction, it could go much faster.
Firstly, ORM operations are always going to be slower than pure SQL. I once wrote an update to a large database in ORM code and set it running, but quit it after several hours when it had completed only a tiny fraction. After rewriting it in SQL the whole thing ran in less than a minute.
Secondly, bear in mind that your code here is doing up to four separate database operations for every row in your data set - the get in get_or_create, possibly also the create, the count on the filter, and finally the save. That's a lot of database access.
Bearing in mind that a maximum of 5000 objects is not huge, you should be able to read the whole dataset into memory at the start. Then you can do a single filter to get all the existing Keyword objects in one go, saving a huge number of queries in the Keyword get_or_create and also avoiding the need to instantiate duplicate ProfileKeywords in the first place.