Trying to extract values from text -

Trying to extract values from text - - regex

I am trying to get acc, accel and rx from below text only if accounting value is true.
dataRx: 21916 drx: 1743625
ota: 191791 orx: 74164489
dataDropped: 14 dropped:1134
id: 65535 waitress BE nginxid: 0 kbps: 0.000
accounting: false
drop : 1
rx : 48392 bytes: 483920
id: 65533 waitress BE nginxid: 1 kbps: 0.000
accounting: false
drop : 4
rx : 122914 bytes: 70081939
id: 4232 nginx BE nginxid: 3 kbps: 0.000
accounting: false
drop : 0
rx : 3084 bytes: 94357
id: 10482 server BE nginxid: 4 kbps: 0.000
accounting: false
drop : 0
rx : 15 bytes: 2477
id: 20344 serve BE nginxid: 10 kbps: 62914.560
accounting: true
drop : 2
rx : 2217 bytes: 309637
accel : 482 bytes: 264318
acc :349 bytes: 225181
Below python code gets accounting and accel values using below regex
accounting:\s*((?P<accounting>\S*)[\S\s]*?accel:[\S\s]*?bytes:\s*(?P<accel>\S*)[\S\s]*?)
for match in re.finditer(re_exp, text):
group = match.groupdict()
print group
Output:
{"accounting": false, "accel": 264318}
But, the expected output should be
{"accounting": true, "accel": 264318}
Need help with regex expression. Any help would be greatly appreciated.
Also, is there a regex way to group all the data fields under id?
Thanks

Try this
content = open("acc.txt",'r')
ar = content.read()
import re
getdata = re.findall(r"accounting: (true).+?accel.+?bytes:\s(\d+)",ar,re.S)
print getdata
Try as the follow for group the all data corresponding to their id
content = open("acc.txt",'r')
ar = content.readlines()
arv = []
flag = 0
m = ""
for j in ar:
if("id:"in j):
arv.append(m)
m = ""
flag = 1
if (flag == 1):
m+=j
for j in (arv):
print j

Iterate through lines one by one:
import re
rx = ""
accel = ""
acc = ""
lines = text.split("\n")
for i in range(len(lines)):
line = lines[i]
if line == "accounting: true":
match = re.search(r"rx\s*:\s+(\d*)", lines[i+2])
if match:
rx = match.group(1)
match = re.search(r"accel\s*:\s+(\d*)", lines[i+3])
if match:
accel = match.group(1)
match = re.search(r"acc\s*:\s*(\d*)", lines[i+4])
if match:
acc = match.group(1)
print("'accounting': true, 'rx': {}, 'accel': {}, 'acc': {}".format(rx, accel, acc))

Related

Find starting and ending index of each unique charcters in a string in python

I have a string with characters repeated. My Job is to find starting Index and ending index of each unique characters in that string. Below is my code.
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
mo = re.search(item,x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
Output :
a 0 1
b 3 4
c 7 8
Here the end index of the characters are not correct. I understand why it's happening but how can I pass the character to be matched dynamically to the regex search function. For instance if I hardcode the character in the search function it provides the desired output
x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
output:
b 2 5
The above function is providing correct result but here I can't pass the characters to be matched dynamically.
It will be really a help if someone can let me know how to achieve this any hint will also do. Thanks in advance

String literal formatting to the rescue:
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
# for patterns better use raw strings - and format the letter into it
mo = re.search(fr"{item}+",x) # fr and rf work both :) its a raw formatted literal
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n) # fix upper limit by n-1
Output:
a 0 3 # you do see that the upper limit is off by 1?
b 3 7 # see above for fix
c 7 9
Your pattern does not need the [] around the letter - you are matching just one anyhow.
Without regex1:
x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
if last_ch == ch:
continue
else:
print(last_ch,start_idx, idx-1)
last_ch = ch
start_idx = idx
print(ch,start_idx,idx)
output:
a 0 2 # not off by 1
b 3 6
c 7 8
1RegEx: And now you have 2 problems...

Looking at the output, I'm guessing that another option would be,
import re
x = "aaabbbbcc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
print(output)
Output
a 0 3
b 3 7
c 7 9
I think it'll be in the Order of N, you can likely benchmark it though, if you like.
import re, time
timer_on = time.time()
for i in range(10000000):
x = "aabbbbccc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
timer_off = time.time()
timer_total = timer_off - timer_on
print(timer_total)

load multiple csv files into Dataframe: columns names issue

I have multiple csv files with the same format (14 rows 4 columns).
I tried to load all of them into a single dataFrame, and use file's name to rename the values of the first column (1-14)
1 500 0 0
2 350 0 1
3 500 1 0
.............
13 600 0 0
14 800 0 0
I tried the following code but I am not getting what I am expecting:
filenames = os.listdir('Threshold/')
Y = pd.DataFrame () #empty df
# file name are in the following foramt "subx_ICA_thre.csv"
# need to get x (subject number to be used later for renaming columns values)
Sub_list=[]
for filename in filenames:
s= int(''.join(filter(str.isdigit, filename)))
Sub_list.append(int(s))
S_Sub_list= sorted(Sub_list)
for x in S_Sub_list: # get the file according to the subject number
temp = pd.read_csv('sub' +str(x)+'_ICA_thre.csv' )
df = pd.concat([Y, temp]) # concat the obtained frame with the empty frame
df.columns = ['id', 'data', 'isEB', 'isEM']
# replace the column values using subject id
for sub in range(1,15):
df['id'].replace(sub, 'sub' +str(x)+'_ICA_'+str(sub) ,inplace=True)
print (df)
output:
id data isEB isEM
0 sub1_ICA_2 200 0 0
1 sub1_ICA_3 275 0 0
2 sub1_ICA_4 500 1 0
................................
11 sub1_ICA_13 275 0 0
12 sub1_ICA_14 300 0 0
id data isEB isEM
0 sub2_ICA_2 275 0 0
1 sub2_ICA_3 500 0 0
2 sub2_ICA_4 400 0 0
.................................
11 sub2_ICA_13 300 0 0
12 sub2_ICA_14 450 0 0
First, it seems that the code makes different dataFrame not a single one.Second, the first row is removed (sub1_ICA_1 is missing, may be replaced with column names).
I couldn't find the problem in the loop that I am using

I think you need create list of DataFrames first, then concat with parameter keys for new values by range in MultiIndex, then modify column id and last remove MultiIndex by reset_index:
Also was added parameter names to read_csv for custom columns names.
Y = []
for x in S_Sub_list:
n = ['id', 'data', 'isEB', 'isEM']
temp = pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n)
Y.append(temp)
#list comprehension alternative
#n = ['id', 'data', 'isEB', 'isEM']
#Y = [pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n) for x in S_Sub_list]
df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))
df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'_ICA_'+ df['id'].astype(str)
df = df.reset_index(drop=True)

Need help in re module of python

I'm trying to use the re module to parse through a file. I tried three version of the code, first 2 version's are not retrieving any O/P. The third version is retrieving only one line. Can someone please have a look?
Version1:
import re
file = open('sample.txt', 'r')
x = file.readline()
while x:
var = re.findall(r'(?:\*|\*>)\s+(\d+.\d+.\d+.\d+\/\d+\s+)?(\S+)\s+\d+\s+(\d+\s+.+)[ie]',x)
x = file.readline()
print(var)
file.close()
Version2:
import re
file = open('sample.txt', 'r')
x = file.read()
var = re.findall(r'(?:\*|\*>)\s+(\d+.\d+.\d+.\d+\/\d+\s+)?(\S+)\s+\d+\s+(\d+\s+.+)[ie]',x)
print(var)
file.close()
Version3:
import re
file = open('sample.txt', 'r')
x = file.readline()
while x:
var = re.search(r'(?:\*|\*>)\s+(\d+.\d+.\d+.\d+\/\d+\s+)?(\S+)\s+\d+\s+(\d+\s+.+)[ie]',x, re.M)
x = file.readline()
print(var.group(0))
file.close()
The data in sample.txt is as below. The network is blank after first line, and when I'm running individually these statements on python shell the regex is working.
Oregon Exchange BGP Route Viewer
route-views.oregon-ix.net / route-views.routeviews.org
This hardware is part of a grant by the NSF.
Please contact help#routeviews.org if you have questions, or
if you wish to contribute your view.
Network Next Hop Metric LocPrf Weight Path
* 64.48.0.0/16 173.205.57.234 0 53364 3257 2828 i
* 202.232.0.2 0 2497 2828 i
* 93.104.209.174 0 58901 51167 1299 2828 i
* 193.0.0.56 0 3333 2828 i
* 103.197.104.1 0 134708 3491 2828 i
* 132.198.255.253 0 1351 6939 2828 i

I think this will do the trick:
import re
thelist = [
"* 64.48.0.0/16 173.205.57.234 0 53364 3257 2828 i",
"* 93.104.209.174 0 58901 51167 1299 2828 i",
"* 193.0.0.56 0 3333 2828 i",
]
regex = re.compile("\*\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\/\d{1,3})?\s+([\S]+)\s+([^i]+i)")
for text in thelist:
match = re.search(regex, text)
if match:
print ("yuppers")
print (match.group(1))
print (match.group(2))
print (match.group(3))
print ("\n")
else:
print ("nope")
results
yuppers
64.48.0.0/16
173.205.57.234
0 53364 3257 2828 i
yuppers
None
93.104.209.174
0 58901 51167 1299 2828 i
yuppers
None
193.0.0.56
0 3333 2828 i
Depending on what you actually want to do with each, you can just fiddle with the results. The regex makes the network optional which might have been tripping you up. Hopefully this gets you in the right direction!

How to calculate the number of occurrence of a given character in each row of a column of strings?

I have a data.frame in which certain variables contain a text string. I wish to count the number of occurrences of a given character in each individual string.
Example:
q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"))
I wish to create a new column for q.data with the number of occurence of "a" in string (ie. c(2,1,0)).
The only convoluted approach I have managed is:
string.counter<-function(strings, pattern){
counts<-NULL
for(i in 1:length(strings)){
counts[i]<-length(attr(gregexpr(pattern,strings[i])[[1]], "match.length")[attr(gregexpr(pattern,strings[i])[[1]], "match.length")>0])
}
return(counts)
}
string.counter(strings=q.data$string, pattern="a")
number string number.of.a
1 1 greatgreat 2
2 2 magic 1
3 3 not 0

The stringr package provides the str_count function which seems to do what you're interested in
# Load your example data
q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F)
library(stringr)
# Count the number of 'a's in each element of string
q.data$number.of.a <- str_count(q.data$string, "a")
q.data
# number string number.of.a
#1 1 greatgreat 2
#2 2 magic 1
#3 3 not 0

If you don't want to leave base R, here's a fairly succinct and expressive possibility:
x <- q.data$string
lengths(regmatches(x, gregexpr("a", x)))
# [1] 2 1 0

nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string))
[1] 2 1 0
Notice that I coerce the factor variable to character, before passing to nchar. The regex functions appear to do that internally.
Here's benchmark results (with a scaled up size of the test to 3000 rows)
q.data<-q.data[rep(1:NROW(q.data), 1000),]
str(q.data)
'data.frame': 3000 obs. of 3 variables:
$ number : int 1 2 3 1 2 3 1 2 3 1 ...
$ string : Factor w/ 3 levels "greatgreat","magic",..: 1 2 3 1 2 3 1 2 3 1 ...
$ number.of.a: int 2 1 0 2 1 0 2 1 0 2 ...
benchmark( Dason = { q.data$number.of.a <- str_count(as.character(q.data$string), "a") },
Tim = {resT <- sapply(as.character(q.data$string), function(x, letter = "a"){
sum(unlist(strsplit(x, split = "")) == letter) }) },
DWin = {resW <- nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string))},
Josh = {x <- sapply(regmatches(q.data$string, gregexpr("g",q.data$string )), length)}, replications=100)
#-----------------------
test replications elapsed relative user.self sys.self user.child sys.child
1 Dason 100 4.173 9.959427 2.985 1.204 0 0
3 DWin 100 0.419 1.000000 0.417 0.003 0 0
4 Josh 100 18.635 44.474940 17.883 0.827 0 0
2 Tim 100 3.705 8.842482 3.646 0.072 0 0

Another good option, using charToRaw:
sum(charToRaw("abc.d.aa") == charToRaw('.'))

The stringi package provides the functions stri_count and stri_count_fixed which are very fast.
stringi::stri_count(q.data$string, fixed = "a")
# [1] 2 1 0
benchmark
Compared to the fastest approach from #42-'s answer and to the equivalent function from the stringr package for a vector with 30.000 elements.
library(microbenchmark)
benchmark <- microbenchmark(
stringi = stringi::stri_count(test.data$string, fixed = "a"),
baseR = nchar(test.data$string) - nchar(gsub("a", "", test.data$string, fixed = TRUE)),
stringr = str_count(test.data$string, "a")
)
autoplot(benchmark)
data
q.data <- data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = FALSE)
test.data <- q.data[rep(1:NROW(q.data), 10000),]

A variation of https://stackoverflow.com/a/12430764/589165 is
> nchar(gsub("[^a]", "", q.data$string))
[1] 2 1 0

I'm sure someone can do better, but this works:
sapply(as.character(q.data$string), function(x, letter = "a"){
sum(unlist(strsplit(x, split = "")) == letter)
})
greatgreat magic not
2 1 0
or in a function:
countLetter <- function(charvec, letter){
sapply(charvec, function(x, letter){
sum(unlist(strsplit(x, split = "")) == letter)
}, letter = letter)
}
countLetter(as.character(q.data$string),"a")

You could just use string division
require(roperators)
my_strings <- c('apple', banana', 'pear', 'melon')
my_strings %s/% 'a'
Which will give you 1, 3, 1, 0. You can also use string division with regular expressions and whole words.

The question below has been moved here, but it seems this page doesn't directly answer to Farah El's question.
How to find number 1s in 101 in R
So, I'll write an answer here, just in case.
library(magrittr)
n %>% # n is a number you'd like to inspect
as.character() %>%
str_count(pattern = "1")
https://stackoverflow.com/users/8931457/farah-el

Yet another base R option could be:
lengths(lapply(q.data$string, grepRaw, pattern = "a", all = TRUE, fixed = TRUE))
[1] 2 1 0

The next expression does the job and also works for symbols, not only letters.
The expression works as follows:
1: it uses lapply on the columns of the dataframe q.data to iterate over the rows of the column 2 ("lapply(q.data[,2],"),
2: it apply to each row of the column 2 a function "function(x){sum('a' == strsplit(as.character(x), '')[[1]])}".
The function takes each row value of column 2 (x), convert to character (in case it is a factor for example), and it does the split of the string on every character ("strsplit(as.character(x), '')"). As a result we have a a vector with each character of the string value for each row of the column 2.
3: Each vector value of the vector is compared with the desired character to be counted, in this case "a" (" 'a' == "). This operation will return a vector of True and False values "c(True,False,True,....)", being True when the value in the vector matches the desired character to be counted.
4: The total times the character 'a' appears in the row is calculated as the sum of all the 'True' values in the vector "sum(....)".
5: Then it is applied the "unlist" function to unpack the result of the "lapply" function and assign it to a new column in the dataframe ("q.data$number.of.a<-unlist(....")
q.data$number.of.a<-unlist(lapply(q.data[,2],function(x){sum('a' == strsplit(as.character(x), '')[[1]])}))
>q.data
# number string number.of.a
#1 greatgreat 2
#2 magic 1
#3 not 0

Another base R answer, not so good as those by #IRTFM and #Finn (or as those using stringi/stringr), but better than the others:
sapply(strsplit(q.data$string, split=""), function(x) sum(x %in% "a"))
q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"))
q.data<-q.data[rep(1:NROW(q.data), 3000),]
library(rbenchmark)
library(stringr)
library(stringi)
benchmark( Dason = {str_count(q.data$string, "a") },
Tim = {sapply(q.data$string, function(x, letter = "a"){sum(unlist(strsplit(x, split = "")) == letter) }) },
DWin = {nchar(q.data$string) -nchar( gsub("a", "", q.data$string, fixed=TRUE))},
Markus = {stringi::stri_count(q.data$string, fixed = "a")},
Finn={nchar(gsub("[^a]", "", q.data$string))},
tmmfmnk={lengths(lapply(q.data$string, grepRaw, pattern = "a", all = TRUE, fixed = TRUE))},
Josh1 = {sapply(regmatches(q.data$string, gregexpr("g",q.data$string )), length)},
Josh2 = {lengths(regmatches(q.data$string, gregexpr("g",q.data$string )))},
Iago = {sapply(strsplit(q.data$string, split=""), function(x) sum(x %in% "a"))},
replications =100, order = "elapsed")
test replications elapsed relative user.self sys.self user.child sys.child
4 Markus 100 0.076 1.000 0.076 0.000 0 0
3 DWin 100 0.277 3.645 0.277 0.000 0 0
1 Dason 100 0.290 3.816 0.291 0.000 0 0
5 Finn 100 1.057 13.908 1.057 0.000 0 0
9 Iago 100 3.214 42.289 3.215 0.000 0 0
2 Tim 100 6.000 78.947 6.002 0.000 0 0
6 tmmfmnk 100 6.345 83.487 5.760 0.003 0 0
8 Josh2 100 12.542 165.026 12.545 0.000 0 0
7 Josh1 100 13.288 174.842 13.268 0.028 0 0

The easiest and the cleanest way IMHO is :
q.data$number.of.a <- lengths(gregexpr('a', q.data$string))
# number string number.of.a`
#1 1 greatgreat 2`
#2 2 magic 1`
#3 3 not 0`

s <- "aababacababaaathhhhhslsls jsjsjjsaa ghhaalll"
p <- "a"
s2 <- gsub(p,"",s)
numOcc <- nchar(s) - nchar(s2)
May not be the efficient one but solve my purpose.

vbscript reading Cisco switch interfaces

Trying to create a script that will send a 'sh run | b interface' to a Cisco switch. Write the output to an array. Split that array with a vbcr so each line of the config is in a sep elemant of the array.
I have tried to skin the cat many ways and still I am struggling.
Logic in English:
Send command to Cisco device
Capture the output to an array
define expected lines 'This are lines that are required under each 'interface' of the switch
Match the 'interface' name and corresponding number and write it to a file.
Check under that interface for the specific lines in the expected
If it finds it, write the line & ", YES"
If it does not find it, write the line & ", NO"
Keep doing this until you do not find any more '^interface\s[FG][a-z].+'
Output should look like this:
Interface GigabitEthernet 0/2
spanning-tree portfast, YES
This is the sample code that is failing:
'These are the expected line (not being compared in the script below but is my intention to have it compare the matched elements)
Dim vExpectedINT(4)
vExpectedINT(0) = "spanning-tree portfast"
vExpectedINT(1) = "switchport access vlan 17"
vExpectedINT(2) = "switchport mode access"
vExpectedINT(3) = "ip mtu 1400"
'objStream.Write "######################################################### " & vbcrlf
'objStream.Write "# I N T E R F A C E # " & vbcrlf
'objStream.Write "######################################################### " & vbcrlf
nCount = 0
vConfigLines = Split(strResultsINT, vbcr)
Set re = new RegExp
re.Global = False
re.IgnoreCase = True
re.Multiline = False
re.Pattern = "^interface [FG]"
' Regex Ex Definition
Set re2 = new RegExp
re2.Global = False
re2.IgnoreCase = True
re2.Multiline = False
re2.Pattern = "\sspanning-tree\sportfast"
' Regex Ex Definition
Set re3 = new RegExp
re3.Global = False
re3.IgnoreCase = True
re3.Multiline = False
re3.Pattern = "ip\smtu\s1400"
Set re4 = new RegExp
re4.Global = False
re4.IgnoreCase = True
re4.Multiline = False
re4.Pattern = "!"
' Compares the information
x = 1
Do While x <= Ubound(vConfigLines) - 1 do
MsgBox chr(34) & strLine & chr(34)
If re.Test(vConfigLines(x)) Then
' Write data to not expected section
x=x+1
do
If ! re4.Test(vConfigLines(x)) Then
MsgBox vConfigLines(x)
'objStream.Write vConfigLines(x) & vbcr
elseif re2.Test(vConfigLines(x)) Then
MsgBox vConfigLines(x)
elseif re3.Test(vConfigLines(x)) Then
MsgBox vConfigLines(x)
else
exit do
end if
x=x+1
loop
end IF
End If
Loop
This is a sample of the vConfigLines output:
There could be 48+ port per switch.
interface FastEthernet1/0/1
switchport access vlan 127
switchport mode access
switchport voice vlan 210
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 0 3 0 0
priority-queue out
mls qos trust cos
auto qos voip trust
spanning-tree portfast
!
interface FastEthernet1/0/2
switchport access vlan 127
switchport mode access
switchport voice vlan 210
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 0 3 0 0
priority-queue out
mls qos trust cos
auto qos voip trust
spanning-tree portfast
!
interface FastEthernet1/0/3
switchport access vlan 127
switchport mode access
switchport voice vlan 210
srr-queue bandwidth share 10 10 60 20
srr-queue bandwidth shape 0 3 0 0
priority-queue out
mls qos trust cos
auto qos voip trust
spanning-tree portfast

When facing a difficult and complex task, just follow these rules:
Divide the task in independently solvable subproblems
getting the info from Cisco
processing the resulting file
gather interesting info
output
Concentrate on the difficult subtask(s)
processing the resulting file
Solve a simplified but generalized version of (each) subtask using handmade data
for easy testing
You have items and are interested in whether they (don't) have given properties
Data to play with:
Item 0 (both props)
prop_a
prop_b
!
Item 1 (just b)
prop_b
!
Item 2 (a only)
prop_a
!
Item 3 (none)
!
Item 4 (irrelevant prop)
prop_c
!
Item 5 (Richy)
prop_c
prop_b
prop_a
!
Item 6 (Junky)
junk
prop_b
whatever
!
#Item 7 (Nasty)
# prop_a_like_but_not_prop_a
# prop_b
#!
Keep it simple
don't do more than absolutely necessary
don't use variables/components you can do without
So let's start:
You have to deal with a text file (lines). So don't do more than
Dim tsIn : Set tsIn = goFS.OpenTextFile("..\data\TheProblem.txt")
Dim sLine
Do Until tsIn.AtEndOfStream
sLine = Trim(tsIn.ReadLine())
If "" <> sLine Then
End If
Loop
tsIn.Close
90 % of the code using Split on .ReadAll is just fat. Yes, it's Do Until tsIn.AtEndOfStream and not Do While tsIn.AtEndOfStream = False. No Set tsIn = Nothing,
please.
The data is organized in blocks (Item n ... !), so make sure you
recognize the parts and know what to do when finding them:
Dim tsIn : Set tsIn = goFS.OpenTextFile("..\data\TheProblem.txt")
Dim sItem : sItem = "Item"
Dim sEnd : sEnd = "!"
Dim sLine
Do Until tsIn.AtEndOfStream
sLine = Trim(tsIn.ReadLine())
If "" <> sLine Then
Select Case True
Case 1 = Instr(sLine, sItem)
WScript.Echo "Begin, note item (name)"
Case 1 = Instr(sLine, sEnd)
WScript.Echo "End, output info"
WScript.Echo "----------"
Case Else
WScript.Echo "Middle, gather info"
End Select
End If
Loop
tsIn.Close
output:
Begin, note item (name)
Middle, gather info
Middle, gather info
End, output info
----------
Begin, note item (name)
Middle, gather info
End, output info
----------
...
For each item the output should be:
name, property, yes|no
The easiest way to do that is
WScript.Echo Join(aData, ", ")
Joining beats concatenation, especially if you want to set/manipulate the
parts independently and/or to pre-set some of them in the beginning.
Dim aData : aData = Array( _
Array( "Item?", "prop_a", "NO") _
, Array( "Item?", "prop_b", "NO") _
)
Dim sLine, aTmp, nIdx
Do Until tsIn.AtEndOfStream
sLine = Trim(tsIn.ReadLine())
If "" <> sLine Then
Select Case True
Case 1 = Instr(sLine, sItem)
aTmp = aData
For nIdx = 0 To UBound(aTmp)
aTmp(nIdx)(0) = sLine
Next
Case 1 = Instr(sLine, sEnd)
For nIdx = 0 To UBound(aTmp)
WScript.Echo Join(aTmp(nIdx), ", ")
Next
WScript.Echo "----------"
Case Else
WScript.Echo "Middle, gather info"
End Select
End If
Loop
tsIn.Close
The output
...
Item 3 (none), prop_a, NO
Item 3 (none), prop_b, NO
...
shows that by setting sensible defaults (NO), this version of the script
deals correctly with items having none of the interesting properties.
So lets tackle the middle/Case Else part:
Case Else
For nIdx = 0 To UBound(aTmp)
If 1 = Instr(sLine, aTmp(nIdx)(1)) Then
aTmp(nIdx)(2) = "YES"
Exit For
End If
Next
output now:
Item 0 (both props), prop_a, YES
Item 0 (both props), prop_b, YES
----------
Item 1 (just b), prop_a, NO
Item 1 (just b), prop_b, YES
----------
Item 2 (a only), prop_a, YES
Item 2 (a only), prop_b, NO
----------
Item 3 (none), prop_a, NO
Item 3 (none), prop_b, NO
----------
Item 4 (irrelevant prop), prop_a, NO
Item 4 (irrelevant prop), prop_b, NO
----------
Item 5 (Richy), prop_a, YES
Item 5 (Richy), prop_b, YES
----------
Item 6 (Junky), prop_a, NO
Item 6 (Junky), prop_b, YES
----------
But what about Nasty:
#Item 7 (Nasty)
# prop_a_like_but_not_prop_a
# prop_b
#!
The simple Instr() will fail, if one property name is a prefix of
another. To prove that starting simple and add complexity later
is good strategy:
Dim sFSpec : sFSpec = "..\data\TheProblem.txt"
WScript.Echo goFS.OpenTextFile(sFSpec).ReadAll
Dim tsIn : Set tsIn = goFS.OpenTextFile(sFSpec)
Dim sItem : sItem = "Item"
Dim sEnd : sEnd = "!"
Dim aData : aData = Array( _
Array( "Item?", "prop_a", "NO") _
, Array( "Item?", "prop_b", "NO") _
)
Dim aRe : aRe = Array(New RegExp, New RegExp)
Dim nIdx
For nIdx = 0 To UBound(aRe)
aRe(nIdx).Pattern = "^" & aData(nIdx)(1) & "$"
Next
Dim sLine, aTmp
Do Until tsIn.AtEndOfStream
sLine = Trim(tsIn.ReadLine())
If "" <> sLine Then
Select Case True
Case 1 = Instr(sLine, sItem)
aTmp = aData
For nIdx = 0 To UBound(aTmp)
aTmp(nIdx)(0) = sLine
Next
Case 1 = Instr(sLine, sEnd)
For nIdx = 0 To UBound(aTmp)
WScript.Echo Join(aTmp(nIdx), ", ")
Next
WScript.Echo "----------"
Case Else
For nIdx = 0 To UBound(aTmp)
If aRe(nIdx).Test(sLine) Then
aTmp(nIdx)(2) = "YES"
Exit For
End If
Next
End Select
End If
Loop
tsIn.Close
output:
Item 0 (both props)
prop_a
prop_b
!
Item 1 (just b)
prop_b
!
Item 2 (a only)
prop_a
!
Item 3 (none)
!
Item 4 (irrelevant prop)
prop_c
!
Item 5 (Richy)
prop_c
prop_b
prop_a
!
Item 6 (Junky)
junk
prop_b
whatever
!
Item 7 (Nasty)
prop_a_like_but_not_prop_a
prop_b
!
Item 0 (both props), prop_a, YES
Item 0 (both props), prop_b, YES
----------
Item 1 (just b), prop_a, NO
Item 1 (just b), prop_b, YES
----------
Item 2 (a only), prop_a, YES
Item 2 (a only), prop_b, NO
----------
Item 3 (none), prop_a, NO
Item 3 (none), prop_b, NO
----------
Item 4 (irrelevant prop), prop_a, NO
Item 4 (irrelevant prop), prop_b, NO
----------
Item 5 (Richy), prop_a, YES
Item 5 (Richy), prop_b, YES
----------
Item 6 (Junky), prop_a, NO
Item 6 (Junky), prop_b, YES
----------
Item 7 (Nasty), prop_a, NO
Item 7 (Nasty), prop_b, YES
----------

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Trying to extract values from text - - regex

Related

Find starting and ending index of each unique charcters in a string in python

load multiple csv files into Dataframe: columns names issue

Need help in re module of python

How to calculate the number of occurrence of a given character in each row of a column of strings?

vbscript reading Cisco switch interfaces

Categories

Resources