Vim Sublist operations - regex

I'm trying to create a script what detects the number of different characters in a selection.
p.e.
a = 4 (the character "a" is 4 times in the selection)
b = 2
e = 10
\ = 2
etc.
To obtain this, I created a list with sublist like this:
[['a', 1], ['b', 1], ['e', 1], ['\', 1]] --> etc
(a = the character // 1 = the number of times the character is found in the text)
What I don't know is:
how to searchi in a sublist? p.e. can I search if there is an "e" or "\" in the list?
when there is a match of "e" how can I add "1" to the number after the "e"?
[['e', 1]] --> [['e', 2]]
and how can I search in a sublist with regex and echo it in an echo command
p.e. search [a-f] and obtain this output:
a = 1
b = 1
e = 2
c, d, f are not found in list and has to be skipped.
Btw...does anyone know where I can find a good documentation about sublists?
(I can't find much information about sublists in the vim docs).

If I understand your problem correctly, the right data structure is a Dictionary mapping the character to the number of occurrences, not a list.
let occurrences = { 'a': 1, 'b': 1, 'e': 1, '\': 1 }
You can check for containment via has_key('a'), and increment via let occurrences['a'] += 1. To print the results use
for char in keys(occurrences)
echo char occurrences[char] "times"
endfor
And you can use the powerful map() and filter() functions on the Dictionary. For example, to only include characters a-f:
echo filter(copy(occurrences), 'v:key =~# "[a-f]"')
Read more at :help Dictionary.

Related

Regex to find repeating numbers between other numbers

I have the following array and need two Regex filters that I want to use in PowerShell.
000111
010101
220220
123456
Filter 1: the number 0 that occurs equal or more than three times.
I expect the following values after filtering
000111
010101
Filter 2: all numbers that occur equal or more than three times.
I should only see these numbers.
000111
010101
220220
With 0{3,} I can only recognize numbers in sequence so i get only the number
000111
Is it possible to find repeating numbers that are between other numbers?
Since you insist to see the solution in regex, look at this: '(\d).*\1.*\1'
I think this is comprehensible without further explanation, isn't it?
Armali's helpful answer is short and to the point (use '(0).*\1.*\1' for filter 1), and definitely the best solution for the problem at hand, given that you only need to know in the abstract if a given string has 3 or more zeros / same digits.
The solutions below may be of interest if you need to know the specific count of 0s / digits, which, as far as I know, cannot be handled by a regex (alone)
Occurrence-counting variant of filter 1:
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$zerosOnly = $_ -replace '[^0]'
[pscustomobject] #{
InputString = $_
CountOfZeros = $zerosOnly.Length
}
})
That is, each string in the input array (enumerated via the intrinsic ForEach() method), has all chars. that aren't '0' ([^0]) removed via the regex-based -replace operator. The length of the resulting string is therefore equivalent to the count of zeros.
Output:
InputString CountOfZeros
----------- ------------
000111 3
010101 3
220220 2
123456 0
Occurrence-counting variant of filter 2
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$outputObject = [pscustomobject] #{ InputString = $_; DigitCounts = [ordered] #{} }
([char[]] $_ | Group-Object).ForEach({
$outputObject.DigitCounts[$_.Name] = $_.Count
})
$outputObject
})
That is, each input string by is grouped by its characters using Group-Object, whose output objects reflect the character at hand in the .Name property and the number of members of the group - i.e. the occurrence count for that character in the .Count property. An ordered hashtable is used to report character-occurrence-count pairs.
Output:
InputString DigitCounts
----------- -----------
000111 {[0, 3], [1, 3]}
010101 {[0, 3], [1, 3]}
220220 {[0, 2], [2, 4]}
123456 {[1, 1], [2, 1], [3, 1], [4, 1]…}
E.g., {[0, 2], [2, 4]} in the output above means that the char. '0' occurs 2 times, and '2' 4 times in input string '220220'.

How get all numerical values from a list except numbers relating to certain Strings

I want to get all the numbers from my String except for the numbers that are related to the String pattern 'SPN'
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.dropWhile { it ==~ /[^0-9 ]/ } // drop until you hit a char that isn't a letter or a space in the list
.findAll { it[0] != 'SPN' } // if a group starts with SPN, drop it
assert splitted == [1, 2, 4]
This doesn't seem to do what I expect it to do, I think I am missing the re-collecting step
You can use findResults which only collects elements that aren't null, so you can use it to filter AND transform at the same time:
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.split(/\s+/) // Split all the entries on whitespace
.findResults { it[1] == 'SPN' ? null : it[0] as Integer }
assert splitted == [1, 2, 4]

Dictionary: Alphabetize the elements of a list and count its occurences

Hi so I've been trying to count the elements in the list that I have made, and when I do it
The result should be:
a 2
above 2
across 1
and etc..
here's what Ive got:
word = []
with open('Lateralus.txt', 'r') as my_file:
for line in my_file:
temporary_holder = line.split()
for i in temporary_holder:
word.append(i)
for i in range(0,len(word)): word[i] = word[i].lower()
word.sort()
for count in word:
if count in word:
word[count] = word[count] + 1
else:
word[count] = 1
for (word,many) in word.items():
print('{:20}{:1}'.format(word,many))
#Kimberly, as I understood from your code, you want to read a text file of alphabetic characters.
You want to also ignore the cases of alphabetic characters in file. Finally, you want to count the occurences of each unique letters in the text file.
I will suggest you to use dictionary for this. I have written a sample code for this task which
satisfy the following 3 conditions (please comment if you want different result by providing inputs and expected outputs, I will update my code based on that):
Reads text file and creates a single line of text by removing any spaces in between.
It converts upper case letters to lower case letters.
Finally, it creates a dictionary containing unique letters with their frequencies.
» Lateralus.txt
abcdefghijK
ABCDEfgkjHI
IhDcabEfGKJ
mkmkmkmkmoo
pkdpkdpkdAB
A B C D F Q
ab abc ab c
» Code
import json
char_occurences = {}
with open('Lateralus.txt', 'r') as file:
all_lines_combined = ''.join([line.replace(' ', '').strip().lower() for line in file.readlines()])
print all_lines_combined # abcdefghijkabcdefgkjhiihdcabefgkjmkmkmkmkmoopkdpkdpkdababcdfqababcabc
print len(all_lines_combined) # 69 (7 lines of 11 characters, 8 spaces => 77-8 = 69)
while all_lines_combined:
ch = all_lines_combined[0]
char_occurences[ch] = all_lines_combined.count(ch)
all_lines_combined = all_lines_combined.replace(ch, '')
# Pretty printing char_occurences dictionary containing occurences of
# alphabetic characters in a text file
print json.dumps(char_occurences, indent=4)
"""
{
"a": 8,
"c": 6,
"b": 8,
"e": 3,
"d": 7,
"g": 3,
"f": 4,
"i": 3,
"h": 3,
"k": 10,
"j": 3,
"m": 5,
"o": 2,
"q": 1,
"p": 3
}
"""

Can someone tell me what's wrong with my code? [Python 2.7.1]

import string
sentence = raw_input("Enter sentence:")
for i in string.punctuation:
sentence = sentence.replace(i," ")
word_list = sentence.split()
word_list.sort(key=str.lower)
print word_list
for j in word_list:
print j,":",word_list.count(j)
word_list.remove(j)
When I use this code and type in a sample sentence, some of my words are not counted correctly:
Sample sentence: I, are.politics:wodng!"frail A P, Python. Python Python frail
output:
['A', 'are', 'frail', 'frail', 'I', 'P', 'politics', 'Python', 'Python', 'Python', 'wodng']
A : 1
frail : 2
I : 1
politics : 1
Python : 3
wodng : 1
What happened to the words "are" and "P"? I know the problem is happening in the last few lines but I don't know what's causing it.
Thanks!
The problem in your code is, that you remove elements from the list over which you are iterating.
Therefore I suggest to separate the iterator by converting the word_list into a set. Then you can iterate over the set word_iter, which contains every word just one time. Then you also don't need to remove anything anymore. Disadvantage is the unordered result, as sets are unordered. But you can put the result in a list and order that afterwards:
import string
sentence = raw_input("Enter sentence:")
for i in string.punctuation:
sentence = sentence.replace(i," ")
word_list = sentence.split()
word_list.sort(key=str.lower)
print word_list
result = []
word_iter = set(word_list)
for j in word_iter:
print j, ':', word_list.count(j)
result.append( (j, word_list.count(j)) )
result:
A : 1
wodng : 1
Python : 3
I : 1
P : 1
are : 1
frail : 2
politics : 1
At the end of your script, your list is not empty
You remove each time a value, so the pointer jumps one value each time

Split line based on regex in Julia

I'm interested in splitting a line using a regular expression in Julia. My input is a corpus in Blei's LDA-C format consisting of docId wordID : wordCNT For example a document with five words is represented as follows:
186 0:1 12:1 15:2 3:1 4:1
I'm looking for a way to aggregate words and their counts into separate arrays, i.e. my desired output:
words = [0, 12, 15, 3, 4]
counts = [1, 1, 2, 1, 1]
I've tried using m = match(r"(\d+):(\d+)",line). However, it only finds the first pair 0:1. I'm looking for something similar to Python's re.compile(r'[ :]').split(line). How would I split a line based on regex in Julia?
There's no need to use regex here; Julia's split function allows using multiple characters to define where the splits should occur:
julia> split(line, [':',' '])
11-element Array{SubString{String},1}:
"186"
"0"
"1"
"12"
"1"
"15"
"2"
"3"
"1"
"4"
"1"
julia> words = v[2:2:end]
5-element Array{SubString{String},1}:
"0"
"12"
"15"
"3"
"4"
julia> counts = v[3:2:end]
5-element Array{SubString{String},1}:
"1"
"1"
"2"
"1"
"1"
I discovered the eachmatch method that returns an iterator over the regex matches. An alternative solution is to iterate over each match:
words, counts = Int64[], Int64[]
for m in eachmatch(r"(\d+):(\d+)", line)
wd, cnt = m.captures
push!(words, parse(Int64, wd))
push!(counts, parse(Int64, cnt))
end
As Matt B. mentions, there's no need for a Regex here as the Julia lib split() can use an array of chars.
However - when there is a need for Regex - the same split() function just works, similar to what others suggest here:
line = "186 0:1 12:1 15:2 3:1 4:1"
s = split(line, r":| ")
words = s[2:2:end]
counts = s[3:2:end]
I've recently had to do exactly that in some Unicode processing code (where the split chars - where a "combined character", thus not something that can fit in julia 'single-quotes') meaning:
split_chars = ["bunch","of","random","delims"]
line = "line_with_these_delims_in_the_middle"
r_split = Regex( join(split_chars, "|") )
split( line, r_split )