Indent spaces to tabs - regex

I really have problem reading code with spaces, so I use the visual studio code editor to indent codes from spaces to tabs before I read them.
But the problem is rails has a lot of files, I have to do the same operation repetitively. So, I want to use Dir.glob to iterate over all of them and covert spaces to tabs and overwrite those files. It is a terrible idea, but still...
Currently my String#spaces_to_tabs() method looks like this:
Code
# A method that works for now...
String.define_method(:spaces_to_tabs) do
each_line.map do |x|
match = x.match(/^([^\S\t\n\r]*)/)[0]
m_len = match.length
(m_len > 0 && m_len % 2 == 0) ? ?\t * (m_len / 2) + x[m_len .. -1] : x
end.join
end
Which kind of works
Here's a test:
# Put some content that will get converted to space
content = <<~EOF << '# Hello!'
def x
'Hello World'
end
p x
module X
refine Array do
define_method(:tally2) do
uniq.reduce({}) { |h, x| h.merge!( x => count(x) ) }
end
end
end
using X
[1, 2, 3, 4, 4, 4,?a, ?b, ?a].tally2
p [1, 2, 3, 4, 4, 4,?a, ?b, ?a].tally2
\r\r\t\t # Some invalid content
EOF
puts content.spaces_to_tabs
Output:
def x
'Hello World'
end
p x
module X
refine Array do
define_method(:tally2) do
uniq.reduce({}) { |h, x| h.merge!( x => count(x) ) }
end
end
end
using X
[1, 2, 3, 4, 4, 4,?a, ?b, ?a].tally2
p [1, 2, 3, 4, 4, 4,?a, ?b, ?a].tally2
# Some invalid content
# Hello!
Currently it does not:
Affect white-spaces (\t, \r, \n) other than spaces.
Affect the output of code, only converts spaces to tabs.
I can't use my editor because:
With Dir.glob (not included in this example), I can iterate over only .rb, .js, .erb, .html, .css, and .scss files.
Also, this is slow, but I can have at most 1000 files (above extensions) with 1000 lines of code for each file, but that's max, and not too practical, I generally have < 100 files with a couple of hundred lines of code. The code can take 10 seconds, which is not a problem here, since I need to run the code once for a project...
Is there a better way to do it?
Edit
Here's the full code with globbing for converting all major files in rails:
#!/usr/bin/ruby -w
String.define_method(:bold) { "\e[1m#{self}" }
String.define_method(:spaces_to_tabs) do
each_line.map do |x|
match = x.match(/^([^\S\t\n\r]*)/)[0]
m_len = match.length
(m_len > 0 && m_len % 2 == 0) ? ?\t * (m_len / 2) + x[m_len .. -1] : x
end.join
end
GREEN = "\e[38;2;85;160;10m".freeze
BLUE = "\e[38;2;0;125;255m".freeze
TURQUOISE = "\e[38;2;60;230;180m".freeze
RESET = "\e[0m".freeze
BLINK = "\e[5m".freeze
dry_test = ARGV.any? { |x| x[/^\-(\-dry\-test|d)$/] }
puts "#{TURQUOISE.bold}:: Info:#{RESET}#{TURQUOISE} Running in Dry Test mode. Files will not be changed.#{RESET}\n\n" if dry_test
Dir.glob("{app,config,db,lib,public}/**/**.{rb,erb,js,css,scss,html}").map do |y|
if File.file?(y) && File.readable?(y)
read = IO.read(y)
converted = read.spaces_to_tabs
unless read == converted
puts "#{BLINK}#{BLUE.bold}:: Converting#{RESET}#{GREEN} indentation to tabs of #{y}#{RESET}"
IO.write(y, converted) unless dry_test
end
end
end

If this is just an intellectual exercise about tab indentation algorithms, then fine. If you really have trouble viewing the files, use Rubocop. It has configuration options that allow you to beautify the code, and the type of spaces it generates and the degree of indentation it applies. I use it with Atom and atom-beautify but I'm sure it has a plugin for VS code too. https://docs.rubocop.org/rubocop/0.86/cops_layout.html#layoutindentationconsistency

Related

How get all numerical values from a list except numbers relating to certain Strings

I want to get all the numbers from my String except for the numbers that are related to the String pattern 'SPN'
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.dropWhile { it ==~ /[^0-9 ]/ } // drop until you hit a char that isn't a letter or a space in the list
.findAll { it[0] != 'SPN' } // if a group starts with SPN, drop it
assert splitted == [1, 2, 4]
This doesn't seem to do what I expect it to do, I think I am missing the re-collecting step
You can use findResults which only collects elements that aren't null, so you can use it to filter AND transform at the same time:
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.split(/\s+/) // Split all the entries on whitespace
.findResults { it[1] == 'SPN' ? null : it[0] as Integer }
assert splitted == [1, 2, 4]

VIM padding with appropriate number of ",0" to get CSV file

I have a file containing numbers like
1, 2, 3
4, 5
6, 7, 8, 9,10,11
12,13,14,15,16
...
I want to create a CSV file by padding each line such that there are 6 values separated by 5 commas, so I need to add to each line an appropriate number of ",0". It shall look like
1, 2, 3, 0, 0, 0
4, 5, 0, 0, 0, 0
6, 7, 8, 9,10,11
12,13,14,15,16, 0
...
How would I do this with VIM?
Can I count the number of "," in a line with regular expressions and add the correct number of ",0" to each line with the substitute s command?
You can achieve that by typing this command:
:g/^/ s/^.*$/&,0,0,0,0,0,0/ | normal! 6f,D
You can add six zeros in all lines first, irrespective of how many numbers they have and then, you can delete everything from sixth comma till end in every line.
To insert them,
:1,$ normal! i,0,0,0,0,0,0
To delete from sixth comma till end,
:1,$normal! ^6f,D
^ moves to first character in line(which is obviously a number here)
6f, finds comma six times
D delete from cursor to end of line
Example:
Original
1,2,
3,6,7,0,0,0
4,5,6
11,12,13
After adding six zeroes,
1,2,0,0,0,0,0,0
3,6,7,0,0,0,0,0,0,0,0,0
4,5,6,0,0,0,0,0,0
11,12,13,0,0,0,0,0,0
After removing from six comma to end of line
1,2,0,0,0,0,0
3,6,7,0,0,0,0
4,5,6,0,0,0,0
11,12,13,0,0,0
With perl:
perl -lpe '$_ .= ",0" x (5 - tr/,//)' file.txt
With awk:
awk -v FS=, -v OFS=, '{ for(i = NF+1; i <= 6; i++) $i = 0 } 1' file.txt
With sed:
sed ':b /^\([^,]*,\)\{5\}/ b; { s/$/,0/; b b }' file.txt
As far as how to do this from inside Vim, you can also pipe text through external programs and it will replace the input with the output. That's an easy way to leverage sorting, deduping, grep-based filtering, etc, or some of Sato's suggestions. So, if you have a script called standardize_commas.py, try selecting your block with visual line mode (shift+v then select), and then typing something like :! python /tmp/standardize_commas.py. It should prepend a little bit to that string indicating that the command will run on the currently selected lines.
FYI, this was my /tmp/standardize_commas.py script:
import sys
max_width = 0
rows = []
for line in sys.stdin:
line = line.strip()
existing_vals = line.split(",")
rows.append(existing_vals)
max_width = max(max_width, len(existing_vals))
for row in rows:
zeros_needed = max_width - len(row)
full_values = row + ["0"] * zeros_needed
print ",".join(full_values)

vim - code folding by expression

I have some sourcecode with curly brackets code blocks
I want to be able to fold the blocks having some if condition in front, and leave the other code blocks unfolded.
example input:
print "this is a test"
if a == b {
{ x = 1
y = 2
z = 3
}
k = [1, 2, 3]
}
{ l = 5 }
return "foo"
expected output:
print "this is a test"
if a == b {
+-- 6 lines:
}
{ l = 5 }
return "foo"
I've read this and this, but still no idea how to face the problem.
Any suggestions ?
Assuming that the if closing '}' brace is at the beginning of a line, you can use:
:g/if.*{/+,/^}/-fold
This folds the statements within the {} braces of the if, excluding the braces themselves.
This is achieved through the + and - movements put after the patterns that define the g range (there's a coma between the patterns): + moves down the range by one line from the first matched pattern (/if.*{/) and the - moves the range one line up from the second matched pattern (/^}/)
If you have indented closing '}' braces or for any circumstance where the above command does not apply, you can try to look for other patterns that you can exploit and change the ex command above as needed.

As a beginning Pythoner I don't understand why I get an infinity loop with while?

This code always gives a infinity loop in while:
pos1 = 0
pos2 = 0
url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>'''
i = int(len(url_string))
#print i # debug
while i > 0:
pos1 = int(url_string.find('>'))
#print pos1 # debug
pos2 = int(url_string.find('<', pos1))
#print pos2 # debug
url_string = url_string[pos2:]
#print url_string # debug
print int(len(url_string)) # debug
i = int(len(url_string))
I tried everything without results.
More info:
Python 2.7.5+ (default, Sep 19 2013, 13:48:49)
[GCC 4.8.1] on linux2
Ubuntu 13.10
Run in GNOME Terminal 3.6.1 (also tried in Emacs and PyCharm without a solution to the infinity problem)
pos1 = int(url_string.find('>'))
pos2 = int(url_string.find('<', pos1))
You're finding the first < that occurs after the first >. There won't always be a < after the first >. When find can't find a <, it'll return -1, and the following:
url_string = url_string[pos2:]
will use url_string[-1:], a slice consisting of the last character of url_string. At that point, Python keeps looping, not finding <, and taking the last character of url_string until you get bored and hit Ctrl+C.
It's not clear what the fix is, as it's not clear what you're even trying to do. You might use while i > 1; or you might switch > and < in the computation of pos1 and pos2, and use url_string = url_string[pos2+1:]; or you might do something else. It depends on the goal you're trying to achieve.
It looks like you're trying to parse HTML to get data out of elements (e.g. I want the data inside the h1 tags, like 'Daily News '). If this is the case, I recommend using another library called BeautifulSoup4 at this link: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start
That said, since I'm not exactly sure what the program is meant to do, I broke down your code so that it's hopefully easier for you to see what's going on with the variables (and for now, took out the while loop). This will let you see exactly what your code has done without it running into an infinite loop.
# Setup Variables
pos1 = 0
pos2 = 0
url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>'''
i = int(len(url_string)) # the url_string length is 60 characters
print "Setting up Variables with string at ", i, " characters"
print "String is: ", url_string
"""string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is
wholly contained in s[start:end]. Return -1 on failure. Defaults for start and
end and interpretation of negative values is the same as for slices.
Source: http://docs.python.org/2/library/string.html
"""
print "Running through program first time"
pos1 = int(url_string.find('>'))
# This finds the first occurrence of '>', which is at position 6
pos2 = int(url_string.find('<', pos1))
# This finds the first occurrence of '<' after position 3 ('>'),
# which is at position 15
print "Pos1 is at:", pos1, " and pos2 is at:", pos2
url_string = url_string[pos2:] # trimming string down?
print "The string is now: ", url_string
# </h1><p>This is the daily news.</p><p>end</p>
print "The string length is now: ", int(len(url_string)) # string length now 45
i = int(len(url_string)) # updating the length var to the new length
This is what it looks like on terminal:
As pointed out above by #user2357112 you are never getting past the end of your string.
There are a few solutions, but one simple one (based on not really knowing what you are trying to achieve) would be to include the knowledge of pos1 and pos2 in your loop.
while (i > 0 && pos1 >= 0 && pos2 >= 0):
If either of the characters you are looking for isn't found, then the loop will stop.
It is just easier to split the string and count the number of letters like so:
map(len, url_string.split('<')) # This equals [0, 14, 4, 25, 3, 5, 3]
Thats not what you want. You want the cumulative sum of this list. Get it like this:
import numpy as np
lens = np.cumsum( map(len, url_string.split('<')) )
Now we are not quite thee yet. You need to also add the missing '<' that you filtered out from the strings when you split it using that. So for that you will have to add them in. Like so:
lens = lens + arange(len(lens))
This should work for single character splits.
Edit
As pointed out the requirement was to just extract the stuff which is not part of the tags. Then the one liner ...
''.join( map(lambda x: x.split('>')[-1] , url_string.split('<')) )
should do the job. Thanks for pointing that out!

Vim Sublist operations

I'm trying to create a script what detects the number of different characters in a selection.
p.e.
a = 4 (the character "a" is 4 times in the selection)
b = 2
e = 10
\ = 2
etc.
To obtain this, I created a list with sublist like this:
[['a', 1], ['b', 1], ['e', 1], ['\', 1]] --> etc
(a = the character // 1 = the number of times the character is found in the text)
What I don't know is:
how to searchi in a sublist? p.e. can I search if there is an "e" or "\" in the list?
when there is a match of "e" how can I add "1" to the number after the "e"?
[['e', 1]] --> [['e', 2]]
and how can I search in a sublist with regex and echo it in an echo command
p.e. search [a-f] and obtain this output:
a = 1
b = 1
e = 2
c, d, f are not found in list and has to be skipped.
Btw...does anyone know where I can find a good documentation about sublists?
(I can't find much information about sublists in the vim docs).
If I understand your problem correctly, the right data structure is a Dictionary mapping the character to the number of occurrences, not a list.
let occurrences = { 'a': 1, 'b': 1, 'e': 1, '\': 1 }
You can check for containment via has_key('a'), and increment via let occurrences['a'] += 1. To print the results use
for char in keys(occurrences)
echo char occurrences[char] "times"
endfor
And you can use the powerful map() and filter() functions on the Dictionary. For example, to only include characters a-f:
echo filter(copy(occurrences), 'v:key =~# "[a-f]"')
Read more at :help Dictionary.