Alternating index list pop - list

I have long list that a can simplify as below and even trying de function "re.sub" i can't remove the blank spaces ''.
overall_list = []
directory = '/content/drive/MyDrive/Colab Notebooks/S N'
for filename in os.listdir (directory):
f = os.path.join(directory,filename)
imagestring = pytesseract.image_to_string(Image.open(f))
string_lists = re.split('',imagestring,1)
print(string_lists)
for x in string_lists:
x = re.sub('\x0c', '', x)
x = re.sub('[\n-\x0c]',' ', x)
x = re.sub('','')
overall_list.append(x)
print(overall_list)
all the code above returns scanned images as individual lists:
['', 'N/S:10229876-5\n\x0c']
['', '192.1638.1 729.200\n\x0c']
['', '192.168.179.103 SPARE\n\x0c']
And the "overall_list" is all the above in one list
['', 'N/S:10229876-5 ', '', '192.1638.1 729.200 ', '', '192.168.179.103 SPARE ']
But a ran out of ideas to clean this list form the '' elements. However i noticed that these occur in a alternating pattern and maybe i can use pop to create a loop for and delete everytime it appears.
How do i structure this loop for this particular goal?

Related

quill inserttext list, what format to use

Basically I am trying to get the MarketShare text as a bullet list item only.
All the lines above Marketshare also show as a bullet list.
I struggle to understand what the (0,4) means in this.quill.formatText below.
and using this.quill.format('list',false) doesn't turn it off
The same question I have with regards to setting the size. I would like MarketShare to have a bigger size then the rest using inserttext, but when I use size 20 px as below, it doesn't work.
this.quill.format('list', false);
this.quill.insertText(0, '\n', '', true)
this.quill.formatText(0,4,'list', true);
this.quill.insertText(0, 'Marketshare (Max ' +
this.globals.MARKETSHAREPOTENTIAL + ' points)', {'size' : '20px', true)
this.quill.insertText(0, '\n', '', true)
this.quill.format('list', false);
this.quill.insertText(0, 'this text must be a bullet list', '', true)
this.quill.formatText(0,4,'list', true);
this.quill.insertText(0, 'this text I like to have in different font size or for instance as header 3 or header 4', 'bold', true)

Regex split on string containing hyphenated words

I have the following string:
test_string = '"abc" + "def" + "-xyz - rst"'
I am trying to split this string based on the - or + operators only but excluded hyphenated words from this regex split. I got this far:
In [205]: [n.strip() for n in re.split(r'[ ]{1}[-+]', test_string) if n != '']
Out[205]: ['"abc"', '"def"', '"-xyz', 'rst"']
I am expecting my result to be:
In [205]: [n.strip() for n in re.split(r'[ ]{1}[-+]', test_string) if n != '']
Out[205]: ['"abc"', '"def"', '"-xyz - rst"']
What am I missing? Thanks.
Considering using shlex:
import shlex
test_string = '"abc" + "def" + "-xyz - rst"'
# Parse the string into space-separated elements treating quotes as the shell does
# lone + and - signs will be their own element
arr = shlex.split(test_string)
# remove any element that is either '+' or '-'
final_arr = [x for x in arr if x not in ['+', '-']]
Variables:
>>> print(arr)
['abc', '+', 'def', '+', '-xyz - rst']
>>> print(final_arr)
['abc', 'def', '-xyz - rst']

How to change multiple rows in a column from unicode to timestamp in python

I am learning python for beginners. I would like to convert column values from unicode time ('1383260400000') to timestamp (1970-01-01 00:00:01enter code here). I have read and tried the following but its giving me an error.
ti=datetime.datetime.utcfromtimestamp(int(arr[1]).strftime('%Y-%m-%d %H:%M:%S');
Its saying invalid syntax. I read and tried a few other stuffs but I can not come right.. Any suggestion?
And another one, in the same file I have some empty cells that I would like to replace with 0, I tried this too and its giving me invalid syntax:
smsin=arr[3];
if arr[3]='' :
smsin='0';
Please help. Thank you alot.
You seem to have forgotten a closing bracket after (arr[1]).
import datetime
arr = ['23423423', '1163838603', '1263838603', '1463838603']
ti = datetime.datetime.utcfromtimestamp(int(arr[1])).strftime('%Y-%m-%d %H:%M:%S')
print(ti)
# => 2006-11-18 08:30:03
To replace empty strings with '0's in your list you could do:
arr = ['123', '456', '', '789', '']
arr = [x if x else '0' for x in arr]
print(arr)
# => ['123', '456', '0', '789', '0']
Note that the latter only works correctly since the empty string '' is the only string with a truth value of False. If you had other data types within arr (e.g. 0, 0L, 0.0, (), [], ...) and only wanted to replace the empty strings you would have to do:
arr = [x if x != '' else '0' for x in arr]
More efficient yet would be to modify arr in place instead of recreating the whole list.
for index, item in enumerate(arr):
if item = '':
arr[index] = '0'
But if that is not an issue (e.g. your list is not too large) I would prefer the former (more readable) way.
Also you don't need to put ;s at the end of your code lines as Python does not require them to terminate statements. They can be used to delimit statements if you wish to put multiple statements on the same line but that is not the case in your code.

Replace specific character from the list of string

Gurus,
I have list which looks like following :
[u'test1', u'test2', '', '']
I am trying to find a way to replace character u which is before 'test1' and 'test2' with none ''. So after replacing it will look like:
['test1','test2', '', '']
Initially I had list like following:
[u'test1\n', u'test2\r\n', '', '']
This I could reduce using following:
row_val = [w.replace('\n', '') for w in row_val]
row_val = [w.replace('\r', '') for w in row_val]
Let me know there is a way to perform the same without iterating through each string.
The u is not a string character, it is telling you that it is a unicode object rather than a str object.
You can just do:
row_val = [str(w) for w in row_val]

Combine several list comprehension codes

I got three list comprehensions that do some trimming in a given string. What these are doing is that in a string, it removes words that contain '/', removes certain words in the list called 'remove_set', and combines single consecutive letters into a one big word.
regex = re.compile(r'.*/.*')
parent = ' '.join([p for p in parent.split() if not regex.match(p)])
remove_set = {'hello', 'corp', 'world'}
parent = ' '.join([i for i in parent.split() if i not in remove_set])
parent = ' '.join((' ' if x else '').join(y) for x, y in itertools.groupby(parent.split(), lambda x: len(x) > 1))
For example:
string = "hello C S people in some corp/llc"
changes to
string = "CS people in some"
Can these commands can be written in one beautiful command??
Thanks in advance!