From SAS to Python : substr - python-2.7

I would like to use the function substr(my_var,1,2) of SAS in Python. I tried a lot of functions like contains(), split() but it didn't work.
The functions contains() and split() work only for a string value. I would like to use it on a Python Series without using a for.
Thanks a lot for your help

A string in python can be sliced like any list:
>>> str = 'Hello World'
>>> str[1:3]
'el'
>>> str[1:-2]
'ello Wor'
To get substrings for multiple strings, you can use list comprehensions:
>>> strs = ['Hello World', 'Foobar']
>>> [ str[1:4] for str in strs]
['ell', 'oob']

In python, you may try this:
my_var[1:3]
This gets sub string of my_var from position 1 to 3 (exclusive).

Related

split string using regex python and "re" package

I'm using Python 3 on Windows 10. Consider the following string:
import re
s = ["12345", "67891", "01112"]
I want to split these zips at the 3 character to get the zip3, but this code throws an error.
re.split("\d{3}", s)
TypeError: cannot use a string pattern on a bytes-like object
I'm not quite sure how to get around. Help appreciated. Thanks.
To get the first three of each, simply string-slice them:
s = ["12345", "67891", "01112"]
first_tree = [p[0:3] for p in s]
print(first_tree)
Outtput:
['123', '678', '011'] # slicing
To split all text in threes, join it, then use chunking to get chunks of 3 letters:
s = ["12345", "67891", "01112"]
k = ''.join(s)
threesome = [k[i:i+3] for i in range(0,len(k),3)]
print(threesome)
Outtput:
['123', '456', '789', '101', '112'] # join + chunking
See How do you split a list into evenly sized chunks? and Understanding Python's slice notation
Slicing and chunking works on strings as well - the official doku about strings is here: about strings and slicing
To get the remainder as well:
s = ["12345", "67891", "01112"]
three_and_two = [[p[:3], p[3:]] for p in s]
print(three_and_two) # [['123', '45'], ['678', '91'], ['011', '12']]

Modify values across all column pyspark

I have a pyspark data frame and I'd like to have a conditional replacement of a string across multiple columns, not just one.
To be more concrete: I'd like to replace the string 'HIGH' with 1, and everything else in the column with 0. [Or at least replace every 'HIGH' with 1.] In pandas I would do:
df[df == 'HIGH'] = 1
Is there a way to do something similar? Or can I do a loop?
I'm new to pyspark so I don't know how to generate example code.
You can use the replace method for this:
>>> df.replace("HIGH", "1")
Keep in mind that you'll need to replace like for like datatypes, so attemping to replace "HIGH" with 1 will throw an exception.
Edit: You could also use regexp_replace to address both parts of your question, but you'd need to apply it to all columns:
>>> df = df.withColumn("col1", regexp_replace("col1", "^(?!HIGH).*$", "0"))
>>> df = df.withColumn("col1", regexp_replace("col1", "^HIGH$", "1"))

Building a string using symbols

In python, I'm able to build a string using this method:
>>> s = "%s is my name" %("Joe")
>>> s
'Joe is my name'
Is there a similar way to do this in C++? I know C has
printf("%s is my name", "Joe")
But that's for printing to standard out. Ideally I'd like something like the Python example. Thanks!
EDIT: Is there a name for this kind of thing? I couldn't think of what to google!
The sprintf command works like printf, but has an extra parameter at the front. The first parameter is an array of characters of where to store the string instead of printing it.
char chararray[1000];
sprintf(chararray,"%s is my name","Joe");
http://www.cplusplus.com/reference/cstdio/sprintf/

Better code then a regex sub() python 2.7

I am trying to find out if there are better faster ways to clean this returned string. Or is this the best way. It works, but more efficient ways are always wanted.
I have a function that returns the following output:
"("This is your:, House")"
I clean it up before printing with:
a = re.sub(r'^\(|\)|\,|\'', '', a)
print a
>>> This is your: House
I also learn a lot from the different ways people do things.
You don't need to use regular expression to do this.
>>> import string
>>> a = '"("This is your:, House")"'
>>> ''.join(x for x in a if x not in string.punctuation)
'This is your House'
>>> tbl = string.maketrans('', '')
>>> a.translate(tbl, string.punctuation)
'This is your House'
s='"("This is your:, House")"'
s.replace('\"','').replace('(','').replace(')','').replace(',','').replace(':','')
'This is your House'

pyqt string to a normal python string is giving some text appends to the string

When I try to print a PyQT string, it is not converted to a normal string. How can I do it? See the code below.
def _execute_test(self):
test_in = str(self.buildFlags.inFlags)
test_out = str(self.buildFlags.exFlags)
print(str(test_in))
print("============")
print(str(test_out))
The output I get is:
>>> [PyQt4.QtCore.QString(u'Documents'), PyQt4.QtCore.QString(u'New folder')]
If you want to print a list of string from a list of PyQt4.QtCore.QString try this:
print([str(x) for x in my_qstring_list])