Python - Dataframe column conversion - regex

I'm new on python and I'm trying to convert a column of a dataframe with strings (like 10,000+ or 1,000+) with regex in order to eliminate characters (+ and ,) and then convert them into integer.
How can I do that?
I've tried with regex functions but it doesn't work
convert_installs = re.compile('(?P<amount>\d*).(?P<unit>\d*)')
is it correct for finding what I want to save?
enter image description here

df['columnname'] = df['columnname'].str.replace('([+,])', '', regex = True).astype('int')
Takes your column called columname, and replaces the + or , to nothing, then changes the type to integer.

Related

PySpark dataframe remove white-spaces from a column of the string

Here in this pic, column Values contains some string values where the spaces are there in between, hence I am unable to convert this column to an Integer type.
If you can help me remove this white space from these string values, I can then cast them easily.
I have trieddf_cause_death_france.select(regexp_replace(col("Value")," ",""))
It does works but it removes all other columns from my spark dataframe.
please ignore this question. I am able to solve it.
In case you want to know my solution, here it is.
df_cause_death_france.withColumn('VALUE', regexp_replace('Value', ' ','')).show()
output =
https://i.stack.imgur.com/1bljf.png

How to mix numbers with text using DAX?

I have a card that I need it to show the Average of a column in a text style
this is what I have:
VAR Cnt = [CntPram]
var AllCnt = [CountAllPram]
RETURN
CntPram + " of " +[CountAllPram]
I end up getting this error
Cannot convert value 'of' of type Text to type Number.
How can I convert the numbers to make the card work?
You can use CONCATENATE for this.
Syntax
CONCATENATE(<text1>, <text2>)
This function joins two text strings into one text string. The joined items can be text, numbers or Boolean values represented as text, or a combination of those items. You can also use a column reference if the column contains appropriate values.
Read more on CONCATENATE
DAX does not recognize + as a concatenation operand for text srings. You can use nested CONCATENATE or join the strings using &:
Measure :=
[CntPram] & " of " & [CountAllPram]

Partially match integers in PostgreSQL queries

So in my PostgreSQL 10 I have a column of type integer. This column represents a code of products and it should be searched against another code or part of the code. The values of the column are made of three parts, a five-digit part and two two-digit parts. Users can search for only the first part, the first-second or first-second-third.
So, in my column I have , say 123451233 the user searches for 12345 (the first part). I want to be able to return the 123451233. Same goes if the users also searches for 1234512 or 123451233.
Unfortunately I cannot change the type of column or break the one column into three (one for every part). How can I do this? I cannot use LIKE. Maybe something like a regex for integers?
Thanks
Consider to use simple arithmetic.
log(value)::int + 1 returns the number of digits in integer part of the value and using this:
value/(10^(log(value)::int-log(search_input)::int))::int
returns value truncated to the same digits number as search_input so, finally
search_input = value/(10^(log(value)::int-log(search_input)::int))::int
will make the trick.
It is more complex literally but also could be more efficient then strings manipulations.
PS: But having index like create index idx on your_table(cast(your_column as text)); search like
select * from your_table
where cast(your_column as text) like search_input || '%';
is the best case IMO.
You do not need regex functions. Cast the integer to text and use the function left(), example:
create table my_table(code int); -- or bigint
insert into my_table values (123451233);
with input_data(input_code) as (
values('1234512')
)
select t.*
from my_table t
cross join input_data
where left(code::text, length(input_code)) = input_code;
code
-----------
123451233
(1 row)

Pandas dataframe replace string in multiple columns by finding substring

I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.
I've found some examples that do this by specifying the column(s) to search, like this:
df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'
But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace will not work since I'm only searching for a substring. Thanks for your help!
You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):
str_cols = ['Answer'] # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question Answer
#0 1 A
#1 2 X
#2 3 C
or if you want to replace all string columns:
str_cols = df.select_dtypes(['object']).columns

How to remove the space between the minus sign and number's in informatica

i have a issue where the there is a amount field which has data like
(- 98765.00),minus{spaces]{numbers} ?, i need to remove the space between the minus and the number and get is as (-98765.00), how do i do it in expression transformation.
field datatype is decimal (8,2).
Thanks,
Kiran
output_port: TO_DECIMAL(REPLACECHR(FALSE,input_port,' ',''))
REPLACECHR replaces the blanks with empty character, essentially removing them. The first argument can be TRUE/FALSE to specify case sensitive or not, but it is not important in this case.
You can use REG_REPLACE function to replace space
To achieve this you need to follow below steps,
* Create two variable ports
* REG_REPLACE - function requires string column, so you need to convert the decimal column to string column using TO_CHAR function
First variable port(string) - TO_CHAR(column_name)
* In previous port data is converted to string, now convert it again to decimal and apply REG_REPLACE function
Second variable port(decimal) - to_decimal(reg_replace(first_variable_port,'s+',''))
s - determines the white spaces in informatica regular expression
See the below image,
same number which you provided is used. Use the same data type and function
Debugger gives the exact result by removing white space in the below image,
May be you have the issue with other transformations which you are passing through. Debug and verify the data once.
Hope you got it, any issues feel free to ask
To have enjoy informatica, have a fun on https://etlinfromatica.wordpress.com/
If my understanding is correct, you need to replace both the spaces and the brackets. Here's the expression:
TO_DECIMAL(
REPLACECHR(0,
REPLACECHR(0, '(- 98765.00)', ' ', '') -- this part does the space replacement
, '()', '') -- this part replaces the brackets
)