I am trying to replace a word that exists in a column in SQLite, but I am having some problems because it uses a special character.
following is the special character ', like "women's", "men's", shows up in SQLite like
Women's
Men's
So I tried the following string I normally use to make replacements under columns but failed
UPDATE SEARCHRESULT
SET CATEGORY = REPLACE(CATEGORY , 'Watches & Jewelry > Glasses > Women's Sunglasses', 'Watches & Jewelry > Glasses > Women's Sunglasses')
I would appreciate if someone can give me a working string for such a case since the string I am using is making the cells empty instead of replacing, normally this string works good with replacing cells that doesn’t have special characters.
I found out that using the following string solved my problem and removed that special charachter
UPDATE SEARCHRESULT
SET CATEGORY = REPLACE(CATEGORY , ''', '')
Related
I am using string replace method to clean-up column names.
df.columns=df.columns.str.replace("#$%./- ","").str.replace(' ', '_').str.replace('.', '_').str.replace('(','').str.replace(')','').str.replace('.','').str.lower()
Though it works, certainly does not look pythonic. Any suggestion?
I need only A-Za-z and underscore _ if required as column names.
Update:
I tried using Regular expression in the first replace method, but I still need to chain the string like this...
terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '')
Update showing test data:
Original string (Tab separated):
[Sr.No. Course Terms Besic of Education Degree Course Course Approving Authority (i.e Medical Council, etc.) Full form of Course 1 year Duration 2nd year 3rd year Duration 4 th year Duration]
Change column names:
terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '').str.lower()
Output:
['srno', 'course', 'terms', 'besic_of_education', 'degree_course',
'course_approving_authority_ie_medical_council_etc',
'full_form_of_course', '1_year_duration', '2nd_year_',
'3rd_year_duration', '4_th_year_duration']
Above output is correct. The question: Is there any way to achive the same other than the way I have used?
You can use a smaller number of .replace operations by replacing non-word strings with an empty string and subsequently removing the whitespace characters with an underscore.
df.columns.str.replace("[^\w\s]+","").str.replace("\s+","_").str.lower()
I hope this helps.
I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("®EXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!
I'm trying match elements in character vector that contain specific Arabic phrases.
So Far I have:
#load list of Arabic phrases
list.of.phrases <- read.table("arabic_phrases.txt")
#look for the first phrase
phrase1 <- arabic.text.vector[grepl(list.of.phrases[1],arabic.text.vector)]
Unfortunately, this approach or using raw Arabic text doesn't seem to return anything and I get this message:
Error in `[[<-.data.frame`(`*tmp*`, qname, value = 1) :
replacement has 1 row, data has 0
I know that I can match Arabic words using : [U0627-U06FF]+ as in:
#look for all cells containing arabic
arabic <-arabic.text.vector[grepl("[U0627-U06FF]+",arabic.text.vector)]
...
So far my approach is to convert the Arabic text to its Unicode point values and then use grep; however, I'm having trouble with the conversion.
Am I heading in the right direction or does anybody have another solution/approach?
I know this topic was discussed multiple times, I looked at multiple posts and answers, but could not find exactly what I need to do.
I am trying to search the string, that has multiple values of varchar2 separated by ':', and if the match is found on another string, delete part of the string that matched, and update the table with the rest of the string.
I wrote the code using combination of str and instr functions, but looking for more elegant solution using regexp, or collections.
For example, the input string looks like this: ('abc:defg:klmnp). Need to find for example the piece of the string (could be at any position), and remove it, that result would look like this: (abc:klmnp)?
EDIT - copied from comment:
The input string (abc:defg:klmn:defgb). Let's say I am looking for defg, and only defg will have to be removed, not defgb. Now, like I mentioned before, next time around, I might be looking for the value in position 1, or the last position. So the desired part of the string to be removed might not always be wrapped in ':' from the both sides, but depending where it is in the string, either from the right, or from the left, or from both sides.
You can do this with a combination of LIKE, REPLACE and TRIM functions.
select trim(':' from
replace(':'||your_column||':',':'||search_string||':',':')
) from table_name
where ':'||your_column||':' like '%:'||search_string||':%';
Idea is,
Surround the column with colons and use LIKE function to find the match.
And on such matched rows, use REPLACE to replace the search string along with surrounding colons, with a single colon.
And then use TRIM to remove the surrounding colons.
Demo at sqlfiddle
EDIT (simplified) Perhaps this is what you need:
SELECT REGEXP_REPLACE(REPLACE('abc:defg:klmnop', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('defg:klmnop:abc', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('abc:klmnop:defg', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('abc:klmnop:defgb:defg', ':defg:', ':'), '(^defg:|:defg$)', '')
FROM DUAL
;
which removes defg from start, middle, and end, and ignores defgb, giving:
abc:klmnop
klmnop:abc
abc:klmnop
abc:klmnop:defgb
And to update the table, you could:
UPDATE my_table
SET value = REGEXP_REPLACE(REGEXP_REPLACE(value, ':defg:', ':'), '(^defg:|:defg$)', '')
-- WHERE REGEXP_LIKE(value, '(^|.*:)defg(:.*|$)')
WHERE value LIKE '%defg%'
;
(though that final regex for the where may need to be tweaked to match, hard to test...)
I’d like to select country except ‘,’ from a data field which looks like this
Japan,Singapore,Italy,France
and my Code looks like this REGEXP_EXTRACT(country,'([^,]*)'), unfortunately, it works but only the country at the first was selected. How can I code it to select it all?
I slightly changed the RegEx to ([^,]+) to make the country name at least one digit. Using * creates empty matches so that every other match contains the country name. (Example)
Take a look at the fixed example here.
Important is the /g tag in the end to make the RegEx match globally.
If you are looking to extract all the characters except , then it could be achieved using either of the the REGEXP_REPLACE Calculated Fields below:
1) Replace , with (space)
REGEXP_REPLACE(country, ",", " ")
2) Remove ,
REGEXP_REPLACE(country, ",", "")
Google Data Studio Report and a GIF to elaborate: