I have a table that contains a number of rows with columns containing a URL. The URL is of the form:
http://one.example1.com:9999/dotFile.com
I would like to replace all matches in that column with http://example2.com/dotFile.com while retaining everything after :9999. I have found some documentation on regexp_matches and regexp_replace, but I can't quite wrap my head around it.
To replace a fixed string, use the simple replace() function.
To replace a dynamic string, you can use regexp_replace() like this:
UPDATE
YourTable
SET
TheColumn = regexp_replace(
TheColumn, 'http://[^:\s]+:9999(\S+)', 'http://example2.com\1', 'g'
)
if you know the url, you don't have to use regex. replace() function should work for you:
replace(string text, from text, to text)
Replace all occurrences in string of substring from with substring to
example: replace('abcdefabcdef', 'cd', 'XX') abXXefabXXef
you could try:
UPDATE yourtable SET
yourcolumn = replace(yourcolumn, 'one.example1.com:9999','example2.com')
;
Related
I am trying to extract all the urls from string field (metainfo.body) using query:
select split(regexp_replace(metainfo.body,'.*?((http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-]))\\n','$1#'),'#')**
Its not returning the URLs only but the complete field only. What should I change in this hive query to get the list of URLs?
eg:
select regexp_replace('hello hi i am arun http://a.com https://b.com','.*?((http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-]))','$1,') as output
output:
hello hi i am arun http://a.com https://b.com
Expected:
http://a.com,https://b.com,
You can try using case insensitive.
Then add a optional white space \s* or [ \t\r\n]* at the end.
Your regex turned into all ascii without the word class \w :
.*?((?:https?|ftp)://[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)+[#%&+-:=?-Z\^_a-z~]*[#%&+\-/-9=?-Z\^_a-z~])\s*
The REGEXP_REPLACE should globally replace all found pattern in the string.
I can't test it, but from some online examples that use split like you're doing
should work.
select split(regexp_replace('hello hi i am arun http://a.com https://b.com',
'.*?((?:https?|ftp)://[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)+[#%&+-:=?-Z\^_a-z~]*[#%&+\-/-9=?-Z\^_a-z~])\s*',
'$1,'), ',');
Here is a test of the regex using PCRE and its replacement
https://regex101.com/r/lIEvCk/1
Other references:
here 1
here 2
I'm a total noob with regexp. All I want to do is to remove the single and double quotes from a string in BigQuery. I can remove the single and double quotes at the beginning of the string, but not the end:
SELECT regexp_extract(foo, r'\"new_foo\":\"(.*?)\"') AS new_foo
FROM [mybq:Schema.table]
All I get is Null but without regexp_extract I have expected results. Help is appreciated.
Try something like below
SELECT REGEXP_REPLACE(foo, r'([\'\"])', '') AS new_foo
FROM [mybq:Schema.table]
Your regex expression should be like /["']/g
And your are using different method to get the expected result. Try REGEXP_REPLACE('orig_str', 'reg_exp', 'replace_str')
Something like this:
SELECT REGEXP_REPLACE(word, /["']/g, '')AS new_foo
FROM [mybq:Schema.table]
select replace(word,'"','') as word
I am new to regular expressions and stackoverflow. Any help would be greatly appreciated.
I am trying to remove unwanted data from a data set. The data is contained in a .csv file column with multiple cells, each cell containing data similar to this:
OSVDB #109124,OSVDB #109125,OSVDB #109126,OSVDB #109127,OSVDB #109128,OSVDB #109129,OSVDB #109130,OSVDB #109131,OSVDB #109132,OSVDB #109133,OSVDB #109134,OSVDB #109135,OSVDB #109136,OSVDB #109137,OSVDB #109138,OSVDB #109139,OSVDB #109140,OSVDB #109141,OSVDB #109142,OSVDB #109143,VMSA #2014-0012,OSVDB #102715,OSVDB #104972,OSVDB #106710,OSVDB #115364,IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
I want to replace the above data with each occurrence of the strings beginning "IAV...". So, the above cell would read:
IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
Below is a snippet of the script that imports the .csv and gets the column containing the data.
My regex, within powershell is:
$reg1 = '$1'
$reg2 = '(IAV[A|B]\s#[0-9]{4}-[A|B]-[0-9]{4}){1,}'
ForEach-Object {$_.IAVM = [regex]::replace($_.IAVM,$reg2,$reg1); $_}
The result is:
The entire cell contents posted above.
From my understanding {1,} at the end of the regex should return each occurrence of the string pattern, but I'm returning all contents of every cell containing my regex string.
Maybe instead of trying to pick out your string you just delete the stuff you don't want? Try something like:
$reg1=''
$reg2='((OSVDB|VMSA)\s#[M-S0-9-]{6,9}[,]?)'
You have .* in that regex at the very beginning. This will capture everything up to the last match of the pat that follows it. In your case I don't think you need that part anyway.
Also note that PowerShell has a handy -replace operator, so there's often no reason to use the static methods on the Regex type.
I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*
(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""
Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.