I have got the following list:
['Not durable\nnice and useful\nall very good and in good shape\nJunk.',
'which may be a result of poor design.',
'This is a poor product.',
'One monitor out of two was defective.']
However, I would like to replace the \n, so that it is a new element in the list. Or replace it by ',', so it takes the form of a new element.
How could this effectively be done?
Thanks in advance.
Related
So I'm trying to split a string in several options, but those options are allowed to occur only once. I've figured out how to make it match all options, but when an option occurs twice or more it matches every single option.
Example string: --split1 testsplit 1 --split2 test split 2 --split3 t e s t split 3 --split1 split1 again
Regex: /-{1,2}(split1|split2|split3) [\w|\s]+/g
Right now it is matching all cases and I want it to match --split1, --split2 and --split3 only once (so --split1 split1 again will not be matched).
I'm probably missing something really straight forward, but anyone care to help out? :)
Edit:
Decided to handle the extra occurances showing up in a script and not through RegEx, easier error handling. Thanks for the help!
EDIT: Somehow I ended up here from the PHP section, hence the PHP code. The same principles apply to any other language, however.
I realise that OP has said they have found a solution, but I am putting this here for future visitors.
function splitter(string $str, int $splits, $split = "--split")
{
$a = array();
for ($i = $splits; $i > 0; $i--) {
if (strpos($str, "$split{$i} ") !== false) {
$a[] = substr($str, strpos($str, "$split{$i} ") + strlen("$split{$i} "));
$str = substr($str, 0, strpos($str, "$split{$i} "));
}
}
return array_reverse($a);
}
This function will take the string to be split, as well as how many segments there will be. Use it like so:
$array = splitter($str, 3);
It will successfully explode the array around the $split parameter.
The parameters are used as follows:
$str
The string that you want to split. In your instance it is: --split1 testsplit 1 --split2 test split 2 --split3 t e s t split 3 --split1 split1 again.
$splits
This is how many elements of the array you wish to create. In your instance, there are 3 distinct splits.
If a split is not found, then it will be skipped. For instance, if you were to have --split1 and --split3 but no --split2 then the array will only be split twice.
$split
This is the string that will be the delimiter of the array. Note that it must be as specified in the question. This means that if you want to split using --myNewSplit then it will append that string with a number from 1 to $splits.
All elements end with a space since the function looks for $split and you have a space before each split. If you don't want to have the trailing whitespace then you can change the code to this:
$a[] = trim(substr($str, strpos($str, "$split{$i} ") + strlen("$split{$i} ")));
Also, notice that strpos looks for a space after the delimiter. Again, if you don't want the space then remove it from the string.
The reason I have used a function is that it will make it flexible for you in the future if you decide that you want to have four splits or change the delimiter.
Obviously, if you no longer want a numerically changing delimiter then the explode function exists for this purpose.
-{1,2}((split1)|(split2)|(split3)) [\w|\s]+
Something like this? This will, in this case, create 3 arrays which all will have an array of elements of the same name in them. Hope this helps
Let's say I have a column which has values like:
foo/bar
chunky/bacon/flavor
/baz/quz/qux/bax
I.e. a variable number of strings separated by /.
In another column I want to get the last element from each of these strings, after they have been split on /. So, that column would have:
bar
flavor
bax
I can't figure this out. I can split on / and get an array, and I can see the function INDEX to get a specific numbered indexed element from the array, but can't find a way to say "the last element" in this function.
Edit:
this one is simplier:
=REGEXEXTRACT(A1,"[^/]+$")
You could use this formula:
=REGEXEXTRACT(A1,"(?:.*/)(.*)$")
And also possible to use it as ArrayFormula:
=ARRAYFORMULA(REGEXEXTRACT(A1:A3,"(?:.*/)(.*)$"))
Here's some more info:
the RegExExtract function
Some good examples of syntax
my personal list of Regex Tricks
This formula will do the same:
=INDEX(SPLIT(A1,"/"),LEN(A1)-len(SUBSTITUTE(A1,"/","")))
But it takes A1 three times, which is not prefferable.
You could do this too
=index(SPLIT(A1, "/"), COLUMNS(SPLIT(A1, "/"))-1)
Also possible, perhaps best on a copy, with Find:
.+/
(Replace with blank) and Search using regular expressions ticked.
You can try use this!
You've got the array of String, so you can acess the last element by length
String message = "chunky/bacon/flavor";
String[] outSplited = message.split("/");
System.out.println(outSplited[outSplited.length -1]);
I have a column which has "--" pattern in the beginning, middle and end of the string. For example:
-- myString
my -- String
myString --
I want to find these two types of cells
-- myString
myString --
and remove the "--" pattern, so it will look fine! I am an amateur user of excel but can use functions if you suggest me. It should be possible with find and use the results of the Find in Replace functions, but I do not know how to pass the results to Replace.
Please note: The answer should take care all the cells in the column, which are hundreds. One solution for changing all, not one solution for one cell.
EDIT: Just reread the request, per instruction from Gary'sStudent. This will remove all instances of "--", not only those at the beginning/end.
If the data is in A1, use the following formula:
=SUBSTITUTE(A1,"--","")
With data in A1 in B1 enter:
=IF(LEFT(A1,2)="--",MID(A1,3,9999),IF(RIGHT(A1,2)="--",MID(A1,1,LEN(A1)-2),A1))
OK, I found the answer. The answer from #Dubison helped me to find the right answer.
If the left two characters in this cell is "--" and the last two characters are "--" the substitute the "--" with "", else to nothing.
=IF(LEFT(A1,2)="--",SUBSTITUTE(A1,"--",""),IF(RIGHT(A1,2)="--",SUBSTITUTE(A1,"--",""), A1))
This will be pretty much the same with previous answers, only using simpler logic. If your strings first or last character = "-" do nothing, else replace "--" with "".
=IF(LEFT(A1,1)="-",A1,IF(RIGHT(A1,1)="-",A1, SUBSTITUTE(A1,"--","")))
UPDATE:
I noticed that I have misread the question. Above code will remove the "--" only if it is in the middle. However original question was to remove "--" only if it is at the beginning or at the end. So formula should be:
=IF(OR(LEFT(A1,2)="--",RIGHT(A1,2)="--"),SUBSTITUTE(A1,"--",""),A1)
I have a column called user_response, on which I want to do variety of operations like take out words contained in quotes, and take out the part of the string after colon (:)
One such operation is this:
Let's say for a record
user_response = "My company: 'XYZ Co.' has allowed to use:: the following \n \n kind of product: RealMadridTShirts"
Now, I want to scrape the part of the string after last colon(:). Hence, my output should be RealMadridTShirts
I could achieve this somehow with the following hack:
SELECT reverse(split_part(reverse(user_response), ' :', 1))
However, this is grossly inefficient, specially when I am having to do this over 500,000 rows. It's not an operation that I will doing throughout the day. This operation is for a once-a-day load but even then the load is becoming very expensive.
Coming from Oracle, I know I could have used INSTR and SUBSTR functions to achieve it in a more elegant fashion (without having to reverse the string and all.
Also, what if I had to scrape the text after the second last colon?
Find the string after the last colon, right?
My company: 'XYZ Co.' has allowed to use:: the following \n \n kind of product: RealMadridTShirts
It's trivial with a regular expression:
regress=> SELECT (regexp_matches(
'My company: ''XYZ Co.'' has allowed to use:: the following \n \n kind of product: RealMadridTShirts',
'.*:(.*?)$')
)[1];
regexp_matches
--------------------
RealMadridTShirts
(1 row)
The apparent lack of a function to request the position of a string counting from a particular starting point makes it harder to do without using a regexp, but as a regexp is sure to be the fastest way to solve this I doubt that's an issue.
Your bigger problem is likely to be that you're scanning so much data. That's never going to be fast.
I have a python code for word frequency count from a text file. The problem with the program is that it takes fullstop into account hence altering the count. For counting word i've used a sorted list of words. I tried to remove the fullstop using
words = open(f, 'r').read().lower().split()
uniqueword = sorted(set(words))
uniqueword = uniqueword.replace(".","")
but i get error as
AttributeError: 'list' object has no attribute 'replace'
Any help would be appreciated :)
You can process the words before you make the set, using a list comprehension:
words = [word.replace(".", "") for word in words]
You could also remove them after (uniquewords = [word.replace...]), but then you will reintroduce duplicates.
Note that if you want to count these words, a Counter may be more useful:
from collections import Counter
counts = Counter(words)
You might be better off with
words = re.findall(r'\w+', open(f, 'r').read().lower())
which will grab all the strings composed of one or more “word characters” and will ignore punctuation and whitespace.