Combine several list comprehension codes

Combine several list comprehension codes - python-2.7

I got three list comprehensions that do some trimming in a given string. What these are doing is that in a string, it removes words that contain '/', removes certain words in the list called 'remove_set', and combines single consecutive letters into a one big word.
regex = re.compile(r'.*/.*')
parent = ' '.join([p for p in parent.split() if not regex.match(p)])
remove_set = {'hello', 'corp', 'world'}
parent = ' '.join([i for i in parent.split() if i not in remove_set])
parent = ' '.join((' ' if x else '').join(y) for x, y in itertools.groupby(parent.split(), lambda x: len(x) > 1))
For example:
string = "hello C S people in some corp/llc"
changes to
string = "CS people in some"
Can these commands can be written in one beautiful command??
Thanks in advance!

Related

scala-regexp: split string into array of two following words

I need to split string into the array with elements as two following words by scala:
"Hello, it is useless text. Hope you can help me."
The result:
[[it is], [is useless], [useless text], [Hope you], [you can], [can help], [help me]]
One more example:
"This is example 2. Just\nskip it."
Result:
[[This is], [is example], [Just skip], [skip it]]
I tried this regex:
var num = """[a-zA-Z]+\s[a-zA-Z]+""".r
But the output is:
scala> for (m <- re.findAllIn("Hello, it is useless text. Hope you can help me.")) println(m)
it is
useless text
Hope you
can help
So it ignores some cases.

First split on the punctuation and digits, then split on the spaces, then slide over the results.
def doubleUp(txt :String) :Array[Array[String]] =
txt.split("[.,;:\\d]+")
.flatMap(_.trim.split("\\s+").sliding(2))
.filter(_.length > 1)
usage:
val txt1 = "Hello, it is useless text. Hope you can help me."
doubleUp(txt1)
//res0: Array[Array[String]] = Array(Array(it, is), Array(is, useless), Array(useless, text), Array(Hope, you), Array(you, can), Array(can, help), Array(help, me))
val txt2 = "This is example 2. Just\nskip it."
doubleUp(txt2)
//res1: Array[Array[String]] = Array(Array(This, is), Array(is, example), Array(Just, skip), Array(skip, it))

First process the string as it is by removing all escape characters.
scala> val string = "Hello, it is useless text. Hope you can help me."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String = Hello, it is useless text. Hope you can help me.
OR
scala>val string = "This is example 2. Just\nskip it."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String =
//This is example 2. Just
//skip it.
Then filter out all necessary chars(like chars, space etc...) and use slide function as
val result = preprocessed.split("\\s").filter(e => !e.isEmpty && !e.matches("(?<=^|\\s)[A-Za-z]+\\p{Punct}(?=\\s|$)") ).sliding(2).toList
//scala> res9: List[Array[String]] = List(Array(it, is), Array(is, useless), Array(useless, Hope), Array(Hope, you), Array(you, can), Array(can, help))

You need to use split to break the string down into words separated by non-word characters, and then sliding to double-up the words in the way that you want;
val text = "Hello, it is useless text. Hope you can help me."
text.trim.split("\\W+").sliding(2)
You may also want to remove escape characters, as explained in other answers.

Sorry I only know Python. I heard the two are almost the same. Hope you can understand
string = "it is useless text. Hope you can help me."
split = string.split(' ') // splits on space (you can use regex for this)
result = []
no = 0
count = len(split)
for x in range(count):
no +=1
if no < count:
pair = split[x] + ' ' + split[no] // Adds the current to the next
result.append(pair)
The output will be:
['it is', 'is useless', 'useless text.', 'text. Hope', 'Hope you', 'you can', 'can help', 'help me.']

converting a list to string and printing it out python

I am trying to convert the first letter of each word of a string to uppercase in python. But i keep getting a generator object at 0x10315b8> no post before this seems to answer my question.
def capitalize(str):
newstr = str.split(' ')
newlist = []
for word in newstr:
if word[0][0] == word[0][0].upper():
newlist.append(word[0][0].upper())
newlist.append(word[0][1:])
newlist.append(" ")
convert_first = (str(w) for w in newlist)
print(convert_first)
capitalize(input("enter some string"))#calling the function

Your problem lies in how you are trying to make a string out of a list of strings. The opposite of "splitting" a string into a list is "joining" a list into a string.
def capitalize(str):
newstr = str.split(' ')
newlist = []
for word in newstr:
newlist.append(word[0].upper() + word[1:])
convert_first = ' '.join(newlist)
print(convert_first)
capitalize(input("enter some string"))#calling the function
Note: I made an attempt to have my code be as close as possible to that in the question.
Also, why is there an if statement in your code? With that in place you're really just capitalizing all the words that are already capitalized and discarding the rest since they never make it into newlist.

There are a few issues with your code:
The error message you got is for trying to print convert_first, which is a generator, not a string.
newstr is a list of words, so word is a string and word[0] is already the first character. Meaningless for word[0][0] or word[0][1:].
if word[0][0] == word[0][0].upper(): just filters all the words whose first character is not uppercase...
So simply some code will do what you described:
def capitalize(str):
newstr = str.split(' ')
newlist = []
for word in newstr:
newlist.append(word[0].upper())
newlist.append(word[1:])
newlist.append(" ")
convert_first = ''.join(w for w in newlist)
print(convert_first)
capitalize(input("enter some string"))
Or those who favors short code and generator expressions:
def capitalize(str):
print(' '.join(word[0].upper() + word[1:] for word in str.split(' ')))
capitalize(input("enter some string"))
This also removes the tailing space of the generated string, which may (not) be what you intended.

Split a word with regexp in matlab; startIndex for 'split'?

My aim is to generate the phonetic transcription for any word according to a set of rules.
First, I want to split words into their syllables. For example, I want an algorithm to find 'ch' in a word and then separate it like shown below:
Input: 'aachbutcher'
Output: 'a' 'a' 'ch' 'b' 'u' 't' 'ch' 'e' 'r'
I have come so far:
check=regexp('aachbutcher','ch');
if (isempty(check{1,1})==0) % Returns 0, when 'ch' was found.
[match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')
%Now I split the 'aa', 'but' and 'er' into single characters:
for i = 1:length(split)
SingleLetters{i} = regexp(split{1,i},'.','match');
end
end
My problem is: How do I put the cells together, such that they are formatted like the desired output? I only have the starting indexes for the match parts ('ch') but not for the split parts ('aa', 'but','er').
Any ideas?

You don't need to work with the indices or length. Simple logic: Process first element from match, then first from split, then second from match etc....
[match,split,startIndex,endIndex] = regexp('aachbutcher','ch','match','split');
%Now I split the 'aa', 'but' and 'er' into single characters:
SingleLetters=regexp(split{1,1},'.','match');
for i = 2:length(split)
SingleLetters=[SingleLetters,match{i-1},regexp(split{1,i},'.','match')];
end

So, you know the length of 'ch', it's 2. You know where you found it from regex, as those indices are stored in startIndex. I'm assuming (Please, correct me if I'm wrong) that you want to split all other letters of the word into single-letter cells, like in your output above. So, you can just use the startIndex data to construct your output, using conditionals, like this:
check=regexp('aachbutcher','ch');
if (isempty(check{1,1})==0) % Returns 0, when 'ch' was found.
[match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')
%Now I split the 'aa', 'but' and 'er' into single characters:
for i = 1:length(split)
SingleLetters{i} = regexp(split{1,i},'.','match');
end
end
j = 0;
for i = 1 : length('aachbutcher')
if (i ~= startIndex(1)) && (i ~= startIndex(2))
j = j +1;
output{end+1} = SingleLetters{j};
else
i = i + 1;
output{end+1} = 'ch';
end
end
I don't have MATLAB right now, so I can't test it. I hope it works for you! If not, let me know and I'll take anther shot at it.

Excel - Extract all occurrences of a String Pattern + the subsequent 4 characters after the pattern match from a cell

I am struggling with a huge Excel sheet where I need to extract from a certain cell (A1),
all occurrences of a string pattern e.g. "TCS" + the following 4 characters after the pattern match e.g. TCS1234 comma-separated into another cell (B1).
Example:
Cell A1 contains the following string:
HRS164, SRS3439(s), SRS3440(s), SRS3441(s), SRS3442(s), SRS3443(s), SRS3444(s), SRS3445(s), SRS3449(s), SRS3450(s), SRS3451(s), SRS3452(s), SYSBASE.SSS300(s), TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234
All TCS-Numbers shall be comma-separated in B1:
TCS3715, TCS3716, TCS3717, TCS4037, TCS1234
It is not necessary to also extract the followed "(s)".
Could someone please help me (excel rookie) with this challenge?
TIA Erika

Here is what I would use for something like that: also a user defined function:
Function GetTCS(TheString)
For Each TItem In Split(TheString, ", ")
If Left(TItem, 3) = "TCS" Then GetTCS = GetTCS & TItem & " "
Next
GetTCS = Replace(Trim(GetTCS), " ", ", ")
End Function
This returns "TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234" out of your string. If you don't know how to create a user defined function, just ask, it's pretty straight forward and I'd be happy to show you. Hope this helps.

Try the following User Defined Function:
Public Function Xtract(r As Range) As String
Dim s As String, L As Long, U As Long
Dim msg As String, i As Long
s = Replace(r(1).Text, " ", "")
ary = Split(s, ",")
L = LBound(ary)
U = UBound(ary)
Xtract = ""
msg = ""
For i = L To U
If Left(ary(i), 3) = "TCS" Then
If msg = "" Then
msg = Left(ary(i), 7)
Else
msg = msg & "," & Left(ary(i), 7)
End If
End If
Next i
Xtract = msg
End Function

If the TCS-parts are always at the end of the string as in your example, I would use (in B1):
=REPLACE(A1,1,FIND("TCS",A1)-1,"")

ViM: how to put string from input dialog in a list

VIM: Does anyone know how to put a string from an input dialog in a list?
p.e.:
the string "3,5,12,15"
to:
list item[1] = 3
list item[2] = 5
list item[3] = 12
etc.
and how can I know how many list items there are?

From :h E714
:let l = len(list) " number of items in list
:let list = split("a b c") " create list from items in a string
In your case,
let string = "3,5,7,19"
let list = split(string, ",")
echo len(list)

Use split, len and empty functions:
let list=split(string, ',')
let list_length=len(list)
" If all you want is to check whether list is empty:
if empty(list)
throw "You must provide at least one value"
endif
Note that if you want to get a list of numbers out of the string, you will have to use map to transform list elements into numbers:
let list=map(split(string, ','), '+v:val')
Most of time you can expect strings be transformed into numbers, but sometimes such transformation is not done.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Combine several list comprehension codes - python-2.7

Related

scala-regexp: split string into array of two following words

converting a list to string and printing it out python

Split a word with regexp in matlab; startIndex for 'split'?

Excel - Extract all occurrences of a String Pattern + the subsequent 4 characters after the pattern match from a cell

ViM: how to put string from input dialog in a list

Categories

Resources