How does split and strip function work together in python? - python-2.7

headers = table.by_tag('th')
labels = [str(t.content).split('(')[0].strip() for t in headers[3:-1]]
I know what is meant by split() and strip(). But what does split('(')[0] means? headers is a content from a table.

For example. HTML was..
<table>
<tr><th>Jerry Brown (D)</th><th>Meg Whitman(D)</th></tr>
<tr><td>1</td><td>4</td></tr>
<tr><td>2</td><td>1</td></tr>
<tr><td>3</td><td>2</td></tr>
</table>
headers may be extracted by BeautifulSoup
and result is a list contains below
["<th>Jerry Brown (D)</th>", "<th>Meg Whitman(D)</th>"]
so t.content is Jerry Brown (D) and Meg Whitman(D)
"Jerry Brown (D)".split('(') = ["Jerry Brown ", "D)"]
"Meg Whitman(D)".split('(') = ["Meg Whitman", "D)"]
["Jerry Brown ", "D)"][0] = "Jerry Brown "
["Meg Whitman", "D)"][0] = "Meg Whitman"
and strip() may remove whitespace on both sides of string so...
labels means ["Jerry Brown","Meg Whitman"]

Related

Split a line read from a file in C++

How do I access individual elements of a line read from a file?
I used the following to read a line from a file:
getline(infile, data) // where infile is an object of ifstream and data is the variable where the line will be stored
The following line is stored in data : "The quick brown fox jumped over the lazy dog"
How do I access particular elements of the line now? What if I want to play around with the second element ( quick ) of the line or get hold of a certain word in the line? How do I select it?
Any help will be appreciated
data = "The quick brown fox jumped over the lazy dog" and the data is string , your string delimeter is " ",you can use std::string::find() to find the position of the string delimeter and std::string::substr() to get a token:
std::string data = "The quick brown fox jumped over the lazy dog";
std::string delimiter = " ";
std::string token = data.substr(0, data.find(delimiter)); // token is "the"
Since your text is space separated, you can use std::istringstream to separate the words.
std::vector<std::string> words;
const std::string data = "The quick brown fox jumped over the lazy dog";
std::string w;
std::istringstream text_stream(data);
while (text_stream >> w)
{
words.push_back(w);
std::cout << w << "\n";
}
The operator>> will read characters into a string until a space is found.

regexp, how to match longer strings first?

There must be an option/flag for this I missed with matlab:
I want to use regular expressions to match to my given string, but multiple matches are possible. I want them sorted to first match the longer ones, before the shorter ones.
How can this be achieved?
regexpi('A quick brown fox jumps over the lazy dog.','quick|the|a','match','once')
%returns 'A', would like it to return 'quick'
Maybe you can try the following code
% match all possible key words, don't use argument 'once' in `regexpi()`
v = regexpi('A quick brown fox jumps over the lazy dog.','quick|the|a','match');
% calculate the lengths of matched words
lens = cellfun(#length,v);
% output the longest word
v{lens == max(lens)}
such that
ans = quick
You could do
>> regexpi('A quick brown fox jumps over the lazy dog.','.*?(quick)|.*?(the)|.*?(a)','tokens','once')
ans =
1×1 cell array
{'quick'}
but that's pretty ugly. Another solution, which is a smidge less ugly, is
>> str = "A quick brown fox jumps over the lazy dog.";
>> list = ["quick" "the" "a"];
>> list(find(arrayfun(#(x)contains(str,x), list), 1))
ans =
"quick"
I think I like Thomas' solution the most.

scala-regexp: split string into array of two following words

I need to split string into the array with elements as two following words by scala:
"Hello, it is useless text. Hope you can help me."
The result:
[[it is], [is useless], [useless text], [Hope you], [you can], [can help], [help me]]
One more example:
"This is example 2. Just\nskip it."
Result:
[[This is], [is example], [Just skip], [skip it]]
I tried this regex:
var num = """[a-zA-Z]+\s[a-zA-Z]+""".r
But the output is:
scala> for (m <- re.findAllIn("Hello, it is useless text. Hope you can help me.")) println(m)
it is
useless text
Hope you
can help
So it ignores some cases.
First split on the punctuation and digits, then split on the spaces, then slide over the results.
def doubleUp(txt :String) :Array[Array[String]] =
txt.split("[.,;:\\d]+")
.flatMap(_.trim.split("\\s+").sliding(2))
.filter(_.length > 1)
usage:
val txt1 = "Hello, it is useless text. Hope you can help me."
doubleUp(txt1)
//res0: Array[Array[String]] = Array(Array(it, is), Array(is, useless), Array(useless, text), Array(Hope, you), Array(you, can), Array(can, help), Array(help, me))
val txt2 = "This is example 2. Just\nskip it."
doubleUp(txt2)
//res1: Array[Array[String]] = Array(Array(This, is), Array(is, example), Array(Just, skip), Array(skip, it))
First process the string as it is by removing all escape characters.
scala> val string = "Hello, it is useless text. Hope you can help me."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String = Hello, it is useless text. Hope you can help me.
OR
scala>val string = "This is example 2. Just\nskip it."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String =
//This is example 2. Just
//skip it.
Then filter out all necessary chars(like chars, space etc...) and use slide function as
val result = preprocessed.split("\\s").filter(e => !e.isEmpty && !e.matches("(?<=^|\\s)[A-Za-z]+\\p{Punct}(?=\\s|$)") ).sliding(2).toList
//scala> res9: List[Array[String]] = List(Array(it, is), Array(is, useless), Array(useless, Hope), Array(Hope, you), Array(you, can), Array(can, help))
You need to use split to break the string down into words separated by non-word characters, and then sliding to double-up the words in the way that you want;
val text = "Hello, it is useless text. Hope you can help me."
text.trim.split("\\W+").sliding(2)
You may also want to remove escape characters, as explained in other answers.
Sorry I only know Python. I heard the two are almost the same. Hope you can understand
string = "it is useless text. Hope you can help me."
split = string.split(' ') // splits on space (you can use regex for this)
result = []
no = 0
count = len(split)
for x in range(count):
no +=1
if no < count:
pair = split[x] + ' ' + split[no] // Adds the current to the next
result.append(pair)
The output will be:
['it is', 'is useless', 'useless text.', 'text. Hope', 'Hope you', 'you can', 'can help', 'help me.']

Matching a whole string with case insensitive

I'm looking for a function if available to match a whole word for example:
std::string str1 = "I'm using firefox browser";
std::string str2 = "The quick brown fox.";
std::string str3 = "The quick brown fox jumps over the lazy dog.";
Only str2 and str3 should match for the word fox. So, it doesn't matter if there is a symbol like period (.) or a comma (,) before or after the word and it should match and it also has to be case-insensitive search at the same time.
I've found many ways to search a case insensitive string but I would like to know something for matching a whole word.
I would like to recommend std::regex of C++11. But, it is not working yet with g++4.8. So I recommend the replacement boost::regex.
#include<iostream>
#include<string>
#include<algorithm>
#include<boost/regex.hpp>
int main()
{
std::vector <std::string> strs = {"I'm using firefox browser",
"The quick brown fox.",
"The quick brown Fox jumps over the lazy dog."};
for( auto s : strs ) {
std::cout << "\n s: " << s << '\n';
if( boost::regex_search( s, boost::regex("\\<fox\\>", boost::regex::icase))) {
std::cout << "\n Match: " << s << '\n';
}
}
return 0;
}
/*
Local Variables:
compile-command: "g++ --std=c++11 test.cc -lboost_regex -o ./test.exe && ./test.exe"
End:
*/
The output is:
s: I'm using firefox browser
s: The quick brown fox.
Match: the quick brown fox.
s: The quick brown Fox jumps over the lazy dog.
Match: the quick brown fox jumps over the lazy dog.

why is my std::string being cut off?

I initialize a string as follows:
std::string myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
and the myString ends up being cut off like this:
'The quick brown fox jumps over the
lazy dog' is an English-language
pangram (a phrase that contains
Where can i set the size limit?
I tried the following without success:
std::string myString;
myString.resize(300);
myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
Many thanks!
Of course it was just the debugger cutting it off (xcode). I'm just getting started with xcode/c++, so thanks a lot for the quick replies.
Are you sure?
kkekan> ./a.out
'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)
There is no good reason why this should have happen!
Try the following (in debug mode):
assert(!"Congratulations, I am in debug mode! Let's do a test now...")
std::string myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
assert(myString.size() > 120);
Does the (second) assertion fail?
When printing, or displaying text, the output machinery buffers the output. You can tell it to flush the buffers (display all remaining text) by output a '\n' or using std::endl or executing the flush() method:
#include <iostream>
using std::cout;
using std::endl;
int main(void)
{
std::string myString =
"'The quick brown fox jumps over the lazy dog'" // Compiler concatenates
" is an English-language pangram (a phrase" // these contiguous text
" that contains all of the letters of the" // literals automatically.
" alphabet)";
// Method 1: use '\n'
// A newline forces the buffers to flush.
cout << myString << '\n';
// Method 2: use std::endl;
// The std::endl flushes the buffer then sends '\n' to the output.
cout << myString << endl;
// Method 3: use flush() method
cout << myString;
cout.flush();
return 0;
}
For more information about buffers, search Stack Overflow for "C++ output buffer".