R : regular expression to match pattern in only the first line [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In my R code, I have the following content of x as a result of lda prediction output.
[1] lamb
Levels: lamb cow chicken
I would like to capture the word "lamb" in the first line and not the second line.
I had the following reg expression which did not work.
if (regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"){
cat("It is a lamb")
}
Instead, I also got the following error :-
Error in regexec(".*?([a-z]+)", x) : invalid 'text' argument
Anyone with help ?
Thanks in advance.
mf

Direct Answer:
It is a variable type error. See ?predict.lda to learn why: The return object of a predict() when used with an object of class lda is a list. You just want the first element of the list, which is a factor for an object of type integer. Factors in R store some characters for every element in their level component, which can be accessed by levels() (Read ?factor as well.). But what you want is to access the explicit value your factor shows, which can be acheived by as.character(). By the way: The second line does not get checked by the regex. It is just standard console output of a factor, see ?print.factor.
Here's an example, based on thepredict.lda() help page:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
x_lda <- predict(z, test)
# x_lda is a list
typeof(x_lda)
# The first element of the list, called "class", is a factor of type integer.
typeof(x_lda$class)
# Now we create a character vector from the factor:
as.character(x_lda$class)
With an explicit character object, your code works for me:
x <- "lamb"
regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"
[1] TRUE
So you need to coerce your object to character, and then use it as the "text" argument for the regexec function.
Actual Answer:
There are better ways to do this.
You nest and chain a lot of functions in one line. This is barely readable and makes debugging hard.
If you know that the output will always consist of certain elements (especially, since you know the input of your lda prediction and therefore know the different factor levels beforehand), you can simply check them by == and maybe any() (continuing with the example from before):
levels(cl)
[1] "c" "s" "v"
any(as.character(x_lda$class)=="c")
[1] TRUE
See the help file for ?any, if you don't know what it does.
Finally, if you just want to print "It is a lamb" in the end, and your output will always just have one element, you can simply use paste():
paste("It is a", as.character(x))
[1] "It is a lamb"

Related

Converting list of clauses to a query?

let say i have the following facts :
book(65).
own(named('Peter'), 65).
now got the query as a list of clauses :
[what(A), own(named('Peter'), A)]
or
[who(X), book(A), own(X, A)] .
how do I make a rule that accept this list and return the result. Keep in mind that the question could be Why,When,Who...
I went the usual way :
query_lst([]).
%% query_lst([what(Q)|T], Q) :- query_lst(T).
query_lst([H|T]) :- write('?- '),writeln(H),
call(H), query_lst(T).
but this does not allow binding of Q in wh(Q) to the answer which could be in any of the facts that are called by call()
Additional complication I did not forsee is that the query :
(what(A), own(named('Peter'), A).
would fail, because there is no what(X), fact in the DB.
I have to just bind somehow the variable A /that is in what()/ to query_lst(Goals,A) and of course remove what(X) from the list /which i can do with select/3 /
any idea how to bind list-Wh-var to query_lst result ?
my current solution (assumes Q is first element):
query_lst([G|Gs],Res) :- G =.. [Q,Res], member(Q,[what,why,who,when]), lst2conj(Gs,Conj), call(Conj).
Simply convert the list of goals into a conjunction and call it:
list_to_conjunction([], true).
list_to_conjunction([Goal| Goals], Conjunction) :-
list_to_conjunction(Goals, Goal, Conjunction).
list_to_conjunction([], Conjunction, Conjunction).
list_to_conjunction([Next| Goals], Goal, (Goal,Conjunction)) :-
list_to_conjunction(Goals, Next, Conjunction).
Then:
query_list(Goals) :-
list_to_conjunction(Goals, Conjunction),
call(Conjunction).
You got an answer, but it was an answer to your question, not to what you really wanted. Also, you edited your question after you accepted that answer, which isn't very helpful. Typically it's better to open a new question when you have... a new question.
Here is an answer to what you seem to want, which is not exactly what you asked. You have lists of the form [WhPart | Rest] where the WhPart is a wh-word with a variable, and the Rest is a list of goals. You want to execute these goals and get the variable in the wh-term bound.
The good news is that, since the variable in the wh-word also occurs in the goals, it will be bound if you execute them. No extra work is needed. Executing the goals is enough. If the wh-part is really at the start of the list, you can do the whole thing like this:
query([_WhPart | Body]) :-
call_body(Body).
call_body([]).
call_body([Goal | Goals]) :-
call(Goal),
call_body(Goals).
For example:
?- query([who(X), book(A), own(X, A)]).
X = named('Peter'),
A = 65.
?- query([what(A), own(named('Peter'), A)]).
A = 65.
As you can see, there is no need to convert the query to a conjunctive goal: Executing the queries in sequence is exactly the same as executing their conjunction.
Also, it doesn't actually matter which wh-word is used; the only thing that really matters is the variable contained within the term. For this reason the above version does no checking at all, and the _WhPart could be anything. If you want to check that it is a valid term, you can do the following:
query([WhPart | Body]) :-
wh(WhPart),
call_body(Body).
wh(who(_X)).
wh(what(_X)).
wh(when(_X)).
This buys you some "type checking":
?- query([foo(A), own(named('Peter'), A)]).
false.
But not a lot, since you don't know if the wh-word actually fits what is being asked:
?- query([when(A), own(named('Peter'), A)]).
A = 65.

haskell function to take list of integer or string and return a list of integer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have been trying to create a function in Haskell that takes a list of integers or strings as input. It then checks the first index and it is numerical value e.g. 0-9 returns [0] if it is string value returns 1. I tried to use the function elem but got this error:
Ambiguous use of operator "elem" with "(==)"
My code is:
I am not looking for a solution which includes importing modules
The code doesn't have to follow this structure; It could be different. Output that I am looking for:
f "abcd efgh ijkl" returns [1]
f [1,2,3,4,5,6] returns [0]
Thanks!
I think you are a bit confused. Haskell doesn't do "I don't know what type that thing is".
However, if you want to write instead a function which can be cast to take a list of integers or cast to take a list of strings (contrast: "takes a list of integers or strings"), you may use a typeclass. For example:
class IsInteger a where isInteger :: proxy a -> Bool
instance IsInteger Char where isInteger _ = False
instance IsInteger Integer where isInteger _ = True
Try it out in ghci:
> isInteger "abcdefg"
False
> isInteger [0,1,2,3]
True

Counting number of item types in a series in Python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am getting values from an api call and it returns one json value/key pair as a string at a time. I need to count how many times items with a certain prefix (which encodes the type of the item) occur:
Lets say I am getting 'abc123' as the 1st value
def getType(nodeName):
nodeCount = 0
if "abc" in nodeName:
count = count + 1
return "ABC", count
How do I retain this nodeCount value so that next time an item with prefix 'abc' comes in from the api call, the count can be incremented to 2.
Also, I need to create other counters to keep track of the count of other node types, such as 'xyz777'.
I tried to declare nodeCount as global variable but if I add "global count", that will defeat the purpose of retaining the count value for the next api call/iteration.
I am very new to python, so please let me know if there is any easy way.
Many Thanks!
You may use a collections.Counter like this:
from collections import Counter
def getType(counter, nodeName):
nodetype= nodeName.rstrip('0123456789')
counter[nodetype] += 1
return nodetype.upper(), counter[nodetype]
c= Counter()
for n in ['abc123', 'def789', 'ghijk11', 'def99', 'abc444']:
nodetype, nodecount = getType(counter= c, nodeName= n)
print('type {} \t: {}'.format(nodetype, nodecount))
print('summary:')
print(c)

regex for matching column names on a matrix

I'll have two strings of the form
"Initestimate" or "L#estimate" with # being a 1 or 2 digit number
" Nameestimate" with Name being the name of the actual symbol. In the example below, the name of our symbol is "6JU4"
And I have a matrix containing, among other things, columns containing "InitSymbol" and "L#Symbol". I want to return the column name of the column where the first row holds the substring before "estimate".
I'm using stringr. Right now I have it coded with a bunch of calls to str_sub but its really sloppy and I wanted to clean it up and do it right.
example code:
> examplemat <- matrix(c("RYU4","6JU4","6EU4",1,2,3),ncol=6)
> colnames(examplemat) <- c("InitSymb","L1Symb","L2Symb","RYU4estimate","6JU4estimate","6EU4estimate")
> examplemat
InitSymb L1Symb L2Symb RYU4estimate 6JU4estimate 6EU4estimate
[1,] "RYU4" "6JU4" "6EU4" "1" "2" "3"
> searchStr <- "L1estimate"
So with answer being the answer I'm looking for, I want to be able to input examplemat[,answer] so I can extract the data column (in this case, "2")
I don't really know how to do regex, but I think the answer looks something like
examplemat[,paste0(**some regex function**("[(Init)|(L[:digit:]+)]",searchStr),"estimate")]
what function goes there, and is my regex code right?
May be you can try:
library(stringr)
Extr <- str_extract(searchStr, '^[A-Za-z]\\d+')
Extr
[1] "L1"
#If the searchStr is `Initestimate`
#Extr <- str_extract(searchStr, '^[A-Za-z]{4}')
pat1 <- paste0("(?<=",Extr,").*")
indx1 <-examplemat[,str_detect(colnames(examplemat),perl(pat1))]
pat2 <- paste0("(?<=",indx1,").*")
examplemat[,str_detect(colnames(examplemat), perl(pat2))]
#6JU4estimate
# "2"
#For searchStr using Initestimate;
#examplemat[,str_detect(colnames(examplemat), perl(pat2))]
#RYU4estimate
# "1"
The question is bit confusing so I am quite not sure if my interpretation is correct.
First, you would extract the values in the string "coolSymb" without "Symb"
Second, you can detect if column name contains "cool" and return the location (column index)
by which() statement.
Finally, you can extract the value using simple matrix indexing.
library(stringr)
a = str_split("coolSymb", "Symb")[[1]][1]
b = which(str_detect(colnames(examplemat), a))
examplemat[1, b]
Hope this helps,
won782's use of str_split inspired me to find an answer that works, although I still want to know how to do this by matching the prefix instead of excluding the suffix, so I'll accept an answer that does that.
Here's the step-by-step
> str_split("L1estimate","estimate")[[1]][1]
[1] "L1"
replace the above step with one that gets {L1} instead of getting {not estimate} for bonus points
> paste0(str_split("L1estimate","estimate")[[1]][1],"Symb")
[1] "L1Symb"
> examplemat[1,paste0(str_split("L1estimate","estimate")[[1]][1],"Symb")]
L1Symb
[1,] "6JU4"
> paste0(examplemat[1,paste0(str_split("L1estimate","estimate")[[1]][1],"Symb")],"estimate")
[1] "6JU4estimate"
> examplemat[,paste0(examplemat[1,paste0(str_split("L1estimate","estimate")[[1]][1],"Symb")],"estimate")]
6JU4estimate
[1,] "2"

regex remove seconds and milliseconds [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
This is linked the my previous question, regex to add hypen in dates.
I would now like to be able to remove the seconds and milliseconds/change it to zero using gsub again as well
i.e. something like:
x <- c("20130603 00:00:03.102","20130703 00:01:03.103","20130804 00:03:03.104")
y <- gsub([REGEX PATTERN TO MATCH],[REPLACEMENT PATTERN TO INSERT HYPHEN and REMOVE SECONDS] ,x)
> y
[1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
You can use strptime to parse your objects into POSIXlt objects which, when printed, are exactly in the format you expect:
y <- strptime(x, "%Y%m%d %H:%M:%S")
# [1] "2013-06-03 00:00:03" "2013-07-03 00:01:03" "2013-08-04 00:03:03"
To remove seconds, use trunc:
y <- trunc(y, units = "mins")
# [1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
Having your objects as date/time objects will open a lot of doors, but if you really mean to store the output as a character vector, then just use as.character:
y <- as.character(y)
A lubridate version:
library(lubridate)
dt <- ymd_hms(x)
dt2 <- update(dt, seconds = 0)
You can try this regex, which I added a bit:
gsub("(\\d{4})(\\d{2})(\\d{2}) (\\d{2}:\\d{2}).*", "\\1-\\2-\\3 \\4:00", subject, perl=TRUE);
demo on regex101.