how can i get cosign diatance between two words in Deeplearning4j - Word2vec - word2vec

I'm using deeplearning4j to learn text data.
I'm done with word2vec tutorial at deeplearning4j website and successfully
trained word vectors with 100 documents.
but i don't know how to get cosign distance of two different words like below picture
Like this picture, if i insert word 'France'
i want to get
[similar words with france + cosign distance]
i can get [similar words with france]
but i don't know how to get cosign distance value.
any solution?

oops sorry my bad i missed some parts of tutorial
sorry
i get solution
double cosSim = vec.similarity("day", "night");
System.out.println(cosSim);
//output: 0.7704452276229858
sorry
forget about this stupid question

if you want to know how to do it with just nd4j, you can also use:
https://github.com/deeplearning4j/nd4j/blob/master/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/ops/transforms/Transforms.java#L53

Related

RegexExtract Syntax - Extract text after decimal/number from string ex. "UNDER 12.5-120"

I am trying to write a RegexExtract formula to extract the odds of a bet.
My example text is UNDER 12.5-120.
In this example, I would hope to return -120 but I need my equation to be dynamic enough to extract other odds as well.
More examples of this would be +120, +1200, +12000, -1000, etc etc.
The string will always be in this order though - OVER or UNDER then the line of the bet and then the odds of the bet. I have successfully written the regex for the line and the over/under but cant figure out the odds portion.
This is what I have so far:
=REGEXEXTRACT('Form Responses 2'!C2,"[\d.,].*") but this returns 12.5-120 and I need only the -120.
Try this for a range
=INDEX(IFERROR(REGEXEXTRACT(Q2:Q8,"[-|+]\d+")))
If you want to have pure numbers try
=INDEX(IFERROR(REGEXEXTRACT(Q2:Q8,"[-|+]\d+")+0))
This ended up working for me!!!
=REGEXEXTRACT('Form Responses 2'!C2,"[-|+][\d]+")

How to Keep rows of multi-line cells containing a keyword in google sheets

I'm trying to keep lines that contain the word "NOA" in a column A which has many multi-line cells as can be viewed in this Google Spreadsheet.
If "NOA" is present then, I would like to keep the line. The input and output should look like the image which I have "working" with too-many helper cells. Can this be combined into a single formula?
Theoretical Approaches:
I have been thinking about three approaches to solve this:
ARRAYFORMULA(REGEXREPLACE - couldn't get it to work
JOIN(FILTER(REGEXMATCH(TRANSPOSE - showing promise as it works in multiple steps
Using the QUERY Function - unfamiliar w/ function but wondering if this function has a fast solution
Practical attempts:
FIRST APPROACH: first I attempted using REGEXEXTRACT to extract out everything that did not have NOA in it, the Regex worked in demo but didn't work properly in sheets. I thought this might be a concise way to get the value, perhaps if my REGEX skill was better?
ARRAYFORMULA(REGEXREPLACE(A1:A7, "^(?:[^N\n]|N(?:[^O\n]|O(?:[^A\n]|$)|$)|$)+",""))
I think the Regex because overly complex, didn't work in Google or perhaps the formula could be improved, but because Google RE2 has limitations it makes it harder to do certain things.
SECOND APPROACH:
Then I came up with an alternate approach which seems to work 2 stages (with multiple helper cells) but I would like to do this with one equation.
=TRANSPOSE(split(A2,CHAR(10)))
=TEXTJOIN(CHAR(10),1,FILTER(C2:C7,REGEXMATCH(C2:C7,"NOA")))
Questions:
Can these formulas be combined and applied to the entire Column using an Index or Array?
Or perhaps, the REGEX in my first approach can be modified?
Is there a faster solution using Query?
The shared Google spreadhseet is here.
Thank you in advance for your help.
Here's one way you can do that:
=index(substitute(substitute(transpose(trim(
query(substitute(transpose(if(regexmatch(split(
filter(A2:A,A2:A<>""),char(10)),"NOA"),split(
filter(A2:A,A2:A<>""),char(10)),))," ","❄️")
,,9^9)))," ",char(10)),"❄️"," "))
First, we split the data by the newline (char 10), then we filter out the lines that don't contain NOA and finally we use a "query smush" to join everything back together.

how to build regular expressions

I'm dealing with some google spreadsheet with data, some of which is in a very confused way, but regular, so i hope we can figure this out.
I've tried reg ex builders but I can't find the right one for google sheets or I misunderstand some stuff.
I would appreciate help with these sentances below:
1. {"user":{"Czy faktura?":"Y","Nazwa firmy":"Name of the company ","NIP":"113 234 20 57"}}
2. {"user":{"Czy faktura?":"Y","Nazwa firmy":"The longer name of the company","NIP":"2352225961"}}
3. {"user":{"Czy faktura?":"N","Nazwa firmy":"","NIP":""}}
The point is to extract: (using arrayformula in google sheets)
Y or N
Name of the company
NIP number
Problems:
The name of the company has different lengths, and the NIP number is sometimes with white-spaces.
Do you guys have any idea how can I properly use it?
I know it's the REGEXEXTRACT formula of course :)
Just have a problem on how to formulate the regular expression..
=regexreplace(B1, "(^.*Nazwa firmy"":"")(.*)("",""NIP.*$)", "$2")
Well the support was fantastic :)
After all, a simple "Y|N" solves the first problem
I used #ttarchala's solution for the company name as it seems to work for some reason - i don't know why or how :)
"(^.Nazwa firmy"":"")(.)("",""NIP.*$)", "$2"
and the NIP is isolated by this one: "NIP\"":\""(.+)\"""),"-|\s","" and later trimmed of off the "-" minus and whitespaces signs.
cheers

Delimiting columns very specifically

I've got a column (with many thousands of rows) which I'd like to delimit into multiple rows. I have some experience using regular expressions in Excel, and I have some experience using delimiters in excel, but this one is just a tad too hard..
Let me give you three example-lines:
- 23-12-05: For sale for 2000. 2010-09-09: Not found
- 25-11-09: For sale for 3400. Last date found: 2010-07-08
- 18-06-08: For sale for 5500. 21-07-09: Changed from 5500 to 4900. 16-09-09: Jumped from 4900 to 4700. 2010-02-04: Not found
Most other lines follow these structures. How can I create a new column based on just the first symbols before [COLON]; A second column based on the symbols between the first [COLON] and the first [DOT]. How can I continue to the last IF the text LAST DATE is not found? Finally: How can I use regex (or another way) to use the text 'NOT FOUND' to paste the last date into a new column?
Trust me, I have been at this for quite some time now (sigh). Any help is much appreciated!
you can actually use formulas for this.
Assuming the text is in A1,
B1: =LEFT(A1,FIND(":",A1)-1)
C1: =MID(A1,FIND(":",A1)+1,FIND(".",A1,FIND(":",A1))-FIND(":",A1))
D1: =MID(A1,FIND(".",A1,FIND(":",A1))+1,LEN(A1))
E1: =MID(A1,FIND("Not found",A1)-12,10)
(I'm assuming the date format does not change for the E1)
By the way, this also works for me, to get the last date in a cell:
=LOOKUP(9999999999999999,FIND("**-**-**",A1,ROW($1:$1024)))
Only problem here is: I haven't the slightest clue what exactly I am doing here.
For example, I'd like to use the same code to find the FIRST occurence of a date.
Can anyone explain this code to me? Why am I searching for a very high number? What is it in this code that makes that I find the last occurence? What does it mean that the 'starting number' is "row(1:1024)"?
Anybody knows?

I'm going to be teaching a few developers regular expressions - what are some good homework problems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm thinking of presenting questions in the form of "here is your input: [foo], here are the capture groups/results: [bar]" (and maybe writing a small script to test their answers for my results).
What are some good regex questions to ask? I need everything from beginner questions like "validate a 4 digit number" to "extract postal codes from addresses".
A few that I can think off the top of my head:
Phone numbers in any format e.g. 555-5555, 555 55 55 55, (555) 555-555 etc.
Remove all html tags from text.
Match social security number (Finnish one is easy;)
All IP addresses
IP addresses with shorthand netmask (xx.xx.xx.xx/yy)
There's a bunch of examples of various regular expression techniques over at www.regular-expressions.info - everything for simple literal matching to backreferences and lookahead.
To keep things a bit more interesting than the usual email/phone/url stuff, try looking for more original exercises. Avoid boredom.
For example, have a look at the Forsysth-Edwards Notation which is used for describing a particular board position of a chess game.
Have your students validate and extract all the bits of information from a string like this:
rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2
Additionaly, have a look at algebraic chess notation, used to describe moves. Extract chess moves out of a piece of text (and make them bold).
1. e4 e5 2. Nf3 Black now defends his pawn 2...Nc6 3. Bb5 Black threatens c4
Validate phone numbers (extract area code + rest of number with grouping) (Assuming US phone number, otherwise generalize for you style)
Play around with validating email address (probably want to tell the students that this is hugely complicated regular expression but for simple ones it is pretty straight forward)
regexplib.com has a good library you can search through for examples.
H0w about extract first name, middle name, last name, personal suffix (Jr., III, etc.) from a format like:
Smith III, John Paul
How about Reg Ex to remove line breaks and tabs from the input
I would start with the common ones:
validate email
validate phone number
separate the parts of a URL
Be cruel. Tell them parse HTML.
RegEx match open tags except XHTML self-contained tags
Are you teaching them theory of finite automata as well?
Here is a good one: parse the addresses of churches correctly from this badly structured format (copy and paste it as text first)
http://www.churchangel.com/WEBNY/newhart.htm
I'm a fan of parsing date strings. Define a few common data formats, as well as time and date-time formats. These are often good exercises because some dates are simple mixes of digits and punctuation. There's a limited degree of freedom in parsing dates.
Just to throw them for a loop, why not reword a question or two to suggest that they write a regular expression to generate data fitting a specific pattern like email addresses, phone numbers, etc.? It's the same thing as validating, but can help them get out of the mindset that regex is just for validation (whereas the data generation tool in visual studio uses regex to randomly generate data).
Rather than teaching examples based from the data set, I would do examples from the perspective of the rule set to get basics across. Give them simple examples to solve that leads them to use ONE of several basic groupings in each solution. Then have a couple of "compound" regex's at the end.
Simple:
s/abc/def/
Spinners and special characters:
s/a\s*b/abc/
Grouping:
s/[abc]/def/
Backreference:
s/ab(c)/def$1/
Anchors:
s/^fred/wilma/
s/$rubble/and betty/
Modifiers:
s/Abcd/def/gi
After this, I would give a few examples illustrating the pitfalls of trying to match html tags or other strings that shouldn't be done with regex's to show the limitations.
Try to think of some tests that don't include ones that can be found with Google.
Asking a email validator should pose no trouble finding..
Try something like a 5 proof test.
Input 5 digit. Sum up each digit must be dividable by five: 12345 = 1+2+3+4+5 = 15 / 5 = 3(.0)