Converting dictionary to KeyedVectorFormat - word2vec

I tried to use the code here :
Convert Python dictionary to Word2Vec object
The error does not make sense. I wrote the file in non-binary format and the first line is as it should be.
Any idea what might be going wrong?
Or another way to achieve the same end result?
/usr/local/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
1496 return _load_word2vec_format(
1497 cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
-> 1498 limit=limit, datatype=datatype)
1499
1500 def get_keras_embedding(self, train_embeddings=False):
/usr/local/lib/python3.7/site-packages/gensim/models/utils_any2vec.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype)
392 parts = utils.to_unicode(line.rstrip(), encoding=encoding, errors=unicode_errors).split(" ")
393 if len(parts) != vector_size + 1:
--> 394 raise ValueError("invalid vector on line %s (is this really the text format?)" % line_no)
395 word, weights = parts[0], [datatype(x) for x in parts[1:]]
396 add_word(word, weights)
ValueError: invalid vector on line 1 (is this really the text format?)

Related

Pandas Merge Error iterable, not itertools.imap

I'm trying to merge two dataframes using the pandas merge code below. Each dataframe has just three columns. I've done similar merges before without issue. I've provided .info() on each dataframe. I'm getting an error about iterable vs not itertools.imap. I have no clue what they're talking about. Any tips very much appreciated.
Data:
pio_smp2_sm.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12779 entries, 15 to 68311
Data columns (total 3 columns):
entityId 12779 non-null object
targetEntityId 12779 non-null object
eventTime 12779 non-null object
dtypes: object(3)
memory usage: 399.3+ KB
cm_smp2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 28035 entries, 40 to 698858
Data columns (total 3 columns):
user_id 28035 non-null object
product_id 28035 non-null object
time_stamp 28035 non-null object
dtypes: object(3)
memory usage: 876.1+ KB
Code:
comp_df2=pd.merge(pio_smp2_sm,cm_smp2,how='inner',left_on=['entityId','targetEntityId'],right_on=['user_id','product_id'])
Error:
TypeErrorTraceback (most recent call last)
<ipython-input-235-6882a22fe6a1> in <module>()
23
24
---> 25 comp_df2=pd.merge(pio_smp2_sm,cm_smp2,how='inner',left_on=['entityId','targetEntityId'],right_on=['user_id','product_id'])
26
27 # print(comp_df2.shape[0])
/data2/user/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
56 copy=copy, indicator=indicator,
57 validate=validate)
---> 58 return op.get_result()
59
60
/data2/user/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in get_result(self)
580 self.left, self.right)
581
--> 582 join_index, left_indexer, right_indexer = self._get_join_info()
583
584 ldata, rdata = self.left._data, self.right._data
/data2/user/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _get_join_info(self)
746 else:
747 (left_indexer,
--> 748 right_indexer) = self._get_join_indexers()
749
750 if self.right_index:
/data2/user/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(self)
725 self.right_join_keys,
726 sort=self.sort,
--> 727 how=self.how)
728
729 def _get_join_info(self):
/data2/user/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
1048
1049 # get left & right join labels and num. of levels at each location
-> 1050 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
1051
1052 # get flat i8 keys from label lists
TypeError: type object argument after * must be an iterable, not itertools.imap

Reading in TSP file Python

I need to figure out how to read in this data of the filename 'berlin52.tsp'
This is the format I'm using
NAME: berlin52
TYPE: TSP
COMMENT: 52 locations in Berlin (Groetschel)
DIMENSION : 52
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
5 845.0 655.0
6 880.0 660.0
7 25.0 230.0
8 525.0 1000.0
9 580.0 1175.0
10 650.0 1130.0
And this is my current code
# Open input file
infile = open('berlin52.tsp', 'r')
# Read instance header
Name = infile.readline().strip().split()[1] # NAME
FileType = infile.readline().strip().split()[1] # TYPE
Comment = infile.readline().strip().split()[1] # COMMENT
Dimension = infile.readline().strip().split()[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split()[1] # EDGE_WEIGHT_TYPE
infile.readline()
# Read node list
nodelist = []
N = int(intDimension)
for i in range(0, int(intDimension)):
x,y = infile.readline().strip().split()[1:]
nodelist.append([int(x), int(y)])
# Close input file
infile.close()
The code should read in the file, output out a list of tours with the values "1, 2, 3..." and more while the x and y values are stored to be calculated for distances. It can collect the headers, at least. The problem arises when creating a list of nodes.
This is the error I get though
ValueError: invalid literal for int() with base 10: '565.0'
What am I doing wrong here?
This is a file in TSPLIB format. To load it in python, take a look at the python package tsplib95, available through PyPi or on Github
Documentation is available on https://tsplib95.readthedocs.io/
You can convert the TSPLIB file to a networkx graph and retrieve the necessary information from there.
You are feeding the string "565.0" into nodelist.append([int(x), int(y)]).
It is telling you it doesn't like that because that string is not an integer. The .0 at the end makes it a float.
So if you change that to nodelist.append([float(x), float(y)]), as just one possible solution, then you'll see that your problem goes away.
Alternatively, you can try removing or separating the '.0' from your string input.
There are two problem with the code above.I have run the code and found the following problem in lines below:
Dimension = infile.readline().strip().split()[1]
This line should be like this
`Dimension = infile.readline().strip().split()[2]`
instead of 1 it will be 2 because for 1 Dimension = : and for 2 Dimension = 52.
Both are of string type.
Second problem is with line
N = int(intDimension)
It will be
N = int(Dimension)
And lastly in line
for i in range(0, int(intDimension)):
Just simply use
for i in range(0, N):
Now everything will be alright I think.
nodelist.append([int(x), int(y)])
int(x)
function int() cant convert x(string(565.0)) to int because of "."
add
x=x[:len(x)-2]
y=y[:len(y)-2]
to remove ".0"

List index out of range when reading a file

I am opening a file and trying to read the 3rd value on each line. Here is my code
myfile = 'dummy2.pepmasses'
fileObj = open(myfile, 'r')
line = fileObj.readline()
while line:
line = fileObj.readline()
linesplit = line.split()
weight = linesplit[2]
print(weight)
fileObj.close
This is resulting the third value being correctly displayed however there is an index error at the bottom but I'm not sure why as I'm not specifying a range of values to read, but rather just to read everything. I believe the issue is that when I read the file there is a blank [] at the bottom, although there are no blank lines on the actual file so I don't understand what is happening.
Any ideas appreciated, thanks.
The end of my file is
STE50,YCL032W 36 1262.6920 0 0 QQGLHPAIMLR
STE50,YCL032W 37 174.1117 0 0 R
STE50,YCL032W 38 174.1117 0 0 R
STE50,YCL032W 39 2081.8783 0 0 GDFEEVAMMNGSDNVTPGGR
STE50,YCL032W 40 131.0947 0 0 L*
The error generated is
174.1117
2081.8783
131.0947
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/Test/Test.py", line 12, in <module>
weight = linesplit[2]
IndexError: list index out of range
You should check whether linesplit has at least 3 values in it and only print the weight in that case.

Tensorflow variable_scope for adam optimizer?

Versions: Python 2.7.13 and TF 1.2.1
Background: I'm trying to create a single LSTM cell and pass an input of N x M and output N x M+1. I want to pass the output through a softmax layer and then through an Adam optimizer with a loss function of negative log likelihood.
Problem: As stated in the title, when I try to set my training_op = optimizer.minimize(nll) it crashes and asks about a variable scope. What should I do?
Code:
with tf.variable_scope('lstm1', reuse=True):
LSTM_cell_1 = tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, activation=tf.nn.relu)
rnn_outputs_1, states_1 = tf.nn.dynamic_rnn(LSTM_cell_1, X_1, dtype=tf.float32)
rnn_outputs_1 = tf.nn.softmax(rnn_outputs_1)
stacked_rnn_outputs_1 = tf.reshape(rnn_outputs_1, [-1, n_neurons])
stacked_outputs_1 = tf.layers.dense(stacked_rnn_outputs_1, n_outputs)
outputs_1 = tf.reshape(stacked_outputs_1, [-1, n_steps, n_outputs])
mu = tf.Variable(np.float32(1))
sigma = tf.Variable(np.float32(1))
def normal_log(X, mu, sigma, left=-np.inf, right=np.inf):
val = -tf.log(tf.constant(np.sqrt(2.0 * np.pi), dtype=tf.float32) * sigma) - \
tf.pow(X - mu, 2) / (tf.constant(2.0, dtype=tf.float32) * tf.pow(sigma, 2))
return val
nll = -tf.reduce_sum(normal_log(outputs, mu, sigma))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(nll)
Error message:
ValueError Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.pyc in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
323
324 return self.apply_gradients(grads_and_vars, global_step=global_step,
--> 325 name=name)
326
327 def compute_gradients(self, loss, var_list=None,
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.pyc in apply_gradients(self, grads_and_vars, global_step, name)
444 ([str(v) for _, _, v in converted_grads_and_vars],))
445 with ops.control_dependencies(None):
--> 446 self._create_slots([_get_variable_for(v) for v in var_list])
447 update_ops = []
448 with ops.name_scope(name, self._name) as name:
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/adam.pyc in _create_slots(self, var_list)
126 # Create slots for the first and second moments.
127 for v in var_list:
--> 128 self._zeros_slot(v, "m", self._name)
129 self._zeros_slot(v, "v", self._name)
130
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.pyc in _zeros_slot(self, var, slot_name, op_name)
764 named_slots = self._slot_dict(slot_name)
765 if _var_key(var) not in named_slots:
--> 766 named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
767 return named_slots[_var_key(var)]
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.pyc in create_zeros_slot(primary, name, dtype, colocate_with_primary)
172 return create_slot_with_initializer(
173 primary, initializer, slot_shape, dtype, name,
--> 174 colocate_with_primary=colocate_with_primary)
175 else:
176 val = array_ops.zeros(slot_shape, dtype=dtype)
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.pyc in create_slot_with_initializer(primary, initializer, shape, dtype, name, colocate_with_primary)
144 with ops.colocate_with(primary):
145 return _create_slot_var(primary, initializer, "", validate_shape, shape,
--> 146 dtype)
147 else:
148 return _create_slot_var(primary, initializer, "", validate_shape, shape,
/usr/local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.pyc in _create_slot_var(primary, val, scope, validate_shape, shape, dtype)
64 use_resource=_is_resource(primary),
65 shape=shape, dtype=dtype,
---> 66 validate_shape=validate_shape)
67 variable_scope.get_variable_scope().set_partitioner(current_partitioner)
68
/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
960 collections=collections, caching_device=caching_device,
961 partitioner=partitioner, validate_shape=validate_shape,
--> 962 use_resource=use_resource, custom_getter=custom_getter)
963
964 def _get_partitioned_variable(self,
/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
365 reuse=reuse, trainable=trainable, collections=collections,
366 caching_device=caching_device, partitioner=partitioner,
--> 367 validate_shape=validate_shape, use_resource=use_resource)
368
369 def _get_partitioned_variable(
/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource)
350 trainable=trainable, collections=collections,
351 caching_device=caching_device, validate_shape=validate_shape,
--> 352 use_resource=use_resource)
353
354 if custom_getter is not None:
/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource)
662 " Did you mean to set reuse=True in VarScope? "
663 "Originally defined at:\n\n%s" % (
--> 664 name, "".join(traceback.format_list(tb))))
665 found_var = self._vars[name]
666 if not shape.is_compatible_with(found_var.get_shape()):
ValueError: Variable lstm1/dense/kernel/Adam_1/ already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "<ipython-input-107-eed033b85dc0>", line 11, in <module>
training_op = optimizer.minimize(nll)
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
if self.run_code(code, result):
So turns out I was executing the section over and over again inside a Python notebook, so to all tf rookies out there, remember to reset your kernel every time

JasperReports: Converting String into array and populating List with it

What I have is this String [125, 154, 749, 215, 785, 1556, 3214, 7985]
(string can have anything from 1 to 15 ID's in it and the reason it is a string and not a List is that, its being sent through a URL)
I need to populate a List called campusAndFaculty with it
I am using iReport 5.0.0
I've tried entering this in the campusAndFaculty default value Expression field
Array.asList(($P{campusAndFacultyString}.substring( 1, ($P{campusAndFacultyString}.length() -2 ))).split("\\s*,\\s*"))
But it does not populate the campusAndFaculty List
Any idea how I can populate the List campusAndFaculty using that String ("campusAndFacultyString")?
======================
UPDATE
I have these variables in iReport (5.0.0)
String campusAndFacultyFromBack = "[111, 125, 126, 4587, 1235, 1259]"
String noBrackets = $P{campusAndFacultyFromBack}.substring(1 ($P{campusAndFacultyFromBack}.length() -1 ))
List campusAndFacultyVar = java.util.Arrays.asList(($V{noBrackets}).split("\\s*,\\s*"))
When I print campusAndFacultyVar It returns "[111, 125, 126, 4587, 1235, 1259]"
but when I use it in a Filter I get the "Cannot evaluate the following expression: org_organisation.org_be_id in null"
This works for me:
String something = "[125, 154, 749, 215, 785, 1556, 3214, 7985]";
Arrays.asList((something.substring(1, (something.length() -1 ))).split("\\s*,\\s*"));
Which means you can do this in iReport:
java.util.Arrays.asList(($P{campusAndFacultyString}.substring(1, (something.length() -1 ))).split("\\s*,\\s*"));
Differences with your snippet:
It's Arrays, not Array
You should take 1, not 2 from the length
Fully qualified reference to Arrays class (which may or may not matter depending on how your iReport is configured)