I'm trying to load in around 30k xml files from clinicaltrials.gov into a mySQL database, and the way I am handling multiple locations, keywords, etc. are in a separate model using ManyToManyFields.
The best way I've figured out is to read the data in using a fixture. So my question is, how do I handle the fields where the data is a pointer to another model?
I unfortunately don't know enough about how ManyToMany/ForeignKeys work, to be able to answer...
Thanks for the help, sample code below: __ represent the ManyToMany fields
{
"pk": trial_id,
"model": trials.trial,
"fields": {
"trial_id": trial_id,
"brief_title": brief_title,
"official_title": official_title,
"brief_summary": brief_summary,
"detailed_Description": detailed_description,
"overall_status": overall_status,
"phase": phase,
"enrollment": enrollment,
"study_type": study_type,
"condition": _______________,
"elligibility": elligibility,
"Criteria": ______________,
"overall_contact": _______________,
"location": ___________,
"lastchanged_date": lastchanged_date,
"firstreceived_date": firstreceived_date,
"keyword": __________,
"condition_mesh": condition_mesh,
}
}
A foreign key is simple the pk of the object you are linking to, a manytomanyfield uses a list of pk's. so
[
{
"pk":1,
"model":farm.fruit,
"fields":{
"name" : "Apple",
"color" : "Green",
}
},
{
"pk":2,
"model":farm.fruit,
"fields":{
"name" : "Orange",
"color" : "Orange",
}
},
{
"pk":3,
"model":person.farmer,
"fields":{
"name":"Bill",
"favorite":1,
"likes":[1,2],
}
}
]
You will need to probably write a conversion script to get this done. Fixtures can be very flimsy; it's difficult to get the working so experiment with a subset before you spend a lot of time converting the 30k records (only to find they might not import)
Related
Working with django JsonField. Using django-entangled in form. I need data format like below. Need suggestion to avail this.
[
{
"name": "Test 1",
"roll": 1,
"section": "A"
},
{
"name": "Test 2",
"roll": 2,
"section": "A"
}
]
With django-entangled this is not possible, because that library does not offer multiple forms of the same kind.
You can however try my next form library django-formset, which can handle multiple forms of the same kind.
I am new to MongoDB, and so far it seems like it is trying to go out of it's way to make doing simple things overly complex.
I am trying to run the below MYSQL equivalent
SELECT userid, COUNT(*)
FROM userinfo
WHERE userdata like '%PC% or userdata like '%wire%'
GROUP BY userid
I have mongo version 3.0.4 and i am running MongoChef.
I tried using something like the below:
db.userinfo.group({
"key": {
"userid": true
},
"initial": {
"countstar": 0
},
"reduce": function(obj, prev) {
prev.countstar++;
},
"cond": {
"$or": [{
"userdata": /PC/
}, {
"userdata": /wire/
}]
}
});
but that did not like the OR.
when I took out the OR, thinking I’d do half at a time and combine results in excel, i got an error "group() can't handle more than 20000 unique keys", and the result table should be much bigger than that.
From what I can tell online, I could do this using aggregation pipelines, but I cannot find any clear examples of how to do that.
This seems like it should be a simple thing that should be built in to any DB, and it makes no sense to me that it is not.
Any help is much appreciated.
/
Works "sooo" much better with the .aggregate() method, as .group() is a very outmoded way of approaching this:
db.userinfo.aggregate([
{ "$match": {
"userdata": { "$in":[/PC/,/wire/] }
}},
{ "$group": {
"_id": "$userid",
"count": { "$sum": 1 }
}}
])
The $in here is a much shorter way of writing your $or condition as well.
This is native code as opposed to JavaScript translation as well, so it runs much faster.
Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:
db.collection.distinct("first_name", {“last_name”:”smith”}).length;
output
3
I started to look into creating a specific ES-mapping for tweets, but quickly realized that an ES-mapping of the tweet-model would become a nightmare to maintain over time so I started to think about dynamic templates. I've registered a dynamic template for every possible property according to the twitter object description. A tweet is a very hierarchical and redundant format which means that a property, say "created_at", may be present at a number of places - thus the nightmare to maintain a stable explicit mapping.
In the mapping I've created so far I have no explicit mappings ("properties"-attribute is empty) as I want all the mappings to be controlled by dynamic templates. As an example my dynamic template for the "created_at" property looks like:
{
"created_at": {
"match": "created_at",
"mapping": {
"format": "EEE MMM d HH:mm:ss Z YYYY",
"index": "no"
}
}
I thought that having this template would take care of the mapping of a "created_at" property whereever it would appear in the json-structure. I know that I may specify "path_match" in order to explicitly specify a give property-instance but I want all the "created_at" attributes to be mapped according to the template above.
However - when I start indexing data into ES I get numerous errors looking something like:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property [created_at]
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateFieldForString(StringFieldMapper.java:331)
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:277)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:399)
... 13 more
What am I doing wrong here?
You could try the following example to set up a dynamic template:
curl -XPUT localhost:9200/_template/template_for_created_at -d '
{
"template": "*",
"mappings": {
"_default_": {
"dynamic": true,
"dynamic_templates": [
{
"created_at_tmpl": {
"match": "created_at",
"match_mapping_type": "date",
"mapping": {
"type": "date",
"format": "EEE MMM d HH:mm:ss Z YYYY",
"index": "no",
"null_value": null
}
}
}
]
}
}
}'
More details and examples can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-templates.html
I'm soo sorry I haven't marked this question as "solved"!!!! I managed to get it working after some investigations. Thanks for the suggestion though.
Cheers
all new jsfiddle: http://jsfiddle.net/vJxvc/2/
Currently, i query an api that will return JSON like this. The API cannot be changed for now, which is why I need to work around that.
[
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
]
(could be a lot more lines, of course)
As you can see, each line has a timestamp and then an array of values. My problem is, that i would actually like to transpose that. Looking at the first line alone:
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]}
It contains a few measurements taken at the same time. This would need to become this in my ember project:
{
"sensor_id": 1, // can be derived from the array index
"timestamp": 1406111961,
"value": 1236.181
},
{
"sensor_id": 2,
"timestamp": 1406111961,
"value": 1157.695
},
{
"sensor_id": 3,
"timestamp": 1406111961,
"value": 698.231
}
And those values would have to be pushed into the respective sensor models.
The transformation itself is trivial, but i have no idea where i would put it in ember and how i could alter many ember models at the same time.
you could make your model an array and override the normalize method on your adapter. The normalize method is where you do the transformation, and since your json is an array, an Ember.Array as a model would work.
I am not a ember pro but looking at the manual I would think of something like this:
a = [
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
];
b = [];
a.forEach(function(item) {
item.values.forEach(function(value, sensor_id) {
b.push({
sensor_id: sensor_id,
timestamp: item.timestamp,
value: value
});
});
});
console.log(b);
Example http://jsfiddle.net/kRUV4/
Update
Just saw your jsfiddle... You can geht the store like this: How to get Ember Data's "store" from anywhere in the application so that I can do store.find()?
I've some models created wich i'd like to provide initial data for. The problem is that there are several models, and i'd like to organize the data.
Currently, i've a big JSON file: initial_data.json with the data. I was thinking i could use some comments, but JSON has no comments! I really want to use json.
So, the file is like:
[
{
"model": "app1.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
//many more
{
"model": "app2.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
//many more
{
"model": "app2.Model1",
"pk": 1,
"fields": {
"nombre": "A convenir con el vendedor"
}
},
]
So, i thought i could organize them in different files, and with some initial script load them. The idea is not issue several python manage.py loaddata thisApp.Model But, then it would be difficult to separate the files that are not ment to be loaded at initial time.
Here are the files as example:
+app1
+fixtures
model1.json
model2.json
+app2
+fixtures
model1.json
model2.json
+app3
+fixtures
model1.json
model2.json
Do you have any idea how to keep simple?
like you said, create several files, and write a script that combines them into initial_data.json and invokes the needed django.core.management command. this is what I do.
Call the files that contain initial data "initial_data.json" - syncdb will only load those. You can load the others manually with manage.py loaddata.
https://docs.djangoproject.com/en/dev/howto/initial-data/#automatically-loading-initial-data-fixtures