Convert list to tibble in R with NULL values - list

library(tidyverse)
event <- list(
reportId = 157250,
eventId = 4580,
country = "Moldova",
disease = "African swine fever",
subType = NULL
)
event %>% as_tibble()
An error raised:
! All columns in a tibble must be vectors.
✖ Column subType is NULL.
There are thousands of events list like this, is any method to get a tibble object properly.

You can circumvent this by running the following steps:
event %>%
enframe() %>% # long format with all elements as lists
unnest("value", keep_empty = TRUE) %>% # transform into elementary data
pivot_wider(names_from = "name", values_from = "value") # make wide format with all columns

use map_df can generate a tibble object, however, the column with NULL missed.
event %>% map_df(.f = ~.x)

I have the same requirement; I have a list obtained from the FlightAware API (converted from JSON) that contains NULLs.
The solution by #manuela-benary did not work straight away because the response includes a nested array of alternatives; here is the specification for the response from the API documentation:
{
"icao": "string",
"iata": "string",
"callsign": "string",
"name": "string",
"country": "string",
"location": "string",
"phone": "string",
"shortname": "string",
"url": "string",
"wiki_url": "string",
"alternatives": [
{
"icao": "string",
"iata": "string",
"callsign": "string",
"name": "string",
"country": "string",
"location": "string",
"phone": "string",
"shortname": "string",
"url": "string",
"wiki_url": "string"
}
]
}
This simply required running Manuela's code on the alternatives list first, then NULLing that member of the response, and then running the code on the original list. The two sets of results can then be row_bind()ed together.

Related

GCP - BigTable to BigQuery

I am trying to query Bigtable data in BigQuery using the external table configuration. I have the following SQL command that I am working with. However, I get an error stating invalid bigtable_options for format CLOUD_BIGTABLE.
The code works when I remove the columns field. For context, the raw data looks like this (running query without column field):
rowkey
aAA.column.name
aAA.column.cell.value
4271
xxx
30
yyy
25
But I would like the table to look like this:
rowkey
xxx
4271
30
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": string,
"qualifierString": string,
"fieldName": "xxx",
"type": string,
"encoding": string,
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
I think you let the default value for each column attribute. the string is the type of the value to provide, but not the raw value to provide. It makes no sense in JSON here. Try to add double quote like that
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": "string",
"qualifierString": "string",
"fieldName": "xxx",
"type": "string",
"encoding": "string",
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
The false is correct because the type is a boolean. More details here. The encoding "string" will be erroneous (use a real encoding type).
The error here is in this part:
bigtableColumnFamilies: [
It should be:
"columnFamilies": [
Concerning adding columns for string you will only add:
"columns": [{
"qualifierString": "name_of_column_from_bt",
"fieldName": "if_i_want_rename",
}],
fieldName is not required.
However to access your field value you will still have to use such SQL code:
SELECT
aAA.xxx.cell.value as xxx
FROM dev_test.telem_test

Django differ JSONField values between lists and objects

I am using django 3.2 with Postgres as DB.
I have a model with JSONField:
class MyModel(models.Model):
data = models.JSONField(default=dict, blank=True)
In database there are a lot of records in this table and some data have JSON values as object and others as lists:
{
"0:00:00 A": "text",
"0:01:00 B": "text",
"0:02:00 C": "text",
}
[
{"time": "0:00:00", "type": "A", "description": "text"},
{"time": "0:01:00", "type": "B", "description": "text"},
{"time": "0:02:00", "type": "C", "description": "text"},
]
I need to filter all records which has JSON values as objects.
What I tried is to use has_key with time frame "0:00:00" :
result = MyModel.objects.filter(data__has_key="0:00:00 A")
But I really cant use it because I am not sure what the key with time frame look like completely.
Any ideas how to filter JSONField values by object struct?

Can I create a Django Rest Framework API with Geojson format without having a model

I have a Django app that requests data from an external API and my goal is to convert that data which is returned as list/dictionary format into a new REST API with a Geojson format.
I came across django-rest-framework-gis but I don't know if I could use it without having a Model. But if so, how?
I think the best way is to use the python library geojson
pip install geojson
If you do not have a Model like in geodjango you have to explicitly describe the geometry from the data you have.
from geojson import Point, Feature, FeatureCollection
data = [
{
"id": 1,
"address": "742 Evergreen Terrace",
"city": "Springfield",
"lon": -123.02,
"lat": 44.04
},
{
"id": 2,
"address": "111 Spring Terrace",
"city": "New Mexico",
"lon": -124.02,
"lat": 45.04
}
]
def to_geojson(entries):
features = []
for entry in entries:
point = Point((entry["lon"], entry["lat"]))
del entry["lon"]
del entry["lat"]
feature = Feature(geometry=point, properties=entry)
features.append(feature)
return FeatureCollection(features)
if __name__ == '__main__':
my_geojson = to_geojson(data)
print(my_geojson)
Create the point geometry from lon, lat (Could also be another geometry type)
Create a feature with the created geometry and add the dictionary as properties. Note that I deleted lon, lat entries from the dictionary to not show up as properties.
Create A feature collection from multiple features
Result:
{"features": [{"geometry": {"coordinates": [-123.02, 44.04], "type":
"Point"}, "properties": {"address": "742 Evergreen Terrace", "city":
"Springfield", "id": 1}, "type": "Feature"}, {"geometry":
{"coordinates": [-124.02, 45.04], "type": "Point"}, "properties":
{"address": "111 Spring Terrace", "city": "New Mexico", "id": 2},
"type": "Feature"}], "type": "FeatureCollection"}
More Info here: Documentation Geojson Library

Error when importing CSV file into Amazon Personalize

I am trying to import a CSV file into Amazon Personalize
my schema looks like this:
{
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "AUTHOR",
"type": "string",
"categorical": true
},
{
"name": "COUNTRY",
"type": "string",
"categorical": true
},
{
"name": "CITY",
"type": "string",
"categorical": true
},
{
"name": "STYLES",
"type": "string",
"categorical": true
},
{
"name": "CATEGORIES",
"type": "string",
"categorical": true
}
],
"version": "1.0"
}
the first few rows of data look like this:
ITEM_ID,AUTHOR,COUNTRY,CITY,STYLES,CATEGORIES
5b4253a7e12434f55875381e,5acd193f48ed4b9b3add5be6,US,city_us_austin,5ad45bc575eb016f3cdb562b|571aa21888a4fd9934f0fd7b|571aa21888a4fd9934f0fd79|5ad45e8c75eb016f3cdb563f|5b4ea35abaa12285687a1f47,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
5b4253a7e12434f55875381f,5acd193f48ed4b9b3add5be6,US,city_us_jackson,571aa21888a4fd9934f0fd82|57600e419e4959cd069658eb|5ad45c3a75eb016f3cdb5631|571aa21888a4fd9934f0fd7b|57aaa7094a393f531ace43f0|575e6d8e34ca56f742bea1c8|571aa21888a4fd9934f0fd8f,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
I get the error
Failed to create a data import job for item dataset.
Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
How can I figure out what is wrong with the CSV (it's thousands of lines long), so I have not idea if its a general mistake, or something wrong on a specific line?
In my experience, so long as the dataset is not >250 thousand records, you can still use Excel to check the data utilizing data filters and corresponding search functions. If it's more than that, look into using Notepad++ and RegEx. Your problem may be one of the following things:
(1) There's a missing comma. This would misalign your data and keep it from being processed.
(2) There's a missing ITEM_ID value. For Items, Personalize requires ITEM_ID and at least one metadata field. It might give this error if there is an instance where you are missing ITEM_ID or have ITEM_ID but no other metadata field values.
(3) STYLES and/or CATEGORIES exceeds 256 characters. There is probably a limit on String length, but I can't get a clear answer on this from the developer's guide. I would guess it's 256 characters. If I was betting money, this would be my guess on your problem.
Here is a different approach to solve the problem, maybe will be useful for other cases. I had the same issue, but when dealing with int columns having null values. Pandas by default converts the columns to float data type - something AWS Personalize dataset import job will not accept if you have dedfined these columns as int or long. Long story short, converting these columns to int solves the problem:
df.column_name = df.column_name.astype(pd.Int32Dtype())

How do I use JSON with U2/Universe

U2/Universe JSON document have the following UDOSetProperty, how would one set the value if it has multiple values? For example if I have multiple emails.
example: UDOSetProperty(udoHandle, "to", value)
"to": [
{
"email": "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
}
],
Not sure if you are trying to add another "to" array element or if you want to add a 2nd "email" only.
So working with your example:
"to": [
{
"email": [ "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
},
{
"email": [ "recipient2Email#example.com",
"name": "Recipient2 Name",
"type": "to"
}
],
If you wanted to create the above JSON from scratch, with the UDO commands, the steps would be:
Using the following functions should help you with what you are trying to do:
Create the initial/root object UDOCreate(UDO_OBJECT,
udoHandle)
Create the array UDOCreate(UDO_ARRAY,
thisArray)
( Use UDOCreate and UDOSetProperty to create the theEmailObject you
want to add to the array, and then add it to the object with
UDOArrayAppendItem( thisArray, theEmailObject )
Then add the array to the root object eith UDOSetProperty(udoHandle,
"TO", thisArray)
Note the part that is important is that there are several functions for dealing with arrays.
Mike
Created a program that builds the JSON with the U2 UDO functions, and added it to github:
https://github.com/RocketSoftware/multivalue-lab/blob/master/U2/Demos/UDO/JSON/The-Basics/arrayExample