I need to remove a Table from the data frame of the particular mxd file.
Found this link https://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy-mapping/removetableview.htm
import arcpy
mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd")
for df in arcpy.mapping.ListDataFrames(mxd):
for tbl in arcpy.mapping.ListTableViews(mxd, "", df):
if tbl.name.lower() == "TableName":
arcpy.mapping.RemoveTableView(df, tbl)
mxd.saveACopy(r"C:\Project\Project2.mxd")
del mxd
This code runs without an error but does not remove the table from the data frame.
Instead of mxd.saveACopy(r"C:\Project\Project2.mxd") I tried mxd.save() also but still no success...
The Table I need to remove during the script process, is a generated output from the Find Identical tool.
Please, note I do not need to delete the table, it just needs to be removed from the Data Frame and save the mxd without the table.
Any Ideas what needs to be added or changed??
Related
I have an API which reads from two main tables Table A and Table B.
Table A has a column which acts as foreign key to Table B entries.
Now inside api flow, I have a method which runs below logic.
Raw SQL -> Joining table A with some other tables and fetching entries which has an active status in Table A.
From result of previous query we take the values from Table A column and fetch related rows from Table B using Django Models.
It is like
query = "Select * from A where status = 1" #Very simplified query just for example
cursor = db.connection.cursor()
cursor.execute(query)
results = cursor.fetchAll()
list_of_values = get_values_for_table_B(results)
b_records = list(B.objects.filter(values__in=list_of_values))
Now there is a background process which will enter or update new data in Table A and Table B. That process is doing everything using models and utilizing
with transaction.atomic():
do_update_entries()
However, the update is not just updating old row. It is like deleting old row and deleting related rows in Table B and then new rows are added to both tables.
Now the problem is if I run api and background job separately then everything is good, but when both are ran simultaneously then for many api calls the second query of Table B fails to get any data because the transaction executed in below manner:
Table A RAW Transaction executes and read old data
Background Job runs in a single txn and delete old data and enter new data. Having different foreign key values that relates it to Table B.
Table B Models read query executes which refers to values already deleted by previous txn, hence no records
So, for reading everything in a single txn I have tried below options
with transaction.atomic():
# Raw SQL for Table A
# Models query for Table B
This didn't worked and I am still getting same issue.
I tried another way around
transaction.set_autocommit(False)
Raw SQl for Table A
Models query for Table B
transaction.commit()
transaction.set_autocommit(True)
But this didn't work either. How can I read both queries in a single transaction so background job updates should not affect this read process.
I am trying to create a text dataset in a Pipeline for a text classification but I believe I am doing it the wrong way or at least I don't get it. The csv passing only contains two columns message and label which is true or false.
Inside my pipeline I am creating dataset like this which I am not very sure how dataset is recognizing that column label is the independent variable.
dataset = gcp_aip.TextDatasetCreateOp(
project = project # my project id,
display_name = display_name # reference name,
gcs_source = src_uris # path to my data in gcs,
import_schema_uri = aiplatform.schema.dataset.ioformat.text.single_label_classification,
)
once created the dataset, i do training like this within the Pipeline
# training
model = gcp_aip.AutoMLTextTrainingJobRunOp(
project = project,
display_name = display_name,
prediction_type = "classification",
multi_label = False,
dataset = dataset.outputs["dataset"],
)
Not sure if creation and training is doing correctly since I never specified that label is my label column and needs to use message as a feature.
In vertex ai the dataset created look like this
But in my training section the results from the AutML, looks like this, dont know why, label with 0% is there, which makes me doubt about the insertion of the data
In preparation of CSV file, you don't need to specify which column is the feature and the label. Vertex AI's AutoML automatically reads the first column as the feature and the second column as the label. You may refer to this documentation for more details in preparation of CSV data.
Below is sample CSV file, all values under first column(column A) are detected to be the feature and all values under second column(column B) are the labels.
You might need to check your CSV file and search for the word "label" on your second column and replace it with either "True" or "False" since based on your given data, you are only trying to have 2 labels which are "True" and "False". In addition, if you find the word "label" on your 2nd column and it doesn't have a value on its first column, then you just need to just remove the word "label".
In your provided screenshot here, there is a 1 count for the word "label", which means there is a "label" value existing on the 2nd column of your CSV data.
I read the Glue catalog table, convert it to dataframe & print the schema using the below (spark with Python)
dyf = glueContext.create_dynamic_frame.from_catalog(database='database_name',
table_name='table_name',
redshift_tmp_dir=args['TempDir'])
df = dyf.toDF()
df.printschema()
It works fine when the table has data.
But, It doesn't print the schema if the table is empty (it is unable to get the schema of an empty table). As a result the future joins are failing.
Is there an way to overcome this and make the dynamic frame get the table schema from catalog even for an empty table or any other alternatives?
I found a solution. It is not ideal but it works. If you call apply_mapping() on your DynamicFrame, it will preserve the schema in the DataFrame. For example, if your table has column last_name, you can do:
dyf = glueContext.create_dynamic_frame.from_catalog(database='database_name',
table_name='table_name',
df = dyf.apply_mapping([
("last_name", "string", "last_name", "string")
])toDF()
df.printschema()
I have a JSON array of structures in S3, that is successfully Crawled & Cataloged by Glue.
[{"key":"value"}, {"key":"value"}]
I'm using the custom Classifier:
$[*]
When trying to query from Spectrum, however, it returns:
Top level Ion/JSON structure must be an anonymous array if and only if
serde property 'strip.outer.array' is set. Mismatch occured in file...
I set that serde property manually in the Glue catalog table, but nothing changed.
Is it no possible to query an anonymous array via Spectrum?
Naming the array in the JSON file like this:
"values":[{"key":"value"},...}
And updating the classifier:
$.values[*]
Fixes the issue... Interested to know if there is a way to query anonymous arrays though. It seems pretty common to store data like that.
Update:
In the end this solution didn't work, as Spectrum would never actually return any results. There was no error, just no results, and as of now still no solution other than using individual records per line:
{"key":"value"}
{"key":"value"}
etc.
It does seem to be a Spectrum specific issue, as Athena would still work.
Interested to know if anyone else was able to get it to work...
I've successfully done this, but without a data classifier. My JSON file looks like:
[
{
"col1": "data_from_col1",
"col2": "data_from_col2",
"col3": [
{
"col4": "data_from_col4",
...
{
]
},
{
"col1": "data_from_col1",
"col2": "data_from_col2",
"col3": [
{
"col4": "data_from_col4",
...
{
]
},
...
]
I started with a crawler to get a basic table definition. IMPORTANT: the crawler's configuration options under Output CAN'T be set to Update the table definition..., or else re-running the crawler later will overwrite the manual changes described below. I used Add new columns only.
I had to add the 'strip.outer.array' property AND manually add the topmost columns within my anonymous array. The original schema from the initial crawler run was:
anon_array array<struct<col1:string,col2:string,col3:array<struct<col4...>>>
partition_0 string
I manually updated my schema to:
col1:string
col2:string
col3:array<struct<col4...>>
partition_0 string
(And also add the serde param strip.outer.array.)
Then I had to rerun my crawler, and finally I could query in Spectrum like:
select o.partition_0, o.col1, o.col2, t.col4
from db.tablename o
LEFT JOIN o.col3 t on true;
You can use json_extract_path_text for extracting the element or json_extract_array_element_text('json string', pos [, null_if_invalid ] ).
for example:
for 2nd index element
select json_extract_array_element_text('[111,112,113]', 2);
output: 113
If your table's structure is as follows:
CREATE EXTERNAL TABLE spectrum.testjson(struct<id:varchar(25),
columnName<array<struct<key:varchar(20),value:varchar(20)>>>);
you can use the following query to access the array element:
SELECT c.id, o.key, o.value FROM spectrum.testjson c, c.columnName o;
For more information you can refer the AWS Documentation:
https://docs.aws.amazon.com/redshift/latest/dg/tutorial-query-nested-data-sqlextensions.html
I'm trying to do something like this
file = sc.textFile('mytextfile')
def myfunction(mystring):
new_value = mystring
for i in file.toLocalIterator()
if i in mystring:
new_value = i
return new_value;
rdd_row = some_data_frame.map(lambda u: Row(myfunction(u.column_name)))
But I get this error
It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers
The problem is (as is clearly stated in the error message) that you are trying to work with an RDD inside the map. File is an RDD. it can have various transformations on it (e.g. you are trying to do a local iterator on it). But you are trying to use the transformation inside another - the map.
UPDATE
If I understand correctly you have a dataframe df with a column URL. You also have a text file which contains blacklist values.
Lets assume for the sake of argument that your blacklist files is a csv with a column blacklistNames and that the dataframe df's URL column is already parsed. i.e. you just want to check if URL is in the blacklistNames columns.
What you can do is something like this:
df.join(blackListDF, df["URL"]==blackListDF["blacklistNames"], "left_outer")
This join basically adds a blacklistNames column to your original dataframe which would contain the matched name if it is in the blacklist and null otherwise. Now all you need to do is filter based on whether or not the new column is null or not.