I want to determine the commit message with help of build number format.
What is the format for it in build number format?
Related
I am working on a Glue job to read data from an oracle database and write it into redshift. I have crawled the tables from my oracle source and redshift target. When I use the glue visual, with oracle source and write to redshift component it is completing in around 7 mins with G1x and 5 workers. I tried other combinations and concluded this is the best combination I can use.
Now I wanted to optimize this further and am trying to write a pyspark script from scratch. I used a simple jdbc read and write, but it is taking more than 30 minutes to complete. I have 3M records in source. I have tried with numpartitions 10 and fetch size 30000. My question is:
what are the default configs used by glue visual job, as it is finishing way so fastly?
Does the fetch size is already configured on the source side when we use to read using a jdbc connection? because if glue visual job is using this and its value is more than what I have specified, could that be the reason for faster execution?
Please let me know if you need any further details.
I have "YYYY-MM-DD HH:MM:SS.QQ ERROR" in my splunk logs.
Now I want to search for similar date pattern along with Status like "2021-Apr-08 23:08:23.498 ERROR" in my splunk logs and create alert if the ERROR tag comes next to the date.
These date are changeable and are generated at run time.
Can any one suggest me how to check for Date time format along with Status in splunk query.
In the title you mentioned Amazon Web Services. If your events are actual AWS log data, you could install the Splunk Add-on for Amazon Web Services: https://splunkbase.splunk.com/app/1876/
The add-on comes with a lot of field extractions. After installing the add-on, all you need to do is have a look at your events to find out the correct field name for the status text and then search for status=ERROR.
Alternatively, you can create the field extraction yourself. This regular expression should do:
(?<date>\d\d\d\d-\w+-\d\d\s+\d\d:\d\d:\d\d\.\d\d\d)\s+(?<status>\w+)
You can test it here: https://regex101.com/r/pVg1Pm/1
Now use Splunk's rex command to do the field extraction at search time:
To have the field extraction done automatically, you can add new field extractions via Settings / Fields / Field extractions.
We are planning to use GCP Pubsub to write events to GCS. I have the below questions.
We want to enable the audit table in BigQuery, we would like to see how many messages came for the particular time frame. By day, hour
How do we validate from Pubsub let's say we received 10 messages, how do we check against GCS? How to check we didn't drop any messages.
I would really appreciate your feedback.
To validate number of records written written to GCS, you can create Big query external temp table and query for number of records written to GCS. This sanity check need to be done at regular interval.
Second solution :- You can also check no or records written to GCS through following command:-
gsutil cat gs://folder/test.csv | wc -l
I have JSON files in an S3 Bucket that may change their schema from time to time. To be able to analyze the data I want to run a glue crawler periodically on them, the analysis in Athena works in general.
Problem: My timestamp string is not recognized as timestamp
The timestamps currently have the following format 2020-04-06T10:37:38+00:00, but I have also tried others, e.g. 2020-04-06 10:37:38 - I have control over this and can adjust the format.
The suggestion to set the serde parameters might not work for my application, I want to have the scheme completely recognized and not have to define each field individually. (AWS Glue: Crawler does not recognize Timestamp columns in CSV format)
Manual adjustments in the table are generally not wanted, I would like to deploy Glue automatically within a CloudFormation stack.
Do you have an idea what else I can try?
This is a very common problem. The way we got around the problem when reading text/json files is we had an extra step in between to cast and set proper data types. The crawler data types are a bit iffy sometimes and is based on the data sample available at that point in time
We want to maintain auditing of tables ,
For that my question is
1)will the commit interval in Informatica will be stored anywhere in any variable ,
so that we can maintain the record count for every commit interval.
2)is there any method/script to read the stats from session log and save in audit table.
3)If there are multiple targets in my mapping then in monitor after executing it will show target success count and target reject count as total for all the targets in the mapping.
how to get individual target success and reject count .
You need to use informatica metadata tables which informatica doesn't recommend(still I am mentioning for your reference). So your options are to create a sh/bat script to get these info from session log or create a maplet that collects this kind of statistics and add that maplet in every infa mappings. To answers your questions -
Yes, commit intervals stored in informatica table opb_task_attr where attr_id=14 and select attr_value.
Nope, either you can use some infa mapplet to collct such stats or some shell script.
Yes this is possible. use infomatica view rep_sess_tbl_log for this purpose. Here you can get each target's statistics of a particular session's run.
Koushik