journalctl to search metadata with regex - regex

Is it possible to search in journalctl via metadata with patterns. What I am doing right now is to search like journalctl CONTAINER_NAME=cranky.hello --lines=100 -f. But what I want to achieve is to search everything after that '.'. Some search pattern like journalctl CONTAINER_NAME=cranky.* --lines=100 -f. Which will also search CONTAINER_NAME metadata like:
cranky.world
cranky.alive
Below are example of output when journalctl is executed:
journalctl CONTAINER_NAME=cranky.hello --lines=100 -f
Oct 17 14:33:35 lottery-staging docker[55587]: chdir: /usr/src/app
Oct 17 14:33:35 lottery-staging docker[55587]: daemon: False
Oct 17 14:33:35 lottery-staging docker[55587]: raw_env: []
Oct 17 14:33:35 lottery-staging docker[55587]: pidfile: None
Oct 17 14:33:35 lottery-staging docker[55587]: worker_tmp_dir: None
journalctl CONTAINER_NAME=cranky.hello --lines=100 -f -o json
{ "__CURSOR" : "s=d98b3d664a71409d9a4d6145b0f8ad93;i=731e;b=2f9d75ec91044d52b8c5e5091370bcf7;m=285b067a063;t=55bbf0361352a;x=64b377c33c8fba96", "__REALTIME_TIMESTAMP" : "1508250837136682", "__MONOTONIC_TIMESTAMP" : "2773213487203", "_BOOT_ID" : "2f9d75ec91044d52b8c5e5091370bcf7", "CONTAINER_TAG" : "", "_TRANSPORT" : "journal", "_PID" : "55587", "_UID" : "0", "_GID" : "0", "_COMM" : "docker", "_EXE" : "/usr/bin/docker", "_CMDLINE" : "/usr/bin/docker daemon -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --userland-proxy=false --tlscert /etc/dockercloud/agent/cert.pem --tlskey /etc/dockercloud/agent/key.pem --tlscacert /etc/dockercloud/agent/ca.pem --tlsverify --log-driver journald", "_SYSTEMD_CGROUP" : "/", "_SELINUX_CONTEXT" : [ 117, 110, 99, 111, 110, 102, 105, 110, 101, 100, 10 ], "_MACHINE_ID" : "0a80624bd4c45a792b0a857c59a858d6", "_HOSTNAME" : "lottery-staging", "PRIORITY" : "6", "MESSAGE" : "Running migrations:", "CONTAINER_ID_FULL" : "c8f60546e9d50f034f364259c409760b3390d979d57a773eccd8d852e1c3553f", "CONTAINER_NAME" : "ghost-1.lottery-staging-stack.c6118be4", "CONTAINER_ID" : "c8f60546e9d5", "_SOURCE_REALTIME_TIMESTAMP" : "1508250837135650" }
{ "__CURSOR" : "s=d98b3d664a71409d9a4d6145b0f8ad93;i=731f;b=2f9d75ec91044d52b8c5e5091370bcf7;m=285b067a2a2;t=55bbf0361376a;x=6c87fea4ea155d00", "__REALTIME_TIMESTAMP" : "1508250837137258", "__MONOTONIC_TIMESTAMP" : "2773213487778", "_BOOT_ID" : "2f9d75ec91044d52b8c5e5091370bcf7", "CONTAINER_TAG" : "", "_TRANSPORT" : "journal", "_PID" : "55587", "_UID" : "0", "_GID" : "0", "_COMM" : "docker", "_EXE" : "/usr/bin/docker", "_CMDLINE" : "/usr/bin/docker daemon -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --userland-proxy=false --tlscert /etc/dockercloud/agent/cert.pem --tlskey /etc/dockercloud/agent/key.pem --tlscacert /etc/dockercloud/agent/ca.pem --tlsverify --log-driver journald", "_SYSTEMD_CGROUP" : "/", "_SELINUX_CONTEXT" : [ 117, 110, 99, 111, 110, 102, 105, 110, 101, 100, 10 ], "_MACHINE_ID" : "0a80624bd4c45a792b0a857c59a858d6", "_HOSTNAME" : "lottery-staging", "PRIORITY" : "6", "MESSAGE" : " No migrations to apply.", "CONTAINER_ID_FULL" : "c8f60546e9d50f034f364259c409760b3390d979d57a773eccd8d852e1c3553f", "CONTAINER_NAME" : "ghost-1.lottery-staging-stack.c6118be4", "CONTAINER_ID" : "c8f60546e9d5", "_SOURCE_REALTIME_TIMESTAMP" : "1508250837135667" }

journalctl does not accept patterns for anything other than unit names (in the -u argument). Depending on your needs, you could perform some filtering using JSON output and grep, as in:
journalctl -u docker -o json -n1000 | grep 'CONTAINER_NAME.*cranky\.'

Related

Neptune loader FROM_OR_TO_VERTEX_ARE_MISSING

I tried to follow this example https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html to load data to neptune
curl X POST -H 'Content-Type: application/json' https://endpoint:port/loader -d '
{
"source" : "s3://source.csv",
"format" : "csv",
"iamRoleArn" : "role",
"region" : "region",
"failOnError" : "FALSE",
"parallelism" : "MEDIUM",
"updateSingleCardinalityProperties" : "FALSE",
"queueRequest" : "TRUE"
}'
{
"status" : "200 OK",
"payload" : {
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb"
}
And I get status 200 but then I try to check if the data was loaded and get this:
curl G 'https://endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb'
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
}
}
I had no idea why I get LOAD_FAILED so I decided to use get-status API to see what errors caused the load failure and got this:
curl -X GET 'endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb?details=true&errors=true'
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
},
"failedFeeds" : [
{
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 1,
"startTime" : 1617653967,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
}
],
"errors" : {
"startIndex" : 1,
"endIndex" : 10,
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb",
"errorLogs" : [
{
"errorCode" : "FROM_OR_TO_VERTEX_ARE_MISSING",
"errorMessage" : "Either from vertex, '1414', or to vertex, '70', is not present.",
"fileName" : "s3://source.csv",
"recordNum" : 0
},
What does this error even mean and what is the possible fix?
It looks as if you were trying to load some edges. When an edge is loaded, the two vertices that the edge will be connecting must already have been loaded/created. The message:
"errorMessage" : "Either from vertex, '1414', or to vertex, '70',is not present.",
is letting you know that one (or both) of the vertices with ID values of '1414' and '70' are missing. All vertices referenced by a CSV file containing edges must already exist (have been created or loaded) prior to loading edges that reference them. If the CSV files for vertices and edges are in the same S3 location then the bulk loader can figure out the order to load them in. If you just ask the loader to load a file containing edges but the vertices are not yet loaded, you will get an error like the one you shared.

Not able to connect to Snowflake from EMR Cluster using Pyspark using airflow emr operator

I am trying to connect to snowflake from EMR cluster launched by airflow EMR operator but I'm getting the following error
py4j.protocol.Py4JJavaError: An error occurred while calling
o147.load. : java.lang.ClassNotFoundException: Failed to find data
source: net.snowflake.spark.snowflake. Please find packages at
http://spark.apache.org/third-party-projects.html
These are the steps I am adding to my EMRaddsteps operator to run the script load_updates.py and I am describing my snowflake packages in the "Args"
STEPS = [
{
"Name" : "convo_facts",
"ActionOnFailure" : "TERMINATE_CLUSTER",
"HadoopJarStep" : {
"Jar" : "command-runner.jar",
"Args" : ["spark-submit", "s3://dev-data-lake/spark_files/cf/load_updates.py", \
"--packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4", \
"INPUT=s3://dev-data-lake/table_exports/public/", \
"OUTPUT=s3://dev-data-lake/emr_output/cf/"]
}
}
]
JOB_FLOW_OVERRIDES = {
'Name' : 'cftest',
'LogUri' : 's3://dev-data-lake/emr_logs/cf/log.txt',
'ReleaseLabel' : 'emr-5.32.0',
'Instances' : {
'InstanceGroups' : [
{
'Name' : 'Master nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'MASTER',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 1,
},
{
'Name' : 'Slave nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'CORE',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 3,
}
],
'KeepJobFlowAliveWhenNoSteps' : True,
'TerminationProtected' : False
},
'Applications' : [{
'Name' : 'Spark'
}],
'JobFlowRole' : 'EMR_EC2_DefaultRole',
'ServiceRole' : 'EMR_DefaultRole'
}
And, this is how I am adding snowflake creds in my load_updates.py script to extract into a pyspark dataframe.
# Set options below
sfOptions = {
"sfURL" : "xxxx.us-east-1.snowflakecomputing.com",
"sfUser" : "user",
"sfPassword" : "xxxx",
"sfDatabase" : "",
"sfSchema" : "PUBLIC",
"sfWarehouse" : ""
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
query_sql = """select * from cf""";
messages_new = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
.options(**sfOptions) \
.option("query", query_sql) \
.load()
Not sure if I am missing something here or where am I doing wrong.
The option --package should be placed before s3://.../load_updates.py in the spark-submit command. Otherwise, it'll be considered as application argument.
Try with this :
STEPS = [
{
"Name": "convo_facts",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--packages",
"net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4",
"s3://dev-data-lake/spark_files/cf/load_updates.py",
"INPUT=s3://dev-data-lake/table_exports/public/",
"OUTPUT=s3://dev-data-lake/emr_output/cf/"
]
}
}
]

AWS Managed elastic Search restore - node does not match index setting

Im trying to restore the ElasticSearch snapshot which is taken from the AWS managed elastic search. Version 5.6. Instance type i3.2xlarge.
While restoring this on a VM, immediately the cluster status went to Red and all the shards are unassigned.
{
"cluster_name" : "es-cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 480,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 0.0
}
When I use the allocation explain API, I got this below response.
{
"node_id" : "3WEV1tHoRPm6OguKyxp0zg",
"node_name" : "node-1",
"transport_address" : "10.0.0.2:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [instance_type:\"i2.2xlarge OR i3.2xlarge\"]"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
This is something strange and I never faced this. Anyhow the snapshot is done, How can I ignore this setting while restoring? Even I tried the below query but still the same issue.
curl -X POST "localhost:9200/_snapshot/restore/awsnap/_restore?pretty" -H 'Content-Type: application/json' -d'
{"ignore_index_settings": [
"index.routing.allocation.include"
]
}'
I found the cause and the solution.
Detailed troubleshooting steps are here https://thedataguy.in/restore-aws-elasticsearch-snapshot-failed-index-settings/
But leaving this comment here, so others can get benefit from it.
This is AWS specific thing, So I used this to solve it.
curl -X POST "localhost:9200/_snapshot/restore/awsnap/_restore?pretty" -H 'Content-Type: application/json' -d'
{"ignore_index_settings": [
"index.routing.allocation.include.instance_type"
]
}
'

grep including new line

I have a log file, and I want to get the content using pattern
Log file will look like this
2019-05-15 16:40 +07:00: data { data:
[ { audio_incremental_num: 1,
session_id: 'openrJEe7A_1557912549',
stream_time: 88,
duration: 291,
audio_id: '749f7c75-9fe1-4dbc-b5d8-770aadfe94bc'
version: '1.2' },
{ audio_incremental_num: 1,
session_id: 'openrJEe7A_1557912549',
stream_time: 88,
duration: 291,
audio_id: '749f7c75-9fe1-4dbc-b5d8-770aadfe94bc'
version: '1.2' }] }
2019-05-15 16:50 +07:00: data { data:
[ { audio_incremental_num: 1,
session_id: 'openrJEe7A_1557912549',
stream_time: 88,
duration: 291,
audio_id: '749f7c75-9fe1-4dbc-b5d8-770aadfe94bc'
version: '1.2' },
{ audio_incremental_num: 1,
session_id: 'openrJEe7A_1557912549',
stream_time: 88,
duration: 291,
audio_id: '749f7c75-9fe1-4dbc-b5d8-770aadfe94bc'
version: '1.2' }] }
I have tried using these but no luck
grep -zo '2019-05-[0-9][1-9] [0-9][0-9]:[0-9][0-9] +07:00: data { data:[[:space:]]'
grep -P '2019-05-[0-9]{2} [0-9]{2}:[0-9]{2} \+07:00: data { data:(\s.*)*.*'
Note: My log file actually is mixed with other log string content, so its not 100% JSON log
Your log file look like a json format you can use jq in bash to parse that is very useful check this link Working with JSON in bash using jq

VMware, Output the Network HealthCheck in on CSV file

I have a script, it work perfectly and everything is show on my power shell screen "Console". But I try to figure Out how to export in CSV
Script:
foreach($vds in Get-VDSwitch)
{
$vds.ExtensionData.Runtime.HostMemberRuntime | %{
$.HealthCheckResult | where{$ -is [VMware.Vim.VMwareDVSVlanHealthCheckResult]} |
Select #{N='vdSwitch';E={$vds.Name}},
UplinkPortKey,
#{N='TrunkedVLAN';E={
($.TrunkedVLAN | %{
if($.Start -eq $.End){
"{0}" -f $.Start
}
else{
"{0}-{1}" -f $.Start,$.End
}
}) -join ','
}}
}
}
The Output on screen look like this;
VsanEnabled : False
VsanDiskClaimMode : Manual
HATotalSlots : 3099
HAUsedSlots : 22
HAAvailableSlots : 1527
HASlotCpuMHz : 32
HASlotMemoryMb : 328
HASlotMemoryGB : 0.3203125
HASlotNumVCpus : 1
ParentId : Folder-group-h28
ParentFolder : host
HAEnabled : True
HAAdmissionControlEnabled : True
HAFailoverLevel : 1
HARestartPriority : Medium
HAIsolationResponse : DoNothing
VMSwapfilePolicy : WithVM
DrsEnabled : True
DrsMode : FullyAutomated
DrsAutomationLevel : FullyAutomated
EVCMode : intel-nehalem
Name : mac01dmzp01
CustomFields : {}
ExtensionData : VMware.Vim.ClusterComputeResource
Id : ClusterComputeResource-domain-c12033
Uid : /VIServer=cn\t175726#mac01vcp02.cn.ca:443/Cluster=ClusterComputeResource-domain-c12033/
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 78
TrunkedVLAN : 11-17,396,500
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 79
TrunkedVLAN : 11-17,396,500
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 82
TrunkedVLAN : 11-17,396,500
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 83
TrunkedVLAN : 11-17,396,500
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 358
TrunkedVLAN : 11-17,396,500
vdSwitch : vds-toronto-mac01-2-ports-10Gbe
UplinkPortKey : 359
TrunkedVLAN : 11-17,396,500
a lot more ......
I found the way to do it, is with a function.
#####################################################
# vSphere 6.5
# Get ESX HealthCheck Network Config from VDS
#
# by Gerald Begin (Nov.20 2018)
#################################
##### Set Script Location
Set-Location T:\___Main-Script___\_VDS-vLANs_
##### Add VMWare Module.
Get-Module -Name VMware* -ListAvailable | Import-Module
##### Output Path
$Desti = 'T:\___Main-Script___\_VDS-vLANs_\Output'
Import-Module -Name "T:\__Script_Functions__\Connect2All.ps1" -Force:$true # Function to Connect to ALL vCenters
$Clster = "mac01dmzp01"
#### --------------------------------------
function GetInfo {
###################################################
foreach($vds in Get-VDSwitch)
{
$vds.ExtensionData.Runtime.HostMemberRuntime | %{
$_.HealthCheckResult | where{$_ -is [VMware.Vim.VMwareDVSVlanHealthCheckResult]} |
Select #{N='vdSwitch';E={$vds.Name}},
UplinkPortKey,
#{N='TrunkedVLAN';E={
($_.TrunkedVLAN | %{
if($_.Start -eq $_.End){
"{0}" -f $_.Start
}
else{
"{0}-{1}" -f $_.Start,$_.End
}
}) -join ','
}}
}
}
}
Get-Cluster -Name $Clster | GetInfo | Export-Csv -Path $Desti\Results.csv -NoTypeInformation
Disconnect-VIServer * -Confirm:$false