AppFabric unable to create a DataCache (LMTRepopulationJob FAILS) - sharepoint-2013

Well first of all, I am learning sharepoint 2013 and I have been following a few tutorials, so far I just setup a farm and everything seems to be working properly except for this service that is being logged into the event viewer every 5 minutes:
The Execute method of job definition
Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID
1e573155-b7f6-441b-919b-53b2f05770f7) threw an exception. More
information is included below.
Unexpected exception in FeedCacheService.BulkLMTUpdate: Unable to
create a DataCache. SPDistributedCache is probably down..
I found out that this is a job that is configured to execute every 5 minutes
But regarding the assumption that the SPDistributedCache is probably down, I already verified it and it is running
As you can see, it is actually running, also I checked the host cache via SP powershell (get-cachehost and get-cacheclusterhealth) and still all seems fine
Yet when I execute the command get-cache I am getting only the default value, and for what I have read there should be listed another cache types like:
DistributedAccessCache_XXXXXXXXXXXXXXXXXXXXXXXXX
DistributedBouncerCache_XXXXXXXXXXXXXXXXXXXXXXXX
DistributedSearchCache_XXXXXXXXXXXXXXXXXXXXXXXXX
DistributedServerToAppServerAccessTokenCache_XXXXXXX
DistributedViewStateCache_XXXXXXXXXXXXXXXXXXXXXXX
Among others which I think probably should include DataCache
Until now I already tried a few workaround but without success
Restart-Service AppFabricCachingService
Remove-SPDistributedCacheServiceInstance
Add-SPDistributedCacheServiceInstance
Restart-CacheCluster
Even this script that it seems to work on many cases to repair the AppFabric Caching Service
$SPFarm = Get-SPFarm
$cacheClusterName = "SPDistributedCacheCluster_" + $SPFarm.Id.ToString()
$cacheClusterManager = [Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheClusterInfoManager]::Local
$cacheClusterInfo = $cacheClusterManager.GetSPDistributedCacheClusterInfo($cacheClusterName);
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $env:computername}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance
$cacheClusterInfo.CacheHostsInfoCollection
Well if anyone has any suggestion, I will appreciate very much, thank you in advance!

This is a generic error message, meaning that the real issue isn't known (hence the word "Probably").
I believe that the key to solving this problem when it is not the Probably , is in looking in the ULS log for the events that have occurred just before it. Events of type "Unexpected", do not appear in the events log and are often seen before a generaic type of error.
In many cases you might see something like "File not Found". This usually means that the noted file is not in the assembly cache. Since the distributed Cache utilizes the AppFabric, which is outside of Sharepoint, then the only way for Sharepoint to find it's file, is to look in the assembly cache. The sharepoint pre Installer should have put the files there, but it might have failed or maybe someone uninstalled the App Fabric and re Installed it manually, which would have removed the files from the assembly and not put them back.

Before Restart-CacheCluster you could specify the connection to your SharePoint Database (The catalog name could be not the same)
Use-CacheCluster -ConnectionString "Data Source=(SharePoint DB Server)
\\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;
Integrated Security=True" -ProviderType System.Data.SqlClient
NOTE: It doesn't work permanently
NOTE 2: If you don't have named instance on DB Server, just put the name of your server without "\".
If you don't have a catalog, you could follow this script
***
Remove-Cache default
New-Cache SharePointCache
Get-CacheConfig SharePointCache
Set-CacheConfig SharePointCache -NotificationsEnabled True
***
New-CacheCluster -Provider System.Data.SqlClient -ConnectionString "Data Source=(SharePoint DB Server)\\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;Integrated Security=True" -Size Small
Register-CacheHost -Provider System.Data.SqlClient -ConnectionString "Data Source=(SharePoint DB Server)\\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;Integrated Security=True" -Account "Domain\spservices_account" -CachePort 22233 -ClusterPort 22234 -ArbitrationPort 22235 -ReplicationPort 22236 -HostName [Name_of_your_server]
Add-CacheHost -Provider System.Data.SqlClient -ConnectionString "Data Source=(SharePoint DB Server)\\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;Integrated Security=True" -Account "Domain\spservices_account"
Add-CacheAdmin -Provider System.Data.SqlClient -ConnectionString "Data Source=(SharePoint DB Server)\\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;Integrated Security=True"
Use-CacheCluster
You could specify or check your database configuration in regedit
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\AppFabric\V1.0\Configuration
Look for ConnectionString string value, and set your connection string
Data Source=(SharePoint DB Server)\(Optional Instance);Initial Catalog=CacheClusterConfigurationDB;Integrated Security=True
To query the status of the server you could use:
Get-SPServiceInstance | ? {($_.service.tostring()) -eq “SPDistributedCacheService Name=AppFabricCachingService”} | select Server, Status
Get-SPServer | ? {($_.ServiceInstances | % TypeName) -contains "Distributed Cache"} | % Address
Get-AFCache | Format-Table –AutoSize
Get-CacheHost
Aditional:
If you need to change your service account, you could do this procedure:
$f = Get-SPFarm
$svc = $f.Services | ? {$_.Name -eq "AppFabricCachingService"}
$acc = Get-SPManagedAccount -Identity "Domain\spservices_account"
$svc.ProcessIdentity.CurrentIdentityType = "SpecificUser"
$svc.ProcessIdentity.ManagedAccount = $acc
$svc.ProcessIdentity.Update()
$svc.ProcessIdentity.Deploy()

have you changed the distributed cache account from the farm account?
what build number are you on?
is this a single server farm?
off the top of my head, the only thing left is this:
Grant-CacheAllowedClientAccount -Account "domain\ProfileserviceWebAppIdentity"
i would do an iisreset and restart the owstimer service after you run this command.

Related

GCP Composer - Airflow webserver shutdown constantly

I'm using GCP Composer with newest image version composer-1.16.1-airflow-1.10.15.
Mine webservers are dying from time to time because of some missing cache files
{cli.py:1050} ERROR - [Errno 2] No such file or directory
Does anybody know how to solve it?
Additional info:
Workers:
Node count 3 Disk size (GB) 20 Machine type n1-standard-1
Web server configuration:
Machine type composer-n1-webserver-8 (8 vCPU, 7.6 GB memory)
Configuration overrides:
UPDATE 27.04.2021
I've managed to find the place responsible for killing the web-server
https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1032-L1138
GCP Composer is using Celery Executor underneath - soo during the check it tries to read some cache files that are already removed by workers?
I've found it! Aaand I'll report the bug to GCP Composer team
So if the config webserver.reload_on_plugin_change=True then cli is going into that section:
https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1118-L1138
# if we should check the directory with the plugin,
if self.reload_on_plugin_change:
# compare the previous and current contents of the directory
new_state = self._generate_plugin_state()
# If changed, wait until its content is fully saved.
if new_state != self._last_plugin_state:
self.log.debug(
'[%d / %d] Plugins folder changed. The gunicorn will be restarted the next time the '
'plugin directory is checked, if there is no change in it.',
num_ready_workers_running, num_workers_running
)
self._restart_on_next_plugin_check = True
self._last_plugin_state = new_state
elif self._restart_on_next_plugin_check:
self.log.debug(
'[%d / %d] Starts reloading the gunicorn configuration.',
num_ready_workers_running, num_workers_running
)
self._restart_on_next_plugin_check = False
self._last_refresh_time = time.time()
self._reload_gunicorn()
def _generate_plugin_state(self):
"""
Generate dict of filenames and last modification time of all files in settings.PLUGINS_FOLDER
directory.
"""
if not settings.PLUGINS_FOLDER:
return {}
all_filenames = []
for (root, _, filenames) in os.walk(settings.PLUGINS_FOLDER):
all_filenames.extend(os.path.join(root, f) for f in filenames)
plugin_state = {f: self._get_file_hash(f) for f in sorted(all_filenames)}
return plugin_state
It is generating files to check by calling os.walk(settings.PLUGINS_FOLDER) function.
In the same time gcsfuse is deciding to delete part of these files
And an error happens - file is not found.
So disabling webserver.reload_on_plugin_change is making the work - but this option is really convenient so I'll create the bug ticket for google

How to execute command on multiple servers for executing a command

i have set of servers (150) for logging and a command (to get disk space). How can i execute this command for each server.
Suppose if script is taking 1 min to get report of command for single server, how can i send report for all the servers for every 10 min?
use strict;
use warnings;
use Net::SSH::Perl;
use Filesys::DiskSpace;
# i have almost more than 100 servers..
my %hosts = (
'localhost' => {
user => "z",
password => "qumquat",
},
'129.221.63.205' => {
user => "z",
password => "aardvark",
},
'129.221.63.205' => {
user => "z",
password => "aardvark",
},
);
# file system /home or /dev/sda5
my $dir = "/home";
my $cmd = "df $dir";
foreach my $host (keys %hosts) {
my $ssh = Net::SSH::Perl->new($host,port => 22,debug => 1,protocol => 2,1 );
$ssh->login($hostdata{$host}{user},$hostdata{$host}{password} );
my ($out) = $ssh->cmd($cmd});
print "$out\n";
}
It has to send output of disk space for each server
Is there a reason this needs to be done in Perl? There is an existing tool, dsh, which provides precisely this functionality of using ssh to run a shell command on multiple hosts and report the output from each. It also has the ability, with the -c (concurrent) switch to run the command at the same time on all hosts rather than waiting for each one to complete before going on to the next, which you would need if you want to monitor 150 machines every 10 minutes, but it takes 1 minute to check each host.
To use dsh, first create a file in ~/.dsh/group/ containing a list of your servers. I'll put mine in ~/.dsh/group/test-group with the content:
galera-1
galera-2
galera-3
Then I can run the command
dsh -g test-group -c 'df -h /'
And get back the result:
galera-3: Filesystem Size Used Avail Use% Mounted on
galera-3: /dev/mapper/debian-system 140G 36G 99G 27% /
galera-1: Filesystem Size Used Avail Use% Mounted on
galera-1: /dev/mapper/debian-system 140G 29G 106G 22% /
galera-2: Filesystem Size Used Avail Use% Mounted on
galera-2: /dev/mapper/debian-system 140G 26G 109G 20% /
(They're out-of-order because I used -c, so the command was sent to all three servers at once and the results were printed in the order the responses were received. Without -c, they would appear in the same order the servers are listed in the group file, but then it would wait for each reponse before connecting to the next server.)
But, really, with the talk of repeating this check every 10 minutes, it sounds like what you really want is a proper monitoring system such as Icinga (a high-performance fork of the better-known Nagios), rather than just a way to run commands remotely on multiple machines (which is what dsh provides). Unfortunately, configuring an Icinga monitoring system is too involved for me to provide an example here, but I can tell you that monitoring disk space is one of the checks that are included and enabled by default when using it.
There is a ready-made tool called Ansible for exactly this purpose. There you can define your list of servers, group then and execute commands on all of them.

Symfony2 unit testing: PDOException: SQLSTATE[00000] [1040] Too many connections

when I run unit tests for my application, first tests are successful, then around 100, tests start to fail, due to PDOException (Too many connections). I have already searched about this problem, but was not able to solve it.
My config is as follows:
<phpunit
backupGlobals = "false"
backupStaticAttributes = "false"
colors = "true"
convertErrorsToExceptions = "true"
convertNoticesToExceptions = "true"
convertWarningsToExceptions = "true"
processIsolation = "false"
stopOnFailure = "false"
syntaxCheck = "false"
bootstrap = "bootstrap.php.cache" >
If I change processIsolation to "true", all tests generate an error (E):
Caused by ErrorException: unserialize(): Error at offset 0 of 79 bytes
For that I tried setting "detect_unicode = Off" inside php.ini file.
If I run tests in smaller batches, like with "--group something", all tests are successful.
Can someone help me solve the issue when running all the tests at once? I really want to get rid of the PDOException.
Thanks in advance!
You should increase the maximum number of concurrent connections in your DB server.
If you're using MySQL, edit /etc/mysql/my.cnf and set the max_connections parameter to the number of concurrent connections you need. Then restart the MySQL server.
Keep in mind: In theory, the physical limits are very high. But if your queries cause a high CPU load or memory consumption, your DB server could eat up the resources required for other processes. This means, you could run out of memory, or your system can become overloaded.
For people who are having the same issue, here is more specific steps to config the my.cnf file.
If you are sure you are on the right my.cnf file, put max_connections = 500 (default is 151) to [mysqld] section in the my.cnf. Don't put it in the [client] section.
To make sure you are on the right my.cnf, if you have multiple mysqld installed from Homebrew or XAMMPP, find the right mysqld, for XAMMPP using /Applications/XAMPP/xamppfiles/sbin/mysqld --verbose --help | grep -A 1 "Default options" and you will get something like this:
Default options are read from the following files in the given order:
/Applications/XAMPP/xamppfiles/etc/xampp/my.cnf /Applications/XAMPP/xamppfiles/etc/my.cnf ~/.my.cnf
Normally it's at /Applications/XAMPP/xamppfiles/etc/my.cnf.

How to debug "could not receive data from client: Connection reset by peer"

I'm running a django-celery application on Ubuntu-12.04.
When I run a celery task from my web interface, I get the following error, taken form postgresql-9.3 logfile (maximum level of log):
2013-11-12 13:57:01 GMT tss_usr 8113 LOG: could not receive data from client: Connection reset by peer
tss_usr is the postgresql user of the django application database and (in this example) 8113 is the pid of the process who killed the connection, I guess.
Have you got any idea on why this happens or at least how to debug this issue?
To make things work again I need to restart postgresql which is extremely uncomfortable.
I know this is an older post, but I just found it because I had the same error today in my postgres logs. I narrowed it down to a PDO select statement. I'm using Zend Framework 1.10.3 on Ubuntu Precise.
The following pdo statement generated an error if $opinion is a long text string. The column opinion is type Text in my postgres table. The query succeeds if $opinion is under a certain number of characters. 1000 characters works fine. 2000 characters fails with "could not receive data from client: Connection reset by peer".
$select = $this->db->select()
->from( 'datauserstopics' )
->where("opinion = ?",trim($opinion))
->where("datatopicsid = ?",trim($tid))
->where("datausersid= ?",$datausersid);
$stmt = $this->db->query($select);
I circumvented the problem by using:
->where("substr(opinion,1,100) = ?",trim(substr($opinion,1,100)))
This is not a perfect solution, but for my purposes, the select statement using substr() suffices.
Note that I have no problem inserting long strings into the same table/column. The disconnect problem only appears for me on the PDO select with relatively long text strings.
I'm getting it in 2017 with 9.4, I have no text fields, don't know what a PDO is. My select statement is about 50 bytes long, I'm trying to fetch an int4 and a double precision. I suspect the error message can mean multiple things.
I've since found https://dba.stackexchange.com/questions/142350/postgres-could-not-receive-data-from-client-connection-reset-by-peer which indicates it could be a problem with the client configuration. My client is libpg and PQconnectdb() is giving me a CONNECTION_OK return. It works at least partly.
For me, restarting the hypervisor where both the Postgres and the application using it helped. I've seen stack traces in dmesg before, though.

How to / Is it possible to monitor remote WMI scripting?

do u know perhaps a way (via script or program) to find out if e.g. a WMI script runs from a remote PC1 and performs some tasks in another PC2 when I am seating in a third PC: PC3
Assume that all PC belong to the same network and domain and have windows xp installed.
The reason for this that I administer a small network and I think that one student shuts down the PC where another student works, via WMI scripting.
Is there a way to monitor (via script or program) such a thing, without disabling wmi remote access.
Thanks everybody
You can get the credentials used to perform the shutdown by looking at verbose WMI logs.
1) Enable verbose WMI logging
Run 'Wmimgmt.msc' (also available under My Computer > 'Manage' > 'Services and Applications' > 'WMI Control')
Select 'WMI Control (Local)', right click --> select 'Properties'
Select 'Logging' Tab, set 'Logging level' to Verbose
2) Look at the WMI log files (Default location: %WINDIR%\system32\wbemLogs) to see record of remote access and actions taken. Specifically, look at wbemcore.log
Example: When I logged in remotely I saw the following entry [<domain> and <username> here were the real ones used for the remote connection]:
(Thu Aug 13 <time>) : DCOM connection from <domain>\<username>
at authentiction level Packet, AuthnSvc = 9, AuthzSvc = 1, Capabilities = 0
Then, to execute the WMI method the student would need to GetObject Win32_OperatingSystem, which showed up like this:
(Thu Aug 13 <time>): CALL CWbemNamespace::GetObject
BSTR ObjectPath = win32_operatingsystem
long lFlags = 0
And finally you'd look for executing the Win32Shutdown method, which should log something like this:
(Thu Aug 13 <time>) : CALL CWbemNamespace::ExecMethodAsync
BSTR ObjectPath = Win32_OperatingSystem
BSTR MethodName = Win32Shutdown