Solr/Jetty confusion - how to get persistent service? - django

I'm on Ubuntu 12.04, using jetty (9_M4), solr (4.0.0) through django-haystack (2.0beta) installed in a django 1.4.2 site.
I've had to make a number of jumps through hoops to get this up and running, as there is very little documentation for getting solr 4.0 up and running in Ubuntu with django-haystack. But how hard could it be?
My main confusion is between what Jetty is doing, and what Solr is doing.
So, I installed Jetty via this tutorial making a small adjustment to the init file as I note in the comment on that tutorial. Jetty is now running, I can see it in browser, even after a reboot.
Great.
Move onto installing Solr via this tutorial again with adjustments. Instead of:
cp -R apache-solr-4.0.0/example/solr /opt
I use:
cp -R apache-solr-4.0.0/example/* /opt/solr/
and therefore add the following to /etc/default/jetty:
JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr/solr $JAVA_OPTIONS"
I can't exactly remember why I did that, but there was a reason at the time. I stop using that tutorial at that point, as I don't understand the solr concept of core very well, and I'm already flustered at how annoyingly difficult this is.
(For context, when I set up django-haystack 2.0 with solr 3.5 about 6 months ago it was terrifyingly easy and didn't require a separate jetty installation - all up took me about two hours)
Anyway, I go back to my Django installation, create the schema.xml, make the stopwords-en.txt changes, copy it across to /opt/solr/solr/collection1/conf.
I edit /opt/solr/solr/collection1/conf/solrconfig.xml to remove the reference to updateLog since any attempt I made to add version field to schema.xml failed dismally with some sort of character error. See here (lucene -solr-user mailing list) and here (django-haystack github) for more info on this.
Finally, I cd into /opt/solr and run it:
sudo java -jar start.jar
Ba-da-boom! I get some results (when I go to my django site and use the search I've set up). Fantastic. This is really great. Now I just need to make the starting of solr persistent.
I create an /etc/init/solr that looks like this:
description "Solr Search Server"
# Make sure the file system and network devices have started before
# we begin the daemon
start on (filesystem and net-device-up IFACE!=lo)
# Stop the event daemon on system shutdown
stop on shutdown
# Respawn the process on unexpected termination
respawn
# The meat and potatoes
exec /usr/bin/java -jar /opt/solr/start.jar >> /var/log/solr.log 2>&1
I restart the server and nothing - I can see solr running, but I'm not getting any results in my django search.
I remove the init file and try running from the cli again - yep, sweet.
So, my questions are:
What the hell have I done wrong?
How do I get solr to start at boot and respawn if it dies accidentally AND produce results through my Django/haystack interface
Why do I need jetty and solr running simultaneously, and what is the relationship of /opt/jetty/webapps/solr.war to my /opt/solr? Am I creating causing conflicts?
Why was this so easy with solr 3.5 and so difficult now? I ask this honestly - I don't want a list of excuses or explanations from solr developers - I want to know how my understanding can be so limited in the first instance (solr 3.5) and get it running in two hours and why I now need to have a comprehensively deeper understanding of jetty/solr architecture and cli/shell script hacking to get it to run?

I am not promising to get all your things, but (numbers do not match questions):
1) Jetty is a web-server. Solr runs as a (web) application inside that web server, however:
2) Jetty can also run an embedded webserver, which is how Solr download works. When you do java -jar start.jar that runs Jetty with everything preconfigured. In which case you do not need a standalone Jetty. I suggest start with embedded Jetty, then switch to external one. However, if only your local app talks to local Solr server, you may be able to get quite far without needing full Jetty.
3) You don't need all the stuff you find in example directory - it has multiple confugurations and support files and is somewhat nested (which is confusing)
4) To start you need two things: Running solr; your configuration directory
5) The easiest way to get Solr running is to put the whole distrubution directory (I know - large) somewhere (e.g. /opt/solr).
6) Your configuration directory is very simple. All you need is two files to start, three if you are picky about names:
- (wherever, but make sure Solr can read/write there)
-- solr.xml (if you are picking about collection name, otherwise you can skip it)
-- collection1/ (that's default name, you can change that in solr.xml)
-- collection1/conf/ (this is configuration directory, Solr will add data directory on the same level once you start right)
schema.xml
-- collection1/conf/shema.xml
-- collection1/conf/solrconfig.xml
7) Then, you need to be in the example directory and run java -Dsolr.solr.home= start.jar . This will get all the pieces up and running on port :8983 . Solr 4 has a pretty new admin interface, so visit it with your browser, maybe do the tutorial, etc.
If you need help with minimal functioning schema/solrconfig files, ask separately, but you cannot just use ones from the example directory as it has all the other file references in the fieldType analysers (though you could just comment those lines out).

Related

Django Celery with Redis Issues on Digital Ocean App Platform

After quite a bit of trial and error and a step by step attempt to find solutions I thought I share the problems here and answer them myself according to what I've found. There is not too much documentation on this anywhere except small bits and pieces and this will hopefully help others in the future.
Please note that this is specific to Django, Celery, Redis and the Digital Ocean App Platform.
This is mostly about the below errors and further resulting implications:
OSError: [Errno 38] Function not implemented
and
Cannot connect to redis://......
The first error happens when you try run the celery command celery -A your_app worker --beat -l info
or similar on the App Platform. It appears that this is currently not supported on digital ocean. The second error occurs when you make a number of potential mistakes.
PART 1:
While Digital Ocean might remedy this in the future here is an approach that will offer a workaround. The problem is the not supported execution pool. Google "celery execution pools" if you want to know more and how they work. The default one is prefork. But what you need is either gevent or eventlet. I went with the former for my purposes.
Whichever you pick you will have to install it as it doesn't come with celery by default. In my case it was: pip install gevent (and don't forget adding it to your requirements as well).
Once you have that you can re-run the celery command but note that gevent and beat are not supported within a single command (will result in an error). Instead do the following:
celery -A your_app worker --pool=gevent -l info
and then separately (if you want to run beat that is) in another terminal/console
celery -A your_app beat -l info
In the first line you can also specify the concurrency like so: --concurrency=100. This is not required but useful. Read up on it what it does as that goes beyond the solution here.
PART 2:
In my specific case I've tested the above locally (development) first to make sure they work. The next issue was getting this into production. I use Redis as the db/broker.
In my specific setup I have most of my celery configuration in the_main_app/celery/__init__.py file but sometimes people put it directly into the_main_app/celery.py. Whichever it is you do make sure that the REDIS_URL is set correctly. For development it usually looks something like this:
YOUR_VAR_NAME = os.environ.get('REDIS_URL', 'redis://localhost:6379') where YOUR_VAR_NAME is then set to the broker with everything as below:
YOUR_VAR_NAME = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('the_main_app')
app.conf.broker_url = YOUR_VAR_NAME
The remaining settings are all documented on the "celery first steps with django" help page but are not relevant for what I am showing here.
PART 3:
When you setup your Redis Database on the App Platform (which is very simple) you will see the connection details as 'public network' and 'VPC network'.
The celery documentation says to use the following URL format for production: redis://:password#hostname:port/db_number. This didn't work. If you are not using a yaml file then you can simply copy paste the entire connection string (select from the dropdown!) from the Redis DB connection details and then setup an App-Level environment variable in your Digital Ocean project named REDIS_URL and paste in that entire string (and also encrypt it!).
The string should look like something like this (redis with 2 s!)
rediss://USER:PASS#URL.db.ondigitialocean.com:PORT.
You are almost done. The last step is to setup the workers. It was fine for me to run the PART 1 commands as console commands on the App Platform to test them but eventually I've setup a small worker (+ Add Component) for each line pasted them into the Run Command.
That is basically the process step by step. Good luck!

Methods to automate ColdFusion Administrator settings

When working with a ColdFusion server you can access the CFIDE/administrator to set config values, which update the cfusion/lib/ xml files (e.g. neo-runtime.xml, neo-mail.xml, etc.)
I'd like to automate a deployment process that includes setting these administrator values so that I don't have to log in and manually set them for each new box that shares settings. I'm unsure of the best way to go about it.
Some thoughts I had are:
Replacing the full files with ones containing my custom settings. I've done this for local development, but it may not be an ideal method due to CF hot-fixes potentially adding/removing/changing attributes.
A script to read the wddx xml file and replace the attribute values. I'm having trouble finding information about how to do this method.
Has anyone done anything like this before? Or does anyone have any recommendations on how to best go about this?
At one company, we checked all the neo-*.xml files into source control, with a set for each environment Devs only had access to the dev settings and we could deploy a local development environment with all the correct settings for new employees quickly.
but it may not be an ideal method due to CF hot-fixes potentially adding/removing/changing attributes.
You have to keep up with those changes and migrate each environment appropriately.
While I was there, we upgraded from 8 to 9, 9 to 11 and from 11 to 2016. Environments would have to be mixed as it took time to verify the applications worked with each new version of CF. Each server got their correct XML files for that environment and scripts would copy updates as needed. We had something like 55 servers in production running 8 instances each, so this scaled well.
There is a very usefull tool developed by Ortus Solutions for this kind of automatizations called cfconfig that can be installed with their commandbox command line utility. This tool isn't only capable of setting configurations of the administrator: It is also capable of exporting/importing settings to a json file (cfconfig.json). It might be what you need.
Here is the link to their docs
https://cfconfig.ortusbooks.com/introduction/getting-started-guide
CFConfig worked perfectly for my needs. I marked #AndreasRu answer as accepted for introducing me to that tool! I'm just adding this response with some additional detail for posterity.
Install CommandBox as part of deployment script
Install CFConfig as part of deployment script
Use CFConfig to export a config.json file from an existing box that will share settings with the new deployment. Store this json file in source control for each type/env of box.
Use CFConfig to import the config.json as part of deployment script
Here's a simple example of what this looks like on debian
# Installs CommandBox
curl -fsSl https://downloads.ortussolutions.com/debs/gpg | apt-key add -
echo "deb https://downloads.ortussolutions.com/debs/noarch /" | tee -a /etc/apt/sources.list.d/commandbox.list
apt-get update && apt-get install apt-transport-https commandbox
# Installs CFConfig module
box install commandbox-cfconfig
# Import config settings
box cfconfig import from=/<path-to-config>/config.json to=/opt/ColdFusion/cfusion/ toFormat=adobe#11.0.19

Non-starting rails 5.2 app - ActiveSupport::MessageEncryptor::InvalidMessage

I have deployed two rails apps to Digital Ocean, Ubuntu 18.04 with Passenger and Nginx.
Both apps were built on rails 5.2.2 with ruby 2.5.1, and the second app has all the same gems at the same versions. While the first app runs fine, the second will not launch.
The last useful line of the Passenger log says:
[ E 2020-08-06 22:41:56.6186 30885/T1i age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/html/AppName_Prod/current: The application encountered the following error: ActiveSupport::MessageEncryptor::InvalidMessage (ActiveSupport::MessageEncryptor::InvalidMessage)
I know this is somethign to do with the master.key file, but that is present and contains the correct key. I'm not using environment vars to store the master keys - they are in the master.key file inside each app's dir structure.
I've read every SO post I could find on this and none have solved my issue.
Any suggestions for getting these two apps (and more) to work on the same droplet?
I'm all out of ideas.
Thank you for any help you can offer.
For anyone who might have the same issue, it was a bit deceptive.
I had tried rails credentials:edit and it didn't fix the issue, but I found that the app's containing folder was owned by user:user, whereas my other app was owned by user:root.
When I changed this, everything started to work.
I hope it helps someone because I didn't find this info anywhere online and it was a lot of trial and error.
Use ls -l to list the current owner of folders in the current working directory, so you can compare them.
For me, this turned out to be somewhat complicated. I had provisioned my server using Ansible, which has a task to copy the Nginx conf. After provisioning the server, I changed RAILS_MASTER_KEY.
It turns out that my Ansible task does not re-write the Nginx conf if it already exists on the server (the file is not compared, I guess). So although I updated RAILS_MASTER_KEY in my Ansible playbook (and it was even getting copied across to the server's environment variables!), it was not accessible to Rails through passenger because it does not pass on the user's environment variables.
Whew!
To fix this (and create a snowflake server in the process...) I manually logged into the server and updated RAILS_MASTER_KEY to my new value in the Nginx passenger_env_var.

Using Selenium WebDriverRemote to start Chrome with command line arguments

I need to capture the Chrome console-debug.log file.
e.g.
chrome.exe --enable-logging
This dumps a console-debug.log file in the UserData folder. I tested this manually and it works.
I can't see to use Selenium to load Chrome remotely with arguments.
I have checked out the docs here and tried to follow
https://sites.google.com/a/chromium.org/chromedriver/capabilities
Now I have two machines, one is a linux machine that hosts my python test code and another is a Windows VM that I run Chrome on for testing against different urls.
The Windows VM has Selenium standalone server 2.45.0 and its' start-up script points to the location of chromedriver.exe (version 2.23.409699). It also starts IEDriverServer but we don't use that right now. I start it using a batch file in the Startup folder that has the line.
java -jar C:\Users\Remote\Desktop\selenium\selenium-server-standalone-2.45.0.jar -Dwebdriver.chrome.driver=C:\Users\Remote\Desktop\selenium\webdrivers\chromedriver.exe -Dwebdriver.ie.driver=C:\Users\Remote\Desktop\selenium\webdrivers\IEDriverServer.exe
On the linux machine I am using a Python implementation, and we use webdriver.Remote to connect to Selenium standalone server. For debug purposes I'm running PyCharm so I can step through the code.
The code I use to setup the connection is:
options = webdriver.ChromeOptions()
options.add_argument("--enable-logging")
caps = options.to_capabilities()
self.selenium = webdriver.Remote(
command_executor='http://%s:4444/wd/hub' % browser_host_ip_address,
desired_capabilities=caps,
proxy=proxy)
If I step through with PyCharm and monitor the caps object just before going in to the webdriver.Remote code I get an object that looks like this:
{'chromeOptions': {'args': ['--enable-logging'], 'extensions': []}, 'javascriptEnabled': True, 'platform': 'ANY', 'browserName': 'chrome', 'version': ''}
This chromeOptions object makes it all the way to the Selenium standalone server and that then outputs to the server console:
10:59:41.109 INFO - Done: [new session: Capabilities [{browserName=chrome, javascriptEnabled=true, chromeOptions={args=[--enable-logging], extensions=[]}, version=, platform=ANY}]]
This then doesn't seem to make it through to chromedriver. I created a simple C program that outputs any arguments passed to it and replaced chromedriver with this and the only arguments passed to it were the --port=xxxxx where x is a number.
So chromeOptions makes it all the way through but fails at the Selenium server. I imagine I am doing something silly but for the life of me I can't work out what.
I have tried the latest version of Selenium standalone server 3.0 beta 2 and 2.53.1, and also 2.39.0 in case something was brought in to stop it. I have also tried multiple chromedriver versions.
I am assuming my original premise that I can use chromeOptions in this way is wrong, but I'm not sure.
I need to use 2 different machines so I assume I would need a webdriver.Remote object as webdriver.Chrome doesn't support remote connections. Is that true?
Any ideas or suggestions would be appreciated?
Adam

appcfg.py not working in command line

I'm just having a bit of trouble understanding why this command:
>appcfg.py -A adept-box-109804 update app.yaml
as given by the Try Google App Engine Now page does not work. I have downloaded the App Engine SDK for Python, and have Path set up to point to the location of appcfg.py, but running appcfg.py in my projects root directory does not work in the command line. I either have to navigate to the folder containing appcfg.py and do
>python appcfg.py help
or do
>python "C:\Program Files (x86)\Google\google_appengine\appcfg.py" help
to get a command to work from anywhere. I used the latter method to deploy my test app, but was just wondering if someone could explain why the command as given by the simple Google tutorial did not do anything. I also checked to make sure that .py files are automatically opened with the Python 2.7 interpreter, such that a file hello.py will be executed in the command line by simply typing
>hello.py
and it will output its print statement. On the other hand, using appcfg.py in a similar manner gives the same output no matter the arguments (please note I truncated the output, but rest assured that they are identical no matter the arguments:
C:\>appcfg.py help backends
Usage: appcfg.py [options] <action>
Action must be one of:
backends: Perform a backend action.
backends configure: Reconfigure a backend without stopping it.
backends delete: Delete a backend.
backends list: List all backends configured for the app.
backends rollback: Roll back an update of a backend.
backends start: Start a backend.
backends stop: Stop a backend.
backends update: Update one or more backends.
create_bulkloader_config: Create a bulkloader.yaml from a running application.
cron_info: Display information about cron jobs.
delete_version: Delete the specified version for an app.
download_app: Download a previously-uploaded app.
download_data: Download entities from datastore.
help: Print help for a specific action.
list_versions: List all uploaded versions for an app.
request_logs: Write request logs in Apache common log format.
resource_limits_info: Get the resource limits.
rollback: Rollback an in-progress update.
set_default_version: Set the default (serving) version.
start_module_version: Start a module version.
stop_module_version: Stop a module version.
update: Create or update an app version.
update_cron: Update application cron definitions.
update_dispatch: Update application dispatch definitions.
update_dos: Update application dos definitions.
update_indexes: Update application indexes.
update_queues: Update application task queue definitions.
upload_data: Upload data records to datastore.
vacuum_indexes: Delete unused indexes from application.
Use 'help <action>' for a detailed description.
C:\>appcfg.py help update
Usage: appcfg.py [options] <action>
Action must be one of:
backends: Perform a backend action.
backends configure: Reconfigure a backend without stopping it.
backends delete: Delete a backend.
backends list: List all backends configured for the app.
backends rollback: Roll back an update of a backend.
backends start: Start a backend.
backends stop: Stop a backend.
backends update: Update one or more backends.
create_bulkloader_config: Create a bulkloader.yaml from a running application.
cron_info: Display information about cron jobs.
delete_version: Delete the specified version for an app.
download_app: Download a previously-uploaded app.
download_data: Download entities from datastore.
help: Print help for a specific action.
list_versions: List all uploaded versions for an app.
request_logs: Write request logs in Apache common log format.
resource_limits_info: Get the resource limits.
rollback: Rollback an in-progress update.
set_default_version: Set the default (serving) version.
start_module_version: Start a module version.
stop_module_version: Stop a module version.
update: Create or update an app version.
update_cron: Update application cron definitions.
update_dispatch: Update application dispatch definitions.
update_dos: Update application dos definitions.
update_indexes: Update application indexes.
update_queues: Update application task queue definitions.
upload_data: Upload data records to datastore.
vacuum_indexes: Delete unused indexes from application.
Use 'help <action>' for a detailed description.
I finally tracked down the real reason, and it wasn't a bug with the AppEngine SDK. Rather it was with my Python interpreter, as I noticed it wasn't accepting arguments for any .py files. It turned out to be a registry error, located at [HKEY_CLASSES_ROOT\Applications\python.exe\shell\open\command] where I had to change the value from "C:\Python27\python.exe" "%1" to "C:\Python27\python.exe" "%1" %*
How this happened, whether it be the Python 2.7 installer, or maybe the AppEngine SDK, I'm not sure though.
Your confusion probably stems from mixing up 2 possible invocations styles:
python appcfg.py ...
appcfg.py ...
The 1st one can't make use of the fact that the location of the appcfg.py is in the path, it is just an argument to the python executable, which can not locate the appcfg.py file unless either:
it finds it in the current directory
the appcfg.py file is specified using a full path or a path relative to the current working directory from which python is invoked
This is the reason for which your 2nd and 3rd commands don't work as you'd expect. Using the 2nd invocation style instead should work if the location of the appcfg.py is in the path - just as your last command invocation does.
Key point to remember: the path configuration applies to the command executable only, not to its arguments (which BTW each executable may process as it wishes, some executables may combine arguments with the path configuration to obtain location of files).
Similarly appcfg.py itself (once successfully invoked using either of the 2 invocation styles) needs to be able to locate your app.yaml file specified as argument. It cannot do so unless either:
it finds it in the current directory
the app.yaml file (or its directory) is specified using a full path or a path relative to the current working directory from which appcfg.py is invoked
I suspect appcfg.py's inability to locate your app.yaml file may be the reason for which the 1st command you mentioned didn't work. If not you should provide details about the failure.
Regarding why the output of your last command is identical regardless of the arguments, I'm not sure, it could be a bug in the windows version of the SDK. In linux the output is different:
> appcfg.py help backends
Usage: appcfg.py [options] backends <directory> <action>
Perform a backend action.
The 'backends' command will perform a backends action.
Options:
-h, --help Show the help message and exit.
-q, --quiet Print errors only.
-v, --verbose Print info level logs.
--noisy Print all logs.
-s SERVER, --server=SERVER
The App Engine server.
-e EMAIL, --email=EMAIL
The username to use. Will prompt if omitted.
-H HOST, --host=HOST Overrides the Host header sent with all RPCs.
--no_cookies Do not save authentication cookies to local disk.
--skip_sdk_update_check
Do not check for SDK updates.
-A APP_ID, --application=APP_ID
Set the application, overriding the application value
from app.yaml file.
-M MODULE, --module=MODULE
Set the module, overriding the module value from
app.yaml.
-V VERSION, --version=VERSION
Set the (major) version, overriding the version value
from app.yaml file.
-r RUNTIME, --runtime=RUNTIME
Override runtime from app.yaml file.
-E NAME:VALUE, --env_variable=NAME:VALUE
Set an environment variable, potentially overriding an
env_variable value from app.yaml file (flag may be
repeated to set multiple variables).
-R, --allow_any_runtime
Do not validate the runtime in app.yaml
--oauth2 Ignored (OAuth2 is the default).
--oauth2_refresh_token=OAUTH2_REFRESH_TOKEN
An existing OAuth2 refresh token to use. Will not
attempt interactive OAuth approval.
--oauth2_access_token=OAUTH2_ACCESS_TOKEN
An existing OAuth2 access token to use. Will not
attempt interactive OAuth approval.
--authenticate_service_account
Authenticate using the default service account for the
Google Compute Engine VM in which appcfg is being
called
--noauth_local_webserver
Do not run a local web server to handle redirects
during OAuth authorization.
I had this problem, and is deepened in local variable python version that different from app engine python version.
So the solution is just to add before the script the current python version location:
C:\Python27\python.exe "C:\Program Files (x86)\Google\google_appengine\appcfg.py"
And it just return to work well.