This looks like it will be very useful information. Thank you.
@maikel and @gnollet I am at the same issue you solved a few weeks ago, trying to get monit to run the /script/delayed_job
process. For some reason, possibly unwisely, I changed the app name to ofn_america
throughout the deploy script, but I don’t imagine that should make much difference.
The script generated by ansible via the /roles/common/templates/monit.j2
template looks like this:
check process ofn_america_dj_worker_0
with pidfile /home/ubuntu/apps/ofn_america/current/tmp/pids/delayed_job.0.pid
start program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
stop program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 stop'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
if mem is greater than 250.0 MB for 3 cycles then restart
And the process will start (sending emails) if I run the CLI command directly:
/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'
And /home/ubuntu/apps/myapp/current/tmp/pids/delayed_job.0.pid
exists (until I stop the process).
-rwxr-xr-x 1 ubuntu ubuntu 175 Nov 6 01:35 delayed_job
I notice you reference a file called delayed_job.sh
above which in my server contains:
#!/usr/bin/env bash
export HOME="/home/ubuntu"
export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH"
$HOME/apps/ofn_america/current/script/delayed_job $@
I can also, from within the script
dir start the process with CLI:
sudo bash delayed_job.sh -i 0 start
And again, tmp/pids/delayed_job.0.pid
exists and emails are sent.
I restart monit sudo service monit restart
: * Restarting daemon monitor monit
.
Waited a few minutes and even tried raising timeout to 240
, but it’s not nearly that slow when starting delayed jobs fro the CLI.
Any thoughts?
Posted here as well.
Hi!
Monit is a bit difficult to debug. Look into /var/log/monit.log
to see if monit is trying to start delayed job. It will only show bash -c as command to start. But probably it will tell you that starting failed every two minutes. Unfortunately, it won’t give you any output of the failing command.
I found it very useful to follow this post about setting the environment as Monit does.
sudo su -u ubuntu
env -i PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/sh
/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'
That should tell you what goes wrong.
We used delayed_job.sh
in the past. It was called by Monit. But the only thing it does, is setting the PATH
environment variable. So we figured that we can simplify the call stack. Maybe we missed something. So I’m looking forward to your findings.
I don’t know, my friend. It’s working now. With the bash script OR going strait to the ruby script. I wish I knew what was different. I had also found my way to that same (great) SO post recommending to use Bash with a $PATH variable set.
Thank you for the input, though.
Do you know offhand how Unicorn is supposed to be run? Is monit configured to manage that as well? I had to manually restart it via sudo service unicorn_ofn_america restart
.
Also I’d love to get an idea of your server resources. I’m right now with:
2 Core
2048MB
RAM40GB
Disk2000GB Bandwidth
But have already been pushing the limits with very little usage of the app.
Good that it’s working now.
Unicorn is normally just running. We use the Git post-receive hook to deploy new versions. That script is stopping and starting unicorn (sudo service unicorn_openfoodnetwork stop
). A restart works most times but doesn’t pick up newly installed gems.
We upgraded our server a couple of time because of memory issues. Currently, we have 4 cores with 8GB memory. We are still running only two workers. But we hope that we can reduce the memory consumption and run 4 workers on that server. I have no idea about the bandwidth. It’s an AWS server.
Very cool. I hadn’t heard of post-receive
before. It looks relatively straight-forward, but if yours is in a place where I could reference it, I wouldn’t mind a look. Will also experiment with using the sudo service unicorn_ofn_america
instead of just sudo service unicorn
and see how that works.
Your input is much appreciated as the US development team is a little lonely at the moment.
I documented the deployment via the post-receive hook recently: https://github.com/openfoodfoundation/ofn_deployment/wiki/Deployment-with-Git
But it assumes that you provisioned with the latest ofn_deployment code. We updated the post-receive template in there recently. You could put it on the server manually, but you need to replace all the variables in the template then: https://github.com/openfoodfoundation/ofn_deployment/blob/master/roles/app/templates/post-receive.j2
Very cool, man. I just pulled in the latest changes to the deployment sript yesterday, as well as updating ruby: 2.1.5p273 # was 1.9.3-p392
. I think that still needs to be updated in ofn_deployment
example script, although I’m not sure if we want to specify p273
or not as I had already installed it manually with rbenv install 2.1.5
(unspecified).
Rohan wrote a playbook for updating Ruby on the server as well. Our Gemfile just specifies 2.1.5. There shouldn’t be a reason to specify the patch level.
Hi,
I’m trying to upgrade to last version by using ofn_deployment. Then I found some issues.
The seeds files are not setup to use I10n package :
On the main.yml under roles/deploy/task, I changed the config as below :
"#"TODO: Ugly hack until we have better configuration management
- name: symlink into the repo
file: src={{ item.src }} dest={{ item.dest }} state=link force=yes owner={{ unicorn_user }}
with_items:- { src: “{{ assets_path }}”, dest: “{{ build_path }}/public/assets” }
- { src: “{{ system_path }}”, dest: “{{ build_path }}/public/system” }
- { src: “{{ spree_path }}”, dest: “{{ build_path }}/public/spree” }
- { src: “{{ config_path }}/database.yml”, dest: “{{ build_path }}/config/database.yml” }
- { src: “{{ config_path }}/application.yml”, dest: “{{ build_path }}/config/application.yml” }
"#" - { src: “{{ config_path }}/seeds.rb”, dest: “{{ build_path }}/db/seeds.rb” } # I comment this line - { src: “{{ l10n_path }}/seeds.rb”, dest: “{{ build_path }}/db/seeds.rb” }
- { src: “{{ l10n_path }}/suburb_seeds.rb”, dest: “{{ build_path }}/db/suburb_seeds.rb” }
- { src: “{{ l10n_path }}/suburbs.csv”, dest: “{{ build_path }}/db/suburbs.csv” }
- { src: “{{ l10n_path }}/states.yml”, dest: “{{ build_path }}/db/default/spree/states.yml” }
- { src: “{{ l10n_path }}/countries.yml”, dest: “{{ build_path }}/db/default/spree/countries.yml” }
tags: symlink
After fixing seeds, I got this error at the end of the deploment :
NOTIFIED: [mortik.nginx-rails | restart nginx] ********************************
changed: [127.0.0.1]
NOTIFIED: [webserver | restart unicorn] ***************************************
changed: [127.0.0.1]
NOTIFIED: [webserver | restart unicorn step 2] ********************************
failed: [127.0.0.1] => {“failed”: true}
msg: unicorn_openfoodnetwork: unrecognized service
unicorn_openfoodnetwork: unrecognized service
FATAL: all hosts have already failed – aborting
NOTIFIED: [webserver | restart unicorn step 2] ********************************
FATAL: no hosts matched or all hosts have already failed – aborting
FATAL: all hosts have already failed – aborting
NOTIFIED: [webserver | restart unicorn step 2] ********************************
FATAL: no hosts matched or all hosts have already failed – aborting
FATAL: all hosts have already failed – aborting
PLAY RECAP ********************************************************************
The failing tasks are in the handler roles/webserver/handlers/main.yml but I don’t know how to fix it.
On the production.log (but maybe the deploy process is not completed and it’s not important for te moment ?) :
Completed 500 Internal Server Error in 57.0ms
** [Bugsnag] No API key configured, couldn’t notify
ActionView::Template::Error (darkswarm/all.css isn’t precompiled):
12:
13: = yield :scripts
14: %script{src: “//maps.googleapis.com/maps/api/js?libraries=places,geometry&sensor=false”}
15: = split_stylesheet_link_tag "darkswarm/all"
16: = javascript_include_tag "darkswarm/all"
17:
18:
app/views/layouts/darkswarm.html.haml:15:in `_94a4bac7f0ff8866b37431d489d93af7’
I you have an idea, you are welcome.
Thanks
What happens if you go to the terminal and run:
sudo /etc/init.d/unicorn_openfoodnetwork restart
?
If I try on terminal :
sudo service unicorn_openfoodnetwork status
[sudo] password for openfoodnetwork:
Usage: /etc/init.d/unicorn_openfoodnetwork <start|stop|restart|upgrade|force-stop|reopen-logs>
I should enter the password for the user.
On production.log file, I get the message I sent on previous message
Apologies if I’m misunderstanding the issue or if I’m misremembering how it works, but does the unicorn_openfoodnetwork
service exist in /etc/init.d/
directory?
Yes, the service exist.
I think I found the issue, on vars.yml, the is 2 variables for user :
user: openfoodnetwork
"# User name for the unprivileged user which runs unicorn
unicorn_user: openfoodnetwork
user is used by playbook user.yml
unicorn_user is used by deploy.yml
I defined the same user for both variables and now, the step on the playbook is passed !
Now, the playbook is failing on seeds :
TASK: [deploy | seed database] ************************************************
failed: [127.0.0.1] => {“changed”: true, “cmd”: [“bash”, “-lc”, “/home/openfoodnetwork/apps/openfoodnetwork/shared/config/seed.sh RAILS_ENV=production”], “delta”: “0:00:27.616035”, “end”: “2015-11-18 21:45:05.331926”, “rc”: 1, “start”: “2015-11-18 21:44:37.715891”, “warnings”: []}
stderr: Digest::Digest is deprecated; use Digest
rake aborted!
Called id for nil, which would mistakenly be 8 – if you really wanted the id of nil, use object_id
/home/openfoodnetwork/apps/openfoodnetwork/shared/l10n/suburb_seeds.rb:3:in seed_suburbs' /home/openfoodnetwork/apps/openfoodnetwork/current/db/seeds.rb:127:in
<top (required)>’
/home/openfoodnetwork/.gem/ruby/2.1.0/gems/activesupport-3.2.21/lib/active_support/dependencies.rb:245:in load' /home/openfoodnetwork/.gem/ruby/2.1.0/gems/activesupport-3.2.21/lib/active_support/dependencies.rb:245:in
block in load’
/home/openfoodnetwork/.gem/ruby/2.1.0/gems/activesupport-3.2.21/lib/active_support/dependencies.rb:236:in load_dependency' /home/openfoodnetwork/.gem/ruby/2.1.0/gems/activesupport-3.2.21/lib/active_support/dependencies.rb:245:in
load’
/home/openfoodnetwork/.gem/ruby/2.1.0/gems/railties-3.2.21/lib/rails/engine.rb:525:in load_seed' /home/openfoodnetwork/.gem/ruby/2.1.0/gems/activerecord-3.2.21/lib/active_record/railties/databases.rake:347:in
block (2 levels) in <top (required)>'
Tasks: TOP => db:seed
(See full trace by running task with --trace)
FATAL: all hosts have already failed – aborting
I uncomment a line on this part :
-
name: seed database
We run a shell script that passes the default email and password to rake with an EOF block, so we don’t hang on the prompts.
command: bash -lc “{{ config_path }}/seed.sh RAILS_ENV={{ rails_env }}” chdir="{{ current_path }}"
–> when: table_exists.stderr.find(‘does not exist’) != -1
tags: seed
notify:- precompile assets
- restart unicorn
I activate the “when” condition and now it’s ok. The website is running.
Great news. Thank you for sharing. Will revisit this. I remember having some issues with those tasks at one point on the AWS Micro instance and might have ended up replacing with some ansible “raw” commands. Am using a slightly more robust (nonAWS) VPS at the moment.
I’m running the website on Azure VM.
The delayed_job.sh is not setup correctly.
I modify paths but I can’t send email.
If I try to launch it manually, I get this error message :
./delayed_job.sh
Digest::Digest is deprecated; use Digest
(eval):1: warning: encountered \r in middle of line, treated as a mere space
ERROR: no command given
Usage: delayed_job –
-
where is one of:
start start an instance of the application
stop stop all instances of the application
restart stop all instances and restart them afterwards
reload send a SIGHUP to all instances of the application
run start the application and stay on top
zap set the application to a stopped state
status show status (PID) of application instances -
and where may contain several of the following:
-t, --ontop Stay on top (does not daemonize)
-f, --force Force operation
-n, --no_wait Do not wait for processes to stop
Common options:
-h, --help Show this message
–version Show version
I don’t know what tho check more.
Thanks
Are you running it with flags:
./delayed_job.sh -i 0 start
or
sudo bash delayed_job.sh -i 0 start
I imagine you get the same (Ruby?) error when you run:
sudo /bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/openfoodnetwork/current/script/delayed_job -i 0 start'
possibly with RAILS_ENV=production
?
Also you have probably seen already that the canada-updates branch of the ofn_deployment repo has some significant improvements.
Also - likely that the warning about the use of Digest::Digest
is in one of the gazillion gems that are used in the app.
Running the command give me this result :
./delayed_job.sh -i 0 start
Digest::Digest is deprecated; use Digest
(eval):1: warning: encountered \r in middle of line, treated as a mere space
delayed_job.0: process with pid 41919 started.
on delayed_job.log, I can see this message :
2015-11-20T07:13:13+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:41577)] Job Enterprise#send_confirmation_instructions_without_delay (id=4) RUNNING
2015-11-20T07:13:14+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:41577)] Job Enterprise#send_confirmation_instructions_without_delay (id=4) COMPLETED after 0.2093
2015-11-20T07:13:14+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:41577)] 1 jobs processed at 4.0954 j/s, 0 failed
2015-11-20T07:14:53+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:42165)] Job Enterprise#send_confirmation_instructions_without_delay (id=5) RUNNING
2015-11-20T07:14:55+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:42165)] Job Enterprise#send_confirmation_instructions_without_delay (id=5) COMPLETED after 2.4779
2015-11-20T07:14:55+0000: [Worker(delayed_job.0 host:vmaztstfrofn3 pid:42165)] 1 jobs processed at 0.3847 j/s, 0 failed
Then I think delayed_job is running, but I don’t receive confirmation email.
on developpment.log, I see this log :
Delayed::Backend::ActiveRecord::Job Load (2.6ms) UPDATE “delayed_jobs” SET locked_at = ‘2015-11-20 07:16:56.001319’, locked_by = ‘delayed_job.0 host:vmaztstfrofn3 pid:42165’ WHERE id IN (SELECT id FROM “delayed_jobs” WHERE ((run_at <= ‘2015-11-20 07:16:56.000093’ AND (locked_at IS NULL OR locked_at < ‘2015-11-20 07:01:56.000144’) OR locked_by = ‘delayed_job.0 host:vmaztstfrofn3 pid:42165’) AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
I will try the canada-update branch tonight
Thanks
I wish my input was more informed.
I have a question that you may be able to offer some insight on. Some of the images in my installation are not showing up, for instance on the main admin page there is a broken link in the top left pointing to an image called “missing.png”. I had poked around the code a little bit (I’m still somewhat vague on the Rails workflow) and it looks like this image gets pointed to when something’s missing. But I’m not so far figuring out where or how this missing image is supposed to be set. Is there a command line task or a Spree configuration that I’m missing here?
Thanks and good luck!