Need help to fix ERR 500

I think I’m not in the same case than the UK server.
I’m running a Production envirronment.
Below the setup of mail method :

Then I can send a test email, I receive it.
But when I create an enterprise, I didin’t receive the email then I can’t validate contact information and can’t complete enterprise registration.

The log file I put on the link is no more available because I reinstall the OFN application with ansible playbook to try the production envirronement.

Great, sounds like moving to production environment is a step in the right direction. @gnollet, could you send me the current log files (production.log and delayed_job.log) so I can have a hunt for causes of the problem sending emails?

@RohanM, I publish the logs here : https://testfrofn.cloudapp.net/testfrofn2.log.tgz

Hi @gnollet, @marito59,

It looks like this is a tricky problem to track down from a distance. Would you be open to giving @maikel or myself SSH access to the server so we can have a go at tracking down and fixing the problem? I reckon that might be a more efficient way to proceed, and we could write up whatever we found so that the problem and solution are documented for future users. I’ll be away for the next week (until the 29th of October), but Maikel will be available for that time.

I would also be interested to be advised of the cause of the issue and it’s resolution.

I’m not sure if this is related to your problems at all, but I will just document one problem solution for the UK staging server. Just to understand the whole architacture of emailing:

  • A mail method is configured in Spree (prod/stage dependend).
  • When a mail should be sent, that job is stored in the database.
  • A separate process (delayed_job) picks up jobs from the database and sends the email.
  • Delayed_job is started by a program called monit.
  • Monit has a system configuration file for delayed_job and checks every two minutes if delayed_job is running, restarting it if needed.

You see, there is a lot that can go wrong. For the UK, I checked the single steps to see what is working and what is not. One problem there was that monit was configured, but could not start delayed_job. There should be a file /etc/monit/conf.d/openfoodnetwork containing this:

check process openfoodnetwork_dj_worker_0
with pidfile /home/ubuntu/apps/openfoodnetwork/current/tmp/pids/delayed_job.0.pid
start program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/delayed_job.sh -i 0 start'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
stop program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/delayed_job.sh -i 0 stop'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
if mem is greater than 250.0 MB for 3 cycles then restart

There is a reference to /home/ubuntu/delayed_job.sh, it was manually added by me. The original script pointed to delayed_job.sh the repository’s script folder. Unfortunately, that script contains paths specific to the Australian servers. I needed to copy it and edit the paths. Then restart monit as root (careful: service monit restart as user doesn’t give a good error message, but doesn’t not reload the system configuration either).

I’m not sure if that is solved in the latest version of the deploy scripts, but it definitely needs attention. Otherwise email just won’t work.

1 Like

@MikeiLL, you fix the issue ! :smile:
The delayed_job.sh script is setup with wrong paths, as you mention, the ansible paths are not used in this script.
I change the paths on delayed_job.sh script ans start it manually, I receive the emails I’m waiting.

Thanks a lot

This looks like it will be very useful information. Thank you.

@maikel and @gnollet I am at the same issue you solved a few weeks ago, trying to get monit to run the /script/delayed_job process. For some reason, possibly unwisely, I changed the app name to ofn_america throughout the deploy script, but I don’t imagine that should make much difference.

The script generated by ansible via the /roles/common/templates/monit.j2 template looks like this:

check process ofn_america_dj_worker_0
with pidfile /home/ubuntu/apps/ofn_america/current/tmp/pids/delayed_job.0.pid
start program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
stop program = "/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 stop'"
as uid ubuntu and gid ubuntu
with timeout 120 seconds
if mem is greater than 250.0 MB for 3 cycles then restart

And the process will start (sending emails) if I run the CLI command directly:

/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'

And /home/ubuntu/apps/myapp/current/tmp/pids/delayed_job.0.pid exists (until I stop the process).

-rwxr-xr-x 1 ubuntu ubuntu  175 Nov  6 01:35 delayed_job

I notice you reference a file called delayed_job.sh above which in my server contains:

#!/usr/bin/env bash

export HOME="/home/ubuntu"
export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH"

$HOME/apps/ofn_america/current/script/delayed_job $@

I can also, from within the script dir start the process with CLI:

sudo bash delayed_job.sh -i 0 start

And again, tmp/pids/delayed_job.0.pid exists and emails are sent.

I restart monit sudo service monit restart: * Restarting daemon monitor monit.

Waited a few minutes and even tried raising timeout to 240, but it’s not nearly that slow when starting delayed jobs fro the CLI.

Any thoughts?

Posted here as well.

Hi!

Monit is a bit difficult to debug. Look into /var/log/monit.log to see if monit is trying to start delayed job. It will only show bash -c as command to start. But probably it will tell you that starting failed every two minutes. Unfortunately, it won’t give you any output of the failing command.

I found it very useful to follow this post about setting the environment as Monit does.

sudo su -u ubuntu
env -i PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/sh
/bin/bash -c 'RAILS_ENV=staging /home/ubuntu/.rbenv/shims/ruby /home/ubuntu/apps/ofn_america/current/script/delayed_job -i 0 start'

That should tell you what goes wrong.

We used delayed_job.sh in the past. It was called by Monit. But the only thing it does, is setting the PATH environment variable. So we figured that we can simplify the call stack. Maybe we missed something. So I’m looking forward to your findings. :slight_smile:

1 Like

I don’t know, my friend. It’s working now. With the bash script OR going strait to the ruby script. I wish I knew what was different. I had also found my way to that same (great) SO post recommending to use Bash with a $PATH variable set.

Thank you for the input, though.

Do you know offhand how Unicorn is supposed to be run? Is monit configured to manage that as well? I had to manually restart it via sudo service unicorn_ofn_america restart.

Also I’d love to get an idea of your server resources. I’m right now with:

 2 Core 
2048MB 
RAM40GB 
Disk2000GB Bandwidth

But have already been pushing the limits with very little usage of the app.

Good that it’s working now.

Unicorn is normally just running. We use the Git post-receive hook to deploy new versions. That script is stopping and starting unicorn (sudo service unicorn_openfoodnetwork stop). A restart works most times but doesn’t pick up newly installed gems.

We upgraded our server a couple of time because of memory issues. Currently, we have 4 cores with 8GB memory. We are still running only two workers. But we hope that we can reduce the memory consumption and run 4 workers on that server. I have no idea about the bandwidth. It’s an AWS server.

Very cool. I hadn’t heard of post-receive before. It looks relatively straight-forward, but if yours is in a place where I could reference it, I wouldn’t mind a look. Will also experiment with using the sudo service unicorn_ofn_america instead of just sudo service unicorn and see how that works.

Your input is much appreciated as the US development team is a little lonely at the moment. :blush:

I documented the deployment via the post-receive hook recently: https://github.com/openfoodfoundation/ofn_deployment/wiki/Deployment-with-Git

But it assumes that you provisioned with the latest ofn_deployment code. We updated the post-receive template in there recently. You could put it on the server manually, but you need to replace all the variables in the template then: https://github.com/openfoodfoundation/ofn_deployment/blob/master/roles/app/templates/post-receive.j2

1 Like

Very cool, man. I just pulled in the latest changes to the deployment sript yesterday, as well as updating ruby: 2.1.5p273 # was 1.9.3-p392. I think that still needs to be updated in ofn_deployment example script, although I’m not sure if we want to specify p273 or not as I had already installed it manually with rbenv install 2.1.5 (unspecified).

Rohan wrote a playbook for updating Ruby on the server as well. Our Gemfile just specifies 2.1.5. There shouldn’t be a reason to specify the patch level.

Hi,

I’m trying to upgrade to last version by using ofn_deployment. Then I found some issues.
The seeds files are not setup to use I10n package :
On the main.yml under roles/deploy/task, I changed the config as below :
"#"TODO: Ugly hack until we have better configuration management

  • name: symlink into the repo
    file: src={{ item.src }} dest={{ item.dest }} state=link force=yes owner={{ unicorn_user }}
    with_items:
    • { src: “{{ assets_path }}”, dest: “{{ build_path }}/public/assets” }
    • { src: “{{ system_path }}”, dest: “{{ build_path }}/public/system” }
    • { src: “{{ spree_path }}”, dest: “{{ build_path }}/public/spree” }
    • { src: “{{ config_path }}/database.yml”, dest: “{{ build_path }}/config/database.yml” }
    • { src: “{{ config_path }}/application.yml”, dest: “{{ build_path }}/config/application.yml” }
      "#" - { src: “{{ config_path }}/seeds.rb”, dest: “{{ build_path }}/db/seeds.rb” } # I comment this line
    • { src: “{{ l10n_path }}/seeds.rb”, dest: “{{ build_path }}/db/seeds.rb” }
    • { src: “{{ l10n_path }}/suburb_seeds.rb”, dest: “{{ build_path }}/db/suburb_seeds.rb” }
    • { src: “{{ l10n_path }}/suburbs.csv”, dest: “{{ build_path }}/db/suburbs.csv” }
    • { src: “{{ l10n_path }}/states.yml”, dest: “{{ build_path }}/db/default/spree/states.yml” }
    • { src: “{{ l10n_path }}/countries.yml”, dest: “{{ build_path }}/db/default/spree/countries.yml” }
      tags: symlink

After fixing seeds, I got this error at the end of the deploment :
NOTIFIED: [mortik.nginx-rails | restart nginx] ********************************
changed: [127.0.0.1]

NOTIFIED: [webserver | restart unicorn] ***************************************
changed: [127.0.0.1]

NOTIFIED: [webserver | restart unicorn step 2] ********************************
failed: [127.0.0.1] => {“failed”: true}
msg: unicorn_openfoodnetwork: unrecognized service
unicorn_openfoodnetwork: unrecognized service

FATAL: all hosts have already failed – aborting

NOTIFIED: [webserver | restart unicorn step 2] ********************************
FATAL: no hosts matched or all hosts have already failed – aborting

FATAL: all hosts have already failed – aborting

NOTIFIED: [webserver | restart unicorn step 2] ********************************
FATAL: no hosts matched or all hosts have already failed – aborting

FATAL: all hosts have already failed – aborting

PLAY RECAP ********************************************************************

The failing tasks are in the handler roles/webserver/handlers/main.yml but I don’t know how to fix it.

On the production.log (but maybe the deploy process is not completed and it’s not important for te moment ?) :
Completed 500 Internal Server Error in 57.0ms
** [Bugsnag] No API key configured, couldn’t notify

ActionView::Template::Error (darkswarm/all.css isn’t precompiled):
12:
13: = yield :scripts
14: %script{src: “//maps.googleapis.com/maps/api/js?libraries=places,geometry&sensor=false”}
15: = split_stylesheet_link_tag "darkswarm/all"
16: = javascript_include_tag "darkswarm/all"
17:
18:
app/views/layouts/darkswarm.html.haml:15:in `_94a4bac7f0ff8866b37431d489d93af7’

I you have an idea, you are welcome.
Thanks

What happens if you go to the terminal and run:

sudo /etc/init.d/unicorn_openfoodnetwork restart

?

If I try on terminal :
sudo service unicorn_openfoodnetwork status
[sudo] password for openfoodnetwork:
Usage: /etc/init.d/unicorn_openfoodnetwork <start|stop|restart|upgrade|force-stop|reopen-logs>

I should enter the password for the user.
On production.log file, I get the message I sent on previous message

Apologies if I’m misunderstanding the issue or if I’m misremembering how it works, but does the unicorn_openfoodnetwork service exist in /etc/init.d/ directory?