Posts Tagged ‘rake’

JRuby, Rails, Rake and Cron for Automation

There are times when you need to automate a particular periodic process associated with maintaining your application. Many times, these types of jobs could be performed manually, but it can easily be forgotten about, until a few weeks later when you wonder why your data is out of sync with reality. Take, for example, a process that obtains data (legally) through a third party vendor API and imports that data into an internal database so that recent information can be analyzed by users or programmatically processed in a timely manner. Without this data migration process in place, it might take way too long to be considered a usable system by any reasonable person.

Programmers know that it is not efficient to use a series of manual processes to keep a business going. All is fine and well when initially testing if your job is running correctly, but this often becomes a tedious or forgettable task. Instead, we should always seek out ways to increase the efficiency of ourselves and the efficiency of the people and systems we support. Most operating systems provide a way of at least scheduling tasks to run on a scheduled basis. If you are deploying to a *nix environment, you’re in luck, especially if the job needs to run specifically in the background.

Cron is a daemon started automatically from /etc/init.d that executes scheduled commands by searching its spool area /var/spool/cron/crontabs for crontab files named after accounts in /etc/passwd. Those crontabs should not be accessed directly; instead, use crontab -l to list a user’s crontab, and use crontab -e in order to edit a particular crontab. Cron also reads the files /etc/crontab and /etc/cron.d. It wakes up every minute to examine the crontabs and ensuring that each job has run by its scheduled time. If need be, the job is executed.

The format of cron entries is defined as the following:

.------------ minute (0-59)
| .---------- hour (0-23)
| | .-------- day of month (1-31)
| | | .------ month (1-12) OR jan,feb,mar,apr ...
| | | | .---- day of week (0-6) (Sunday= 0 or 7) OR sun,mon,tue,wed,thu,fri,sat
| | | | |
* * * * * command_to_be_executed

Cron also comes with a small list of special shortcuts as well.

@reboot   = run once at startup
@yearly   = 0 0 1 1 * = @annually = run once per year
@monthly  = 0 0 1 * * = run once per month
@weekly   = 0 0 * * 0 = run once per week
@daily    = 0 0 * * * = @midnight = run once per day
@hourly   = 0 * * * * = run once per hour

So how can use cron along with jruby? and rails?

First, you’ll need to ensure that you have a JRuby in the user’s PATH for which you’ll be using to define the cron jobs. An easy way to do this is to define the paths for JRuby, Java in the user’s .bash_profile.

$> vi .bash_profile

# :wq => to write the changes out the file and quit

$> source ~/.bash_profile

$> echo $JRUBY_HOME

$> echo $PATH

Try running the job that you wish to execute once, manually, as the appropriate user to test the environment:

su -l jrubyist -c 'jruby -S vendor_api_data_import start'

If everything is running properly, you can be sure that the command you add to the user’s crontab will work.
Let’s say we wanted this import task to run every Monday, Wednesday and Friday at 6:45 pm.
You would add the corresponding entry to the user’s crontab as the following, with a comment for describing the entry:

# Automated download/migration process that makes use of the  API
45 18 * * * mon,wed,fri source /etc/profile && 
source /home/jrubyist/.bash_profile && 
jruby -S vendor_api_data_import start

Combining Cron, JRuby, Rails and Rake
The example above is all fine and dandy, but what if you want to call a rake task that needs access to say, a set of models defined for a JRuby on Rails project?. A few days ago, Felipe Coury @fcoury posed this question on Twitter: “What gem/lib/etc do you guys use for Ruby daemons that needs to load the Rails env prior to execution?” I love to browse twitter for #jruby questions so I can help out by finding answers to those questions and writing about it. There’s a fairly straightforward approach you can take to achieve this goal, and the boilerplate process is as follows:

1) Upon deploying the JRuby/Rails application, create a symbolic link to the root of the rails dir.
In the case of JRuby/Rails on JBoss, this means we want a symbolic link to the exploded war file.
2) The Rake task you create should be defined such that it depends on the rails :environment.
3) Tell the cron entry to start the jruby/rake task given the path to that symbolic link.

#1- Can be automated by using a clever trick to hook into the initialization of the rails application.
When your container deploys your rails app, as in the case of JBoss, the $servlet_context will be defined,
so a link to the deployed application directory will be created at “/home/jrubyist/deployed-rails-app

# Create /config/initializers/symlink-deployment.rb
if defined?($servlet_context) && RAILS_ENV == 'production'
  symlink_file = "/home/jrubyist/deployed-rails-app"

  current_link = nil
  if File.exist?(symlink_file) && File.symlink?(symlink_file)
    current_link = File.readlink(symlink_file)

  if current_link != RAILS_ROOT
    system("ln -sf #{File.expand_path(RAILS_ROOT)} #{symlink_file}")

#2 – Example Rake Task that depends on your Rails models:

namespace :third_party_vendor do
  namespace :api do
    desc "Uses the 3rd party vendor API to import data into our internal databases."
    task :data_import => :environment do
      # Since we say that we depend on the :environment, 
      # we now have access to our rails model objects.  For example...
      # eligible_401k_employees = Employees.find(:all, 
                    :conditions => ['effective >= ?', 1.year.ago])

If you need to have access to non-rails frozen gems as well, you will want to modify your config/environment.rb to include the following before the do |config| …

# Load non-Rails frozen gems too..
Dir.glob(File.join(RAILS_ROOT, 'vendor', '*', 'lib')) do |path|
  $LOAD_PATH << path

Some people have reported that in order to get the environment to load correctly for your rails task, they had to add the following to the top of their Rake task:

require File.join(RAILS_ROOT, 'config', 'environment.rb')

#3 – Modify your cron task so that it executes your Rake task defined in your rails app.

# Automated download/migration process that makes use of the  API
45 18 * * * mon,wed,fri source /etc/profile && 
        source /home/jrubyist/.bash_profile && 
        RAILS_ENV=production rake --rakefile 

Finishing touches…
That should be enough to get you started. Finally, if you want your background processes to not affect your production application environment, you might consider adding “nice” to the command. nice maps to a kernel call of the same name. For a given process, it changes the priority in the kernel’s scheduler. A niceness of -20 is the highest priority, and 19 is the lowest priority. You can read more about nice on wikipedia.

Another useful feature to add to your rake task is to have the output of stdout written to a log file. That way you can go back and analyze the log file for any errors that might occur during the execution of your rake task. Create a file that is writable by the cron user, and then add the following to your cron command. The finished product is as follows:

# Automated download/migration process that makes use of the  API
45 18 * * * mon,wed,fri source /etc/profile && 
     source /home/jrubyist/.bash_profile && 
     RAILS_ENV=production nice rake --rakefile 
     --trace >> /home/jrubyist/logs/cron/import.log 2>&1

This technique is both useful and pragmatic. Never worry again about running a periodic process. Let the system do the work.