Ketrew: Keep Track of Experimental Workflows

Ketrew is:

An OCaml library providing an EDSL API to define complex and convoluted workflows (interdependent steps/programs using a lot of data, with many parameter variations, running on different hosts with various schedulers).
A client-server application to interact with these workflows. The engine at heart of the server takes care of orchestrating workflows, and keeps track of everything that succeeds, fails, or gets lost.

This is the version 3.0.0 of Ketrew. See also the documentation for the master branch.

If you have any questions, you may submit an issue, or join the authors on the public “Slack” channel of the Hammer Lab:

Build & Install

Ketrew requires at least OCaml 4.02.2 and should be able to build & work on any Unix platform.

From Opam

If you have opam up and running, just install Ketrew while choose a database backend (you may pick both and choose later in the config-file):

opam install  (sqlite3 | postgresql) [ssl | tls]  ketrew

you need to choose a database backend sqlite or postgresql (you may install both and choose later in the config-file),
if you want Ketrew to use HTTPS you need to get it linked with OpenSSL (package ssl) or nqsb-TLS (package tls, experimental).

This gets you

a ketrew executable that can be used to schedule and run workflows,
an OCaml library also called ketrew that handles the messy orchestration of those tasks and exports the Ketrew.EDSL module used to write workflows.

Remember that at runtime you'll need ssh in your $PATH to execute commands on foreign hosts.

Optional: Ketrew, like any Lwt-based piece of software, will be much faster and scalable when libev is detected and used as a backend. Use opam install conf-libev to tell opam that libev is installed, which you can ensure with

brew install libev on MacOSX
apt-get install libev-devon Debian/Ubuntu,
yum install libev-devel on CentOS (which requires export C_INCLUDE_PATH=/usr/include/libev/ and export LIBRARY_PATH=/usr/lib64/

before opam install conf-libev.

Using Docker

See the instructions at hub.docker.com: hammerlab/ketrew-server.

Without Opam

See the development documentation to find out how to build Ketrew (and its dependencies) from source.

Getting Started

Ketrew is very flexible and hence may seem difficult to understand at first. Let's get a minimal workflow running.

Before you can use Ketrew, you need to configure it:

$ ketrew init

By default this will write a configuration file & list of authorized tokens for the Ketrew server in

$ ls $HOME/.ketrew/
authorized_tokens    configuration.ml

You can check that the client or the server are configured (the client is returned by default) by using the print-configuration subcommand:

$ ketrew print-configuration
[ketrew]
    Mode: Client
    Connection: "http://127.0.0.1:8756"
    Auth-token: "755nRor8Q5z5nx7W22C6C078HF3YoY5PS29sEgNXxP4="
    UI:
        Colors: with colors
        Get-key: uses `cbreak`
        Explorer:
            Default request: Targets younger than 1.5 days
            Targets-per-page: 6
            Targets-to-prefectch: 6
    Misc:
        Debug-level: 0
        Plugins: None
        Tmp-dir: Not-specified (using /tmp/)

For the server (using pc, a command alias for print-configuration):

$ ketrew pc server
[ketrew] 
    Mode: Server
    Engine:
        Database: "/home/hammerlab/.ketrew/database"
        Unix-failure: does not turn into target failure
        Host-timeout-upper-bound:
        Maximum-successive-attempts: 10
        Concurrent-automaton-steps: 4
        Archival-age-threshold: 10.000000 days
    UI:
        Colors: with colors
        Get-key: uses `cbreak`
        Explorer:
            Default request: Targets younger than 1.5 days
            Targets-per-page: 6
            Targets-to-prefectch: 6
    HTTP-server:
        Authorized tokens:
            Inline (Name: l8Tm7Gv6veO1vYB9Fvc-ZnDwwsXXKbaKE4Vn5zcopOk=,
            Value: "l8Tm7Gv6veO1vYB9Fvc-ZnDwwsXXKbaKE4Vn5zcopOk=")
            Path: "/home/hammerlab/.ketrew/authorized_tokens"
        Daemonize: false
        Command Pipe: Some "/home/hammerlab/.ketrew/command.pipe"
        Log-path: Some "/home/hammerlab/.ketrew/server-log"
        Return-error-messages: true
        Max-blocking-time: 300.
        Listen: HTTP: 8756
  Misc:
      Debug-level: 0
      Plugins: None
      Tmp-dir: Not-specified (using /tmp/)

Furthermore, daemon is a shortcut for starting the server in daemon mode. You may now start a server:

$ ketrew start-server --configuration-profile daemon

Let's open the GUI:

$ ketrew gui

Which should open your browser.

Back at the command line you can always check the server's status (using the shorter command line argument -P, instead of --configuration-profile):

$ ketrew status -P daemon
[ketrew] The server appears to be doing well.

The ketrew submit sub-command can create tiny workflows:

ketrew submit --wet-run --tag 1st-workflow --tag command-line --daemonize /tmp/KT,"du -sh $HOME"

The job will appear on the WebUI and you can inspect/restart/kill it.

If you don't like Web UI's you can use the text-based UI:

$ ketrew interact
[ketrew]
    Main menu
    Press a single key:
    * [q]: Quit
    * [v]: Toggle verbose
    * [s]: Display current status
    * [l]: Loop displaying the status
    * [k]: Kill targets
    * [e]: The Target Explorer™

Finally to stop the server:

$ ketrew stop -P daemon
[ketrew] Server killed.

As you can see, just from the command line, you can use ketrew submit to launch tasks. But to go further we need to use an EDSL.

The EDSL: Defining Workflows

Overview

The EDSL is an OCaml library where functions are used to build a workflow data-structure. Ketrew.Client.submit_workflow is used to submit that datastructure to the engine.

A workflow is a graph of “workflow-nodes” (sometimes called “targets”).

There are three kinds of links (edges) between nodes:

depends_on: nodes that need to be ensured or satisfied before a node can start,
on_failure_activate: nodes that will be activated if the node fails, and
on_success_activate: nodes that will be activated only after a node succeeds.

See the Ketrew.EDSL.workflow_node function documentation for details. Any OCaml program can use the EDSL (script, compiled, or even inside the toplevel). See the documentation of the EDSL API (Ketrew.EDSL).

Example

The following script extends the previous shell-based example with the capability to send emails upon the success or failure of your command.

#use "topfind"
#thread
#require "ketrew"

let run_command_with_daemonize ~cmd ~email =
  let module KEDSL = Ketrew.EDSL in

  (* Where to run stuff *)
  let host = KEDSL.Host.tmp_on_localhost in

  (* A “program” is a datastructure representing an “extended shell script”. *)
  let program = KEDSL.Program.sh cmd in

  (* A “build process” is a method for making things.

     In this case, `daemonize` creates a datastructure that represents a job
     running our program on the host. *)
  let build_process = KEDSL.daemonize ~host program in
  (* On Mac OSX
  let build_process = KEDSL.daemonize ~using:`Python_daemon ~host program in
  *)

  (* A node that Ketrew will activate after cmd completes *)
  let email_target ~success =
    let sstring = if success then "succeeded" else "failed" in
    let e_program =
      KEDSL.Program.shf "echo \"'%s' %s\" | mail -s \"Status update\" %s"
        cmd sstring
        email
    in
    let e_process =
      KEDSL.daemonize ~using:`Python_daemon ~host e_program in
    KEDSL.workflow_node KEDSL.without_product
      ~name:("email result " ^ sstring)
      ~make:e_process
  in

  (* The function `KEDSL.workflow_node` creates a node in the workflow graph.
     The value `KEDSL.without_product` means this node does not
     “produce” anything, it is like a `.PHONY` target in `make`. *)
  KEDSL.workflow_node KEDSL.without_product
    ~name:"daemonize command"
    ~make:build_process
    ~edges:[
      KEDSL.on_success_activate (email_target true);
      KEDSL.on_failure_activate (email_target false);
    ]

let () =
  (* Grab the command line arguments. *)
  let cmd   = Sys.argv.(1) in
  let email = Sys.argv.(2) in

  (* Create the  workflow with the first argument of the command line: *)
  let workflow = run_command_with_daemonize ~cmd ~email in

  (* Then, `Client.submit_workflow` is the only function that “does”
     something, it submits the workflow to the engine: *)
  Ketrew.Client.submit_workflow workflow

You can run this script from the shell with

ocaml daemonize_workflow.ml 'du -sh $HOME' myaddress@email.com

Checking in with the gui, we'll have a couple of new targets:

To learn more about the EDSL, you can also explore examples of more and more complicated workflows (work-in-progress).

Troubleshooting

Trying to use use Sqlite3 on MacOSX, and opam fail? These instructions should be helpful.
opam and ssl errors when install ketrew? Please see this issue.
When reconfiguring Ketrew between versions it may be helpful to delete old configurations:
```
$ rm -fr $HOME/.ketrew/
```
During configuration it is recommended that you pass an authentication token, as opposed to having Ketrew generate one for you:
```
$ ketrew init --with-token my-secret-token
```
If you are trying the example workflow on a system that does not have Python installed you can use another deamonization method (we use `Python_daemon by default above because setsid is missing on MacOSX):
```
let build_process = KEDSL.daemonize ~using:`Nohup_setsid ~host program in
```

Where to Go Next

From here:

To write workflows for Ketrew, see src/test/Workflow_Examples.ml for examples and the documentation of the EDSL API.
To configure Ketrew use the configuration file documentation.
If you don't want a server running and listening on HTTP(S), Ketrew can run a degraded mode called “standalone.”
You may want to “extend” Ketrew with new ways of running “long-running" computations: see the documentation on plugins, and the examples in the library: like Ketrew.Lsf or in the tests: src/test/dummy_plugin.ml.
You may want to extend Ketrew, or preconfigure it, without configuration files or dynamically loaded libraries: just create your own comand-line app.
If you are using Ketrew in server mode, you may want to know about the commands that the server can understand as it listens on a Unix-pipe.
You may want to call out directly to the HTTP API (i.e. without ketrew as a client).
If you want to help or simply to understand Ketrew see the development documentation, and have a look at the modules like Ketrew.Engine.

License

It's Apache 2.0.

Ketrew: Home

Contents

Menu