Sosa: Home

Sane OCaml String API

This library is a set of APIs defined with module types, and a set of modules and functors implementing one or more of those interfaces.

The APIs define what a character and a string of characters should be.

See the INSTALL file for build instructions and/or the documentation website.

The library is “packed” in the Sosa toplevel module name.

Module Types (APIs)

We have, in the sub-module Api:

Implementations

Native OCaml Characters

The Native_character module implements BASIC_CHARACTER with OCaml's char type.

Native OCaml Strings

The Native_string module implements BASIC_STRING with OCaml's string type considered immutable (and hence Native_character).

Native Mutable OCaml Strings (Bytes)

The Native_bytes module implements BASIC_STRING and UNSAFELY_MUTABLE with OCaml's bytes type.

Lists Of Arbitrary Characters

List_of is a functor: BASIC_CHARACTERBASIC_STRING, i.e., it creates a string datastructure made of a list of characters.

Build From Basic Mutable Data-structures

The functor Of_mutable uses an implementation of MINIMALISTIC_MUTABLE_STRING to build a BASIC_STRING.

Integer UTF-8 Characters

The Int_utf8_character module implements BASIC_CHARACTER with OCaml integers (int) representing Utf8 characters (we force the handling of not more than 31 bits, even if RFC 3629 restricts them to end at U+10FFFF, c.f. also wikipedia). Note that the function is_whitespace considers only ASCII whitespace (useful while writing parsers for example).

Examples, Tests, and Benchmarks

See the file test/main.ml for usage examples, the library is tested with:

  • native strings and characters,
  • lists of native characters (List_of(Native_character)),
  • lists of integers representing UTF-8 characters (List_of(utf8-int array)),
  • arrays of integers representing UTF-8 characters (Of_mutable(utf8-int)),
  • bigarrays of 8-bit integers (Of_mutable(int8 Bigarray1.t)).

The tests depend on the Nonstd, unix, and bigarray libraries:

make test
./sosa_tests

and you may add the basic benchmarks to the process with:

./sosa_tests bench