Database Transactions and Management in Haskell

Published 26 March 2020 · Updated 26 March 2020

haskell database data infrastructure sql serialization

Introduction

In Haskell, we have many database libraries. Persistent is a layer atop them, using some of them as backends, that imposes type safety and seamless integration with Haskell code. With persistent-odbc many more DBMSs can be attached. Custom drivers can be written. Whenever Persistent turns out to be insufficient, we can fall back to the raw driver or its Haskell wrapper, and write a Persistent ORM ourselves. The framework is wonderful, but, as with everything, it has its caveats. We will discuss some of them in the course of this series of posts here.

In other words, Persistent is a seamless marshalling or serialization library that removes most boilerplate code1.

This marshalling or object-relational mapping (ORM) imposes type-safety onto the conventional untyped database drivers, which may be written in an arcane dialect of C (macros!), such as the pgsql driver, notorious for its undocumented features. We thereby can employ Haskell’s type system and GHC’s facilities to ensure that we don’t make mistakes in our queries and that the data remains consistent throughout our data transformations. The key point here is that any errors that might occur are pulled back from runtime to compile-time, where we can fix them proactively. In particular, type-safety prevents SQL injection attacks.

There are basically four different approaches to dealing with databases in Haskell:

  1. use the existing drivers provided as backends and create a layer atop that imposes Haskell types on the level of Haskell code --- this is the approach of serialization, marshalling, or ORM taken by Persistent and Esqueleto (more later), and many others listed below;

  2. use the existing drivers via foreign-function interfaces (FFIs) by introducing a thin wrapper around them, and run direct textual queries, hoping for consistency --- this is what sqlite-simple and postgresql-simple among others do (“mid-level client libraries”);

  3. run third-party database drivers as subprocesses --- this is very crude but may turn out to be necessary in some situations;

  4. rewrite the existing driver directly in Haskell, which enables us to introduce category-theoretic abstractions that can simplify code and improve performance --- this is what Nikita Volkov’s hasql library for Postgres does.

Some other persistence and ORM-like libraries:

Limitations in Persistent that can be overcome with Esqueleto

So when do we use which of these two libraries?

  • Persistent provides type-safe serialization of data. It allows for type-safe filtered storage and retrieval of data from a database (remote-resource, local resource, file, or in-memory databases). It does not support type-safe JOINs, but we can fall back to explicit-string queries.
  • Esqueleto: provides an embedded domain-specific language (EDSL) for SQL and superposes type-safe JOINs atop of Persistent SQL backends. As of writing this, Esqueleto supports only type-safe SELECT, UPDATE, INSERT, and DELETE queries. It does not provide for all SQL features. So we can either implement them by hand whenever necessary, which is relatively easy, or switch to the fall-back mode of explicit-string queries.

We can use both libraries in the same module, in which case we would have to import at least one of them qualified, so as to avoid identifier conflicts.2

-- For a module that mostly uses esqueleto.
import Database.Esqueleto
import qualified Database.Persistent as P

or import esqueleto itself qualified:

-- For a module that uses esqueleto just on some queries.
import Database.Persistent
import qualified Database.Esqueleto as E

Our plan is to first learn to use Persistent, and then extend it with Esqueleto.

Fundamental Definitions

Here is the fundamental correspondence3:

SQL DBMS Persistent


Data types PersistValue Column PersistField Table PersistEntity

Also consider this migrations (conversions) table for marshalling4:

HaskellPostgreSQLMySQLMongoDBSQLite
TextVARCHARTEXTStringVARCHAR
ByteStringBYTEABLOBBinDataBLOB
IntINT8BIGINT(20)NumberLongINTEGER
DoubleDOUBLE PRECISIONDOUBLEDoubleREAL
RationalNUMERIC(22,12)DECIMAL(32,20)unsupportedNUMERIC(32,20)
BoolBOOLEANTINYINT(1)BooleanBOOLEAN
DayDATEDATENumberLongDATE
TimeOfDayTIMETIME5unsupportedTIME
UTCTime6TIMESTAMPDATETIME5DateTIMESTAMP

And, in particular, for MySQL and MariaDB:

Haskell typeCompatible MySQL types
BoolTiny
Int8Tiny
Int16Tiny, Short
Int32Tiny, Short, Int24, Long
IntTiny, Short, Int24, Long, LongLong7
Int64Tiny, Short, Int24, Long, LongLong
IntegerTiny, Short, Int24, Long, LongLong
Word8Tiny
Word16Tiny, Short
Word32Tiny, Short, Int24, Long
Word64Tiny, Short, Int24, Long, LongLong
DoubleFloat, Double, Decimal, NewDecimal, Tiny, Short, Int24, Long
Ratio IntegerFloat, Double, Decimal, NewDecimal, Tiny, Short, Int24, Long, LongLong
ByteStringVarChar, TinyBlob, MediumBlob, LongBlob, Blob, VarString, String, Set, Enum
Lazy.ByteStringVarChar, TinyBlob, MediumBlob, LongBlob, Blob, VarString, String, Set, Enum
Encoding.Text8VarChar, TinyBlob, MediumBlob, LongBlob, Blob, VarString, String, Set, Enum
Lazy.TextVarChar, TinyBlob, MediumBlob, LongBlob, Blob, VarString, String, Set, Enum
[Char], StringVarChar, TinyBlob, MediumBlob, LongBlob, Blob, VarString, String, Set, Enum
UTCTimeDateTime,Timestamp
DayYear, Date, NewDate
TimeOfDayTime

As of writing, there is no support for Word, Float, or Scientific yet.

Entity Syntax

Footnotes

  1. Persistent project page on GitHub.

  2. Esqueleto docs.

  3. The chapter on Persistent in Yesod Book.

  4. Persistent docs.

  5. “The default resolution for TIME and DATETIME in MySQL is one second. As of MySQL version 5.6.4, and persistent-mysql-2.6.2, fractional seconds are handled correctly if you declare an explicit precision by using sqltype. For example, appending sqltype=TIME(6) to a TimeOfDay field definition will give microsecond resolution.” 2

  6. “Support for ZonedTime was dropped in persistent 2.0. UTCTime can be used with timestamp without timezone and timestamp with timezone in PostgreSQL. See also the section below about timezone support.”

  7. “When Word size is 64bit.”

  8. “Utf8 only.”