One of the nasty things about OOP is that it enforces a certain discipline about modularizing your code. If your problem domain doesn't fit that discipline, you'll be fighting the object model all the way.
Functional languages, obviously, don't suffer from these nasty drawbacks. Unfortunately, they don't reap the nice stuff either. So when you switch to a functional language like Haskell or Erlang, you have to relearn many of the habits for "programming in the large" that work so well with object-oriented languages.
This is an attempt to catalog some of the usage patterns for modules in real-world, practical programs. The basic atoms will be elements familiar to most functional programmers:
- Types, defined as sets of values
- Functions, mapping types to other types
- Values - everybody knows what these are. Also includes constants and defaults
- Constructors. For this blog entry, "constructor" refers only to the language-level construct that creates a value of a specific type. For example, Java's "new" operator or Haskell's data tags. Logically, a factory method or creation function is just as much of a constructor as the in-language equivalents, but I'm trying to make useful distinctions as to what should or should not be exported from a module here.
There's one additional open question: are constraints an additional atom exported from modules, or are they modules themselves? By constraint, I mean something like a Java interface, or Haskell typeclass, or Ocaml signatures or functor. Java and Ocaml treat constraints as modules: a Java interface is a standalone entity that a class must implement, an Ocaml signature is a set of values that a module must export. Haskell treats them as standalone entities, themselves exported from modules. The Haskell approach is more flexible, but leads to many questions about where to put the definition of a typeclass relative to the instances themselves.
Comments, corrections, and additional patterns are welcome. Like the original patterns book, this is supposed to be descriptive, not prescriptive. It's intended to capture common problems that arise in real software engineering and catalog decent solutions in non-OO languages to those problems.
On to the patterns...
Abstract Data Type
An abstract data type (ADT) is a type and a set of related operations, all grouped together as one module. The internal structure of the type is not exposed - instead, functions to create and manipulate the type are part of the public interface of the module.
In terms of the atoms above, an ADT consists of:
- Zero or more imports - whatever you need for implementation
- One exported type
- One or more exported functions
- Zero or more exported values
- Zero exported constructors - the ADT should hide its internal implementation
- One (tautological) constraint - any implementation of the ADT exports precisely the functions and values that it exports
A somewhat related pattern is the GUI Widget. Oftentimes, you'll want a component to display certain information on screen, without worrying about the precise formatting or layout. The widget can be viewed as an ADT with certain functions to set the data and the ability to register callbacks (in functional languages, these would be closures) on relevant actions that the user might perform. Widgets are tricky in purely functional languages because they're inherently side-effecting. Oftentimes, this means you export IO actions instead of functions.
Another related pattern is the Plugin. Plugins are just like ADTs, except that the constraints are defined in a module far removed from the implementation, usually written by different people.
Examples: Data structures, GUI widgets, plugins, files, sockets
Related GoF patterns: Composite, Facade, State
OOP implementation: A class, or if implementation is omitted, an interface
Pipeline
Imagine you're building a compiler. Usually, compilers are divided into several stages: lexing, parsing, typechecking, flow & data analysis, optimization, and code-generation. The data types between those stages also vary: you have, respectively, a list of tokens, an AST, an AST with type information, an intermediate representation, a simpler intermediate representation, and output code. Each data representation is needed by the surrounding two phases, and each phase needs the surrounding two data representations.
Where do you put the code, and where do you put the data structures?
(There's also an additional problem in that the set of expressions may change frequently, but ignore that for now. Scala has claimed to have solved the expression pattern, but I haven't looked at that yet.)
The Pipeline Pattern describes a situation with two main types of modules: code and data structures. Code modules are:
- Two imported data modules - the source and the sink
- Zero exported types
- One exported function (possibly collapse adjoining stages with same data requirements, to make many exported functions)
- Zero exported values (aside from constants or configuration that may be needed)
- Zero exported constructors
- No imported code modules
- One exported type
- Zero or more exported functions (whatever's necessary for utilities)
- Zero or more exported values (again, whatever's convenient for defaulting)
- Many exported constructors, giving full access to the code modules.
Examples: Compilers, digital signal processing, UNIX scripting
Related GoF patterns: Chain of responsibility, Decorator
OOP implementation: I don't know of any convenient one. My Compiler Design professor punted on this and gave us skeleton code with public data everywhere. Some other OO applications (more in the stream-processing area than in compilers) make each phase dependent upon the one before it, giving each intermediate representation the responsibility for creating itself from the phase before.
Gatekeeper
Imagine that you're implementing a credit card transaction system. Inside, you have all sorts of modules that perform logging, fee collection, cash-back, over-the-wire transfers, and so on. But you want to provide a single method to clients. And you want to prevent them from directly accessing any of the individual sub-modules, because it wouldn't be good if they could wire themselves money without debiting the customer's account.
In terms of the atoms above, you want to define a small set of types, functions, and values (probably no constructors) to present to the user, and hide all the implementation details that this model imports. The former part is easy; the latter is also easy, yet very few languages give direct support for it. Java provides package access, but often the submodules in question are themselves packages, or even instances of the Gatekeeper pattern. Other than that, few languages let you limit the scope of a whole module to a single other module.
A related pattern is the Handy Toolbox. In this case, the facade exists to simplify access to commonly-used functions, but access to more advanced features is not restricted. This pattern needs no special language support and is both easy and commonly used. Examples include JFreeChart and the Haskell Prelude.
Examples: credit-card processing, web services, databases. Almost any large system will have this pattern.
Related GoF patterns: Facade, Mediator
OOP Solution: Many Java programs keep a separate source tree of "internal" classes. Access to these is not restricted by the language, but they are not JavaDocced and their use is heavily discouraged, and sometimes restricted by ClassLoader.
Extension
Sometimes you already have a datatype, and you'd like to add some new operations to it. This is very difficult in most OO languages (unless you can subtype and create all the instances you'll be dealing with), leading to hacks like open classes. It's surprisingly easy in functional languages, leading to this pattern.
The Extension Pattern defines:
- One imported data type (plus whatever else you need to implement the extension)
- Zero exported types
- One or more exported functions
- One or more exported values
- Zero exported constructors
Examples: Every programmer in the world has probably had the experience where they wish they had just one more String method, yet the brain-dead language designers left it out. Extensions are also often needed for many other common data types, such as GUI widgets, collections, dates, and XML.
Related GoF patterns: Decorator, Strategy, Command, Adapter
OOP Solution: Oftentimes, extension methods go in a generic "Utils" class or package in many OO languages. Virtually every project has one.
Feature Matrix
This example is in almost every introductory OO textbook. Imagine that you have a generic
shape class. This class has Triangle, Rectangle, Circle subclasses, each of which is responsible for rendering itself.
Now imagine that you need to print these shapes. Each printer has a completely different rendering mechanism, so you can't factor out commonalities with a Bridge pattern. The total number of code blocks you must write is the cartesian product of (printers x shapes).
This is a hard problem that inherently leads to a mess, since code is one-dimensional. It's somewhat less messy in languages like Haskell and Common Lisp that support multi-parameter typeclasses and multimethods, respectively. The proposed solution:
Have one module that defines a multi-parameter typeclass. This module has:
- Zero imported data types (other than what's necessary for auxilliary functions)
- One exported constraint (the typeclass)
- whatever else can be built upon that typeclass
- Zero imported data types - don't import the typeclass here...
- One exported type
- Zero or more exported functions - whatever's convenient
- Zero or more exported values
- Many exported constructors - you probably can't avoid exposing the internal representation to the following modules, though if you can, just expose functions.
- One imported typeclass (the first module)
- Two imported data structures - one each from shapes and printers
- Zero exported types
- One or more exported functions - the implementation of the typeclass
- Zero exported values
- Zero exported constructors
Examples: The "expression problem", mentioned above, is an instance of this. It also often occurs when combining portability across different backend services with extensibility along different problem dimensions.
Related GoF patterns: Visitor
OOP Solution: Visitor, basically. OOP sucks for this.
Global Service
Almost every project has some features that need to be reachable from everywhere in the program. Logging is one of these; so are configuration and localization.
These almost invariably involve side-effects, which makes them difficult to program in a pure functional language. For this pattern, I'll assume that an IO action (Haskell) or message to a daemon process (Erlang) is good enough. State-management and side-effects are a whole other can of worms.
Define a module that has:
- Zero or more imports (whatever's necessary)
- Zero or One exported data type - often "IO ()" is perfectly fine
- A small set of exported functions
- Zero exported values, except possibly some configuration options - clients should access the service through the function above
Related GoF patterns: Singleton
Current OOP solution: Almost always singleton. Some try to dress it up as a configurable service with a "getDefault()" method, but I've never known someone to use anything other than getDefault() as their instance.
2 comments:
Great post. I'm surprised there are no comments.
Well, there's at least yours !
But I must say the same : great post.
OOP programmers often seems to think that there is no salvation outside OOP and that OOP can solve the problem of the world food supply or something...
It is refreshing to get some summary of the patterns seen in the functional world.
Post a Comment