## Wolfgang Banzhaf and Lidia Yamamoto

Print publication date: 2015

Print ISBN-13: 9780262029438

Published to MIT Press Scholarship Online: September 2016

DOI: 10.7551/mitpress/9780262029438.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2018. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 17 September 2019

# Basic Concepts of Artificial Chemistries

Chapter:
(p.11) Chapter 2 Basic Concepts of Artificial Chemistries
Source:
Artificial Chemistries
Publisher:
The MIT Press
DOI:10.7551/mitpress/9780262029438.003.0002

# Abstract and Keywords

This chapter covers some basic concepts that constitute the pillars of any artificial chemistry. Starting with principles of computational modeling and simulation, we then move on to some notions of real chemistry that often appear in artificial chemistries as well, such as chemical reaction networks, concentration dynamics, equilibrium, energy and thermodynamics. The general structure of an artificial chemistry is then laid out, followed by a few important distinctions between AC approaches, some simple examples of ACs, and some frequently used AC techniques.

Computer models and simulation attempt to achieve the ultimate goals of science and thus represent the ultimate extension of science.

CHARLES BLILIE, THE PROMISE AND LIMITS OF COMPUTER MODELING, 2007

Before we can discuss the general structure of an artificial chemistry, we have to set the stage for this discussion by addressing two topics that together constitute the pillars of any artificial chemistry: the first one is a topic of very broad relevance to all of science and engineering, a topic fraught with misunderstandings: modeling and simulation. The second pillar is the topic of real chemistry, which is used as a source of inspiration for artificial chemistries, or as the target system to be modeled. In the first section of this chapter, we will therefore address the principles behind modeling and simulation, and try to argue for their value in the current world, dominated by computers. In section 2.2 we will introduce some basic concepts of chemistry that are often used in ACs. Note that in contrast to chemistry textbooks, we will present these concepts from a computational perspective. Section 2.3 will then introduce us to the general structure of an artificial chemistry. From the vantage point of modeling and simulation, we will then see that the general structure of an artificial chemistry is nothing but a specialization of a simulation tool, geared toward the interactions possible in (artificial) chemistries. Section 2.4 supplies a few important distinctions between AC approaches. Section 2.5 then introduces a few simple examples of ACs, a discussion that will be beefed up by later chapters. A number of frequently used AC techniques are then addressed in section 2.6, among them the notion of space brought about by the possibility to define a topology among reacting objects. Because we are still in the introductory part of the book, we try to stay as high-level as possible in this chapter.

# 2.1 Modeling and Simulation

As we try to understand the world around us, humankind has adopted a number of successful strategies, some of which have now been formalized. Epistemology is the branch of philosophy that concerns itself with how humans acquire and process knowledge. Empirical strategies of acquiring knowledge center on experience and how it can be used to learn about the world. The emergence of science is closely connected to the recognition that if the acquisition of knowledge (p.12)

Table 2.1 Empirical Hierarchy: From Data to Understanding

Level

From

To

Elements

Data

Relations between facts

Tools

m Experimental trials

n Hypotheses

Methodology

Experiment

Theory

Result

Observation

Explanation / Understanding

is subjected to certain constraints often termed the scientific method, this will bear fruit in the form of an increased understanding of the world around us, but also in the form of allowing us to predict phenomena and to apply them to satisfy human needs.

At the core of the scientific method is the interaction between observation and explanation/understanding. We say that we understand a phenomenon observed either through our senses or by using scientific instruments if we can reproduce the phenomenon using our understanding, and if we further can predict the outcome of hitherto unobserved phenomena. In order to reach this goal, science works with two complementary tools: well-defined experiments and theories, the former to observe phenomena and the latter to explain them in a logically consistent way.

On the experimental side, many independent trials need to be conducted, possibly under different initial or boundary conditions. On the theoretical side, many hypotheses need to be formulated as well, in order to test different aspects of the phenomenon under study and to gradually refine our understanding and achieve a level of explanatory power that merits the term “theory” for a consistent set of hypotheses. In a sense, this is an evolutionary process involving some trial and error, that will ultimately lead to a better reflection of reality in our theories.

Experiments under constrained conditions produce data that need to be analyzed and checked against the predictions of hypotheses. The hypotheses themselves are trying to uncover relations among these data. This is why correlation analysis plays such a fundamental role for scientists in trying to discover simple hypothesis about data: The simplest relations, i.e., linear relations, can be easily detected by this tool.

Where in all this do we locate models and simulations? The short answer is: at the core, right in between theory and experiment. A simulation produces data (and thus stands in for an experiment) based on a model (standing in for a theory) formulated as a computer program.

The longer answer is this: With the advent of computers, the traditional scientific method has been augmented and refined to produce an even more efficient interaction between theory and experiment, provided the theory can be translated into a computer model, which is an algorithm incorporating the relationship and transition rules between factual objects. This computer algorithm can then be executed as a program to run a simulation which could be termed a model in motion [122]. The typical work cycle of a scientist would therefore read: Theory — Computer Model (algorithm/program) — Simulation — Experiment. Simulation sometimes replaces, sometimes only precedes an experiment. It allows science to progress more efficiently in that not all experimental trials actually need to be carried out, potentially saving time, materials, lives of laboratory animals, and so on.

Simulations are built upon a model of the system under study. A model is the next step from a theory or hypothesis toward an experiment, in that it is a theoretical construct attempting to replicate a part of reality. Bunge defines a model to be a general theory plus a special description (p.13) of an object or a system [144]. The term “model” is often used synonymous with that of “theory,” but one should avoid this use. As we can see from Bunge’s above definition, a theory is more ambitious than a model in explaining phenomena, since it strives to be more general.

Blilie states [122] that computer models and simulation “attempt to achieve the ultimate goals of science and thus represent the ultimate extension of science.”

In the preceding discussion, we have completely glossed over a very important point: in what sense models reflect, or represent reality. This is a deep question that we cannot address in our short discussion here. Interested readers are referred to the literature on epistemology [671]. However, one point is too important not to be mentioned: Models always are caricatures of the real system they intend to represent. They are abstractions that reduce the complexity of the real system by projecting systems into subspaces, e.g., by neglecting certain interactions between the elements of the original system or by insulating it from its environment. Models have a purpose, which dictates the kind of simplification of the original system or the perspective from which they are considered.

Models come in different flavors, physical, imaginary, and conceptual. The following discussion will only be concerned with conceptual models. Computer models are conceptual models, based on the formalism of algorithms. There are other well-known conceptual models, e.g., based on logic or on mathematics, but algorithms are a broader category, and we believe they include the latter two. Algorithmic conceptual models of systems have the advantage that they can be simulated, i.e., executed on computers. This execution produces behavior that can be observed and compared to the behavior of the original system under study. Depending on the input of the computer model, various types of behavior might be observable, and the range of these behaviors might be similar to or completely different from the range of behaviors the original system exhibits. Based on such comparisons, computer models can be refined. Different inputs produce predictions of behavior that can be confirmed or falsified by observing the real system.

It is an interesting question whether mathematical and algorithmic models are equivalent to each other. Clearly, mathematical descriptions like differential equations have their counterpart in algorithms. However, do algorithmic implementations always have mathematical descriptions? The issue we raised in the previous chapter, whether one could formulate a mathematical framework based on chemistry, in which instances of symbols “react” with each other to produce instances of new symbols is closely connected to that. If, as we stated, algorithmic formalism contains mathematical formalisms but is not equivalent, the question is nontrivial. Perhaps one type of formalism can be considered a model of the other, with all the connotations we previously mentioned about models (abstraction, simplification, projection)?

Once a mathematical or algorithmic model of the system under study has been constructed, a computer simulation of the model can then be implemented. Typically, the simulation algorithm will try to mimic the behavior of the real system in time, by a set of transition rules that define how the system moves from an initial state to future states. What happens if we let the simulation algorithm execute? It will show a particular behavior, based on its input and initial conditions. The behavior can be followed by observing its state changes and collecting this information to compare it with different starting conditions and inputs, and the real system it models.

However, there is more to it than simply behavior of known components. In fact, the observation of system behavior needs to be preceded by the determination of which state variables (or collections of state variables) should be observed at all. In principle, a simulation of a system can produce an infinite amount of data. We therefore have to restrict ourselves to the ones we believe will be relevant. This might not be determined in one go; rather, an iterative process of (p.14) simulation and observation might be necessary to focus on relevant quantities. This has to do with the constructive nature of a simulation. Rasmussen and Barrett [690] propose to formulate an entire theory of simulation, based on the power of computer systems to execute algorithms and produce phenomena.

Before we can understand this latter aspect, a few words are in order about the goal of simulations. Again, following Blilie [122], we can discern three goals of computer models and their dynamic execution: (1) a unified representation of physical phenomena, (2) a unified representation of human society and history, and (3) construction and presentation of different, seemingly realistic alternate worlds. Blilie further divides the latter into “simulated realities” useful for, e.g., training purposes and “fictional realities” as they appear in gaming and entertainment applications. The playful aspect of the latter is something we need to come back to.

All three goals have in common that the elements of a computer model, when set in motion and interacting with each other in a simulation, will show a coherent dynamic behavior, that can only be captured when we observe the system at some higher level, e.g., above the level of simple elements. We shall see an example of this characteristic feature in the next chapter. For the moment, we refer the reader to the famous “Game of Life” simulation [111] as an illustration (see chapter 10). One of the features of this game is to exhibit coherent patterns on a spatially extended grid of cellular automata, such as gliders and blinkers. One might then look at those “blinkers” and “gliders” and other coherent patterns, count their numbers, measure the speed with which they traverse the universe, or observe some other quantities related to them. This way, computer simulations can and usually do expose higher level regularities in the underlying simulated system, a source of complex (emergent) behavior from possibly simple objects and interaction rules.

In general, simulations will — by virtue of being virtual — allow to explore different scenarios and possibilities for a system, depending on the input and initial conditions of the simulation. We are interested here in event-driven simulations, that is simulations whose natural timing is determined by discrete elementary moments, events like the collision of two atoms. Simulations of this type are called discrete time simulations, in contrast to continuous-time simulations in which time is considered a continuous quantity.

According to [690] in order for a simulation to be possible, four systems must be assumed to exist: (1) The real system Σ‎R to be simulated, (2) a simulation system Σ‎S with an update rule U, (3) subsystems $∑ S i$ with SiM, M being the set of models including their rules of interaction, and finally (4) a formal computing machine Σ‎C on which the simulation is run. These requirements implicitly assume that we are interested in studying the emergent phenomena that come about by simulating the interaction of the subsystems Si, which is a valid assumption in many contexts.

In practical terms, we are here mainly concerned with (2) and (3), since those are the core of the modeling and simulation process. The setup of a simulation system proceeds through the following steps [122]: system identification, determination of boundary conditions, determination of system components and their relation, determination of component properties and quantifiable state variables, setting up of transition rules between states (of subsystems) and their relation to each other, determination of the simulation management, implementation, verification, and validation. All of this has to happen on the background of a clearly defined modeling purpose.

It is the modeling purpose that dictates the perspective of the modeler, and ultimately, the expectations that we have for the results of the simulation. Do we expect to find data that mimic the original system Σ‎R as close as possible? Do we want to show that the behavior of both systems, (p.15) Σ‎R and Σ‎S is qualitatively the same? Do we only want to make statistical statements of the type, that the class of behavior in both systems is comparable? Finally, do we hope to find unexpected phenomena in Σ‎S that we can use to predict behavior of Σ‎R, under certain circumstances?

Turning our attention now to artificial chemistries, a preliminary comment is in order. A frequently expressed criticism about AC approaches is raised by scientists in the following form: If ACs wish to study real chemistry, they need to become much more sophisticated and realistic than is the case at present. They are currently weak representations of real systems. If, however, ACs aim at exploring the realm of the possibility of chemistries, without a necessary connection to reality, what is the point? They look like mere entertainment and idle play. It must be answered that the goals of ACs are various, and goals cannot be fixed in general. Each AC, by virtue of being a model, looks at real chemistry from a different perspective. As this book will discuss in later chapters, the spectrum is indeed very wide. There are very realistic artificial chemistries, wet ACs, that involve the interaction of matter. Next are ACs that aim at simulating real chemistry. At the other end of the spectrum there are playful variants not aimed at current chemistries. In a sense, these make use of the formal nature of the simulation process that allows them to probe virtual spaces not bound to reality. Some are aimed at visualization and demonstration of AC concepts. The same concern has been raised against mathematics exploring universes of possible mathematical worlds, without being bound by reality. In the case of mathematics, this playful exploration of the realm of the possible has in many cases yielded unexpected insights and connections to reality. It is our strong belief that artificial chemistries, based on firm principles of modeling and simulation, can achieve the same: Unexpected bridges into reality and real systems that we do not even dream of now. Without an exploration of the possible, we shall never understand the real.

# 2.2 Chemistry Concepts

Before we can discuss in more detail how to build an artificial chemistry in a computer, we need to understand a few basic concepts of how chemistry can be modeled and simulated. Thus, this section will now introduce some of the notions used to treat chemistry computationally.

In a chemical reaction vessel, a number of objects (molecules) interact via chemical reactions that transform these molecules according to the rules of chemistry. A molecular species is the family of substances with the same molecular composition and shape. Substances may be in a solid, liquid, or gas phase. The liquid phase plays a prominent role in artificial chemistries, since it enables the complex interactions at the origins of life. Some artificial chemistries are also based on gases, and a few consider the combination of multiple phases and phase transitions.

Two types of vessels can be distinguished: well-stirred and spatial. They are depicted in figure 2.1. In a well-stirred (or well-mixed) vessel, the actual position of each molecule in space has no impact on the overall outcome of the reactions taking place inside. In terms of an artificial chemistry, this means that the probability that a group of molecules react is independent of their position. This is in contrast to spatial reactors such as Petri dishes, where chemicals form patterns in space. These patterns depend on the way chemicals move and interact with other nearby chemicals. This second type of reactor will be covered later in this chapter and throughout the book.

For now, we focus on the well-stirred case. In this case, the state of the system is fully described by the concentration of each species present in the reactor. The concentration measures (p.16)

Figure 2.1 Examples of reaction vessels with molecules inside. Left: Well-stirred tank reactor in which all molecules are equally likely to collide with each other and react. Right: Screenshot of the Organic Builder [413], a 2D spatial artificial chemistry akin to a virtual Petri dish, where nearby molecules can collide and react (in this example: yellow ’a’-type molecules bond together to form a cell-like structure).

the number of (moles of) molecules per unit of volume in a vessel. For a given molecular species S, the concentration of S is denoted by [S] and is measured as:

(2.1)
$Display mathematics$
where:
• nS is the number of molecules of species S in the vessel;

• NA is the Avogadro constant: NA = 6.02214 × 1023 molecules;

• V is the volume of the vessel.

A chemical reaction is a mass-conserving transformation from a multiset of “input” molecules (educts, or reactants) to a multiset of “output” molecules (products). A multiset is a collection of objects where elements may occur more than once. More about multisets will appear in Section 2.3.3. In the following chemical reaction, one molecule of nitrogen and three molecules of hydrogen react to form two molecules of ammonia:

(2.2)
$Display mathematics$
The educt multiset is therefore e = {N2, H2, H2, H2}, and the product multiset is p = {NH3, NH3}. The coefficient k is the kinetic coefficient or rate coefficient of the reaction, and (p.17) it measures the speed of the reaction when one mol of each needed molecule is present (in this case, one mol of N2 and three of H2).

The inflow and outflow of substances in and out of a vessel can also be represented as chemical reactions, in this case, with addition or removal of mass. For instance, the following reaction represents the injection of molecules of species A at constant rate k:

(2.3)
$Display mathematics$
The reaction below represents the removal of B:
(2.4)
$Display mathematics$
Removal reactions are also often used to model the degradation of substances into some inert chemical (e.g. a waste product) that is not considered for the system under study.

Reactions are often reversible, with different coefficients on the forward and backward reaction. This is represented as:

(2.5)
$Display mathematics$
In the preceding example, the forward reaction consumes A and B to produce C with a rate coefficient of kf, and the reverse reaction produces A and B with coefficient kr.

## 2.2.1 Chemical Reaction Networks

In a chemical reactor, multiple substances react in different ways, leading to a network of chemical reactions: a node in such a network represents either a reaction or a chemical species, while an edge linking two nodes represent the flow of substances participating in the reactions. The graph representing this network is bipartite, since chemicals always connect to reactions and not directly to other chemicals (similarly, reactions connect to chemicals and not directly to other reactions). For example, consider the following reactions:

(2.6)
$Display mathematics$
(2.7)
$Display mathematics$
(2.8)
$Display mathematics$
(2.9)
$Display mathematics$
The corresponding chemical reaction network is shown in figure 2.2, left. Often, chemical reaction networks are depicted with substances as nodes and reactions as edges, as in figure 2.2, right. This is simpler and more intuitive, though in some instances this simplification may be misleading.

Reaction networks have a great importance in chemistry and also in biology. The processes taking place in cells form complex reaction networks that system biologists strive to analyze. Some basic analytical methods will be introduced in this chapter. The three major categories of cellular networks are: metabolic networks, genetic regulatory networks, and cell signaling networks. Artificial chemistries have been extensively used to either model such biological networks or to find alternative representations leading to analogous lifelike phenomena, such as artificial metabolic networks in which the chemical substances are represented by strings of letters from a given alphabet. Several examples of such chemistries will appear throughout this book. (p.18)

Figure 2.2 Left: Bipartite reaction graph of reactions symbolized by equations 2.6 — 2.9; circles are substances, squares are reactions. Right: Reaction graph with edges representing the reactions, nodes representing the substances participating.

## 2.2.2 Law of Mass Action

We have seen that reactions can be accompanied by a kinetic coefficient k linked to the speed of the reaction. The most common way to specify the speed of the reaction given its coefficient k is through the law of mass action. This law governs the way the kinetic coefficient k and the educt concentrations influence the speed v of the reaction: it states that the speed (rate) of a chemical reaction in a well-stirred reactor is proportional to the product of the concentrations of the reactants present. According to this law, the rate of reaction 2.2 becomes:

(2.10)
$Display mathematics$

The law of mass action stems from counting the number of possible collisions between the molecules involved in the reaction, and is only valid for large numbers of molecules, and for reactions that proceed in one single step. Many reactions in nature are composed of several intermediate steps. Moreover, in practice, the likelihood of a simultaneous collision of more than two molecules is negligible. Hence, this law has to be taken with care, since it is a simplification of the reality. However, it is sufficient and useful in many cases of real and artificial chemistries.

More generally, for a chemical reaction r with n reactants:

(2.11)
$Display mathematics$

The speed of the above reaction according to the law of mass action is:

(2.12)
$Display mathematics$
where: (p.19)
• [Si] is the concentration of the i-th reactant Si, i = 1..n

• αi is the stoichiometric coefficient of Si in r, i.e. the number of molecules of Si required in one chemical reaction r involving Si.

• αi, βj are the stoichiometric coefficients of species Si and Pj in the reaction, respectively.

Stoichiometric coefficients tell how many molecules of Si (respectively Pj) are required (resp. produced) in one chemical reaction r. For example, in reaction 2.2, α1 = 1, α2 = 3 and β1 = 2. Note that the speed of the reaction only depends on the reactant quantities, never on the quantity of products.

The law of mass action is not the only way to characterize the speed of a chemical reaction. Other laws are very common, such as Michaelis-Menten for enzyme kinetics, or Hill kinetics for genetic regulatory networks (see chapter 18) which are mostly derived from mass action. We will now see how this law can be used to understand the dynamics of networks of chemical reactions.

## 2.2.3 Concentration Dynamics

In a well-stirred reactor, a large number of molecules is usually present, frantically colliding and reacting with each other. In this case, the concentration dynamics of the molecules in the vessel can be described by a system of ordinary differential equations (ODE), one equation per molecular species. Each equation quantifies the net rate of production of a given chemical, by adding all reactions that produce the chemical, and subtracting those that consume it.

An example for these reactions would be:

(2.13)
$Display mathematics$
(2.14)
$Display mathematics$
Using the law of mass action, the corresponding differential equations are:
(2.15)
$Display mathematics$
(2.16)
$Display mathematics$
(2.17)
$Display mathematics$

For instance, take equation (2.17): Reaction r1 produces one molecule of C at rate k1[A]2 [B]3, represented by the first term of equation (2.17). Reaction r2 consumes two molecules of C at rate k2[C]; therefore, the second term multiplies this rate by two. The same procedure is applied to the other substances to obtain the complete ODE in a straightforward manner.

More generally, for a system of n substances S1,…, Sn and m reactions r1,…, rm the equations read:1

(2.18)
$Display mathematics$

(2.19)
$Display mathematics$

(p.20) The change in the overall concentration of Si is obtained by summing over all reactions rj that possibly contribute, and can be then written as:

(2.20)
$Display mathematics$
This system of differential equations can be expressed in a more compact form using a matrix notation:
(2.21)
$Display mathematics$
where
• $d s → /dt= [ d s 1 /dt … d s i /dt … d s n /dt ] ⊤$ is the vector of differential equations for each of the species Si in the system and is the sign for transposition.

• M is the stoichiometric matrix for the reaction, which quantifies the net production of chemical Si in each single reaction rj. An element of M is given by:

(2.22)
$Display mathematics$

• is the vector containing the speeds of each reaction rj:

(2.23)
$Display mathematics$

Such ODE systems are common in the modeling of biological systems and in some forms of biochemical computation (discussed in chapters 17 and 19), in which the concentration of substances represents values to be computed by a chemical reaction network.

The fact that chemical reactions can be translated into a set of deterministic differential equations following precise laws, such as the law of mass action, means that many mathematical methods and tools available for the analysis of dynamical systems can also be applied to chemistry. This is one of the basic premises behind systems chemistry and biology. By extension, an artificial chemistry designed to follow similar precise laws can also benefit from such analytical tools, one of the main advantages of the ODE approach.

ODEs have many limitations however: they are only sufficiently accurate for very large numbers of molecules participating in the reaction. Therefore, they are not appropriate to model reactions involving only a few molecules, such as those taking place inside living cells. Moreover, they can only model systems where the number of possible molecular species and reactions is known in advance, showing limited innovation potential. These are important shortcomings from an Artificial Chemistry perspective. We will come back to this topic in section 2.3.3.

## 2.2.4 Chemical Equilibrium

Although the changes in concentration hold a lot of information about the behavior of a system, it is often interesting to consider such a system when it reaches an equilibrium state. At equilibrium, the concentrations of all substances in the system do not change:

(2.24)
$Display mathematics$
(p.21)

Figure 2.3 Changes in concentrations of A, B, and C = A.B in a reversible dimerization reaction: for kf = kr = 1, starting from initial concentrations A(0) = 2.0, B(0) = 1.4, C(0) = 0, the system approaches a chemical equilibrium where the concentrations no longer change.

Consider, for instance, a reversible dimerization reaction:

(2.25)
$Display mathematics$

In this reaction, molecules A and B are monomers that combine to form a dimer A.B. This process is reversible, so the dimer can dissociate back into its constituent monomers.

At equilibrium:

(2.26)
$Display mathematics$
(2.27)
$Display mathematics$

K is called the equilibrium constant of the reversible reaction. Figure 2.3 depicts what happens with this system as it converges to the equilibrium concentrations.

Throughout this book we will see many examples of chemistries that either aim to settle to an equilibrium point or, on the contrary, strive to stay away from it by a permanent inflow and outflow of substances.

## 2.2.5 Molecular Structure and Chemical Bonds

So far we have not looked much inside the structure of the molecules considered in our examples. However, in both real and artificial chemistries, we will often be very interested in the internal structures of molecules and how these structures affect their function.

In real chemistry, most of the atomic elements (except noble gases such as helium) react with other atoms to form molecules, and some of these molecules may in turn react to form other molecules. Chemical bonds are formed and broken during reactions. There are basically two types of chemical bonds: ionic and covalent. Ionic bonds are formed by electron transfer from an electron donor (typically a metal) to an electron acceptor. The donor atom becomes positively charged (cation), the receptor atom becomes negatively charged (anion), and these (p.22) two oppositely charged ions attract each other to form a ionic bond. An example is the well-known reaction 2 Na +Cl2 → 2 Na+ +2 Cl → 2 NaCl that forms sodium chloride, the common table salt. Covalent bonds are formed by sharing electron pairs between two nonmetal atoms. For instance, water (H2O), carbon dioxide (CO2), and a large number of organic compounds are built from covalent bonds.

Since there is no limit to the size of a molecule, there are nearly infinitely many ways to recombine atoms into valid molecular structures, especially in organic chemistry, where macromolecules such as proteins or DNA may contain thousands or even millions of atoms. As discussed in chapter 1, it is precisely this combinatorial nature of chemistry that ACs seek to explore in alternative ways, not necessarily bound by the laws of real chemistry.

In artificial chemistries, bonds are not always explicitly represented. Molecules in ACs are often represented as strings of symbols (such as letters or numbers), where each symbol represents an atom. For instance, in [233] molecules are binary strings, whereas in [638] they are strings made of the 8 letters from a to h. Among the ACs that explicitly represent bonds, we can mention [103, 384]. In [384] molecules are made of digits that represent the amount of bonds that they form, for instance, “1-2-3=2” is a valid molecule in this chemistry. In [103] molecules are graphs that are closer to the molecular structures found in real chemistry, and their reactions are a simplified model of real chemical reactions. These and other chemistries will be discussed in more detail throughout the book.

## 2.2.6 Energy and Thermodynamics

Energy plays a key role in the execution of chemical reactions. Within a reactor vessel, molecules collide and react, releasing or absorbing energy in the process.

The first law of thermodynamics is the law of conservation of energy: it states that the internal energy of an isolated system remains constant. In reality, however, most systems are not isolated: They lose energy to the environment, for example in the form of dissipated heat, and must then rely on an external source of energy to keep operating.

The second law of thermodynamics is the law of increasing entropy : The entropy of an isolated system that is not in equilibrium will tend to increase over time, approaching a maximum value at equilibrium.

The total energy E of a molecule in the reactor is the sum of its kinetic and potential energies:

(2.28)
$Display mathematics$

The kinetic energy Ek of a molecule is given by its speed inside the reaction vessel. The potential energy Ep of a molecule is given by its composition, bonds between atoms and shape (spatial conformation of atoms and their bonds).

Two colliding molecules may react only if the sum of their kinetic energies is high enough to overcome the activation energy barrier (Ea) for the reaction, which corresponds to the minimum amount of kinetic energy that is needed for the reaction to occur. In this case, the collision is said to be effective. If the kinetic energy of the colliding molecules is lower than Ea, the reaction does not occur, and the collision is called elastic. Effective and elastic collisions are very commonly modeled in artificial chemistries, as we will see in section 2.3.

The formation of chemical bonds leads to a more stable molecule, with a lower potential energy. Therefore breaking these bonds requires an input of energy. This energy is provided by the kinetic energy of the movement of the colliding molecules. The collision results in broken bonds, (p.23) sending the molecules to a higher potential energy state that is highly unstable. This intermediate cluster of atoms is called transition state. As new bonds are formed, the transition state decays almost instantly into a lower energy state of one or more stable molecules, constituting the products of the reaction. The height of the energy barrier Ea corresponds to the difference in potential energy between the transition state and the reactant state.

Figure 2.4 Energy changes during catalyzed (black curve) and uncatalyzed (red dashed curve) chemical reactions. The forward reaction proceeds from left to right, converting reactants X to products Y; conversely, the reverse reaction (Y back to X) can be read from right to left. The vertical axis indicates the Gibbs free energy (G). On both directions, the reaction only occurs when the reactants are able to overcome the activation energy barrier Ea. In this example, Ea is lower for the forward reaction Ea(XY) than for the reverse reaction Ea(YX); therefore the forward direction is favored (Δ‎G < 0). Arrows pointing upward indicate positive energy values, while those pointing downward refer to negative energy values.

Modified from the original image by Brian Kell, http://en.wikipedia.org/wiki/Activation_energy, available for unrestricted use.

A common way to depict the energy changes during a reaction is through an energy diagram, such as the one shown in figure 2.4. For the moment we ignore the red parts of this figure, which will be the topic of section 2.2.7. Since it is difficult to observe and measure the behavior of individual molecules, energy diagrams usually depict macroscopic quantities at the scale of mols of molecules. Such macroscopic values reflect the average behavior of individual reactions.

The horizontal axis of the plot in figure 2.4 is called reaction coordinate, and can be read in two directions: from left to right, it shows the progression of the forward reaction from reactants X to products Y; symmetrically, the corresponding reverse reaction (from Y back to X) can be read from right to left. The vertical axis corresponds to the Gibbs free energy (G) of the substances involved, a quantity that takes entropy into account:

(2.29)
$Display mathematics$
(2.30)
$Display mathematics$
H is the enthalpy, T is the temperature, S is the entropy, p is the pressure, V is the volume, and U is the internal energy of the system, which includes the potential energy.

In figure 2.4, the X substances have total free energy Gx, and the Y substances have Gy. The difference in Gibbs free energy before and after the reaction is denoted by Δ‎G = GyGx. If Δ‎G > 0 then the reaction is endergonic, i.e., it absorbs energy from its surroundings, while if Δ‎G < 0 it is exergonic, releasing energy. The reaction of figure 2.4 is exergonic in the forward direction (p.24) (Δ‎G < 0) and endergonic in the reverse direction (Δ‎Gyx > 0). The energy barrier from X to Y is Ea(XY), and from Y to X it is Ea(YX) = Ea(XY) − Δ‎G. Because it faces a smaller energy barrier Ea(XY) < Ea(YX), the forward direction is easier to occur than the reverse reaction; therefore, it is said to be spontaneous. Let us refer to the difference in activation energy before and after the reaction as Δ‎Ea = Ea(YX)−Ea(XY). Note that for any given reversible reaction, the fact that Δ‎Ea > 0 implies that Δ‎G < 0:

(2.31)
$Display mathematics$
Hence, reactions with a negative Gibbs free energy change (Δ‎G < 0) are often referred to as spontaneous, thermodynamically favored, or “downhill” (since they go from a higher to a lower energy landscape). Conversely, because they see a higher Ea barrier than their reverse counterpart, reactions with Δ‎G > 0 are referred to as nonspontaneous, thermodynamically unfavored, or “uphill.”

All reactions are in principle reversible: in figure 2.4, the reaction can flow from left to right or from right to left, as long as the colliding molecules acquire sufficient kinetic energy to jump over the Ea barrier. The speeds with which the reactions proceed in the forward and reverse directions are determined by the Ea height observed at each side of the barrier: the higher the barrier, the slower the reaction. The relation between Ea and the speed of a chemical reaction is given by the Arrhenius equation [42]:

(2.32)
$Display mathematics$
where
• k is the kinetic coefficient for the reaction (recall that k is proportional to the speed of the reaction);

• A is the so-called preexponential factor of the reaction;

• R is the gas constant: R = kB NA, where kB is the Boltzmann constant (kB = 1.38065 × 10−23 JK−1), and NA is the Avogadro constant (thus R = 8.31451 JK −1mol−1);

• T is the absolute temperature inside the reactor.

Irreversible reactions are reactions in which the participating substances fall into a considerably lower energy state, such that the height of Ea barrier seen at the opposite side is so great that it can no longer be overcome in practice: the substances remain “trapped” at the other side.

Uphill reactions require an input of energy to drive them in the forward direction, such their equilibrium can be shifted to yield more products. The relationship between Δ‎G and the equilibrium constant K of a reversible reaction can be derived from the Arrhenius equation:

(2.33)
$Display mathematics$
Hence, for kf > kr (forward reaction favored) we have K > 1 which results in Δ‎G < 0; conversely, for kf < kr (reverse reaction favored), K < 1 and Δ‎G > 0. For Δ‎G = 0 both sides of the reaction occur with equal probability hence their average speed is the same (kf = kr).

In summary, Δ‎G (or equivalently Δ‎Ea) determines the equilibrium properties of the reactions, whereas Ea determines its dynamic (speed-related) properties.

## (p.25) 2.2.7 Catalysis

A catalyst is a substance that participates in a chemical reaction by accelerating it without being consumed in the process. Its effect is to lower the reaction’s activation energy peak, thereby accelerating the reaction, while leaving the initial and final states unchanged. The decrease in Ea is depicted by the red/dashed curve in figure 2.4. Following the Arrhenius equation (Eq. 2.32), the kinetic coefficient k for the reaction increases exponentially with a decrease in Ea. Hence, decreasing Ea has a dramatic effect on k, which is proportional to the speed of the reaction. This explains the role of enzymes in accelerating reactions.

It is important to highlight that the enzyme activity does not shift the equilibrium of a reaction: because it accelerates both the forward and the reverse reactions in the same proportion, Δ‎G does not change, therefore according to equation 2.33, the equilibrium constant K also remains the same.

Catalysts play a fundamental role in both natural and artificial chemistries. Enzymes are proteins that act as biological catalysts. They are essential to numerous cellular processes, such as metabolism, gene expression, and cell replication. Many artificial chemistries rely on catalysts to emulate phenomena related to life, such as autocatalytic cycles, the formation of organizations, self-replication, and Darwinian evolution.

After this brief excursion to chemistry, we now turn to the general structure of an artificial chemistry.

# 2.3 General Structure of an Artificial Chemistry

This section now introduces the basic concepts of artificial chemistries. Let us start with a broad definition, taken from our previous review of the state-of-the-art of ACs [236]: An artificial chemistry is a man-made system that is similar to a real chemical system. By man-made we mean a model system that can either be executed as a simulation (computer model), or as a physical system, i.e., where real physical/chemical processes play a role that do not appear combined naturally in our environment. This definition has been kept as general as possible in order not to exclude any relevant work.

Formally, an artificial chemistry can be defined by a triple (S, R, A), where S is the set of all possible molecules, R is a set of collision rules representing the interactions among the molecules, and A is an algorithm describing the reaction vessel or domain and how the rules are applied to the molecules. Both, the set of molecules S and the set of reaction rules R, can be defined explicitly by stating which molecule(s) reacts (react) to which other molecule or implicitly by giving an algorithm or a mathematical operation for what will happen.

If we recall briefly what we stated in the first section of this chapter, we have the central elements of any simulation at hand: The components or subsystems, here the molecules S (previously Σ‎Si), their possible interactions, here R, and the update rule, here algorithm A (previously U).

## 2.3.1 The Set of Molecules S – Objects

Just as in the general approach to simulation, we start with the identification of components or subsystems of the system to be simulated. Those are the objects or molecules of an artificial chemistry.

(p.26) The set of molecules S = {s1,…, si,…, sn}, with n possibly infinite, describes all different valid molecules that might appear in the AC. A vast variety of molecule definitions can be found in different ACs. A molecule’s representation is often referred to as its structure, which needs to be kept apart from its function, which is given by the reaction rules R. The description of the valid molecules and their structure is usually the first step in the definition of an artificial chemistry. This is analogous to the part of chemistry that describes what kind of atomic configurations form stable molecules and how these molecules appear.

Note that the molecules as components of an AC do not need to be elementary; rather, they could already be subsystems. This is a very important and interesting possibility that we shall later have to elaborate on in detail.

## 2.3.2 The Set of Rules R – Interactions

The set of reaction rules R describes the interactions between molecules siS. A rule rR can be written according to the chemical notation of reaction rules in the form (see eq. 2.11):

(2.34)
$Display mathematics$
A reaction rule determines the n components (objects, molecules) on the left hand side that can react and subsequently be replaced by the m components on the right hand side. In the general case, n is called the order of the reaction, but there are exceptions.

A rule is applicable only if certain conditions are fulfilled. The major condition is that all components written on the left hand side of the rule are available. This condition can be broadened easily to include other parameters such as neighborhood, rate constants, probability for a reaction, or energy consumption. In these cases a reaction rule would need to hold additional information or further parameters. Whether or not these additional predicates are taken into account depends on the objectives of the AC. If it is meant to simulate real chemistry as accurately as possible, it is necessary to integrate these parameters into the simulation system. If the goal is to build an abstract model, many of these parameters might be left out.

## 2.3.3 Reactor Algorithm A – Dynamics

The reactor algorithm A is in charge of simulating the dynamics of a reaction vessel. It determines how the set R of rules is applied to a collection of molecules P, called reactor, soup, reaction vessel, or population. Note that P cannot be identical to S, since some molecules might be present in many exemplars, while others might not be present at all. So while we can speak of S as being a set, P is correctly termed a multiset. The index of S indicates a type of molecule, while the index of P indicates a place in the population which could be occupied by any type of molecule.

Because multisets are a concept used in many artificial chemistries, we’ll take some time now to define and understand it. A multiset is a generalization of a set allowing identical elements to be members. Thus, in order to characterize a multiset, the identity of each element must be known, as well as how many of these elements are contained in the multiset. For instance, M = {a, b, b, c, c, c} is a multiset, while S = {a, b, c} would be a set. Multisets are used in computer science, sometimes also called “bags.” The order of elements appearing in a multiset is not important, such that {c, a, b, c, b, c} = M is the same multiset as the one above. The cardinality of the multiset is the total number of elements, i.e., card(M) = 6. The multiplicity of an (p.27) element is the number of times it appears in the multiset. For M, the multiplicity of element a is 1, for b it is 2, and for c it is 3.

From this it can be easily seen that an alternative definition of a multiset is that it is a 2-tuple (S, m) where S is a set and m is a function from S into the natural numbers m : S → ℕ. m counts the number of occurrences of an element sS, i.e. its multiplicity. For the above example we have again, m(a) = 1; m(b) = 2; m(c) = 3. The multiset M can therefore also be written as the tuple: ({a, b, c},{1,2,3}).

As we can see here, the state of a chemistry (in terms of the number of molecules of a particular element) can be represented as multiset, where all the molecular species correspond to elements of S and the molecule count corresponds to their respective multiplicity. It is, of course, not so interesting to see only a snapshot of a chemistry, but rather its changes over time. As chemical reactions happen, new molecules appear and previously existing molecules are consumed. So it is really the changes of the counts of molecules (and possibly) the increase or decrease in the cardinality of the multiset (and the underlying set) that is of interest when looking at chemical reactions.

Two classes of reactor algorithms can be distinguished: microscopic and macroscopic. The former takes into account individual molecular collisions, and is inherently stochastic. The latter approximates a system containing a vast amount of molecules, and consists mostly in numerical methods for the deterministic integration of systems of differential equations such as equations. (2.20) and (2.21). We will briefly describe a few reaction algorithms in this section, while their detailed explanation is reserved for chapter 4.

Although the algorithm A is conceptually independent of the representation of R and P, its performance and algorithmic complexity will often be influenced by the way R and P are represented as data structures in the computer. This is especially true for P, which may represent huge numbers of molecules. An analysis of the algorithmic complexity of a few popular reaction algorithms will be discussed in chapter 4.

#### (1) Stochastic molecular collisions:

In this approach, every molecule is explicitly simulated and the population is represented as a multiset P. A typical algorithm draws a random sample of molecules from the population P, assumes a collision of these molecules, and checks whether any rule rR can be applied. If the conditions of a rule r are fulfilled, the molecules are replaced by the molecules on the right-hand side of r. In the event that more than one rule can apply, a decision is made which rule to employ. If no rule can be applied, a new random set of molecules is drawn. The actual algorithm, though, must not be that simple. Further parameters such as rate constants, energy, spatial information, or temperature can be introduced into the rules for the chemistry to become more realistic.

The following example is an algorithm used for an AC with second-order reactions only: (p.28)

while ¬terminate () do

s1 := draw(P);

P := remove(P, s1);

s2 := draw(P);

if ∃ (s1 + s2s1 + s2 + … + s′m) ∈ R

then

P := remove(P, s2);

P := insert (P, s1, s2, …, s′m);

else

P := insert(P, s1);

fi

od

The function draw returns a randomly chosen molecule from P (without removing it from P). The probability that a specific type is returned is proportional to the concentration of this type in P. If a reaction rule that applies to the selected molecules is found, the corresponding reaction products are injected into P. In this case, we say that the collision between the two molecules was effective. If no applicable reaction rule can be found, the drawing of the two molecules does not produce any change to the reactor vessel. We say that the collision between the two molecules was elastic. The collision is simulated by the drawing of molecules, whereas the reaction is given by the reaction rule. Note that the first molecule (s1) is removed such that it cannot be drawn twice in the same reaction, as in chemistry; if the collision is elastic, s1 is returned to the vessel in order to restore its original state before the collision.

The previous algorithm simulates a closed system in which no molecules flow in and out of the vessel. Most systems of interest, however, are open systems relying on an inflow of raw material and/or energy, and an outflow of product molecules or elimination of waste. To begin with, a dilution flux of molecules can be easily added to the algorithm. In chemistry, dilution means the addition of a solvent such as water: the overall volume of the vessel increases, and the concentration of each molecular species that it contains decreases accordingly. In an artificial chemistry, the notion of a dilution flux is frequently associated to a vessel that overflows, such that any excess molecules are removed. This is the notion that we are going to use for the next algorithm. In a well-stirred reactor, all molecules have an equal probability of dropping out of the vessel by excess dilution. Therefore, for a given molecular species, the probability of being chosen for removal is proportional to its concentration. So the following lines could be added to the while loop:

s := draw(P);

P := remove(P, s);

We have now assumed that molecules s1, s2 are consumed by the reaction, molecules s′1, s′2,…, s′m are newly produced and one arbitrary molecule s is removed from the reaction vessel.

The simulation of every molecular collision is realistic and circumvents some difficulties generated by collective descriptions we shall discuss. There are, however, certain disadvantages to an explicit simulation of every collision. If reaction probabilities of molecular species differ by several orders of magnitude this low level simulation is not efficient. Also, if the total number of different molecular species is low or the population is large, or the number of different reaction rules is large, such an explicit simulation will be slow.

#### (p.29) (2) Stochastic simulation of chemical reactions:

The stochastic molecular collision algorithm presented before correctly mimics the stochastic dynamics of reactions triggered by individual molecular collisions. However, it has a number of shortcomings. First, the algorithm is inefficient when most collisions turn out to be elastic. Moreover, it is only applicable to the case of bimolecular reactions. It is difficult to extend it to arbitrary reactions with different kinetic coefficients.

In order to simulate the stochastic behavior of a chemical reaction system in an accurate and efficient way, a number of specialized algorithms have been proposed. Perhaps the most well-known one among these is Gillespie’s stochastic simulation algorithm (SSA) [328]: at every iteration, it decides which reaction will occur and when. Therefore, only effective collisions are computed and elastic collisions do not play a role in this algorithm, which can save substantially on simulation time. The reaction chosen is drawn from the set R with a probability proportional to its propensity, which is proportional to the rate of the reaction. So faster reactions will occur more often, as expected. The time increment when the reaction should occur is drawn from an exponential distribution with mean inversely proportional to the sum of propensities of all reactions in R. This means that in a system with a large number of fast reactions, the interval between two reactions is small; conversely, a slow system with few reactions will display a long time interval between reactions, on average; which is again the expected behavior. Gillespie proved that his algorithm accurately simulates the stochastic behavior of chemical reactions in a well-mixed vessel. Gillespie’s algorithm will be described in more detail in chapter 4, along with other algorithms in this category.

#### (3) Continuous differential or discrete difference equations:

A further step away from individual reaction events is often more appropriate to describe the behavior of a chemical system. A typical example is the simulation of large numbers of molecules, like in real chemistry. In such a scenario, simulating individual molecular collisions would be inefficient; a collective approach based on deterministic simulation of the macroscopic behavior of the system is sufficiently accurate. For this purpose the dynamics of the chemical system is described by differential rate equations reflecting the development of the concentrations of molecular species, as explained in section 2.2.3. The index carried by a molecule now characterizes its species (Si), not its place in the population.

The method to simulate the chemical reaction system now consists in numerically integrating the corresponding ODE: given the stoichiometric matrix and the rate vectors for the system, and starting from the initial concentrations of all substances Si, a numerical integrator computes the changes in concentration for each substance in time according to equation (2.21), and outputs a set of vectors containing the concentration timeline for each substance. Several numerical methods are readily available for this purpose, in well-known scientific software packages. The simplest one is to compute the concentration changes in discrete time steps Δ‎t, and then add the resulting Δ‎Si value to the absolute concentration Si. This is called the explicit Euler method.

Although simplistic and inaccurate, it can offer sufficiently good approximations provided that the time step Δ‎t can be made small enough, and that the integration process does not last too long (since this method accumulates errors with time).

Rate equations are a continuous model for a discrete situation. While concentrations might approach zero continuously, this is not the case for the real system: If the last molecule of a particular species disappears in the volume under consideration, there is a discrete change in concentration of this species from a finite quantity to zero. This causes a number of anomalies, (p.30) and the modeler should be aware of the limitations of this model. Using differential equations is an approximation for large numbers of molecules of the same species in the volume under consideration.

One of the disadvantages of this approach deserves mentioning: Concentration values are normally represented in a computer by floating point numbers. They may be so small as to be below the threshold equivalent of a single molecule in the reactor. For example, if the assumed population size is 1,000 and [si] is below 0.001, then every further calculation of reactions using such a small concentration value si would cause unnecessary computational effort because species si is effectively no longer present in the reactor. In fact, these calculations would be misleading as they would still pretend that a possibility exists for a reaction in which si might participate, when in reality si has disappeared from the reactor. In order for the differential equation approach to be useful, the number of different molecular species N should be small relative to the total number of molecules, M,

(2.35)
$Display mathematics$

One can see that this condition contradicts the condition for the existence of novelty in a system, see eq. 1.6. We conclude that for a treatment of systems with novelty differential equations don’t seem to be an appropriate approach. We shall discuss this in more detail later.

The metadynamics approach acknowledges the limitations of the differential equation model because it assumes that the number of species and therefore the number of differential equations may change over time [49, 50]. The equations at a given time only represent dominant species, i.e., species with a concentration value above a certain threshold. All other species are considered to be part of the background (or might not even exist). As concentration values change, the set of dominant species may change as well. Thus the differential equation system is modified by adding and/or removing equations. One can distinguish between deterministic and stochastic metadynamics. In deterministic metadynamics, there are basically no surprises. The set of dominant species follows the development of its differential equations, until the concentration threshold causes a change in the set. The sequence of sets resulting from this dynamic explores the potential reactions embodied in the differential equation system but is purely deterministic. All pathways are internal to the system. As a result, a final state, the metadynamic fixed point, or pinned state, is reached.

Stochastic metadynamics assumes the existence of random events that change the background of dominant species. Every so often, a species from the background is allowed to interact as if it were above threshold. This will potentially result in a chain reaction causing the metadynamic fixed point to change. If that happens, a new pinned state can be reached and the system has shown some sort of innovation. There is a second type of stochastic metadynamics where, instead of bringing in a random species from the background, the set of reactions is changed through a random event [425, 426]. This can also result in a change in the fixed point behavior of the underlying system. We will need to discuss this type of approach in more detail in chapter 15 (section 15.2.2).

If applied in the right way, metadynamics can combine the speed of a differential equation approach with the physical accuracy of a stochastic molecular collision simulation.

There is one further variant on this approach: If the differential equation system (2.20) is sufficiently simple, symbolic integration of the equations becomes possible. This way, the steady state behavior of the system (fixed point, limit cycle, chaotic behavior …) can be derived analytically. (p.31)

Table 2.2 Comparison of reactor algorithms

Entities

Formalism

Interaction

Individual molecules

Stochastic molecular collisions

Collisions and rule-based reactions

Molecular species

Stochastic simulation of chemical reactions

Possible reaction channels

Molecular species

Concentration/rate differential equations

Coupling between equations

Dominant molecular species and background

Coupling between equations and random changes to couplings

Dominant molecular species and single molecules

Explicit simulation of molecules and ODE of dominant species

Coupling between equations and collision-based reactions of a small number of molecules

This technique can be applied in the metadynamics approach to calculate the dynamical fixed point of the differential equations in a very compact way.

In most cases, however, the differential equation system will contain nonlinear coupling terms and, as a result, will be difficult to solve analytically. Thus, we should keep in mind that the analytical approach is feasible only in a very small subset of ACs. In most cases, numerical integration methods have to be applied.

#### (5) Hybrid approaches:

There are also approaches where single macromolecules are simulated explicitly and a small number of molecules are represented by their concentrations (see, for instance [952, 953]).

Table 2.2 compares the different approaches to modeling ACs through the reactor algorithm, all based on the assumption of a well-stirred chemical reactor.

# 2.4 A Few Important Distinctions

Each of the elements {S, R, A} that characterize an artificial chemistry can be defined in an explicit or implicit way. These differences will have an important impact on the behavior of the chemistry; therefore, we will often use this distinction to classify or compare the various ACs introduced later in the book.

## 2.4.1 Definition of Molecules: Explicit or Implicit

In an artificial chemistry {S, R, A} the set S represents the interacting objects or molecules. S can be defined explicitly or implicitly.

Molecules are explicitly defined if the set S is given as an enumeration of symbols, e.g., S = {r, g, b}. An implicit definition is a description of how to construct a molecule from more basic elements. This description may be a grammar, an algorithm, or a mathematical operation. (p.32) Examples for implicit definitions are: S = {1,2,3,…}, the set of natural numbers; or S = {0,1}*, the set of all bit strings.

To build constructive dynamical systems (see chapter 15) it is convenient to define molecules implicitly. Typical implicitly defined molecules are character sequences (e.g., abbaab), mathematical objects (e.g., numbers), or compound objects which consist of different elements. Compound objects can be represented by data structures [562]. We will refer to the representation of a molecule as its structure.

In some artificial chemistries, the structure of a molecule is not a priori defined. Instead, the arrival of molecules is an emergent phenomenon and interpreting a structure as a molecule is possible only a posteriori. In these cases, it is not possible to define the molecules because they are emergent to the process. However, ACs of this type will possess lower level elements that can be defined.

## 2.4.2 Definition of Interactions/Reaction Laws: Explicit or Implicit

Once we have defined the molecules of our AC, it is time to specify their interactions. These interactions can be defined, again, in two different ways.

#### Explicit:

The definition of the interaction between molecules is independent of the molecular structure. Molecules are represented by abstract interchangeable symbols and the total number of possible elements of the AC is fixed. Reaction rules can be enumerated and explicitly given. All possible elements and their behavior in reactions is known a priori. Their interaction rules do not change during the experiment.

#### Implicit:

The definition of the interaction between molecules depends on their structure. Examples for implicit reaction rules are concatenation or cleavage reactions of polymers. From the number division example we discussed in chapter 1 we remember that particular numbers under division behave differently. We can say that this is the result of their structure. An artificial chemistry with an implicit reaction scheme allows to derive the outcome of a collision from the structure of its participating molecules. Therefore, there is no a priori way of knowing or enumerating the molecules beforehand. The number of possible molecules can be infinite, even. Implicitly defined interactions are commonly used for constructive artificial chemistries.

## 2.4.3 Definition of the Algorithm/Dynamics: Explicit or Implicit

Finally, the algorithm A of the artificial chemistry {S, R, A} producing the dynamics of the system needs to be defined. The explicit definition of the system’s dynamics has already been given in section 2.3.3. This definition is necessary from a computational point of view, because in a computer only the effective execution of a series of formally defined interactions causes the system to change and to show any kind of dynamic behavior. An explicit definition of the dynamics is based on the interactions determined through R and may include various parameters, ranging from temperature, pressure, pH-value to field effects of secondary or higher level molecule structures, resulting in an artificial chemistry that can be used in the field of computational chemistry [182]. It is shown in section 2.5, however, that even a random execution of interactions with no additional parameters causes interesting dynamical phenomena.

(p.33) Sometimes, the definition of interactions makes the additional definition of the A unnecessary. This implicit definition is used, for example, in a cellular automata chemistry, where the dynamics is caused by the synchronous or asynchronous update of lattice automata sites.

# 2.5 Two Examples

Let us now look at two examples of very simple artificial chemistries, in order to get a feeling how one would go about setting up these kinds of systems.

## 2.5.1 A Nonconstructive Explicit Chemistry

The first example is intended to demonstrate the relation of an explicit simulation of molecules to their differential equation model. We call this a nonconstructive AC, since all molecular objects are known beforehand and can be explicitly taken care of in a simulation model. The example is inspired by a problem posed in the “Puzzled” column of a computer science magazine [922]. The puzzle goes like this:

A colony of chameleons includes 20 red, 18 blue, and 16 green individuals. Whenever two chameleons of different color meet, each changes to the third color. Some time passes during which no chameleons are born or die, nor do any enter or leave the colony. Is it possible that at the end of this period all 54 chameleons are the same color?

The answer is “no” for this particular distribution of chameleons at the outset; see [921]. Yet there are many other configurations in which this is possible, namely those where the difference between the number of chameleons of different colors is a multiple of 3. However, any such outcome is extremely unlikely, because in a world where probabilities reign, encounters of chameleons are dominated by the most frequent ones, just the opposite of the very low number required for the mathematical problem posed here. Be that as it may, we are not so much interested in the answer to this concrete question (more on it later), but to take the idea of an interaction of chameleons and model it as an artificial chemistry. In the realm of chemistry and ACs as well, encounters are also governed by probabilities, i.e., the most frequent molecules bump into each other most frequently.

#### Objects/molecules:

In order to model this problem as an AC, we need to first determine the interacting objects. There are three types of objects here, red r, green g, and blue b chameleons, each in different quantities.

#### Interactions/reactions:

Their interaction is determined by the following rules:

(2.36)
$Display mathematics$
(2.37)
$Display mathematics$
(2.38)
$Display mathematics$
(p.34) Rules obey symmetry in that an encounter of r and g has the same effect as an encounter of g and r.2 All other encounters between chameleons (“collisions”) are elastic, i.e., do not lead to changes in color. These reaction equations can be compactly written as
(2.39)
$Display mathematics$
where i stands for any chameleon with color i and indices i, j, k run through r, g, b colors.

#### Dynamics/algorithm:

The algorithm is very simple in this case. Since there is no additional production of objects, we don’t need a flow term, the total population will always remain of the same size. A flow term would guarantee a constant population size by flushing out an appropriate number of objects, if that were needed. The algorithm is really a simple adaptation of the algorithm introduced before:

while ¬terminate () do

s1 := draw(P);

s2 := draw(P);

if (s1s2)

then

P : remove(P, s1, s2);

P := insert(P, s3, s3);

fi

od

The only variation we have made here is that the insert-operation inserts the new objects into the same places where the reactants were found. One iteration of the while-loop simulates one collision of two molecules and we can count the number of each molecular type over the course of iterations. In appendix A we have included the code for this example in the Python programming language. There, you will also find a small simulation environment for allowing you to set up ACs conveniently. You can download this and later examples from www.artificial-chemistries.org.

Before we now look at the behavior of this AC, a few notation techniques need to be mentioned. We can write the reaction rules in the form of a reaction table:

Reactants

Products

rules

r

g

b

r

g

b

r1

1

1

2

r2

1

1

2

r3

1

1

2

or, shortly,

r

g

b

r

Ø

2b

2g

g

2b

Ø

2r

b

2g

2r

Ø

where Ø means that no objects are produced by the interaction listed. These tables can then be translated into reactions graphs of the sorts depicted in figure 2.2.

If we start with approximately the same number of chameleons, we shall not see much development, except that in many places, color changes due to a random encounter. However, if we start with chameleons of only two colors, we will see initial development toward an equilibrium of colors, see figure 2.5. (p.35)

Figure 2.5 The first two generations (equivalent to 10,800 iterations) of an explicit simulation of encounters between chameleons changing colors. A total of 5,400 chameleons is simulated, starting from a condition where only blue and read chameleons are present, in virtually equal numbers, due to randomized initial condition. Red chameleons quickly pick up in numbers, until only fluctuations are left between numbers.

We can see that dynamics at work more clearly if we now formulate the differential equation system for the chameleon example and integrate these equations numerically.

#### The model with differential equations:

As said previously, an alternative to a molecular simulation of each and every encounter between molecules is to write down rate equations for the reaction system. We start by representing each species by a concentration vector $x ( → t)=( x r ( t ), x g ( t ), x b ( t ) )$, where xr (t), xg(t) and xb(t) denote the concentration of chameleon types r, g, b at time t, respectively. The reactions are set up such that the total number of chameleons is constant 0 ≤ xr, g, b ≤ 1 and their sum is the total concentration, $∑ i=r,g,b x i ( t )=1$, summing to 1. The corresponding differential equations read:

(2.40)
$Display mathematics$
(2.41)
$Display mathematics$
(2.42)
$Display mathematics$
A dilution flux Φ‎(t) is not necessary here because as a result of each collision two objects are created and two are removed.

#### Comparison of the two descriptions:

Figure 2.6 shows the simulation of an explicit molecular system with a system of differential rate equations. Initial state has been prepared carefully so as to be identical in both approaches. The time scale of the differential equation simulation has to be carefully chosen, such that a number of single molecular reactions is equivalent to one step of the numerical integration is executed. Notably, precaution has to be taken about (p.36)

Figure 2.6 A comparison of two generations (equivalent to 10,800 iterations) of an explicit simulation of encounters with an integration of rate equations 2.40 — 2.42. Again, a total of 5,400 chameleons is simulated. Here, colors are distributed differently, but equilibrate after some time. The differential equation (DE) curves show good agreement with the explicit simulation.

the fact that there are a number of elastic collisions in the explicit simulation model. Those elastic collisions correspond to a change in the time scale for the update rule of rate equations. In our particular example, elastic collisions happen when two chameleons of the same color encounter each other, i.e., with a probability $p e = ∑ i=r,g,b x i 2$. In other words, only in 1− pe cases a reaction takes place, and this is the quantity with which the time scale of the integration has to be corrected.

As we increase the number of chameleons, the fluctuations in a stochastic explicit simulation average out and the description approaches the rate equation model, which assumes an infinite population of each type of reactants.

## 2.5.2 A Constructive Implicit Chemistry

As example for a constructive, implicitly defined artificial chemistry we explain the number-division chemistry examined previously in [72, 113]. In chapter 1 we alluded to the fact that we can imagine numbers reacting with each other to produce new numbers. The arithmetic operations of multiplication and division of natural numbers are particularly fruitful examples for demonstration purposes. Here we shall focus on the division operation as our implicit reaction, which can be defined over the infinite set of natural numbers. We shall, however, exclude the numbers “0” and “1” from our artificial chemistry, since they behave rather exceptionally under division. Otherwise, a potentially infinite number of molecular types are present. So let’s again start with the objects of our system and work our way to the dynamics from there.

#### Molecules:

The set of molecules S are all natural numbers greater than one: S = {2,3,4,…}. This set is infinite, but we do not expect to have an infinite reaction vessel. Thus, only a finite subset of numbers will be realized in our actual reaction system at any given time.

#### (p.37) Reactions:

Mathematical division is used to calculate the reaction product. If two molecules s1, s2S collide and s1 can divide s2 without remainder, s2 is transformed into s2/s1. Thus s1 acts as a catalyst. The set of reactions can also be written implicitly as

(2.43)
$Display mathematics$
If two molecules collide and there is no corresponding reaction rule in R, the collision is elastic.

#### Dynamics:

The algorithms draws randomly two molecules from the population and replaces the larger number provided it can be divided by the smaller reactant. In case they are equal, or cannot be divided the collision is elastic. Written in pseudo-code, the algorithm is:

while ¬terminate () do

s1 := draw(P);

s2 := draw(P);

if (s2 > s1) ^ (s2 mod s1 = 0)

then

s3 := s2/s1;

P := remove(P, s2);

P := insert(P, s3);

else

if (s2 < s1) ^ (s1 mod s2 = 0)

s3 := s1/s2;

P := remove(P, s1);

P := insert(P, s3);

fi

fi

od

Despite the simple algorithm, the dynamic behavior of the system is fairly interesting. After repeated application of the reaction rule, prime numbers emerge as stable (nonreactive) elements and the whole population tends to eliminate nonprime numbers (figure 2.7). Elsewhere we have shown that only prime numbers remain after some time. This is surprising at first, because nothing in the experimental setup (neither in S nor in R) specifies any condition regarding primes. Upon further analysis, however, one realizes that the reaction rule precisely determines primes as nonreactive.

We have now seen the implicit character of the reaction system prohibiting the formulation of a reaction matrix, but it remains to be seen in what sense this system is constructive. After all, initializing a population with all numbers between 2 and 100 doesn’t produce new types of numbers.

The previous simulation can quickly be turned into a constructive system, through the initialization with 100 numbers out of a larger range of integers. For instance, if we initialize in the interval I = [2,1000], we will see new numbers, which have not appeared before. If again a population of M = 100 numbers is initialized (this time through a random draw from the interval I), the condition N >> M is fulfilled. An analysis of the systems under these new conditions must be done carefully, since we don’t know from the outset which number types will appear. Figures 2.8 and 2.9 show one run of the reaction system with new numbers being generated. (p.38)

Figure 2.7 Development of (prime) number count over time in iterations. Population size M = 100. Population initialized with each number from S = {2,3,…,101}. Prime numbers emerge as dominant, since they become stable in the population.

From the initial status of the system, gradually more smaller numbers and primes are produced. The smaller numbers further react until all numbers are divided into primes. The primes cannot be divided any more and thus accumulate in the reactor. The concentration of primes shows a typical sigmoidal growth pattern and finally reaches a maximum at prime concentration cprimes = 1.0. Note, however, that this AC does not produce the prime factorization of the initial set of numbers, because divisors act as catalysts in the reaction system of equation 2.43, i.e., there are no copies produced in the division operation. This explains the fact that we do not see more than a limited number of copies of primes in the final state of the reaction vessel.

# 2.6 Frequently Used Techniques in ACs

A number of techniques often used in ACs can be highlighted: for instance, sometimes it is interesting to measure time in terms of the number of collisions that happened in the system, instead of an absolute time unit. Pattern matching to emulate molecule recognition and binding (mimicking the way biological enzymes recognize their protein or DNA targets) is another commonly used technique in ACs. A third technique is the use of a spatial topology as opposed to a well-mixed vessel that was discussed in section 2.2. An overview of these techniques can be found below, and more about them will appear throughout the book as they are discussed in the context of the various ACs covered. (p.39)

Figure 2.8 Development of (prime) number count over time in iterations. Population size M = 100, population initialized with random integers taken from the interval S = {2,3,…,1000}. Some numbers appear and disappear. Again, primes emerge and stay.

## 2.6.1 Measuring Time

One step of the algorithm in section 2.3.3 can be interpreted as a collision of two (or more) molecules. Simulated time is proportional to the number of collisions divided by reactor size M. It is common to measure the simulated time in generations, where one generation consists of M collisions, independent of whether a collision causes a reaction or not. Using M collisions (a generation) as a unit of time is appropriate because otherwise an increase of the reactor size M would result in a slowdown of development speed. If, for example, in a reactor with a population size of 105, 1010 collisions were executed, this would mean that — statistically speaking — every molecule of the population did participate twice in a reaction. If the same number of collisions were executed in a reactor of size 1020, only half of all molecules would have taken part in a reaction. If time would not be scaled by reactor size, this would result in a development of the reactor at a quarter of the speed as the previous one.

In continuously modeled ACs the progress of time depends on the integration step size of the numerical ODE solver. For a standard solver, a fixed step length of h = 10−2 is a common setting. Thus, 100 evaluations of the ODE are needed to get a unit length time progress. If one wants to compare a stochastic simulation with a ODE numerical model, care must be taken to adapt the time measurements for both approaches. This will have to take into account the probability of elastic collisions as we have exemplified in the chameleon example. (p.40)

Figure 2.9 Closer look at development of (prime) number count over time in iterations, only interval [2,100] shown. New numbers appearing can be clearly identified, as they come and go.

## 2.6.2 Pattern Matching

Pattern matching is a method widely used in artificial chemistries and other artificial life systems. A pattern can be regarded as information about (or as a means to identify) the semantics of a subcomponent or location. It allows to refer to parts of a system in an associative way independent of the absolute position of these parts. For instance, Koza used the shape of trees to address subtrees and to select reaction partners [472]. In the field of DNA computing, pattern matching is a central mechanism in the alignment of two DNA strands. There, a pattern is a sequence of nucleotides such as CGATTGAGGGA… In the Tierra artificial life platform [701], a pattern is given by a sequence of NOP0 and NOP1 operations in the genome of an individual and is used to direct jumps of the instruction pointer, which points to the next instruction the program will execute. We shall encounter many other examples throughout the book.

The accuracy of a match can be used to calculate the probability of a reaction or rate constants, as suggested in, e.g., [49]. In order to compute the accuracy, a distance measure between patterns is needed, that allows to compute a “similarity” between two molecules by assigning a numerical value. One convenient distance measure for binary strings of the same length is the Hamming distance, whereas for variable length strings one can use the string edit distance. The Hamming distance is the number of bits that differ between the two binary strings. The edit distance between two strings is the minimum number of editing operations (insertions, deletions and replacements of single characters in a string) necessary to convert one of the strings into the other.

## (p.41) 2.6.3 Spatial Topology

In many artificial chemistries, the reactor is modeled as a well-stirred tank reactor, often with a continuous inflow and outflow of substances. In a well-stirred tank reactor the probability for a molecule si to participate in a reaction r is independent of its position in the reactor.

In a reactor with topology, however, the reaction probability depends on the neighborhood of a molecule si. This neighborhood may be determined by the vicinity of si, for example, as measured by a Euclidian metric. This space will usually have a two- or three-dimensional extension. Alternatively, the neighborhood may be defined as the neighborhood in a cellular automaton (e.g., [530, 881]) or as a self-organizing associative space ([232]). In some cases, specific features, like diffusion, can be present in a reaction system [939].

All these different spatial structures have one feature in common: the topology/spatial metric has some influence on the selection of reactants by the algorithm A. While the analysis of ACs becomes more complicated by adopting an additional spatial topology, some properties can be observed only with a spatial structure and will not occur at all otherwise.

## 2.6.4 Limited Resources

Limited resources often determine which models can be possibly simulated and examined and which cannot. Therefore, it is important to consider the resource implications of certain decisions made in setting up a simulation model. A comparison of computational costs is first among those. Here, we shall look at a comparison of the two basic approaches, differential rate equation simulations and models using discrete molecular collisions. In particular, we want make more quantitative statements about when it would be useful to apply one of these methods versus the other.

In order to arrive at those statements, we shall need to estimate the computational costs of computing explicit molecular collisions vs. numerically solving the equivalent ODE system. We shall assume that 〈n〉 is the mean value of the number of molecules participating in each reaction r in R, and N is the number of different species in a population P of size M. First we need to estimate computational costs of performing a single step of both major approaches.

In an explicitly performed simulation, a reaction requires O(〈n〉) draw-operations, followed by O(R) lookups in the reaction rule set R. Thereafter O(〈n〉) insert-operations are performed for an intermediate update. This leads to the estimation of O(〈n〉) + O(R) costs for each reaction, which can be approximated by O(R), because in almost every (artificial) chemistry the total number of different reactions is much larger than the average number of reactants of a single reaction. A full update of the population P thus requires O(R) ·O(M) = O(R · M) operations.3

An integration of the system of ODE equations (eq. 2.20) requires N differential equations, each of which is of order 〈n〉. Assuming an integration step size h, an update of the concentration vector $s →$ with a numerical integration method thus has computational costs of $O( 1 h ⋅N⋅〈n〉 )$ function evaluations, because $1 h$ evaluations of the system with N equations of order 〈n〉 are needed to perform a single step. So $O( 1 h ⋅N⋅〈n〉 )$ computations have to be performed to have the same progress as M reactions would generate in an explicit simulation with a population P of size M. If a very common integration method like the standard Runge-Kutta method is used, (p.42)

Table 2.3 Comparison of the computational costs (in instructions). Fixed step size h = 0.01, assumed average reaction order 〈n〉 = 2: White rows: Numerical integration is less expensive; Blue rows: Atomic collision simulation less expensive; Pink rows: Costs are equal

Parameters

Costs

Pop. size M

Species N

Reactions R

Collisions

Integration

103

10

10

104

103

103

10

102

105

103

103

10

104

107

103

103

104

10

104

106

103

104

102

105

106

103

104

104

107

106

103

108

10

104

1010

103

108

102

105

1010

103

108

104

107

1010

106

10

10

107

103

106

10

102

108

103

106

10

104

1010

103

106

104

10

107

106

106

104

102

108

106

106

104

104

1010

106

106

108

10

107

1010

106

108

102

108

1010

106

108

104

1010

1010

109

10

10

1010

103

109

10

102

1011

103

109

10

104

1013

103

109

104

10

1010

106

109

104

102

1011

106

109

104

104

1013

106

109

108

10

1010

1010

109

108

102

1011

1010

109

108

104

1013

1010

an additional factor of 4 (because of the intermediate steps) needs to be weighed in, so that the overall costs sum up to $O( 4⋅ 1 h ⋅N⋅〈n〉 )$, which can be approximated by $O( 1 h ⋅N )$. Note that this simple Runga-Kutta method uses a fixed step size. If the solver employs a variable step size, parameter h will change accordingly.4

Though constant factors were omitted in the assumption of runtime behavior expressed in O(…) terminology, they may play an important role in the calculation of the actual computational (p.43) effort.5 Results of the comparison of the two approaches are shown in table 2.3. In general, integration is advantageous if the population is large and the number of species is small. In the opposite, case an explicit simulation of molecular collisions shows better efficiency.

Even without taking the different costs of a function evaluation into consideration (which is of course much more expensive compared to accessing an element of the population) it is clear that both approaches have advantages and drawbacks.

Computational costs as defined here are a narrow term that translates mainly into “time of computation.” However, space of computation is another resource, and stems from memory requirements of parameters that need to be stored. There are different demands on memory for both methods as well. For instance, in order to integrate ODEs the reaction parameters need to be stored in matrices prior to integration. This requires O(N n) storage locations. In a molecular simulation with explicit reactions, requirements are similar. In a molecular simulation with implicit reactions, however, reactions can be stored in a lookup table that grows as the number of new reactions that never had been encountered before grows. In this way, a more efficient use of space can be achieved. Keep in mind that — as in other algorithms — one can sometimes trade time for memory or the other way around, so the entire picture needs to take into account both requirements when dealing with resource limitations.

# 2.7 Summary

In this chapter we have discussed general principles of chemistry and how chemical reactions can be modeled mathematically and simulated in a computer. We have also seen how complex chemical reactions networks can be formed from simple reactions. We have learned about the general structure of an artificial chemistry and realized that it reflects the needs of all modeling and simulation approaches. We have then looked at two examples of very simple ACs to demonstrate the idea. Finally, we have discussed a number of frequently used terms and techniques in AC. The discussion in this chapter tried to summarize on an introductory level some of the bare essentials necessary to understand the rest of the book. (p.44)

## Notes:

(1) Note that the stoichiometric coefficients αi, j and βi, j are set to zero if a species Si does not participate in the reaction rj (αi, j = 0 if Si is not a reactant and βi, j = 0 if it is not a product of reaction rj).

(2) While this is the natural case in chemistry due to the physical features of space, it is not required for artificial chemistries and must be explicitly considered.

(3) These computational costs will reduce to O(M) if the lookup of a reaction can be done in O(1) which is, e.g., the case for reactions implicitly defined by the structure of a molecule. However, calculation of implicit reactions will then replace the lookup, and might carry even higher costs. So the comparison of costs in table 2.3 assumes O(R) lookup costs.

(4) Variable step size solvers might be necessary if the rate constants of the reactions differ by several orders of magnitude, which in turn makes the system become stiff).

(5) This comparison does not consider other effects such as creating the reaction table, or the creation and deletion of new species during the evolution of a system. It also neglects the memory requirement of both approaches, which causes additional computational effort. Nevertheless, this comparison gives as a result a coarse evaluation of the approaches.