## Wolfgang Banzhaf and Lidia Yamamoto

Print publication date: 2015

Print ISBN-13: 9780262029438

Published to MIT Press Scholarship Online: September 2016

DOI: 10.7551/mitpress/9780262029438.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 01 August 2021

# The Essence of Life

Chapter:
(p.111) Chapter 6 The Essence of Life
Source:
Artificial Chemistries
Publisher:
The MIT Press
DOI:10.7551/mitpress/9780262029438.003.0006

# Abstract and Keywords

This chapter briefly explores the potential mechanisms underlying a transition from inanimate matter to life. First of all, the structure of a minimal cell is discussed, according to chemoton theory and autopoiesis. A summary of current theories about the origin of life on Earth then follows: starting from the formation of organic compounds in a prebiotic world, various hypotheses have been formulated about how these basic building blocks could have been combined into self-replicating and evolving structures that can be considered alive. Three of these hypotheses are covered: RNA World hypothesis, the Iron-Sulfur World, and the Lipid World. The chapter is concluded with an overview of Artificial Chemistries intended to study minimal cells and the origins of life, such as autocatalytic sets and autopoietic protocells.

Organisms are different from machines because they are closed to efficient causation.

ROBERT ROSEN, LIFE ITSELF, 1991

The basic atoms of life (carbon, hydrogen, oxygen, and nitrogen) are everywhere. They are among the most common and cheapest chemical elements found on Earth. And yet, mixing them all together in a container does not result in anything near organic. The key to life is the way these atoms can form basic organic building blocks, and then how these building blocks can be combined into highly organized complex structures.

The organization of current life forms is very complex indeed. How does such complexity emerge? Would it be possible to reduce it to the bare essential? Would it be possible to put together a very simple cell out of the basic building blocks of life, one that is much simpler than today’s organisms, but still functional and able to evolve into more complex forms? What is then the essence of life?

Now that we have looked at how the building blocks of life are combined into the complex organisms that we see today, we will look at how they could be combined (hypothetically) to build the simplest possible living cell. Initially, assume that building blocks such as fatty acids, amino acids, and nucleotides are available in abundance. The possible origin of these compounds, and how they could have led to life on Earth, will be discussed in section 6.2.

# 6.1 A Minimal Cell

Recall from the operational definition of life (chapter 5) that a minimal cell requires three basic subsystems: a compartment, an information carrier, and a metabolism. Different origin-of-life theories differ in their hypotheses about which of these subsystems may have emerged first, and how they may have been integrated to form a living cell. This topic will be covered in section 6.2. For now, we will look at how each individual subsystem may be put together using the building blocks of life introduced in section 5.2. We will then look at how the subsystems may be combined into a minimal cell, as an outcome of a natural or artificial process, without any necessary connection to what may have actually happened when life first appeared on the primitive Earth. A brief summary of the various theories about the actual origin of life on Earth will then follow.

## (p.112) 6.1.1 Membrane Compartment

A compartment defines the boundaries of the cell, protecting its information and metabolic content from the external environment. A compartment must also perform the role of a communication channel, selectively importing nutrients into the cell, exchanging signaling molecules, and expelling waste and toxic substances. The structure of modern cell membranes offers an example of a sophisticated compartment: a number of specialized proteins are embedded in the membrane’s phospholipidic bilayer, forming selective pores that actively facilitate the passage of certain molecules while blocking others. Are there simpler and yet feasible types of compartments?

One of the easiest ways to create simple compartments is by throwing fatty acids into water. As outlined in section 5.2, fatty acids are amphiphiles with a simpler molecular structure than phospholipids, but are also able to form micelles and vesicles similar to those shown in figure 5.3. Micelles are small spherules, while vesicles are larger bubbles. Such structures “self-assemble,” that is, they form spontaneously; hence, they are stable yet dynamic, and easily rebuild when shaken apart. Free fatty acids may be incorporated in the structure, causing it to grow. As it grows, its shape may become irregular, and it may be divided by mechanical forces.

Vesicles have a hollow interior which is also filled with water, making them a suitable container to hold the other subsystems. Vesicles made of fatty acids have also been shown to be permeable to small molecules such as nucleotides, amino acids, and small food molecules [839]. At the same time, bigger molecules such as biopolymers tend to remain inside the vesicle.

Since such vesicles are able to self-maintain, grow, and divide, can we call them alive, even if empty or just full of water inside? The answer is no, because they are not able to carry genetic information in a reliable way, and to transmit this information to the next generation. There is no “program” controlling the maintenance of the cell. The growth and division processes are erratic, subject to external factors, and prone to disruptions that can easily destroy the compartments. However, this answer is not so straightforward as it may seem. It is also possible to build roughly spherical aggregates out of different types of lipids [761]. These aggregates can also grow and divide, and their composition (“compositional genome” or “composome”) can be regarded as information that is transmitted during division, albeit less accurately and with a lower carrying capacity than what can be achieved with linear polymer sequences. The idea of such aggregate-based information carriers gave rise to the Lipid World hypothesis for the origin of life, further discussed in Section 6.2.4.

Since it is very easy and cheap to obtain compartments by self-assembly using amphiphiles, this technique is being used to explore alternative designs for artificial protocells [691], see chapter 19.

## 6.1.2 Genetic Information

The information subsystem is at the core of all forms of life. Organisms use this information to maintain themselves, and they must transmit it from one generation to the next. Offspring resemble their parents. The classical view of information subsystems is anchored upon life on Earth: DNA and RNA molecules carry the genetic information needed to construct the organism, and this information is encoded in their polymer sequence.

If we wished to “design” the simplest possible cell, what kind of information system would we use? Certainly not the complex machineries of today’s cells. We could then ask ourselves whether we would use only DNAs, only RNAs, or perhaps even only proteins. Researchers have debated (p.113) this topic extensively over the years, and have reached a rough consensus that proteins by themselves would not make a good information storage substrate, since they are unable to replicate in general, and their alphabet of about 20 digits is overly complicated. Similarly, DNAs are not good candidates either, because they form too stable structures that need several enzymes to be replicated. This leaves us with RNA molecules: RNA molecules form less rigid structures that can be copied by template replication. Furthermore, some RNAs have shown to have catalytic properties (“ribozymes”) [458, 657, 844]. Hence, if we are looking for a simple way to carry information within a cell, RNA seems to be a good candidate to start with. The idea would be to construct a system made only of RNAs, where some RNAs would store information, others would help to replicate them, and others would help with the various other processes necessary to maintain the cell. This idea gave rise to the RNA World hypothesis for the origin of life [327].

However, storing and transmitting information in RNAs is not as simple as it might appear. The polymerization of RNA strands in water is unfavored; more problematically, side reactions and errors in the replication process hinder the accuracy of information transmission. In modern cells, the replication of genetic material counts on sophisticated error correction processes aided by specialized enzymes. In the absence of such enzymes, replication is unfaithful and tends to accumulate too many errors. This compromises the fidelity of the information transfer and generates offspring that is unable to survive, ultimately leading to the mass extinction of the species. In the 1970s, Eigen [249] showed that there is a threshold in the amount of replication errors, above which the species cannot survive. He also showed that this error threshold imposes a maximum sequence length beyond which the error threshold is crossed, leading to an “error catastrophe” (mass extinction). Therefore, without the aid of error-correcting enzymes, only small sequences could be replicated with sufficient accuracy. But longer sequences are needed to form enzymes, and if the replication process relies on these enzymes, then we are left with a chicken-and-egg problem known as the “Eigen paradox.” Small molecules could survive by nonenzymatic replication, but the evolution of more sophisticated functions would require such primitive replicators to elongate while crossing the error threshold. How to achieve this in a hypothetical RNA World and in synthetic protocells are subjects of current research, further discussed in Sections 6.2.2 and 19.2.

## 6.1.3 Metabolism

As explained earlier, all life forms require some source of energy for their survival. A living system can be seen as an open dynamic system kept out of equilibrium by a continuous inflow of energy and outflow of waste. The metabolic processes in today’s organisms take care of harvesting the necessary energy, storing and transforming it along several metabolic pathways such that it can be retrieved when needed and used for the various activities of the cell. Without a metabolism, organisms would have to rely entirely on the prompt delivery of all sorts of needed raw materials from the outside, and would quickly die as soon as these raw materials were depleted. Moreover, the complexification of life forms would be hindered by their inability to manufacture increasingly more sophisticated molecules.

Most of the metabolic reactions in today’s organisms rely on specialized enzymes that accelerate their rates to acceptable levels. Moreover, it is not sufficient to have a serial reaction pathway, made of a chain of reactions occurring one behind the other, transforming nutrients to useful components for the cell. Such a serial construction would easily break down due to a missing intermediate and would be seriously limited by the rate of the slowest reaction. In order to construct robust metabolic networks, one needs redundant metabolic pathways (such that (p.114)

Figure 6.1 Schematic representation of the formose reaction, a possible base for a minimalistic metabolism. Left: organic compounds reacting in an autocatalytic cycle that takes two molecules of formaldehyde (CH2O) and produces two molecules of glycolaldehyde (C2H4O2), that can then be used to produce other organic compounds such as sugars and lipid components. Note that the component C4H8O4 of the cycle is already a simple sugar (tetrose). Right: abstract representation of the cycle on the left, with Ci labels representing the number i of carbons in the actual molecule.

at least one of them remains functional when others are knocked out), cyclic reaction networks (able to regenerate and reuse components), and more especially, autocatalytic reaction networks (able to replicate their constituents).

What would a minimalistic metabolism look like? Researchers in biology and artificial life have considered this question for a long time. One possibility might be something like the formose reaction (figure 6.1). The formose reaction is actually not a single reaction but an autocatalytic cycle of chemical reactions that is able to produce sugars such as tetrose and ribose (a nucleotide component), as well as glycerol (a lipid component, see section 5.2.5) from formaldehyde, a common organic compound in the universe. It does all this without the need for biological enzymes or ATP input. For these reasons the formose reaction is a good candidate constituent of primitive metabolic processes in early life forms, before the appearance of specialized enzymes and energy carriers such as ATP.

The next question to ask is, how could metabolisms emerge and evolve? This question is a difficult one with no clear answer so far, and may be asked in various contexts: the first one is within origin of life hypothesis, and will be addressed in section 6.2.3 when looking at “metabolism first” theories for the origin of life. Another context is the synthesis of artificial metabolisms in the laboratory for various purposes, and will be discussed in chapter 19. Finally, it is interesting to look at abstract models of metabolic processes, expressed as artificial chemistries. These will be covered in section 6.3.1.

## 6.1.4 The Chemoton

In order to compose a living cell using the three subsystems described earlier, a useful exercise is to imagine an abstract chemistry able to give rise to entities with lifelike properties and to support them afterward. Someone had already thought about this before: Back in the 1950s, Tibor Gánti proposed the Chemoton, one of the first abstract models of a living cell [305, 306]. It was conceived around 1952 and published in Hungary in 1971, but became known in the Western world only much later. Since that time, the progress in molecular and cell biology has been enormous. Synthetic biology has emerged, and numerous alternative designs for synthetic protocells (p.115)

Figure 6.2 The Chemoton by Tibor Gánti [305, 306], an early abstract protocell model: a membrane consisting of molecules of type T encloses the cell’s genetic and metabolic subsystems. The genetic material is made of single-stranded (pVi) and double-stranded (pVi Vj) polymers. Molecules Ai run the metabolic subsystem that autocatalytically produces template and membrane molecules from nutrients X, expelling waste Y in the process.

have been proposed (see [692, 787] for an overview). Yet, the basic concepts behind the original Chemoton model remain valid and are very instructive to understand the principles behind the organization of life in a simple and abstract way, without resorting to all the complex details of real chemistry.

The Chemoton is shown in figure 6.2. One can distinguish a surrounding membrane made of m molecules of type T. Nutrients are taken up from outside in the form of X molecules, that enter an autocatalytic cycle representing the metabolism of the cell.

The metabolic subsystem uses the X molecules to run a cyclic reaction network that transforms molecule Ai into molecule Ai+1 (modulo the size of the cycle, which is 5 in the model). Each turn of the metabolic cycle produces raw materials for the other subsystems: precursors of T molecules for the membrane, and V molecules for the genetic subsystem. It also generates waste molecules Y that are expelled out of the cell. The last step of the metabolic cycle splits one molecule of A5 into two molecules of A1, therefore the cycle is autocatalytic. This cycle is a conceptual simplification of the formose reaction (figure 6.1), already mentioned as a candidate component of a primitive metabolism.

(p.116) The genetic material takes the form of pVn molecules, where p indicates that the molecule is a polymer made of n monomers of type V. The genetic information is encoded in the pVn molecules, and is copied by a process called template replication. Template replication is a simplified form of what occurs today in DNA replication. First of all, a single-stranded polymer pVn serves as a template for the binding of free monomers to the monomers already in the chain, forming a double-stranded ladder pVnVn. The double strand then splits into two single strands that can again serve as templates, and the cycle repeats, duplicating the number of template molecules with each turn. It is therefore another example of an autocatalytic cycle.

In the Chemoton, the template replication cycle also generates R molecules that participate in membrane production. These R molecules react with the precursors T″ produced by the metabolic cycle to produce the actual T molecules to be incorporated in the membrane. In this way, the speed of membrane formation is directly coupled to the speed of replication of the genetic material, and to the metabolic rate with which the “A-wheel” turns.

Starting with a given amount of materials inside the cell, when the genetic material duplicates, the membrane surface doubles. At the same time, the overall amount of substances within the cell also doubles, so the volume of the cell doubles as a consequence, but this is insufficient to compensate for the surface doubling. It follows that the cell can no longer maintain its spheric shape due to osmotic pressure, and it ends up dividing into two daughter cells, each with roughly half of the material of the mother cell. Each daughter cell then goes back to the initial state, taking up X molecules from the environment to produce more membrane and template constituents, again reaching an osmotic tension that ends up splitting the cell in two, and so on. The chemoton is “alive” in the sense that it maintains its three subsystems, and is able to reproduce.

Gánti’s Chemoton model offers a computational vision of the cell. The word Chemoton stems from “chemical automaton.” Gánti refers to his Chemoton as a “program-directed, self-reproducing fluid automaton,” in contrast to von Neumann’s self-replicating machines [893]. The program in the Chemoton is the set of autocatalytic reactions that drive the survival and replication of the cell. It is fluid, as opposed to solid electric or mechanical devices. Its fluid nature makes it more flexible and easy to replicate, since new cells may easily move to other areas in fluid space, something that is rather difficult to achieve when trying to copy solid parts. Gánti’s vision of the minimum cell is also that of a machine doing useful work: The cell is organized as a set of interconnected chemical cycles, each of which performs useful work by producing molecules that will be used in other parts of the cell.

The Chemoton was ahead of its time, and it is a very elegant model of a simple cell. It has been subject to extensive studies [204, 271] and has been recently shown to exhibit complex dynamics [604], being to some extent robust to noise [880, 949] and environmental changes such as food shortage [273], and able to differentiate into species [159]. However, the Chemoton is sometimes too simplistic and unrealistic. For instance, Gánti acknowledges that at least four different types of template monomers are needed to constitute a functional information system (akin to the four nucleobases in DNA and RNA). These different types are not explicitly modeled in the Chemoton, where all template monomers are just represented as V molecules. Actually, the genes in the Chemoton do not play such a crucial role as the genes in modern cells. The Chemoton relies on the synchronized growth and replication of its internal materials and membrane surface, and this synchronization is achieved via stoichiometric coupling between the different subsystems. This makes the Chemoton fragile in the face of changes in the internal concentration of its compounds.

The Chemoton was designed by a scientist using a thorough knowledge of chemical reaction networks and their mathematical modeling. In contrast, how would a minimum cell emerge (p.117) spontaneously in some primordial ocean on Earth or elsewhere? Would it look like the Chemoton, or would it be based on an entirely different set of reactions? Would there be a variety of feasible primitive cells, all with potentially different internal architecture, and perhaps competing for resources? How could it be set to evolve to more complex life forms? The Chemoton does not provide direct answers to these questions. For this we must turn to two interrelated and complementary exploration fields. The first of these is an investigation into the possible origin of life on Earth, looking at our past in search for these answers. The second field is the exploration of alternative emergent protocell designs from the bottom up [691], either by computer simulations using artificial chemistries, or by attempting to obtain synthetic cells in vitro, thus seeking these answers by looking toward the future. These different research directions will be surveyed in the remainder of this chapter and in chapter 19.

## 6.1.5 Autopoiesis

Another early conceptual model of living systems was put together by Varela, Maturana, and Uribe in the 1970s [557, 881]. It is based on the notion of autopoiesis, which means “self-production.” In contrast to others, Varela and coworkers emphasize the self-maintenance aspect of living systems, rather than their reproductive capabilities. A system can be alive if it is able to maintain itself autonomously, that is, if it is autopoietic. It could remain alive forever, without ever reproducing or dying. Neither is evolution by natural selection an essential property of minimal life forms.

According to [881] an autopoietic unit must satisfy six conditions. It must have identifiable boundaries (1) with inner constitutive elements or components (2); the interactions and transformations of these components form a mechanistic system (3); the unit’s boundaries are formed through preferential interactions (4) between the elements forming these boundaries; all the components of the unit, including its boundary components (5) and inner components (6) are produced by interactions between the unit’s own components.

A computer simulation of an autopoietic system satisfying these conditions was then designed in [881]. It consists of a two-dimensional grid where particles of three types (substrates, catalysts, and links) move and interact. Link particles are membrane constituents that can bond to each other. A catalyst molecule converts two substrate molecules into a link molecule. As they bond, links may form closed membranes that are nevertheless permeable to substrate molecules. Link molecules also decay spontaneously. Using this simple model, the spontaneous formation of an autopoietic unit was demonstrated in [881], as depicted in figure 6.3. In the same paper, the authors also show that the boundary can be regenerated after the decay of some of its constituent molecules, illustrating the self-maintaining features of the autopoietic system.

Following that pioneering work, Maturana and Varela later published a book on the topic [559], and subsequently attracted the attention of numerous researchers who tried to reproduce their results and improve upon their computer experiments [136, 156, 282, 421, 437, 560, 569, 637, 639–641, 729, 730]. These ideas were also brought into synthetic biology for the creation of minimal cells in the laboratory [531, 533–536, 801].

## 6.1.6 Robert Rosen’s ideas

Robert Rosen was one of the pioneers of the now widespread opinion that biology must be approached from a complex systems perspective, and that reductionism is not sufficient to understand biological phenomena. Rosen argued that biological organisms can be distinguished (p.118)

Figure 6.3 Computer simulation of autopoiesis: Initially (t = 0) only one catalyst molecule (*) is present, surrounded by substrate molecules (O). Link molecules [O] are formed in a catalytic reaction *+ 2O → *+[O]. This can be seen at t = 2. Link molecules tend to bond to each other, eventually leading to the formation of a closed boundary resembling a cell membrane (visible at t = 6).

from simple physical systems by their “organization,” a result of the complex interactions between the organism’s several parts, which cannot be reduced to the parts alone: When we break the system apart in order to study it, we destroy its organization and therefore cannot see how it functions [725]. This basic property of organization is what in systems theory, and in models of complex systems today in particular, is referred to with the saying “the whole is more than the sum of the parts.” From this perspective, Rosen can be considered as one of the founders of systems biology.

Rosen’s concept of organization is linked to self-organization and thermodynamics: “a system is organized if it autonomously tends to an organized state” (p. 115 of [725]). We know from the second law of thermodynamics that a closed system tends to a state of thermodynamic equilibrium, associated with a state of maximum entropy or maximum disorder. Therefore, in order to be organized the system must be open and thus out of equilibrium. One can measure the degree of organization of a system by measuring its distance from equilibrium, or equivalently, by measuring the improbability of its state.

Rosen also claimed that an organization is independent from its material support. Therefore, a complex living system could be studied in an abstract way through the modeling of its organizational properties. He then set out to explain how such living organizations could work. For this purpose, he proposed an abstract mathematical model of a living organism called metabolism-repair (M,R) system [720, 723, 724]. An (M,R)-system is a formalism intended to capture the minimal functionality of a living organism, without details of its biochemical implementation. In an (M,R)-system, metabolism (M) and repair (R) subsystems interact to keep the organism alive (see figure 6.4). Rosen’s metabolic subsystem is a mathematical abstraction of biological anabolic and catabolic functions. The repair subsystem is an abstraction for a genetic subsystem, containing the information necessary to construct the (M,R)-system, regenerating and replicating it when needed. (M,R)-systems contain no membrane abstraction, consistent with Rosen’s goal of a very (p.119)

Figure 6.4 Rosen’s (M,R) systems are closed to efficient causation. The notion of these systems has developed over the years. A, B, f, Φ‎ are components (material and functional) of the system. A is transformed into B with the help of f; B is transformed into f with the help of Φ‎; f is transformed into Φ‎ with the help of B. Left: Rosen’s original unsymmetrical conceptualization [725]. Middle: Symmetrical depiction as proposed by Cottam et al [201]. Right: More conventional use of reaction and substrate notation as proposed in [343]. Line arrows symbolize material cause, broken line arrows efficient cause.

minimalistic formalism. Self-replication follows as a consequence of requiring the avoidance of an infinite regress of repair systems.

After an initial graph-theoretical formulation of (M,R)-systems in [720], a more powerful formulation based on category theory was proposed in [723, 724]. About a decade ago, a relation between (M,R)-systems and autopoietic system was discovered: Both theoretical frameworks abstract away from particular features of their parts/components, and emphasize the circular character of causation in these conceptual models of living systems [509]. The authors conclude that autopoietic systems are a subset of (M,R)-systems. Certain features of (M,R)-systems, one being the controversial noncomputability feature proved by Rosen for (M,R)-systems would therefore carry over to autopoietic systems and render them noncomputable as well [180, 725].

# 6.2 Origin of Life

For many centuries, people believed in the theory of spontaneous generation: Life (or at least microbes) could arise spontaneously from a mix of nutrients. Rudolph Virchow proposed in 1855 that life could only come from life, what he expressed as “omni cellula ex cellula.” This idea was corroborated by Louis Pasteur, who in 1862 proved beyond any doubt that earlier experiments that seemed to have proven otherwise were flawed due to inadvertent contamination. But if this is so, how did the first life form appear? It was then postulated that if life once arose out of inanimate matter, this must have happened a very long time ago, under conditions that were totally different from the ones found nowadays.

There are various hypotheses about how life might have originated on Earth, based on the early chemical composition and atmospheric conditions of the prebiotic Earth. There are hypotheses focusing on each of the elementary subsystems as the one that could have emerged first: compartment, metabolism or information first. And there are hypotheses combining subsystems. (p.120) Probably we will never know for sure how exactly life appeared on Earth,1 but the various theories help us not only to shed light on the possible past history of our planet, but also to learn about the physicochemical processes responsible for producing life from inanimate matter. Such fundamental knowledge can then be applied to various areas of science and technology, and ultimately, it can be used to seek life in other planets.

In this section we give a brief overview of the existing theories on the origin of life, with special attention to those aspects of these theories that have been investigated with the help of artificial chemistries. For more comprehensive recent reviews on the topic, see [48, 140, 449, 534, 710, 763].

## 6.2.1 The Prebiotic World

About four billion years ago, the Earth was cooling down, and water vapor in the atmosphere condensed to form the oceans. It is believed that some basic organic building blocks of life formed “shortly” after that, around 3.8 billion years ago.

In 1952, through a series of famous experiments, Miller and Urey [587, 588] showed that various organic compounds could be formed under conditions similar to those assumed to have occurred in the early Earth. Based upon previous theoretical works by Oparin and Haldane [299, 644], they assumed that the primordial atmosphere of the Earth was composed of hydrogen (H2), ammonia (NH3), methane (CH4), and water vapor (H2O). In their experiments, these components were mixed in a flask, and an electric discharge was applied to them. As a result, amino acids such as glycine and alanine, and a number of other organic compounds were formed.

Subsequently, other scientists showed how to synthesize other building blocks of life, including adenine, guanine, and ribose. Today, plausible chemical pathways have been discovered for synthesizing most building blocks of life from inorganic compounds (see [184] for an overview). However, there are still gaps in this knowledge. For instance, sometimes the yield of a given compound is too low, or too many undesirable side-products are formed. Worse, the conditions for the formation of some building blocks sometimes differ from those of the formation of others, questioning how they could have been put together into early life forms. Even the composition of the prebiotic oceans and atmosphere is still subject to debate [950]. Some claim that hydrogen would have easily escaped to outerspace, and that methane and ammonia would have been quickly destroyed. Others sustain that the early atmosphere contained significant amounts of volcanic gases such as carbon dioxide (CO2) and nitrogen (N2), conditions that do not favor the formation of organic compounds. These difficulties make some believe that a large portion of the organic material needed to bootstrap life arrived on earth from outerspace via comets and meteorites (the panspermia theory). Others suggest that the basic building blocks started underwater, near hydrothermal vents (volcanic fissures that release hot water).

Whatever their origin, these organic compounds would have ended up in the oceans and ponds of the primitive Earth, forming a prebiotic “soup,” that would later lead to the formation of the first cells. Once the basic building blocks were there, floating in the primordial soup, how did they combine to form the first primitive organisms? Several theories have been proposed to explain the emergence of life, each with their strong and weak points, and no overall consensus has been reached so far.

One of the most famous theories is the RNA World hypothesis, according to which primitive nucleotides polymerized into RNA molecules or some variant thereof. Some of these polymers (p.121) would present catalytic properties (ribozymes). Eventually some ribozymes became able to copy other ribozymes, and get copied too, giving rise to the first replicators.

Another theory is the Lipid World hypothesis, according to which lipids, or even simply fatty acids, self-assembled into compact aggregates, or into hollow vesicles, that have the natural ability to growth and divide [760].

A third theory is the metabolism-first hypothesis. It holds that life was bootstrapped from a web of chemical reactions forming a primitive metabolism able to use energy to maintain itself. Such web of reactions would typically contain autocatalytic cycles such as the formose reaction, which could later be accelerated by primitive catalysts. The energy flow within the organism would be ensured by coupled redox reactions.

These theories are often combined to produce hybrid explanations for the origin of life: For instance, a primitive metabolism could have supplied building blocks for the first self-replicating information molecules. Szostak [839] proposes a protocell made of a primitive replicator enclosed in a membrane, without an explicit metabolic network. Shapiro [769] highlights a boundary as the first requirement in a metabolism-first scenario; the boundary is needed to allow the organism to increase its internal order while the external entropy increases.

## 6.2.2 The RNA World

Cells rely on genetic information contained in the DNA to carry out their vital functions. Today, this information is transcribed from DNA to RNA, and then translated from RNA to proteins. At some point in time, a first molecule or group of molecules able to make crude copies of themselves must have emerged in the primitive Earth. DNA is too stable and its replication requires a complex mechanism guided by enzymes. Therefore, various theories on the origin of life generally assume that more primitive molecules played the role of information carriers in early life forms. The RNA World hypothesis says that RNA was such a molecule: A primordial mechanism for RNA replication could have been based on ribozymes, RNA molecules able to act as enzymes, cleaving or ligating other RNA molecules. Such built-in catalytic ability is a crucial first step to explain how RNAs could have replicated by themselves, without the existence of protein-based enzymes that fulfill this role in modern cells.

According to the RNA World hypothesis [327, 433], RNA molecules able to self-replicate originated in the prebiotic Earth and progressively grew in number. Random errors in the replication process introduced occasional mutations, and natural selection favored groups of sequences able to replicate more efficiently, giving rise to evolution. This theory puts information transfer through RNA sequences at the center of the process that gave birth to life. Moreover, it highlights the dual role of RNA as blueprint and construction worker for building life.

Figure 6.5 depicts the emergence of RNAs and ribozymes in a hypothetical RNA World. Once the first nucleotides became available, mineral surfaces such as montmorillonite clays [137, 645] could have catalyzed the formation of short RNA oligomers with random nucleotide sequence (figure 6.5). Some of these oligomers could have folded into secondary structures that conferred them a catalytic ability, giving rise to the first ribozymes: some ribozymes would be able to ligate two RNA strands together (ligases), others to cleave a strand in two pieces, yet others to cleave portions of themselves (figure 6.5). The ligase activity is especially important in order to create longer polymers able to perform more complex catalytic functions [137]. Some ligase ribozymes would ligate segments that are base-pair complements of themselves, giving rise to replicase ribozymes [657], able to produce copies of themselves by template replication (figure 6.5). Different variants of such self-replicators would compete for resources, and selection would (p.122)

Figure 6.5 A schematic picture of the RNA World hypothesis: (A) formation of short RNA oligomers with random nucleotide sequence, possibly catalyzed by minerals such as clays. (B) Emergence of ribozymes (catalytic RNAs). (C) Emergence of self-replicating (replicase) ribozymes. (D) Primordial replicators enclosed in self-assembling vesicles become the first protocells.

(p.123) favor faster and more accurate replicators, leading to the first evolutionary steps. More complex cross-catalytic replicators and autocatalytic networks of ribozymes could form [458, 878]. These primordial replicators could have become trapped inside lipid vesicles that would offer them protection, forming the first protocells (figure 6.5). A more detailed overview of the RNA World hypothesis can be found in [137].

Figure 6.6 A model of the minimal self-replicator demonstrated by Von Kiedrowski [890]: The two T molecules are complementary templates (hexamers in the experiments). Each of them base-pairs with units A and B (trimers in the experiments), to form an intermediate molecule M which is almost a double strand: the still missing link between A and B is then catalyzed by T itself, forming the double strand D. D then splits into two single strands T and the cycle repeats.

A crucial point in this theory is to explain exactly how the first RNA replicators could have arisen. There are two possibilities: One is a self-replicating RNA molecule based on template replication; another is an autocatalytic network of ribozymes. Template replication is hard to achieve without the help of enzymes. Von Kiedrowski [890] reported the first nonenzymatic template replication of a nucleotide sequence in laboratory. In his experiment, a sequence of six nucleotides (the template) catalyzes the ligation of two trimers together, each trimer being paired with half of the template sequence by complementarity, as depicted in figure 6.6. This apparently simple experiment was actually a landmark in the field, as it demonstrated that template replication was possible without enzymes. Subsequently, several other theoretical studies and experiments on template replication were reported.

The experiments in [890] were based on deoxynucleotides. Concerning RNAs, a self-replicating ligase ribozyme was demonstrated in [657]. It was obtained experimentally using in vitro evolution, and is depicted in figure 6.7: RNA strand T is made of two segments A and B. A and B are designed such that they bind by complementary base-pairing to consecutive portions of T, forming the complex ABT. T acts as a ligase ribozyme to help forming a covalent bond (p.124) between A and B, creating a new copy of itself. The two T strands then separate (for instance, by raising the temperature), and each of them is ready to ligate another A with another B, repeating the cycle. This ribozyme was further extended in [458] to provide cross-catalytic function in a system of two templates that catalyze each other’s ligation from smaller segments.

Figure 6.7 The self-replicating ligase ribozyme from [657]: RNA strand T is a ligase ribozyme that catalyzes its own formation from shorter RNA segments A and B.

Once such a simple replicator appeared, how could it have evolved to store more information and to perform more complex functions? Actually, this problem remains largely unsolved; more information requires longer strands, but strand elongation is equivalent to a mutation, or error, in the copy process, so must be kept at a minimum. Fernando and others [275] showed that elongation comes along with a potential “elongation problem” that can lead to an “elongation catastrophe” (analogous to Eigen’s “error threshold” leading to an “error catastrophe” [249]). During template replication, imperfect duplexes often form, leading to an elongated replica. The elongation problem is that, the longer the double strand is, the more stable it gets, so it becomes more difficult to split it into single strands able to enter a new replication cycle. The replication of such longer strands can be achieved with the help of enzymes, but these were not available then. Over time, the accumulation of such elongated replicas leads to an “elongation catastrophe,” where information is not faithfully transmitted, and the longer strands are no longer able to separate, leading to a collapse of the replication process both due to too many errors and due to a halting copy speed. Recent work [888] emulates template replication in hardware using a group of robots, and points out that the evolvability of self-replicators based on template replication must lie in between the rigidity of fixed-length replicators and the excess freedom leading to the elongation catastrophe.

Instead of template replication, another possibility for RNA replication would be that of an autocatalytic network of reactions, catalyzed by ribozymes [878]. According to this view, one ribozyme would act on some neighbor ribozyme, which would act on another one, until an autocatalytic (p.125) cycle would eventually form, leading to the replication of the molecules in the cycle. Since one ribozyme could act as a catalyst to several others, and in turn, receive catalysis from various others, a graph of catalytic interactions would form. Such graphs are similar to the autocatalytic sets of proteins by Kauffman [446] (see section 6.3.1), with RNAs replacing proteins. In short, the main objection against the autocatalytic set view is the information loss due to leakings and errors caused by side reactions and stochastic noise. Eigen introduced the hypercycle [249, 252] (see also chapter 14) as a candidate structure to overcome the error threshold. Boerlijst later showed that hypercycles could resist the invasion of parasitic reactions when placed on a surface [125]. However, it was later discovered that hypercycles in space are also vulnerable to parasites [750]. Besides parasite invasion, other problems such as collapse due to member extinction and chaotic dynamics could make it difficult for hypercycles to survive in general. On the other hand, related dynamic phenomena could also help hypercycles to fight parasites and evolve, under some conditions. The dynamics of hypercycles is complex and still a subject of research [742, 743].

More generally, the question is still open regarding whether and how self-replication of single molecules, or collective replication via autocatalytic networks could sustain themselves sufficiently well to constitute a reliable information system for the first life forms.

As a further obstacle to an RNA World, even the feasibility of spontaneous RNA formation under prebiotic conditions is questioned [645]. Recently, the synthesis of RNA bases under prebiotic conditions was achieved [673], especially the pyrimidines (cytosine and uracil), which had been difficult to obtain so far. Such findings help to counteract some objections to the RNA World theory, but other issues still remain. Many say that some other form of primitive molecule is likely to have existed before RNAs [645]; however, evidence of such molecule has yet to be found.

Szostak [839] points to other types of nucleotides that are able to polymerize spontaneously, forming new templates for replication. New monomers attach to the single-stranded template, forming a double-strand polymer that can unzip into two single strands under higher temperature. Under lower temperature, these single strands could catch new monomers, forming double strands again, and so on, repeating the template replication cycle. According to Szostak, these alternative nucleotides could have been present in the prebiotic world and could have been at the origin of early replicators, which were later replaced by the ones we encounter today. Later, these RNA-like molecules could have become trapped within self-assembling lipid compartments, giving rise to primitive protocells without an explicit metabolism.

## 6.2.3 Metabolism-First Theories

The complications and difficulties around the RNA World theory left several researchers skeptical about it, and led to the formulation of alternative hypotheses for the origin of life. One of these hypothesis is that a set of chemicals reacting together could form a collectively autocatalytic entity able to metabolize energy and materials from the environment in order to replicate itself, leading to the first primitive life forms. Kauffman’s autocatalytic sets of proteins [446] and the autocatalytic metabolisms by Bagley and Farmer [49] both offered enticing theoretical ideas for the origin of metabolisms, but both were based on abstract mathematical and computational modeling, and lacked a real chemistry support.

In 1988, Wächtershäuser [895] surprised the research community with a new theory for the origin of life, comprising a concrete chemical scenario for the emergence of autocatalytic metabolisms. He named his theory “surface metabolism,” and it later became also known as the (p.126) “Iron-Sulfur World” [898]. According to his theory, the precursors of life were self-maintaining autocatalytic metabolisms running on the surface of pyrite (FeS2). Pyrite is a mineral crystal that could form abundantly in deep sea hot springs, from redox reactions between ferrous ions (Fe2+) and hydrogen sulfide (H2S). Pyrite surfaces have a slight positive charge, which would attract negatively-charged molecules that would attach to this surface by ionic bonding. Ionic bonding is strong enough to keep the molecules on the surface with a small propensity for detaching, while weak enough to allow molecules to move slowly on the surface, enabling sufficiently diverse interaction possibilities. Autocatalytic cycles could then form as a result of these interactions, leading to the first autocatalytic metabolisms able to feed on inorganic nutrients from the environment, (such as CO, CO2, HCN, and other gases that could emanate from volcanic exhalations), and to grow by colonizing nearby vacant mineral surfaces. Molecules able to bind more strongly to the pyrite surface would have a selective advantage. Occasionally, a different, novel molecule could form. Some of these novelties could lead to new autocatalytic cycles, which could be regarded as a form of mutation. A crude evolutionary process would then arise [895, 896], selecting for longer molecules able to bind to the surface more strongly and on several points. Such a mineral surface would facilitate the synthesis of larger organic molecules from smaller ones, and could then lead to the progressive formation of sugars, lipids, amino acids, peptides, and later of RNA, DNA, and proteins. Wächtershäuser also sketches possible chemical pathways that could explain the origin of lipid membranes, of RNA and DNA replication, of RNA and protein folding, of the genetic code and its translation mechanism, and of full cellular structures able to emancipate themselves from the pyrite surface and to conquer the vast oceans.

Since then, experimental evidence for Wächtershäuser’s theories is being gathered [409, 897–899]. For instance, the synthesis of some amino acids and related compounds has been achieved from volcanic gases as carbon sources and iron-group metals as catalysts [899], at high temperatures in a volcanic hydrothermal setting that is considered a plausible prebiotic scenario. However, much remains to be done, from the synthesis of more complex compounds up to the demonstration of fully self-sustaining autocatalytic metabolisms and their evolution.

Other authors have also advocated the importance of being able to harvest energy first [216, 616, 769], prior to processing information as required by an RNA World. For instance, ATP and other nucleotides such as GTP, CTP, and UTP could have played an early role in energy transfer (a role that ATP still keeps today) before becoming building blocks for more complex information-carrying molecules such as RNA [216].

## 6.2.4 The Lipid World

One of the hypothesis for the origin of life is that of a “compartment first” world where early amphiphile molecules such as lipids or fatty acids would have spontaneously self-assembled into liposomes, which are simple vesicles akin to soap bubbles. The plain laws of physics would have caused these primitive compartments to grow and divide. Growth could happen naturally due to the incorporation of free amphiphiles into the structure. Division could be triggered by mechanical forces, such as marine currents, osmotic pressure, and surface tension caused by a modification of the volume to surface relation as the membrane grows. These “empty shell” membranes would have constituted the first self-replicators in a “lipid world” scenario for the origin of life, illustrated in figure 6.8.

The Lipid World hypothesis [534, 760, 762, 763] states that liposomes could have given rise to the first cell-like structures subject to a rudimentary form of evolution. It is very plausible indeed (p.127)

Figure 6.8 Lipid world scenario: (a) floating amphiphiles of various types (indicated by the different colors of their hydrophilic heads) self-assemble into lipidic aggregates (b), that then grow and divide (c)

that these aggregates could have formed in the primitive Earth. This is the main argument in favor of a lipid world: Liposomes form spontaneously in water, whereas the spontaneous emergence of RNA or similar biopolymers is very unlikely, let alone their maintenance in a hostile environment subject to numerous side reactions.

The first amphiphiles could have come from two sources. They could have formed in prebiotic synthesis reactions using energy from light; they could also have been delivered by the numerous meteorites and comets bombarding the primordial Earth.

Liposomes could be composed of several types of amphiphiles with different properties. Their multiset composition can be regarded as a kind of rudimentary genome, called “compositional genome” [761, 762]. A crude, still error-prone form of inheritance could then be provided by the rough preservation of the composition structure of the aggregate during division. Such inheritance is illustrated in figure 6.9.

Some of these liposomes could possess catalytic properties, enhancing the rate of reactions that produce precursors for the assembly, or facilitating the incorporation of additional molecules to the assembly, hereby accelerating its growth. Such catalytic liposomes have been named “lipozymes,” in analogy with RNA ribozymes. Since their catalytic properties could facilitate their own growth, such aggregates could become autocatalytic lipozymes, able to self-replicate collectively by a growth and division process. Lipozymes with more efficient autocatalytic properties would grow faster than the others, leading to selection for efficient ensemble autocatalysis. Combined to the rough propagation of compositional information during division, a primitive evolutionary process would arise, in which the units of evolution would be ensemble replicators.

Later, lipozymes could have trapped free-floating RNA molecules, significantly increasing their ability to carry and transmit information to the next generation. A protocell with information-compartment subsystems could then form. Note that in such a protocell structure, the compartment subsystem also plays a metabolic role, since it catalyzes the growth of the membrane.

The evolution of liposome compositions has been investigated computationally using the GARD model (graded autocatalysis replication domain) [764, 765]. Figure 6.11 depicts the basic (p.128)

Figure 6.9 Various degrees of compositional inheritance in a Lipid World scenario: (a) Only homogeneous ensembles are present (formed by the same type of lipid), hence no information is carried. (b) When ensembles are made of too many different types of lipids, their fission results in random aggregates; in this case, no useful information is carried either. Things become more interesting in (c): when ensembles are made of a few types of lipids, in a way that they attract more lipids of their type to the ensemble, fission results in “daughter” ensembles that are similar to their parents (in the picture, a yellowish ensemble gives rise to yellowish offspring, and a blue ensemble to blue offspring); these two types of ensembles could grow at different rates, giving rise to competition and selection that could lead to a primordial form of evolution.

Figure 6.10 Growth and fission of liposomes: Growth (a) happens by incorporation of lipids into the aggregate; (b) fission simply splits the aggregate into two parts, that then start to grow again, repeating the cycle.

(p.129)

Figure 6.11 GARD mutual catalysis scenario: each lipid of type i joins the lipid aggregate at a basal rate ki, and leaves it at rate ki. This join/leave process can be accelerated (catalyzed) by other lipids of type j already present in the aggregate. The amount of catalysis that a lipid of type j offers to a lipid of type i is βji.

GARD model: at any time, lipids join and leave the compositional assemblies spontaneously, but this happens at a slow rate. This rate can be accelerated with the help of other lipids that are already in the assembly; these lipids then act as catalysts for the incorporation and release of lipids from the aggregate. This model can be expressed by the following reaction:

(6.1)
$Display mathematics$

Reaction 6.1 expresses the GARD mutual catalysis model of figure 6.11 in chemical reaction format: lipid Li enters aggregate CLj (representing a lipid ensemble where at least one lipid molecule of type Lj is present) at rate ki and leaves it at rate ki. The incorporation of lipids into the aggregate occurs in a noncovalent fashion. This reaction is facilitated by the presence of catalyst Lj, which accelerates the reaction by a factor βji in both directions. Since catalysis accelerates both forward and reverse reactions (here, respectively, join and leave the assembly), there is no change in the equilibrium state of the system with and without catalysts. Therefore the system must be kept out of equilibrium by an inflow and outflow of lipids. Moreover, the rates βji must be carefully chosen following the receptor affinity distribution model [490, 726], such that the probability of a lipid catalyzing a given reaction is maximal between similar but not identical lipids.

Simulations of the GARD model have been performed using Gillespie SSA [764, 765]. They have shown that when autocatalysis is favored, the faster autocatalyst ends up dominating, as expected in a system consisting of competing species that do not interact. These experiments also show that mutually catalytic networks emerge when catalytic activity is assigned at random, showing a behavior similar to that of random catalytic networks [792], albeit extended to the lipid aggregate domain.

Further computer simulations reveal the emergence of homeostatically stable ensemble replicators or composomes [761, 762], also referred to as quasicompartments, or quasistationary states (QSSs). As mutations accumulate, these composomes can get replaced by new ones, like ESSs being replaced by a fitter mutant (see also section 7.2.8). Unfortunately, however, the low accuracy of replication characteristic of such molecular assemblies compromises their evolvability [884]. This is one of the reasons why these compartments are primarily regarded as shelters for information-carrying molecules, rather than as standalone units of evolution.

(p.130) In spite of the evolvability issues raised in [884], the Lipid World hypothesis and its simulated GARD model remain interesting ideas worth further exploration, since they open up the possibility of alternative ways to encode and transmit the information necessary for life, different from more traditional sequence-based approaches.

A number of extensions of the GARD model have been proposed in order to address its shortcomings. Polymer GARD [771] is an extension of the basic GARD model in which polymers can form by the combination of basic monomers, therefore supporting the dynamic creation of new molecular species. Another extension of the GARD model called EE-GARD (environment exchange polymer GARD) was proposed in [773] to study the coevolution of compositional protocells with their environment. A spatial version of GARD is presented in [772], where limitations in assembly size give rise to complex dynamics, including a spontaneous symmetry breaking phenomenon akin to the emergence of homochirality.

# 6.3 Artificial Chemistry Contributions to Origin of Life Research

One of the initial motivations for artificial chemistries was exactly to study the origin of life. Since it is difficult or even impossible to actually demonstrate the origin of life in the laboratory, abstract computer models are valuable tools to obtain insights into this complex problem within feasible time frames.

AC research themes related to the origin of life include the exploration of various types of autocatalytic sets, the investigation of how evolution might emerge in chemical reaction networks, and the formation of protocellular structures akin to chemotons. We shall now focus on autocatalytic sets and the formation of protocells with ACs, whereas ACs related to evolution are covered in chapter 7.

## 6.3.1 Autocatalytic Sets

Catalysis is important for life, otherwise chemical reactions would simply be too slow to be able to keep an organism alive. Moreover catalysis also plays a regulatory role in modern cells, acting as a mechanism to control which reactions should occur and when. Autocatalysis is even more interesting, because it can be regarded as a form of molecular self-replication. As we will see in chapter 7, the combination of replication with some variation, subject to a selection pressure can lead to evolution. Strictly speaking, autocatalytic reactions do not actually exist in chemistry; no molecule is able to replicate itself in a single step. Instead, autocatalysis occurs in cycles of individual reaction steps, such as the formose cycle (figure 6.1), or the template replication cycle shown in figure 6.6. An autocatalytic cycle contains a reaction that duplicates one of the members of the cycle, and as a result, the other members of the cycle are also duplicated. Examples of autocatalytic cycles in biochemistry include the Calvin cycle (part of photosynthesis) and the reductive citric acid cycle (a reverse form of the Krebs cycle that could also have played a role in prebiotic metabolism). Cycles can also be nested, forming hypercycles [249]: the output of one autocatalytic cycle feeds another cycle, and so on, until the last cycle feeds the first one, closing the hypercycle.

More generally, one can have a set of chemicals that together form a collectively autocatalytic reaction network, in which every member of the set is produced by at least one reaction catalyzed by another member of the set. Such a set is called an autocatalytic set [260, 446].

### (p.131) Autocatalytic Sets of Proteins

Motivated by the formation of amino acids in the Miller-Urey experiments, in the 1980s Kauffman [446] investigated the hypothesis that life could have originated from an autocatalytic set of proteins. Proteins fold into 3D shapes that may confer them catalytic properties. One protein could then catalyze the formation of another protein, and so on, leading to a network of catalytic interactions between proteins. Kauffman then argued that for a sufficiently large amount of catalytic interactions, autocatalytic sets would inevitably form.

Proteins are formed in condensation reactions that join two amino acids together, releasing one molecule of water in the process:

(6.2)
$Display mathematics$
In this reaction, peptides A and B are concatenated to produce peptide C, and a molecule of water (H) is released. Molecule E is an optional catalyst for this reaction. In an aqueous environment, the reverse reaction (cleavage of the protein, consuming a water molecule) is favored; therefore, the chemical equilibrium tends toward small peptides. One solution to allow the formation of large peptides is to keep the system out of equilibrium by a steady inflow of amino acids or other small peptides, which can be accompanied by the removal of water molecules in order to favor the forward reaction. Another (complementary) solution to favor polymerization is the addition of energy, for instance, in the form of energized molecules.

In order for these reactions to proceed with sufficient speed, they must be catalyzed. Since proteins can have catalytic abilities, the catalysts can be the proteins themselves, therefore autocatalytic sets containing only proteins could be envisaged. Figure 6.12 shows an example of autocatalytic set as idealized by Kauffman. In his model, peptides are represented as strings from a two-letter alphabet {a, b}. The graph of figure 6.12 shows the strings as nodes and the reactions between them as arcs, with the action of catalysts represented as a dotted arrow pointing to the reaction that it accelerates.

Starting from a “food set” of monomers and dimers (molecules a, b, aa and bb), progressively longer polymers can form. Moreover, autocatalytic cycles can form, such that the concentrations of the reactants in the cycle gets amplified as the expense of other molecules that can quickly get broken down into smaller ones, until they end up feeding the cycles. One such cycle is highlighted in figure 6.12: molecule “abb” catalyzes the formation of “baab” which in turn catalyzes the formation of the long polymer “aabaabbb” on top of the figure. This long string then catalyzes the cleavage of “abab” into two molecules of “ab,” each of them a catalyst for “abb,” closing the cycle.

Using a graph-theoretic analysis, assuming that all proteins had a given probability P of catalyzing the formation of another protein in a condensation-cleavage reaction (equation 6.2), Kauffman calculated the minimum probability of catalysis above which autocatalytic sets would form with high probability. In a subsequent study [260] Farmer and colleagues estimated the probability of catalysis as a function of the size of the food set and the size of the polymer alphabet: they showed that for a polymer alphabet of size B and a firing disk (food set) containing about N = BL polymers of maximum length L, a critical probability of catalysis Pcrit occurs around

(6.3)
$Display mathematics$
For P > Pcrit, the formation of autocatalytic sets is favored due to the dense connectivity of the reaction graph, which causes its fast expansion from the food set as new polymers are produced. (p.132)

Figure 6.12 An example of an autocatalytic set: polymers from a binary alphabet are formed in catalyzed reactions, from monomers and dimers that constitute the food set. An example of autocatalytic cycle within this set is highlighted in red.

Adapted from p. 54 of [260] or p. 323 of [447].

For P < Pcrit, the reaction graph stops growing very soon, because nodes tend to end up isolated in disconnected islands. A large firing disk (due to a large B or a large L) greatly reduces Pcrit, hence a small probability of catalysis is enough to form autocatalytic sets, which is consistent with a prebiotic scenario dominated by inefficient catalysts. Note that, according to this model, since a larger B is more favorable for the formation of autocatalytic sets, proteins (with B ≈ 20) are more suitable to form such sets than RNA or DNA (with B = 4) for the same maximum length L of the food set polymers.

### Autocatalytic Metabolisms

Kauffman [446] has pointed out that autocatalytic sets could also be used to study the emergence of metabolisms: Proteins could be used to catalyze not only the formation of other proteins but also the formation and breakdown of other organic compounds in metabolic reactions. The model of an autocatalytic metabolism would then be essentially the same as an autocatalytic set. More complex compounds are formed from simple building blocks provided in the food set, and autocatalytic sets would emerge with high probability when the connectivity of the reaction graph exceeds a threshold, determined by the probability of catalysis of each reaction. He then conjectured that primitive metabolisms could arise in conjunction with autocatalytic sets of proteins, such that the latter could be used to produce the catalysts needed to maintain both sets (p.133) of reactions. Accordingly, Farmer’s polymers in [260] are abstract strings that could represent proteins or other organic compounds.

The studies in [260, 446] are mostly based on graph-theoretical, static considerations, without taking dynamics nor chemical kinetic aspects into account, although an ODE formulation is presented in [260] together with preliminary results.

In 1992, Bagley and Farmer [49] extended the autocatalytic set model of [260, 446] to a dynamical system model of autocatalytic metabolisms, which still follows the same condensation-cleavage scheme of reaction 6.2), where only a subset of the reactions are catalyzed. To emphasize the importance of dynamics, the authors define an autocatalytic metabolism as an autocatalytic set in which the species concentrations are significantly different from those expected without catalysis. This model highlights the role of catalysis in focusing the mass of the system into a few species that constitute the core of the autocatalytic metabolism. Such catalytic focusing only occurs when the system is kept out of equilibrium by a steady inflow of food set members. This is because catalysis does not change the chemical equilibrium of a reversible reaction: it accelerates both forward and backward reactions by an equal factor, thereby keeping the equilibrium unchanged.

Starting from a “firing disk” of small polymers, Bagley and Farmer [49] simulated the dynamics of their system under a continuous supply of small food molecules. Under various parameter settings, they observed the emergence of autocatalytic networks able to take up the food molecules provided and turn them into a reduced set of core molecules that formed a stable autocatalytic metabolism. In a subsequent paper [50] the authors studied how such a metabolism would react to mutation events and concluded that such metabolisms had the potential to evolve by jumping from one fixpoint to a different one.

### Formalizing Autocatalytic Sets

Jain and Krishna [425–427] proposed a network model of autocatalytic sets in which only the catalytic interactions between species are depicted on a graph. This model is a simplification of Kauffman’s graphs in which only the catalytic action (dotted arrows in figure 6.12) is explicitly represented. The dynamics of the system is simulated in a way that species that fall below a given threshold concentration get replaced by newly created species. This model will be covered in more detail in chapter 15 in the context of novelty creation in ACs.

Hordijk and Steel [402] devised a formal model of autocatalytic sets based on the concept of a RAF set (reflectively autocatalytic and F-generated). According to the RAF framework, a set of reactions R is reflexively autocatalytic (RA) when every reaction in R is catalyzed by at least one molecule involved in any of the reactions within R. The set R is F-generated (F) if every reactant in R can be constructed from a small set of molecules (the “food set” F) by successively applying reactions from R. A set R is then a RAF when it is both RA and F. In other words, a RAF set is a self-sustaining set that relies on externally supplied food molecules; thus, it is an autocatalytic set satisfying the catalytic closure property.

The RAF formalism was used to devise a polynomial-time algorithm (polynomial in the number of reactions) to detect autocatalytic sets within a given chemical reaction network [402]. The algorithm is then applied to study the phase transitions from parameter settings where random reaction networks remain highly probable, to those favoring the emergence of autocatalytic sets.

The RAF formalism was extended in [599] to take inhibitory effects into account. Using this extended model, the authors show that the problem of finding autocatalytic sets becomes NP-hard (p.134) when inhibitory catalytic reactions may occur. Hence, the algorithm in [402] cannot be at the same time extended to support inhibition and kept polynomial.

In a review paper [400] about the RAF framework, the authors highlight the important role of their framework in adding formal support to Kauffman’s original claim that autocatalytic sets arise almost inevitably in sufficiently complex reaction networks. The RAF framework quantifies the amount of catalysis needed to observe the emergence of autocatalytic sets: according to computational and analytical results [402, 599], only a linear growth in catalytic activity (with system size) is required for the emergence of autocatalytic sets, in contrast with Kauffman’s original claim that an exponential growth was needed. However, the authors also acknowledge the limitations of the RAF approach: It does not take dynamics into account, does not address compartmentalized chemistries, and does not directly look at heredity and natural selection.

### Side Reactions and Stochastic Effects

In practice, a cyclic reaction network is never perfect. Numerous side reactions take place, draining resources from the main cycle. Autocatalysis can help compensating for losses and errors due to side reactions [565]. In the absence of enzymes, reactions that are not autocatalytic have trouble to “survive” against a background of numerous types of reactions and reactants, leading to a dead “tar” of chemicals or to a combinatorial explosion of reactions that end up in the same dead tar after exhausting all the available resources. In contrast, autocatalysis provides not only a means to “stand out” against the background of various reactions, but also to replicate substances, therefore it can be seen as a first step toward reproduction and Darwinian evolution.

Unfortunately, however, autocatalytic reaction networks are also plagued by side reactions. Moreover, they seem to have limited capacity to evolve [836]. A vast amount of research literature is available on attempts to find solutions to make primitive autocatalytic metabolisms actually “jump to life.”

Kauffman’s original argument was based on the probability of catalysis in a static graph of molecular interactions. Therefore, it is largely criticized for overestimating these probabilities under realistic situations where dynamic and stochastic effects cannot be neglected. Bagley and Farmer [49] showed the emergence of autocatalytic sets in a dynamic setup; however, the emergence of such sets under stochastic fluctuations has only recently been extensively studied [277, 278]. As expected, these more recent studies [277, 278] confirm that autocatalytic sets emerge more rarely in stochastic settings than in deterministic ones, and that the sets formed are often unstable and easily disappear due to stochastic fluctuations. The main objections against the autocatalytic set theory thus still remain the leakings and errors caused by side reactions. In summary, the spontaneous emergence and maintenance of autocatalytic sets is not so straightforward to observe in practice, and this topic is still subject of active research.

## 6.3.2 Emergence of Protocells with ACs

The above discussion about autocatalytic sets teaches us that life is very unlikely to emerge from “naked” chemical reactions in a prebiotic “soup.” Given the spontaneous formation of amphiphile vesicles in water, a more plausible scenario for the origin of life could be the emergence of primitive protocellular structures that could harbor the chemical reactions that would then support life. Much attention from the AC community has been devoted to investigate this hypothesis.

(p.135) Earlier we have already looked at the GARD model as an example of artificial chemistry that aims at studying the origins of life using amphiphilic molecules, in what is termed the Lipid World hypothesis. Related ACs that simulate the self-assembly of amphiphiles into micelles and vesicles include [246, 270].

### Autopoietic Protocells by Ono and Ikegami

Following from the computational models of autopoiesis, Ono and Ikegami devised an artificial chemistry able to spontaneously form two-dimensional protocells in space [637, 639, 641]. In their model, particles move, rotate and interact with each other in a hexagonal grid. Three types of particles are considered: hydrophilic, hydrophobic, and neutral. Hydrophilic and hydrophobic particles repel each other, whereas a neutral particle may establish weak interactions with the other two types. Hydrophobic particles may be isotropic or anisotropic. Isotropic hydrophobic particles repel hydrophilic particles with equal strength in all directions. Anisotropic particles, on the other hand, have a stronger repulsion in one direction. Figure 6.13 illustrates the difference between isotropic and anisotropic repulsion fields.

Figure 6.13 Repulsion field surrounding two types of hydrophobic particles in the two-dimensional protocell model by Ono and Ikegami [641]: (a) isotropic particles Mi form a uniform repulsion field around them; (b) anisotropic particles Ma have stronger repulsion in a given direction (a darker hexagon represents higher repulsion).

From [641], © Springer 2001, reprinted with permission.

Five chemical species, A, M, X, Y, and W, react with each other, forming a minimal metabolic system. Particles of species A are autocatalysts that replicate themselves by consuming food molecules X. A particles may also produce molecules of M, which represent membrane constituents. W is a water molecule. All particles (except W) decay into the “waste” species Y. An external energy source recycles Y molecules into new food particles X. A and W are hydrophilic, M is hydrophobic, whereas X and Y are neutral.

After some preliminary exploration [639], in [641] the authors show the formation of membrane-like clusters of particles, with initially irregular shapes. Some of these membranes eventually form closed cells that are nevertheless permeable to some small food particles that cross the membrane by osmosis. Some of these protocells may even grow and divide. The presence of anisotropic particles is crucial for protocell formation in this model; when only isotropic particles are present, clusters of particles form but end up shrinking and disappearing due to their inability to sustain an internal metabolism. The advantage of anisotropic over isotropic particles is especially visible under low resource supply conditions, as shown in figure 6.14, which depicts the main results of [641]. Note, however, that the metabolic system inside these (p.136)

Figure 6.14 Emergence of protocells in the autopoietic model by Ono and Ikegami [641], under low supply of X food molecules. Left (a1, a2, a3): when all M particles are isotropic, clusters resembling cells form, but are unable to sustain themselves and end up dying out. Right (b1, b2, b3): when M particles are anisotropic, irregular membrane filaments form initially, and some of them form closed protocells able to grow, divide and sustain their internal metabolism. Time flows from top to bottom. Blue: Water (W) particles; red: membrane (M) particles; yellow: autocatalysts (A).

From [641], © Springer 2001, reprinted with permission.

cells is extremely simple; there is no explicit genetic material and no evolution mechanism, and therefore this system is expected to have a limited ability to create more complex organisms.

A subsequent study [637] goes further in depth into the conditions for protocell emergence in this model. The model has also been extended to three dimensions [538], revealing a variety of self-maintaining cellular structures including the formation of parallel membranes with tubular structures connecting them, and intertwined filaments that organize themselves into globular structures.

# 6.4 Summary

Although considerable progress has been made in understanding the chemical processes that could potentially lead to life, the actual origin of life on Earth remains surrounded by mystery. The experimental results so far have relied on extreme simplifications and only succeed (p.137) in demonstrating fragments of the puzzle. A full demonstration of a scenario for the origin of life under plausible chemical conditions which is compatible with a prebiotic Earth still remains to be shown. We can be sure about one point, though: Many exciting new discoveries still lie ahead of us.

Even at this stage, where there are still more open questions than firm answers, the investigations around the origin of life open up several doors to other connected research domains, including related exploratory efforts such as astrobiology [313, 667], and the construction of synthetic life in the laboratory. More about the latter will appear in chapter 19. Meanwhile, in the next chapter we shall turn our attention to how life evolves once it has emerged. (p.138)

## Notes:

(1) Some ideas discuss that life originated elsewhere in the universe, e.g., on Mars, or even farther out.