There are small parts between Genes in an Operon that does not encode for any amino acids. What is the purpose of these parts?

There are small parts between Genes in an Operon that does not encode for any amino acids. What is the purpose of these parts?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

What is the purpose of these parts in the translation process ? Picture to demonstrate below :

The regions between the genes on a prokaryote operon are transcribed into the messenger RNA but are not translated into proteins. Because they are not translated, they are characterized by the general term UnTranslated Regions (UTRs). A typical operon having two genes is shown in this diagram from Wikipedia:

The UTR which occurs between the protein-coding genes contains the sequence for a ribosomal binding site (RBS) in the transcribed mRNA. An RBS is a site where a ribosome can attach to the mRNA so that the Start codon of the following gene is in the correct position to begin translation of the protein. (Of course, the 5' UTR preceding the first gene also has an RBS sequence.) An RBS contains a particular nucleotide sequence to match up with a complementary sequence of the ribosomal RNA of the ribosome; this sequence is known as the Shine-Dalgarno sequence.

(Note that these inter-gene UTRs are not called introns; introns occur only in eukaryotes and occur within a gene, between the Start and Stop codons.)

Not sure about the role in this particular case of lac operon but normally such noncoding sequences in eukaryotic system (intron regions) have some regulatory roles, may be on the same gene or other genes located elsewhere on the chromosome. For example intron 3 of gene A may act as an enhancer sequence of gene B located 500 bp downstream of gene A

In this case the operon is producing three different proteins, if there were no gaps then a fusion protein may have been produced which is not desired

There are small parts between Genes in an Operon that does not encode for any amino acids. What is the purpose of these parts? - Biology

Regulation of Gene Expression

Cellular function is influenced by cellular environment. Adaptation to specific environments is achieved by regulating the expression of genes that encode the enzymes and proteins needed for survival in a particular environment. Factors that influence gene expression include nutrients, temperature, light, toxins, metals, chemicals, and signals from other cells. Malfunctions in the regulation of gene expression can cause various human disorders and diseases.

Regulation in Prokaryotes

Bacteria have a simple general mechanism for coordinating the regulation of genes that encode products involved in a set of related processes. The gene cluster and promoter, plus additional sequences that function together in regulation are called an operon.

The Lactose Operon (lac operon)

The lactose operon of E. coli encodes the enzyme b -galactosidase which hydrolyzes lactose into galactose and glucose.

The lac operon contains three cistrons or DNA fragments that encode a functional protein. The proteins encoded by cistrons may function alone or as sub-units of larger enzymes or structural proteins.

The Z gene encodes for b -galactosidase. The Y gene encodes a permease that facilitates the transport of lactose into the bacterium. The A gene encodes a thiogalactoside transacetylase whose function is not known. All three of these genes are transcribed as a single, polycistronic mRNA. Polycistronic RNA contains multiple genetic messages each with its own translational initiation and termination signals.

Regulation of the lac Operon

The activity of the promoter that controls the expression of the lac operon is regulated by two different proteins. One of the proteins prevents the RNA polymerase from transcribing (negative control), the other enhances the binding of RNA polymerase to the promoter (positive control).

Negative Control of the lac Operon

The protein that inhibits transcription of the lac operon is a tetramer with four identical subunits called lac repressor. The lac repressor is encoded by the lacI gene, located upstream of the lac operon and has its own promoter. Expression of the lacI gene is not regulated and very low levels of the lac repressor are continuously synthesized. Genes whose expression is not regulated are called constitutive genes.

In the absence of lactose the lac repressor blocks the expression of the lac operon by binding to the DNA at a site, called the operator that is downstream of the promoter and upstream of the transcriptional initiation site. The operator consists of a specific nucleotide sequence that is recognized by the repressor which binds very tightly, physically blocking (strangling) the initiation of transcription.

The lac repressor has a high affinity for lactose. When a small amount of lactose is present the lac repressor will bind it causing dissociation from the DNA operator thus freeing the operon for gene expression. Substrates that cause repressors to dissociate from their operators are called inducers and the genes that are regulated by such repressors are called inducible genes.

Positive Control of the lac Operon

Although lactose can induce the expression of lac operon, the level of expression is very low. The reason for this is that the lac operon is subject to catabolite repression or the reduced expression of genes brought on by growth in the presence of glucose. Glucose is very easily metabolized so is the preferred fuel source over lactose, hence it makes sense to prevent expression of lac operon when glucose is present.

The strength of a promoter is determined by its ability to bind RNA polymerase and to form an open complex. The promoter for the lac operon is weak and consequently the lac operon is poorly transcribed upon induction. There is a binding site, upstream from the promoter, for a protein called the catabolite activator protein (CAP). When the CAP protein binds it distorts the DNA so that the RNA polymerase can bind more effectively, thus transcription of the lac operon is greatly enhanced. In order to bind the CAP must first bind cyclic AMP (cAMP), a second messenger synthesized from ATP by the enzyme Adenylate Cyclase.

In the presence of glucose circulating cAMP levels are very low and consequently the initiation of transcription from the lac operon is very low. As glucose levels decrease the concentration of cAMP increases activating CAP which in turn binds to the CAP site stimulating transcription. The cAMP-CAP complex is called a positive regulator.

Arabinose is a five-carbon sugar that can serve as an energy and carbon source for E. coli. Arabinose must first be converted into ribulose-5-phosphate before it can be metabolized. The arabinose operon has three genes,araB, araA and araD that encode for three enzymes to carry out this conversion. A fourth gene, araC, which has its own promoter, encodes a regulatory factor called the C protein.

The regulatory sites of the ara operon include four sites that bind the C protein and one CAP binding site. The araO1 and araO2 sites are upstream of the promoter and CAP binding sites. The other two C protein binding sites called araI1 and araI2 are located between the CAP binding site and the promoter.

Negative Control of the araC Operon

In the absence of arabinose, dimers of the C protein bind to araO2, araO1 and araI1. The C proteins bound to araO2 and araI1 associate with one another causing the DNA between them to form a loop effectively blocking transcription of the operon.

Positive Control of the araC Operon

The C protein binds arabinose and undergoes a conformational change that enables it to also bind the araO2 and araI2 sites. This results in the generation of a different DNA loop that is formed by the interaction of C proteins bound to the araO1 and araO2 sites.

The formation of this loop stimulates transcription of the araC gene resulting in additional C protein synthesis, thus the C protein autoregulates its own synthesis. In the absence of glucose, cAMP-CAP is formed which binds to the CAP site. C protein bound at the araI1 and araI2 sites interacts with the bound CAP enabling RNA polymerase to initiate transcription from the ara operon promoter.

The Tryptophan Operon

E. coli can synthesize all 20 of the natural amino acids. Amino acid synthesis consumes a lot of energy, so to avoid wasting energy the operons that encode for amino acid synthesis are tightly regulated. The trp operon consists of five genes, trpE, trpD, trpC, trpB and trpA, that encode for the enzymes required for the synthesis of tryptophan.

The trp operon is regulated by two mechanisms, negative corepression and attenuation. Most of the operons involved in amino acid synthesis are regulated by these two mechanisms.

The trp operon is negatively controlled by the trp repressor, a product of the trpR gene. The trp repressor binds to the operator and blocks transcription of the operon. However, in order to bind to the operator the repressor must first bind to Trp hence tryptophan is a corepressor. In the absence of Trp the trp repressor dissociates and transcription of the trp operon is initiated.

Attenuation regulates the termination of transcription as a function of tryptophan concentration. At low levels of trp full length mRNA is made, at high levels transcription of the trp operon is prematurely halted. Attenuation works by coupling transcription to translation. Prokaryotic mRNA does not require processing and since prokaryotes have no nucleus translation of mRNA can start before transcription is complete. Consequently regulation of gene expression via attenuation is unique to prokaryotes.

a. Attenuation is mediated by the formation of one of two possible stem-loop structures in a 5' segment of the trp operon in the mRNA.

b. If tryptophan concentrations are low then translation of the leader peptide is slow and transcription of the trp operon outpaces translation. This results in the formation of a nonterminating stem-loop structure between regions 2 and 3 in the 5' segment of the mRNA. Transcription of the trp operon is then completed.

c. If tryptophan concentrations are high the ribosome quickly translates the mRNA leader peptide. Because translation is occurring rapidly the ribosome covers region 2 so that it can not attach to region 3. Consequently the formation of a stem-loop structure between regions 3 and 4 occurs and transcription is terminated.

Regulation of Gene Expression in Eukaryotes

The genetic information of a human cell is a thousand fold greater than that of a prokaryotic cell. Things are further complicated by the number of cell types and the fact that each cell type must express a particular subset of genes at different points in an organisms development. Regulating gene expression so that a particular subset of genes is expressed in a specific tissue at specific points of development is very complicated. This increased complexity in regulation lends itself to malfunctions that cause disease. Three ways that eukaryotes regulate gene expression will be discussed: alteration of gene content or position, transcriptional regulation and alternative RNA processing.

1. Alteration of Gene Content or Position

The copy number of a gene or its location on the chromosome can greatly effect its level of expression. Gene content or location can be altered by gene amplification, diminution or rearrangement.

The expression of a particular gene can be augmented by amplifying its copy number. Histone proteins and rRNA are needed in large quantities by almost all eukaryotic cells therefore the genes encoding histones and rRNA exist in a permanently amplified state. Gene amplification can present problems with the use of chemotherapeutic drugs. Methotrexate inhibits dihydrofolate reductase, the enzyme responsible for regenerating the folates used in nucleotide synthesis. Tumor cells often become resistant to the drug because the gene encoding dihydrofolate reductase is amplified by several hundred fold resulting in more enzyme production then the drug can handle.

A gene whose expression is only needed at a particular developmental point or in a particular tissue may be shut off by gene diminution. As reticulocytes mature into red blood cells all of their genes are lost as the nucleus is degraded.

Gene rearrangement is used to generate each of the genes encoding the millions of different antibodies that are produced by B cells. Sometimes bad gene rearrangements occur that lead to improper gene regulation. This frequently occurs in cancer cells. Translocation of a segment from chromosome 8 to chromosomes that encode immunoglobulins leads to activation of a gene that transforms healthy B cells into Burkitt's lymphoma cells (unregulated proliferating B cells).

2. Transcriptional Regulation

Through Chromosomal Packaging

Regions of each of the different chromosomes are either packaged as heterochromatin or euchromatin. In heterochromatin the DNA is very tightly condensed and rendered inaccessible to the transcriptional machinery, consequently heterochromatin is transcriptionally inactive. In human females one of each of the two X chromosomes is completely inactivated by being packaged into a heterochromatin to form a Barr body. The Cys residues in DNA in the heterochromatin are heavily methylated suggesting that methylation may play a role in the maintenance of heterochromatin. Drugs that interfere with methylation cause activation of previously inactive genes found in heterochromatin.

In euchromatin the DNA is not as condensed and is accessible to the transcription machinery. The regions of a chromosome that are maintained as hetero- and eu- chromatin may vary in a cell specific manner. This may enable the cells of specific tissues to express a particular subset of genes required for tissue function.

Through Individual Genes

Proteins that participate in regulating gene expression are often called trans acting elements. At least 100 different proteins, many specific for the regulation of a particular gene, are known. Others play a more general role in regulating gene expression in a manner analogous to the activation of numerous prokaryotic genes by the CAP-cAMP complex. Trans-acting factors have multiple domains required for activity and may include DNA-binding, transcription-activating and ligand-binding domains.

DNA binding domains recognize specific DNA sequences in the regulatory regions of a gene. The DNA-binding domains of a regulatory protein generally consist of one of three motifs: helix-turn-helix, zinc finger or leucine zipper. DNA-binding proteins possessing these motifs bind with high affinity to their recognition sites and with low affinity to other DNA. A very small portion of the protein makes contact with the DNA through H-bonds and van der Waals interactions between amino acid side chains and the functional groups in the major groove and the phosphate backbone of the DNA. The remainder of the protein is involved in proper positioning of the DNA-binding domain and in making protein-protein contacts with other transcriptional proteins.

Proteins with this motif form symmetric dimers that recognize a symmetric palindromic DNA sequence. Each monomer of the dimer contains a region in which two a helices are held at 90 degrees to each other by a turn of four amino acids. One set of helices makes contact with about five base pairs in the major groove. The other set sits atop the phosphate backbone and helps to properly position the set of helices that fits into the major groove.

Proteins possessing this motif contain between 2 to 9 repeated domains that are each centered on a tetrahedrally coordinated zinc ion. Each zinc coordinated domain forms a loop containing an a -helix, this loop is called a zinc-finger. There are two types of zinc fingers: the C2H2 finger and the Cx finger.

Three fingers interact with the major groove and wrap around the DNA. Many transcription factors have this type of domain.

Proteins with this motif bind as dimers to the major groove of the DNA. Many steroid receptors have this type of domain.

Proteins with this type of motif have an amphipathic a -helix at their carboxyl terminus. One side of the helix consists of hydrophobic groups, usually leucine, that are repeated every seventh position for several turns of the helix. The other face consists of charged and polar groups.

Proteins with this motif bind as dimers to the major groove of the DNA. The two a -helices of each arm enter the major groove and wrap around the double helix. Several oncogenes use this type of motif.

Transcription-Activating Domains

These domains generally act separately and independently of the DNA-binding domains. Transcription-activating domains enhance transcription by physically ineracting with other regulatory proteins and/or with RNA polymerase. The actual mechanisms by which these domains activate or enhance transcription are not known.

Steroid hormones, thyroid hormones and retinoic acid are examples of ligands that activate transcription by binding to a specific domain on a receptor protein. Upon binding the receptor undergoes a conformational change that enables it to bind DNA. Once bound to the DNA a receptor protein can activate or repress transcription of the target gene.

Cis-acting elements are DNA sequences that are recognized and bound by the trans-acting elements that regulate transcription. There are two major types of cis-acting elements: promoters and regulatory elements.

Promoters are the sites where RNA polymerase must bind to the DNA in order to initiate transcription (see "RNA Synthesis and Processing" lecture). The rate or efficiency of promoter use by RNA polymerase is affected by the regulatory elements.

Regulatory elements are specific DNA sequences that are recognized and bound by the trans-acting elements that stimulate or inhibit the expression of a particular gene. There are two types: enhancers and response elements.

Enhancers are regulatory elements that increase or repress the rate of gene transcription.

Response Elements are regulatory sequences that facilitate the coordinated regulation of a group of genes. Certain ligands such as steroid hormones and cAMP bind to their receptors which in turn bind to their response element to activate or inhibit transcription.

3. Alternative Processing

Initiating transcription at an alternative start site places a different exon at the 5' end of the transcript. Examples of genes that use alternative start sites as a form of regulation include amylase, myosin and alcohol dehydrogenase.

Alternative Polyadenylation Sites

Immunoglobin (antibody) heavy chains use an alternative polyadenylation site to affect the length of transcripts. The longer transcript encodes the m m form which is localized to the cell membranes of lymphocytes, the shorter transcript encodes the secreted form, m s.

Alternative splice sites are used to generate similar proteins with tissue specific functions called isoforms. Many peptide hormones exist as isoforms such as the calcitonin gene which is differentially spliced to produce calcitonin in the thyroid and calcitonin gene-related peptide in the neurons.

Regulation of mRNA Stability

The stability of mRNA is quite variable form gene to gene. These variations in stability govern the length of time that mRNA is available for translation and hence the amount of protein that is synthesized. The half-lives of mRNA vary from 10 hours to minutes. Sequences in the 3' untranslated region of mRNA which serve as signals for rapid degradation have been identified in some mRNA's with very short half-lives. The length of the poly A tail also affects mRNA stability, with longer tails tending to have longer half-lives.

CH450 and CH451: Biochemistry - Defining Life at the Molecular Level

13.1 Prokaryotic Gene Regulation

13.2 Eukaryotic Gene Regulation

13.3 Protein-DNA Interactions

13.4 Epigenetics and Transgenerational Inheritence

13.5 References

Each nucleated cell in a multicellular organism contains copies of the same DNA. Similarly, all cells in two pure bacterial cultures inoculated from the same starting colony contain the same DNA, with the exception of changes that arise from spontaneous mutations. If each cell in a multicellular organism has the same DNA, then how is it that cells in different parts of the organism’s body exhibit different characteristics? Similarly, how is it that the same bacterial cells within two pure cultures exposed to different environmental conditions can exhibit different phenotypes? In both cases, each genetically identical cell does not turn on, or express, the same set of genes. Only a subset of proteins in a cell at a given time is expressed.

Genomic DNA contains both structural genes, which encode products that serve as cellular structures or enzymes, and regulatory genes, which encode products that regulate gene expression. The expression of a gene is a highly regulated process. Whereas regulating gene expression in multicellular organisms allows for cellular differentiation, in single-celled organisms like prokaryotes, it primarily ensures that a cell’s resources are not wasted making proteins that the cell does not need at that time.

Elucidating the mechanisms controlling gene expression is important to the understanding of human health. Malfunctions in this process in humans lead to the development of cancer and other diseases. Understanding the interaction between the gene expression of a pathogen and that of its human host is important for the understanding of a particular infectious disease. Gene regulation involves a complex web of interactions within a given cell among signals from the cell’s environment, signaling molecules within the cell, and the cell’s DNA. These interactions lead to the expression of some genes and the suppression of others, depending on circumstances.

Prokaryotes and eukaryotes share some similarities in their mechanisms to regulate gene expression however, gene expression in eukaryotes is more complicated because of the temporal and spatial separation between the processes of transcription and translation. Thus, although most regulation of gene expression occurs through transcriptional control in prokaryotes, regulation of gene expression in eukaryotes occurs at the transcriptional level and post-transcriptionally (after the primary transcript has been made).

13.1 Prokaryotic Gene Regulation

In bacteria and archaea, structural proteins with related functions are usually encoded together within the genome in a block called an operon and are transcribed together under the control of a single promoter, resulting in the formation of a polycistronic transcript(Figure 13.1). In this way, regulation of the transcription of all of the structural genes encoding the enzymes that catalyze the many steps in a single biochemical pathway can be controlled simultaneously, because they will either all be needed at the same time, or none will be needed. For example, in E. coli, all of the structural genes that encode enzymes needed to use lactose as an energy source are encoded next to each other in the lactose (or lac) operon under the control of a single promoter, the lac promoter. French scientists François Jacob (1920–2013) and Jacques Monod at the Pasteur Institute were the first to show the organization of bacterial genes into operons, through their studies on the lac operon of E. coli. For this work, they won the Nobel Prize in Physiology or Medicine in 1965.

Figure 13.1 Schematic Representation of an Operon. In prokaryotes, structural genes of related function are often organized together on the genome and transcribed together under the control of a single promoter. The operon’s regulatory region includes both the promoter and the operator. If a repressor binds to the operator, then the structural genes will not be transcribed. Alternatively, activators may bind to the regulatory region, enhancing transcription.

Each operon includes DNA sequences that influence its own transcription these are located in a region called the regulatory region. The regulatory region includes the promoter and the region surrounding the promoter, to which transcription factors, proteins encoded by regulatory genes, can bind. Transcription factors influence the binding of RNA polymerase to the promoter and allow its progression to transcribe structural genes. A repressor is a transcription factor that suppresses transcription of a gene in response to an external stimulus by binding to a DNA sequence within the regulatory region called the operator, which is located between the RNA polymerase binding site of the promoter and the transcriptional start site of the first structural gene. Repressor binding physically blocks RNA polymerase from transcribing structural genes. Conversely, an activator is a transcription factor that increases the transcription of a gene in response to an external stimulus by facilitating RNA polymerase binding to the promoter. An inducer, a third type of regulatory molecule, is a small molecule that either activates or represses transcription by interacting with a repressor or an activator.

In prokaryotes, there are examples of operons whose gene products are required rather consistently and whose expression, therefore, is unregulated. Such operons are constitutively expressed, meaning they are transcribed and translated continuously to provide the cell with constant intermediate levels of the protein products. Such genes encode enzymes involved in housekeeping functions required for cellular maintenance, including DNA replication, repair, and expression, as well as enzymes involved in core metabolism. In contrast, there are other prokaryotic operons that are expressed only when needed and are regulated by repressors, activators, and inducers.

Prokaryotic operons are commonly controlled by the binding of repressors to operator regions, thereby preventing the transcription of the structural genes. Such operons are classified as either repressible operonsor inducible operons. Repressible operons, like the tryptophan (trp) operon, typically contain genes encoding enzymes required for a biosynthetic pathway. As long as the product of the pathway, like tryptophan, continues to be required by the cell, a repressible operon will continue to be expressed. However, when the product of the biosynthetic pathway begins to accumulate in the cell, removing the need for the cell to continue to make more, the expression of the operon is repressed. Conversely, inducible operons, like the lac operon of E. coli, often contain genes encoding enzymes in a pathway involved in the metabolism of a specific substrate like lactose. These enzymes are only required when that substrate is available, thus expression of the operons is typically induced only in the presence of the substrate.

The trp Operon: A Repressible Operon

E. coli can synthesize tryptophan using enzymes that are encoded by five structural genes located next to each other in the trp operon (Figure 13.2). When environmental tryptophan is low, the operon is turned on. This means that transcription is initiated, the genes are expressed, and tryptophan is synthesized. However, if tryptophan is present in the environment, the trp operon is turned off. Transcription does not occur and tryptophan is not synthesized.

When tryptophan is not present in the cell, the repressor by itself does not bind to the operator therefore, the operon is active and tryptophan is synthesized. However, when tryptophan accumulates in the cell, two tryptophan molecules bind to the trp repressor molecule, which changes its shape, allowing it to bind to the trp operator. This binding of the active form of the trp repressor to the operator blocks RNA polymerase from transcribing the structural genes, stopping expression of the operon. Thus, the actual product of the biosynthetic pathway controlled by the operon regulates the expression of the operon.

Figure 13.2 The Trp Operon. The five structural genes needed to synthesize tryptophan in E. coli are located next to each other in the trp operon. When tryptophan is absent, the repressor protein does not bind to the operator, and the genes are transcribed. When tryptophan is plentiful, tryptophan binds the repressor protein at the operator sequence. This physically blocks the RNA polymerase from transcribing the tryptophan biosynthesis genes.

The Lac Operon: An Inducible Operon

The lac operon is an example of an inducible operon that is also subject to activation in the absence of glucose. The lac operon encodes three structural genes, lacZ, lacY, and lacA, necessary to acquire and process the disaccharide lactose from the environment (Fig 13.3A).

Figure 13.3 Biological Activity of the lac Operon. (A) Schematic representation of the lac operon in E. coli. The lac operon has three structural genes, lacZ, lacY, and lacA that encode for β-galactosidase, permease, and galactoside acetyltransferase, respectively. The promoter (p) and operator (o) sequences that control the expression of the operon are shown. Upstream of the lac operon is the lac repressor gene, lacI, controlled by the lacI promoter (p). (B) Shows the lac repressor inhibition of the lac operon gene expression in the absence of lactose. The lac repressor binds with the operator sequence of the operon and prevents the RNA polymerase enzyme which is bound to the promoter (p) from initiating transcription. (C) In the presence of lactose, some of the lactose is converted into allolactose, which binds and inhibits the activity of the lac repressor. The lac repressor-allolactose complex cannot bind with the operator region of the operon, freeing the RNA polymerase and causing the initiation of transcription. Expression of the lac operon genes enables the breakdown and utilization of lactose as a food source within the organism.

The lacZ gene encodes the β-galactosidase (β-gal) enzyme responsible for the hydrolysis of lactose into simple sugars glucose and galactose (Fig. 13.4A). The β-gal enzyme can also mediate the breakdown of the alternate substrate 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (Xgal) (Fig. 13.4B). The breakdown product, 5-bromo-4-chloro-3-hydroxyindole – 1, spontaneously dimerizes to form the intensely blue blue product, 5,5′-dibromo-4,4′-dichloro-indigo – 2. Thus, Xgal has been a valuable research tool, not only in the study of the enzymatic activity of β-gal, but also in the development of the commonly used blue-white DNA cloning system that utilizes the β-gal enzyme as a marker in molecular cloning experiments.

The lac operon contains two more genes, in addition to lacZ (Fig. 13.3A). The lacY gene encodes a permease that increases the uptake of lactose into the cell and lacA encodes a galactoside acetyltransferase (GAT) enzyme. The exact function of GAT during lactose metabolism has not been conclusively elucidated but acetylation is thought to play a role in the transport of the modified sugars.

Figure 13.4 Reactions Controlled by the Expression of the Lac Operon. (A) Expression of the β-galactosidase enzyme enables the breakdown of lactose into the simple sugars, glucose and galactose for E. coli to use as a food resource. (B) The β-galactosidase enzyme also mediates the breakdown of the non-native substrate 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (Xgal). Breakdown product (1) 5-bromo-4-chloro-3-hydroxyindole quickly dimerizes into the intensely blue product (2) 5,5′-dibromo-4,4′-dichloro-indigo making it a useful tool for molecular biology. (C) β-D-1-thiogalactopyranoside (IPTG) can serve as a non-native inducer of the lac operon. It mimics the structure of lactose and binds with the Lac Repressor.

For the lac operon to be expressed, lactose must be present. This makes sense for the cell because it would be energetically wasteful to create the enzymes to process lactose if lactose was not available.

In the absence of lactose, the lacI gene is constituitively expressed, expressing the lac repressor protein (Fig. 13.3 B). The lac repressor binds with an operator region of the lac operon and physically prevents RNA polymerase from transcribing the structural genes (Fig. 13.3 B). However, when lactose is present, the lactose inside the cell is converted to allolactose. Allolactose serves as an inducer molecule, binding to the repressor and changing its shape so that it is no longer able to bind to the operator DNA (Fig. 13.3 C). Removal of the repressor in the presence of lactose allows RNA polymerase to move through the operator region and begin transcription of the lac structural genes. In addition to lactose, laboratory experiments have revealed that the non-natural compound Isopropyl β-D-1-thiogalactopyranoside (IPTG) can also bind with the lac repressor and cause the expression of lac operon (Figure 13.4 C). Similar to Xgal, this compound has also been used as a research tool for molecular cloning.

The Lac Operon: Activation by Catabolite Activator Protein

Bacteria typically have the ability to use a variety of substrates as carbon sources. However, because glucose is usually preferable to other substrates, bacteria have mechanisms to ensure that alternative substrates are only used when glucose has been depleted. Additionally, bacteria have mechanisms to ensure that the genes encoding enzymes for using alternative substrates are expressed only when the alternative substrate is available. In the 1940s, Jacques Monod was the first to demonstrate the preference for certain substrates over others through his studies of E. coli’s growth when cultured in the presence of two different substrates simultaneously. Such studies generated diauxic growth curves, like the one shown in Figure 13.5. Although the preferred substrate glucose is used first, E. coli grows quickly and the enzymes for lactose metabolism are absent. However, once glucose levels are depleted, growth rates slow, inducing the expression of the enzymes needed for the metabolism of the second substrate, lactose. Notice how the growth rate in lactose is slower, as indicated by the lower steepness of the growth curve.

Figure 13.5. Utilization of Glucose in E. Coli.When grown in the presence of two substrates, E. coli uses the preferred substrate (in this case glucose) until it is depleted. Then, enzymes needed for the metabolism of the second substrate are expressed and growth resumes, although at a slower rate.

The ability to switch from glucose use to another substrate like lactose is a consequence of the activity of an enzyme called Enzyme IIA (EIIA). When glucose levels drop, cells produce less ATP from catabolism and EIIA becomes phosphorylated. Phosphorylated EIIA activates adenylyl cyclase, an enzyme that converts some of the remaining ATP to cyclic AMP (cAMP), a cyclic derivative of AMP and important signaling molecule involved in glucose and energy metabolism in E. coli (Fig. 13.6). As a result, cAMP levels begin to rise in the cell. This is an indicator to the cell, that overall energy levels are low and that ATP is being depleted.

Figure 13.6. Conversion of ATP to cAMP. When ATP levels decrease due to depletion of glucose, some remaining ATP is converted to cAMP by adenylyl cyclase. Thus, increased cAMP levels signal glucose depletion.

The lac operon also plays a role in this switch from using glucose to using lactose. When glucose is scarce, the accumulating cAMP caused by increased adenylyl cyclase activity binds to catabolite activator protein (CAP), also known as cAMP receptor protein (CRP). The complex binds to the promoter region of the lac operon (Figure 13.7). In the regulatory regions of these operons, a CAP binding site is located upstream of the RNA polymerase binding site in the promoter. Binding of the CAP-cAMP complex to this site increases the binding ability of RNA polymerase to the promoter region to initiate the transcription of the structural genes. Thus, in the case of the lac operon, for transcription to occur, lactose must be present (removing the lac repressor protein) and glucose levels must be depleted (allowing binding of an activating protein). When glucose levels are high, there is catabolite repression of operons encoding enzymes for the metabolism of alternative substrates. Because of low cAMP levels under these conditions, there is an insufficient amount of the CAP-cAMP complex to activate transcription of these operons.

Figure 13.7 Effect of CAP on the Lac Operon. (a) In the presence of cAMP, CAP binds to the promoters of operons, like the lac operon, that encode genes for enzymes for the use of alternate substrates. (b) For the lac operon to be expressed, there must be activation by cAMP-CAP as well as removal of the lac repressor from the operator.

Global Responses of Prokaryotes

In prokaryotes, there are also several higher levels of gene regulation that have the ability to control the transcription of many related operons simultaneously in response to an environmental signal. A group of operons all controlled simultaneously is called a regulon.


When sensing impending stress, prokaryotes alter the expression of a wide variety of operons to respond in coordination. They do this through the production of alarmones, which are small intracellular nucleotide derivatives, such as guanosine pentaphosphate (pppGpp) (Fig. 13.8).

Figure 13.8 Structure of Guanosine Pentaphosphate (pppGpp)

Alarmones change which genes are expressed and stimulate the expression of specific stress-response genes. For example, pppGpp signaling is involved in the stringent response in bacteria, causing the inhibition of RNA synthesis when there is a shortage of amino acids present. This causes translation to decrease and the amino acids present are therefore conserved. Furthermore, pppGpp causes the up-regulation of many other genes involved in stress response such as the genes for amino acid uptake (from surrounding media) and biosynthesis.

The use of alarmones to alter gene expression in response to stress appears to be important in pathogenic bacteria, as well. On encountering host defense mechanisms and other harsh conditions during infection, many operons encoding virulence genes are upregulated in response to alarmone signaling. Knowledge of these responses is key to being able to fully understand the infection process of many pathogens and to the development of therapies to counter this process.

Quorum Sensing

Quorum sensing (QS) is an intercellular communication mechanism of bacteria used to coordinate the activities of individual cells in population level in response to surroundings through production and perception of diffusible signal molecules such as Acyl Homoserine Lactones or small singaling peptides (Fig. 13.9). The signal synthase, signal receptor, and signal molecules are three essential elements of the basic QS circuit machinery (Fig. 13.9). Genes encoding signal generating proteins are also included among the QS target genes. This forms an autoinduction feedback loop to modulate generation of signal molecules. Several bacterial behaviors including virulence factors expression, secondary metabolites production, biofilm formation, motility, and luminescence are regulated by QS. Through complex regulatory networks bacteria are capable of expressing corresponding genes according to their own population size and of behaving in a coordinated manner.

Figure 13.9 Examples of Quorum Sensing Pathways. (Left panel) Typical Gram-negative quorum sensing mechanism. Acyl homoserine lactone molecules, synthesized by LuxI, passively pass the bacterial cell membrane and when a sufficient concentration is reached (threshold level) activate the intracellular LuxR which subsequently activates target gene expression in a coordinated way. Note that a single cell is shown for simplicity. However, acyl homoserine lactones will commonly diffuse and target neighboring cells within the colony to mediate a communal or population response within the bacterial colony. (Right panel) Quorum sensing peptides are synthesized by the bacterial ribosomes as pro-peptidic proteins and undergo posttranslational modifications during excretion by active transport. The quorum sensing peptides bind membrane associated receptors which get autophosphorylated and activate intracellular response regulators via phosphor-transfer. These phosphorylated response regulators induce increased target gene expression.

For example, some microbial species, such as Staphylococcus aureus, can encase their community within a self-produced matrix of hydrated extracellular polymeric substances that include polysaccharides, proteins, nucleic acids, and lipid molecules. These encasements are known as biofilms. The formation of the biofilm on solid surfaces is a step-wise process comprising several stages (Fig. 13.10). It starts with the conditioning of the surface through the coating with macromolecules from the aqueous surrounding, which enables initial reversible adhesion of microorganisms. The next step is a formation of stronger, irreversible attachments to the surface, followed by the proliferation and aggregation of microorganisms into multicellular and multilayered clusters, which actively produce extracellular matrix. Some cells in the mature biofilms continuously detach and separate from the aggregates, representing a continuous source of planktonic bacteria that can subsequently spread and form new microcolonies.

Figure 13.10 Schematic drawing of biofilm formation.

Biofilms are a common cause of chronic, nosocomial and medical device-related infections, due to the fact that they can develop either on vital or necrotic tissue as well as on the inert surfaces of different implanted materials. Moreover, biofilms are linked with high-level resistance to antimicrobials, frequent treatment failures, increased morbidity and mortality. As a consequence, biofilm infections and accompanying diseases have become a major health concern and a serious challenge for both modern medicine and pharmacy. The rough estimation shows that more than 60% of hospital-associated infections are attributable to the biofilms formed on indwelling medical devices, which result in more than one million cases of infected patients annually and more than $1 billion of hospitalization costs per year in the USA.

Biofilm infections share some common characteristics: slow development in one or more hot-spots, delayed clinical manifestation, persistency for months or years, usually with interchanging periods of acute exacerbations and absence of clinical symptoms. Even though they are less aggressive than acute infections, their treatment is challenging to a greater extent. The main reason for the aforesaid is up to 1000-fold decrease in susceptibility of biofilms to antimicrobial agents and disinfectants as well as resistance to host immune response. Thus, ways to reduce or inhibit biofilm formation are highly sought. The majority of the proposed biofilm-control methods focuses on: (i) prevention and minimization of biofilm formation by selection and surface modifications of anti-adhesive materials (ii) debridement techniques including ultrasound and surgical procedures (iii) disruption of biofilm QS-signaling system or (iv) achieving proper drug penetration and delivery to formed biofilms by the use of electromagnetic field, ultrasound waves, photodynamic activation or specific drug delivery systems.

Alternate σ Factors

Since the σ subunit of bacterial RNA polymerase confers specificity as to which promoters should be transcribed, altering the σ factor used is another way for bacteria to quickly and globally change what regulons are transcribed at a given time. The σ factor recognizes sequences within a bacterial promoter, so different σ factors will each recognize slightly different promoter sequences. In this way, when the cell senses specific environmental conditions, it may respond by changing which σ factor it expresses, degrading the old one and producing a new one to transcribe the operons encoding genes whose products will be useful under the new environmental condition. For example, in sporulating bacteria of the genera Bacillus and Clostridium (which include many pathogens), a group of σ factors controls the expression of the many genes needed for sporulation in response to sporulation-stimulating signals.

Prokaryotic Attenuation and Riboswitches

Although most gene expression is regulated at the level of transcription initiation in prokaryotes, there are also mechanisms to control both the completion of transcription, as well as translation, concurrently. Since their discovery, these mechanisms have been shown to control the completion of transcription and translation of many prokaryotic operons. Because these mechanisms link the regulation of transcription and translation directly, they are specific to prokaryotes, because these processes are physically separated in eukaryotes.

One such regulatory system is attenuation, whereby secondary stem-loop structures formed within the 5’ end of an mRNA being transcribed determine if transcription to complete the synthesis of this mRNA will occur and if this mRNA will be used for translation. Beyond the transcriptional repression mechanism already discussed, attenuation also controls expression of the trp operon in E. coli (Fig. 13.11). The trp operon regulatory region contains a leader sequence called trpL between the operator and the first structural gene, which has four stretches of RNA that can base pair with each other in different combinations. When a terminator stem-loop forms, transcription terminates, releasing RNA polymerase from the mRNA. However, when an antiterminator stem-loop forms, this prevents the formation of the terminator stem-loop, so RNA polymerase can transcribe the structural genes.

Figure 13.11. Attenuation of Transcription and Translation. When tryptophan is plentiful, translation of the short leader peptide encoded by trpL proceeds, the terminator loop between regions 3 and 4 forms, and transcription terminates. When tryptophan levels are depleted, translation of the short leader peptide stalls at region 1, allowing regions 2 and 3 to form an antiterminator loop, and RNA polymerase can transcribe the structural genes of the trp operon.

A related mechanism of concurrent regulation of transcription and translation in prokaryotes is the use of a riboswitch, a small region of noncoding RNA found within the 5’ end of some prokaryotic mRNA molecules (Figure 13.12). A riboswitch may bind to a small intracellular molecule to stabilize certain secondary structures of the mRNA molecule. The binding of the small molecule determines which stem-loop structure forms, thus influencing the completion of mRNA synthesis and protein synthesis.

Figure 13.12. Riboswitch Form and Function. Riboswitches found within prokaryotic mRNA molecules can bind to small intracellular molecules, stabilizing certain RNA structures, influencing either the completion of the synthesis of the mRNA molecule itself (left) or the protein made using that mRNA (right).

13.2 Eukaryotic Gene Regulation

As seen in Chapter 10, the initiation of transcription requires the assembly of a multitude of transcription factors (TF) localized at the promoter region. Transcription can also utilize far reaching interactions of enhancers, that bind at a distant DNA site and loop back around to stabilize the RNA polymerase at the promoter. Control of transcriptional initiation is dependent on TF factor activation, TF binding with specific DNA recognition sequences, and chromatin remodeling.

Transcription Factor (TF) Activation

Many TF are expressed within cells and held in an inactive conformation until the right environmental stimulus is present within the cell. Cellular signaling pathways can cause post-translational protein modifications leading to TF activation or small molecules may physically bind and allosterically modify the protein structure to mediate activation. Here we will use examples from the cell cycle signaling cascade and steroid hormone receptor pathways to highlight some mechanisms of TF activation. A key element to take away from this section is that transcription factor activation is often highly pleiotropic and has many cellular affects. Depending on the cell type and the environmental conditions, different combinations of downstream target genes may be activated or inactivated. Teasing apart these intricacies and the physiological effects that they have within an organism is a major goal of ongoing research.

Cell Cycle Regulation by p53

p53 is one of the most studied proteins in science. To date, over 68,000 papers appear in PubMed containing p53 or TP53 in the title and/or abstract. Originally described as an oncogene (since a mutated, functionally altered form of the protein was first characterized), p53 is now recognized as the most frequently inactivated tumor suppressors in human cancers. It is a transcription factor that controls the expression of genes and miRNAs affecting many important cellular processes including proliferation, DNA repair, programmed cell death (apoptosis), autophagy, metabolism, and cell migration (Fig. 13.13). Many of those processes are critical to a variety of human pathologies and conditions extending beyond cancer, including ischemia, neurodegenerative diseases, stem cell renewal, aging, and fertility. Notably, p53 also has non-transcriptional functions, ranging from intrinsic nuclease activity to activation of mitochondrial Bak (Bcl-2 homologous antagonist killer) and caspase-independent apoptosis.

As a transcription factor, p53 responds to various genotoxic insults and cellular stresses (e.g., DNA damage or oncogene activation) by inducing or repressing the expression of over a hundred different genes. p53 transcriptional regulation plays a dominant role in causing the arrest of damaged cells, facilitating their repair and survival, or inducing cell death when DNA is damaged irreparably. p53 can also cause cells to become permanently growth arrested, and there is compelling in vivo evidence that these “senescent” cells secrete factors that enhance their clearance by the immune system, leading to tumor regression. Through these mechanisms, p53 helps maintain genomic stability within an organism, justifying its long-held nickname “guardian of the genome”. Other p53 gene targets are involved in inhibiting tumor cell angiogenesis, migration, metastasis and other important processes (such as metabolic reprogramming) that normally promote tumor formation and progression

Figure 13.13. Cellular stress leads to p53 transcriptional activation of downstream targets. Normally, p53 levels are kept low by its major antagonist, Mdm2, an E3 ubiquitin ligase that is itself a transcriptional target of p53. Stress signals, such as DNA damage, oncogene activation and hypoxia, promote p53 stability and activity by inducing post-translational modifications (PTMs) and tetramerization of p53. p53 functions as a transcription factor that binds to specific p53 response elements upstream of its target genes. p53 affects many important cellular processes linked to tumor suppression, including the induction (green) of senescence, apoptosis, and DNA repair as well as inhibition (red) of metabolism, angiogenesis, and cell migration. These functions are largely mediated through transcriptional regulation of its targets (examples given).

p53 protein function is regulated post-translationally by coordinated interaction with signaling proteins including protein kinases, acetyltransferases, methyl-transferses, and ubiquitin-like modifying enzymes (Figure 13.14). The majority of the sites of covalent modification occur at intrinsically unstructured linear peptide docking motifs that flank the DNA-binding domain of p53 which play a role in anchoring or in allosterically activating the enzymes that mediate covalent modification of p53. In undamaged cells, p53 protein has a relatively short half-life and is degraded by a ubiquitin-proteasome dependent pathway through the action of E3 ubiquitin ligases, such as MDM2 (Fig 13.13). Following stress, p53 is phosphorylated at multiple residues, thereby modifying its biochemical functions required for increased activity as a transcription factor. Post-translational modifications help to stabilize the tetramer formation of the protein and enhance the translocation of the protein from the cytoplasm into the nucleus. The tetrameric form of p53 is then functional to bind to DNA in a sequence-specific manner and either activate or repress transcription, depending on the target sequence. Some post-translational modifications, such as acetylation, are DNA-dependent and can play a role in chromatin remodeling and activation of p53 target gene expression.

Figure 13.14 Sites of Post-Translational Modification on p53. Schematic representation of the 393 amino acid domain structure of human p53 showing the sites of post-translational modification including phosphorylation, acetylation, ubiquitination, methylation, neddylation, and sumoylation. Abbreviations: N-terminal transactivation domain (TAD) proline-rich domain (PRD) tetramerisation domain (TET) C-terminal regulatory domain (REG) arginine (R) lysine (K) serine (S) threonine (T).

It should be noted that single point mutations that modify the ability of the protein to be phosphorylated in one position, typically do not show a decrease in the stabilization or activation of the protein following a damage or stress event. Thus, multiple modifications likely allow for redundancy within this pathway and ensure the activation of the protein following a stress event. Furthermore, the environment within the cell can lead to different p53 phenotypes, such as the activation of growth arrest and DNA repair processes (ie if there is not a lot of damage) or it can lead to the activation of apoptosis or programmed cell death pathways (ie if damage is too extensive to be repaired).

Steroid Hormone Receptors

Steroid hormone receptors (SHRs) belong to the superfamily of nuclear receptors (NRs),which are one of the essential classes of transcriptional factors. NRs play a critical role in all aspects of human development, metabolism and physiology. Since they generally act as ligand-activated transcription factors, they are an essential component of cell signaling. NRs form an ancient and conserved family that arose early in the metazoan lineage. NR molecular evolution is characterized by major events of gene duplication and gene losses. Phylogenetic analysis revealed a distinct separation of NR ligand binding domains (LBDs) into 4 monophyletic branches, the steroid hormone receptor-like cluster, the thyroid hormone-like receptors cluster, the retinoid X-like and steroidogenic factor-like receptor cluster and the nerve growth factor-like/HNF4 receptor cluster (Fig. 13.15).

Figure 13.15 Phylogenetic tree of the nuclear receptors’ ligand binding domain. Four distinct monophyletic branches are visible. Those monophyletic branches are divided into subcategories. The phylogenetic trees confidently separate the steroid hormone-like (branch colored green), the retinoid X-like and steroidogenic factor-like receptors cluster (branch colored orange), the thyroid hormone-like receptors cluster (branch colored blue) and the nerve growth factor-like/hepatocyte nuclear factor-4 receptors cluster (branch colored yellow).

Here we will focus on the Steroid Hormone-Like Receptors branch (SHRs). SHRs plays a key role in many important physiological processes like organ development, metabolite homeostasis, and response to external stimuli. The estrogen receptor comes in two major forms, ERα and ERβ. Other members of this subgroup include the cortisol binding glucocorticoid receptor (GR), the aldosterone binding mineralocorticoid receptor (MR), the progesterone receptor (PR), and the dihydrotestosterone (DHT) binding androgen receptor (AR) (Fig. 13.16).

Figure 13.16 Overview of Steroid Hormone Receptor Family (SHR). A. Phylogenetic tree of the Steroid Hormone Receptor (SHR) family showing the evolutionary interrelationships and distance between the various receptors. Based on alignments available at The NucleaRDB [Horn et al., 2001]. B. All steroid receptors are composed of a variable N-terminal domain (A/B) containing the AF-1 transactivation region, a highly conserved DNA Binding Domain (DBD), a flexible hinge region (D), and a C-terminal Ligand Binding Domain (LBD, E) containing the AF-2 transactivation region. The estrogen receptor α is unique in that it contains an additional C-terminal F domain. Numbers represent the length of the receptor in amino acids.

The members of the Steroid Hormone Receptor family share a similar, modular architecture, consisting of a number of independent functional domains (Fig. 13.16B). Most conserved is the centrally located DNA binding domain (DBD) containing the characteristic zinc-finger motifs. The DBD is followed by a flexible hinge region and a moderately conserved Ligand Binding Domain (LBD), located at the carboxy-terminal end of the receptor. The estrogen receptor α is unique in that it contains an additional F domain of which the exact function is unclear. The LBD is composed of twelve α-helices (H1-H12) that together fold into a canonical α-helical sandwich. Besides its ligand binding capability, the LBD also plays an important role in nuclear translocation, chaperone binding, receptor dimerization, and coregulator recruitment through its potent ligand-dependent transactivation domain, referred to as AF-2. A second, ligand independent, transactivation domain is located in the more variable N-terminal part of the receptor, designated as AF-1. To date, no crystal structure of a full-length SHR exists, though structures of the DBD and LBD regions of most SHRs are available. These have helped significantly in understanding the molecular aspects of DNA and ligand binding, but have to some extent also led to biased attention to these parts of the receptor only. For example, many coregulator interaction studies are still performed with the LBD only, while numerous studies have demonstrated that the AF-2 domain often tells only part of the story. With the help of biophysical techniques, however, it is feasible to study the full-length receptor in its native environment (Figure 13.16).

Most SHRs remain in the cytoplasm of the cell until they are bound with the appropriate steroid (Fig 13.17). Steroid binding causes the dimerization of SHRs and localization to the cell nucleus, where the SHRs interact with the DNA at sequence specific motifs known as Hormone Response Elements (HREs) (Fig. 13.17, Step 5). Many SHRs can also interact with membrane-bound receptors and affect cellular signaling pathways, in addition to the activation of gene expression (Fig. 13.17, step 6).

Figure 13.17 Steroid Hormone Receptors (SHR) act as hormone dependent nuclear transcription factors. Upon entering the cell by passive diffusion, the hormone (H) binds the receptor, which is subsequently released from heat shock proteins, and translocates to the nucleus. There, the receptor dimerizes, binds specific sequences in the DNA, called Hormone Responsive Elements or HREs, and recruits a number of coregulators that facilitate gene transcription.

Steroid Hormones, such as the estrogens, reach their target cells via the blood, where they are bound to carrier proteins. Naturally occurring estrogens include estradiol, estrone, estriol, and estretrol and differ primarily in structure on the presence of hydroxyl-groups (Fig. 13.18). Estradiol is the predominant estrogen during reproductive years both in terms of absolute serum levels as well as in terms of estrogenic activity. During menopause, estrone is the predominant circulating estrogen and during pregnancy estriol is the predominant circulating estrogen in terms of serum levels. Another type of estrogen called estetrol (E4) is produced also produced predominantly during pregnancy (Fig 13.18). Estrogens function in many physiological processes, including the regulation of the menstrual cycle and reproduction, maintaining bone density, brain function, cholesterol mobilization, maturation of reproductive organs during development, and they play a role in controlling inflammation.

Figure 13.18 Naturally Occurring Estrogens.

Because of their lipophilic nature it is thought that steroid hormones, such as estrogen, pass the cell membrane by simple diffusion, although some evidence exists that they can also be actively taken up by endocytosis of carrier protein bound hormones. For a long time it has been assumed that binding of the ligand resulted in a simple on/off switch of the receptor (Fig. 13.17, step 1). While this is likely the case for typical agonists like estrogen and progesterone, this is not always correct for receptor antagonists, used in drug therapy. These antagonists come in two kinds, so-called partial antagonists (for the estrogen receptors known as SERMs for Selective Estrogen Receptor Modulators) and full antagonists. The partial antagonist can, depending on cell type, act as a SHR agonist or antagonist. In contrast, full antagonists (for ER known as SERDs for Selective Estrogen Receptor Downregulators) always inhibit the receptor, independent of cell type, in part by targeting the receptor for degradation. Binding of either type of antagonist results in major conformational changes within the LBD and in release from heat shock proteins that thus far had protected the unliganded receptor from unfolding and aggregation (Fig. 13.17 step 2).

Trancription Factor (TF) Recognition and Binding to DNA

TF control gene expression by binding to their target DNA site to recruit, or block, the transcription machinery onto the promoter region of the gene of interest. Their function relies on the ability to find their target site quickly and selectively. In living cells TFs are present in nM concentrations and bind the target site with comparable affinity, but they also bind any DNA sequence (nonspecific binding), resulting in millions of low affinity (i.e., >10 −6 M) competing sites. Nonspecific binding facilitates the search for the target site by three major mechanisms (Fig. 13.19). One of the main scenarios involves a ‘sliding’ mechanism, in which the protein moves from its initial non-specific site to its actual target site by sliding along the DNA (also known as 1-dimensional (1D) sliding) (Fig. 13.19). When the TF starts to move and shift counterions from the phosphate backbone, the same number of counterions binds to the site left free by the protein. The sliding rate is also dependent on the hydrodynamic radius of the protein the required rotational movement over the DNA backbone is greater for larger proteins, that tend to slide slowly. The second scenario is a ‘hopping’ mechanism, in which a TF might hop from one site to another in 3D space by dissociating from its original site and subsequently binding to the new site. This may happen within the same chain and re-association occurs adjacent to the former dissociated site. A third search mechanism is described as ‘intersegmental transfer’. In this scenario, the protein moves between two sites via an intermediate ‘loop’ formed by the DNA and subsequently bind at two different DNA sites. This mechanism is applicable to TFs with two DNA-binding sites. Proteins with two DNA-binding sites can occasionally bind non-specifically to two locations situated far apart within the DNA strand, that are brought into close contact through the formation of these loops. Such TFs transfer across a point of close contact without dissociating from the DNA.

Figure 13.19 Protein-DNA recognition mechanisms. The main three protein-DNA recognition mechanisms are shown. When the transcription factor (pink ring) moves from one site to another by means of sliding along the DNA and is transferred from one base pair to another without dissociating from the DNA, this mechanism is called sliding (top). Hopping occurs when the transcription factor moves on the DNA by dissociating from one site and re-associating with another site (center). Intersegmental transfer describes the mechanism by which the transcription factor gets transferred through DNA bending or the formation of a DNA loop, resulting in the protein being bound transiently to both sides and subsequently moving from on site to the other (bottom).

Each eukaryotic TF controls tens to hundreds of genes scattered throughout the genome, and expressing each gene needs various TFs simultaneously binding to their sites to form the transcription complex, an extremely rare event in probabilistic terms. As result, the in vivo site occupancy patterns of eukaryotic TFs are more complex than predicted by their in vitro site-specific binding profiles and do not strongly correlate with the actual levels of gene expression. An interesting feature highlighted by genome analysis is an accumulation of potential TF binding sites in regions flanking eukaryotic genes. Such clusters of degenerate recognition sites are assumed to be key for transcription control, and thus are generally classified as gene regulatory regions (RR). For example, the affinity of the Drosophila TF Engrailed to the RRs of its target genes is strongly amplified by long tracts of degenerate consensus repeats that are present in such regions.

Histone Modification and Chromatin Remodeling

Regulation of transcription involves dynamic rearrangements of chromatin structure. Recall that eukaryotic DNA is complexed with histone octamers, which are composed of dimers of the core histones H2A, H2B, H3 and H4. 147 bp of DNA are wrapped 1.65 times around each octamer forming nucleosomes, the basic packaging units of chromatin. Nucleosomes, connected by linker DNA of variable length as “beads on a string”, generate the 11 nm linear structure. The linker histone H1 is positioned at the top of the core histone octamer and enables higher organized compaction of DNA into transcriptionally inactive 30 nm fibres.

To understand the role of chromatin for regulation of transcription it is important to know where nucleosomes are positioned and how positioning is achieved. Basically there are four groups of activities which change chromatin structure during transcription: (1) histone modifications, (2) eviction and repositioning of histones, (3) chromatin remodeling and (4) histone variant exchange. Histone modifiers introduce post-translational, covalent modifications to histone tails and thereby change the contact between DNA and histones. These modifications govern access of regulatory factors. Histone chaperones aid eviction and positioning of histones. A third class of chromatin restructuring factors are ATP dependent chromatin remodelers. These multi-subunit complexes utilize energy from ATP hydrolysis for various chromatin remodeling activities including nucleosome sliding, nucleosome displacement and the incorporation and exchange of histone variants.

Post-translational modifications (PTMs) of histone proteins is a primary mechanism that controls chromatin architencture. Over 20 distinct types of histone PTMs have been described, among which the most abundant ones are acetylation and methylation of lysine residues. Histone PTMs can be deposited on and removed from chromatin by different enzymes, known as histone PTM ‘writers’ and ‘erasers’. Histone PTMs exert their regulatory effects via two main mechanisms. First, histone PTMs serve as docking sites for various nuclear proteins––histone PTM ‘readers’––that specifically recognize modified histone residues through their modification-binding domains. Recruitment of these proteins at specific genomic loci promotes key chromatin processes, such as transcriptional regulation and DNA damage repair. Second, some histone PTMs, such as acetylation, directly affect chromatin higher-order structure and compaction, thereby controlling chromatin accessibility to protein machineries such as those involved in transcriptiion. Chromatin may adopt one of two major states in an interchangeable manner. These states are heterochromatin and euchromatin. Heterochromatin is a compact form that is resistant to the binding of various proteins, such as transcriptional machinery. In contrast, euchromatin is a relaxed form of chromatin that is open to modifications and transcriptional processes (Fig. 13.20). Histone methylation promotes the formation of Heterochromatin whereas, histone acetylation promotes euchromatin.

Figure 13.20 Schematic drawing of histone methylation and acetylation in relation to chromatin remodeling. Addition of methyl groups to the tails of histone core proteins leads to histone methylation, which in turn leads to the adoption of a condensed state of chromatin called ‘heterochromatin.’ Heterochromatin blocks transcription machinery from binding to DNA and results in transcriptional repression. The addition of acetyl groups to lysine residues in the N-terminal tails of histones causes histone acetylation, which leads to the adoption of a relaxed state of chromatin called ‘euchromatin.’ In this state, transcription factors and other proteins can bind to their DNA binding sites and proceed with active transcription.

Chromatin remodeling can also be an ATP-dependent process and involve histone dimer ejection, full nucleosome ejection, nucleosome sliding, and histone variant exchange (Fig 13.21). ATP-dependent chr omatin remodeling complexes bind to nucleosome cores and the surrounding DNA, and, using energy from A TP hydrolysis, they disrupt the DNA-histone interactions, slide or eject nucleosomes, alter nucleosome structures, and modulate the access of transcription factors to the DNA (Figure 13.21 ). In addition to modulating gene expression, some of the complexes are involved in nucleosome assembly and organization, following transcription at locations in which nucleosomes have been ejected, packing of DNA, following replication and DNA repair.

Figure 13.21 Overview of the functions of ATP-dependent chromatin remodeling complexes. (a) A subset of ISWI and CHD complexes are involved in nucleosome assembly, maturation, and spacing. (b) SWI/SNF complexes are primarily involved in histone dimer ejection, nucleosome ejection, and nucleosome repositioning through sliding, thus modulating chromatin access. (c) INO80 complexes are involved in histone exchange. It should be noted that the complexes might be involved in other chromatin remodeling functions.

Another level of chromatin regulation is accomplished by a dynamic exchange of canonical histones with specific histone variants. Histone variants are non-allelic isoforms of canonical histones that differ in their primary sequence and functional properties. For example, the histone variant H3.3 has been found to progressively accumulate in various mouse somatic tissues with age, resulting in near complete replacement of the canonical H3.1/2iso-forms by the age of 18 months. Deletion of H3.3 in mice is lethal and in the fruit fly, Drosophila, causes sterility. Within the nematode, C. elegans, loss of H3.3 exhibit a significant ‘bagging’ phenotype which involves eggs hatching inside the animal body. Furthermore, in organisms that had deficient insulin signaling, loss of H3.3 caused a reduction in lifespan (although this phenotype is not observed in animals with a wildtype insulin signaling pathway) (Fig. 13.22). H3.3 also appears to acculumate with age in humans, and its accumulation is often absent in tumor cells. Overall, histone variant replacement is associated with changes in post translational modifications (such as methylation), and has multiple effects on overall chromosome structure.

Figure 13.22 The Effects of Histone Variant H3.3 on C. elegans Lifespan. H3.3 expression increases over time in C. elegans during their normal lifespan. In organisms with impaired Inulin/IGF-1 signaling, germline deficiency of H3.3 resulted in significant decreases in lifespan.

13.3 Protein-DNA Interactions

Proteins use a wide range of DNA-binding structural motifs, such as homeodomain (HD), helix-turn-helix (HTH), and high-mobility group box (HMG) to recognize DNA. HTH is the most common binding motif and can be found in several repressor and activator proteins (Fig. 13.23). Despite their structural diversity, these domains participate in a variety of functions that include acting as substrate interaction mediators, enzymes to operate DNA, and transcriptional regulators. Several proteins also contain flexible segments outside the DNA-binding domain to facilitate specific and non-specific interactions. For example, many HD proteins use N-terminal arms and a linker region to interact with DNA. The Encyclopedia of DNA Elements (ENCODE) data suggest that about 99.8% of putative binding motifs of TFs are not bound by their respective TFs in the genome. It is, therefore, clear that the presence of a single binding motif per TF is not adequate for TF binding.

Figure 13.23 Representative figures of the transcription factor binding domains. The figure shows the crystal structures of different types of TF domains (3l1p, 4m9e, 5d5v, 1lbg, 1gt0, and 1nkp). The structures were obtained from the Protein Data Bank (PDB) and redrawn using chimera. The respective domains and important regions have been labeled. HTH stands for helix-turn-helix domain. bHLH stands for basic helix-loop-helix motif. HD and HMG stand for homeodomain and high-mobility group box domain, respectively.

Most of the searching mechanism studies that try to determine how TFs find their binding sites are limited to naked DNA-protein complexes, which do not reflect the actual crowded environment of a cell. Studies with naked DNA and transcription factors have shown that many DNA-binding proteins travel a long distance by 1D diffusion. However, the search process for eukaryotes must occur in the presence of chromatin, which has the ability to hinder protein mobility. In this case, the protein must dissociate from the DNA, enter a 3D mode of diffusion state, and continue the target site searching process.

The sliding and intersegmental transfer mechanisms can be explained through the example of the lac repressor. The lac repressor contains 4 identical monomers (a dimer of dimers) for its DNA-binding. The binding sequence of these dimers is symmetric or pseudo-symmetric, and each half is identified by these identical monomers. The HTH domain of the lac repressor is the DNA-binding domain that facilitates the interaction with its target site on DNA (Fig. 13.24). As a result of a rapid search (sliding) along the DNA molecule and intersegmental transfer between distant DNA sequences, the lactose repressor finds its target sites faster than the diffusion limit. The section comprised between residues 1–46 of the HTH protein domain, characterized by three α-helices, maintains its secondary structure through specific and non-specific binding (Fig 13.24). When the repressor binds to a non-specific site, the HTH domain interacts with the DNA backbone and maintains the interaction with its helix region in the major groove juxtaposition. This arrangement facilitates the interaction of the recognition helix with the edges of the DNA bases, enabling the repressor to walk or search for its specific site on the DNA. The C-terminal residues of the DNA-binding domain, residues 47–62, form the hinge region, and are normally disordered during non-specific recognition however, during specific site recognition, residues 50–58 acquire an α-helix configuration (hinge helix) (Fig. 13.24). The disordered hinge region and the flexibility of the HTH domain allow the protein to move freely along the DNA to search for its target site. In specific binding complexes, the hinge helix of each monomer is located at the symmetrical center of the binding site, thereby causing the hinge helices to interact with each other (intersegmental transfer) to allow better stability. Moreover, DNA bends at the symmetrical center of the specific binding site (37° angle), thereby supporting monomer-monomer interactions (Fig 13.24).

Figure 13.24. The Helix-Turn-Helix Motif of the Lac Repressor. Lac repressor binds to DNA non-specifically, enabling it to slide rapidly along the DNA double helix until it encounters the lac operator sequence. The DNA-binding domain employs a helix-turn-helix (HTH) motif ( Alpha Helices , Turns ). During non-specific binding, the hinge region is disordered. The DNA double helix is depicted as straight in the model when the Lac Repressor binds non-specifically. Upon recognizing the specific operator sequence, the non-specific binding converts to specific binding . During this conversion, the hinge region changes from disordered loops to Alpha Helices , which bind to the minor groove of the DNA. As explained below, this binding stabilizes a kinked (“bent”) DNA double helix conformation.

In addition to the helix-turn-helix structure, the zinc finger motif is also very common, especially in eukaryotic TFs (Fig. 13.25). Proteins that contain zinc fingers (zinc finger proteins) are classified into several different structural families. Unlike many other clearly defined supersecondary structures such as Greek keys or β hairpins, there are a number of types of zinc fingers, each with a unique three-dimensional architecture. A particular zinc finger protein’s class is determined by this three-dimensional structure, but it can also be recognized based on the primary structure of the protein or the identity of the ligands coordinating the zinc ion. In spite of the large variety of these proteins, however, the vast majority typically function as interaction modules that bind DNA, RNA, proteins, or other small, useful molecules, and variations in structure serve primarily to alter the binding specificity of a particular protein. The most common type of zinc finger motif utilizes two Cys and two His residues (CCHH) coordinating the Zn(II) ion to adopt a ββα fold with three hydrophobic residues responsible for the formation of a small hydrophobic core which offers additional stabilization of the zinc finger domain (Fig. 13.25).

Figure 13.25 Sequence alignments of the CCHH zinc fingers and a representative structure. (a) Alignment of the TFIIIA-like zinc finger domains from different organisms. Green color denotes residues that are responsible for the hydrophobic core formation in most CCHH zinc fingers (L17, F11 and L2). Yellow and blue indicate the coordinating Cys and His residues, respectively. (b) The 3D NMR structure of 15-th ZF from zinc finger protein 478 [PDB: 2YRH].

Overall, zinc finger motifs display considerable versatility in binding modes, even between members of the same class (e.g., some bind DNA, others protein), suggesting that they are stable scaffolds that have evolved specialised functions. For example, zinc finger-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organization, epithelial development, cell adhesion, protein folding, chromatin remodeling, and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

The last binding domain that we will consider in detail here is the helix-loop-helix domains found in Leucine zipper-containing proteins. Specifically, bZIPs (Basic-region leucine zippers) are a class of eukaryotic transcription factors. The bZIP domain is 60 to 80 amino acids in length with a highly conserved DNA binding basic region and a more diversified leucine zipper dimerization region. The two regions form α-helical structures that are connected together via a looped region. This forms a core helix-loop-helix (HLH) structure within each monomer of the protein. Two monomers then join through the fomation of a leucine zipper junction forming a heterodimeric protein structure. The resulting heterodimer can bind with DNA in a sequence-specific manner through the basic α-helices (Fig. 13.26).

Specifically, basic residues, such as lysines and arginines, interact in the major groove of the DNA, forming sequence-specific interactions (Fig 13.26). Most bZIP proteins show high binding affinity for the ACGT motifs. The bZIP heterodimers exist in a variety of eukaryotes and are more common in organisms with higher evolution complexity .

Figure 13.26 Leucine Zipper Transcription Factors from the bZIP family. The monomer subunits of a heterodimeric bZIP protien contain a Helix-loop-Helix (HLH) core structure, where one helix forms the leucine zipper with the other monomer, and the basic helices of each monomer interact with the major groove of the target DNA. The helices are held together by a flexible loop region. (One monomer is shown in blue and one monomer is shown in green).

13.4 Epigenetics and Transgenerational Inheritence

Even though all somatic cells of a multicellular organism have the same genome, different cell types have different transcriptomes (set of all expressed RNA molecules), different proteomes (set of all proteins) and, hence, different functions. Cell differentiation during embryonic development requires the activation and repression of specific sets of genes by the action of cell lineage defining transcription factors. Within a cell lineage, gene activity states are often maintained over several rounds of cell divisions (a phenomenon called “cellular memory” or “cellular inheritance”). Since the rediscovery of epigenetics some 30 years ago (it was originally proposed by Conrad Hal Waddington in the early 1940s), cellular inheritance has been attributed to gene regulatory feedback loops, chromatin modifications (DNA methylation and histone modifications) as well as long-lived non-coding RNA molecules, which collectively are called the “epigenome”. Among the different chromatin modifications, DNA methylation and polycomb-mediated silencing are probably the most stable ones and endow genomes with the ability to impose silencing of transcription of specific sequences even in the presence of all of the factors required for their expression.

Defining Transgenerational Epigenetic Inheritance

The metastability of the epigenome explains why development is both plastic and canalized, as originally proposed by Waddington. Although epigenetics deals only with the cellular inheritance of chromatin and gene expression states, it has been proposed that epigenetic features could also be transmitted through the germline and persist in subsequent generations. The widespread interest in “transgenerational epigenetic inheritance” is nourished by the hope that epigenetic mechanisms might provide a basis for the inheritance of acquired traits. Yes, Lamarck has never been dead and every so often raises his head, this time with the help of epigenetics.

Although acquired traits concerning body or brain functions can be written down in the epigenome of a cell, they cannot easily be transmitted from one generation to the next. For this to occur, these epigenetic changes would have to manifest in the germ cells as well, which in mammals are separated from somatic cells by the so-called Weismann barrier. Further, the chromatin is extensively reshaped during germ cell differentiation as well as during the development of totipotent cells after fertilization, even though some loci appear to escape epigenetic reprogramming in the germline . Long-lived RNA molecules appear to be less affected by these barriers and therefore more likely to carry epigenetic information across generations , although the mechanisms are largely unsolved.

Evidence for Transgenerational Epigenetic Inheritance

In the past 10 years, numerous reports on transgenerational responses to environmental or metabolic factors in mice and rats have been published. The factors include endocrine disruptors, high fat diet, obesity, diabetes, undernourishment as well as trauma. These studies investigated DNA methylation, sperm RNA or both. For example, when male mice are made prediabetic by treatment with streptozotocin it affects the DNA methylation patterns in their resulting sperm, as well as the pancreatic islets of F1 and F2 of the resulting offspring. Furthermore, studies have shown that traumatic stress in early life altered behavioral and metabolic processes in the progeny and that injection of sperm RNAs from traumatized males into fertilized wild-type oocytes reproduced the alterations in the resulting offspring.

In humans, epidemiological studies have linked food supply in the grandparental generation to health outcomes in the grandchildren. An indirect study based on DNA methylation and polymorphism analyses has suggested that sporadic imprinting defects in Prader–Willi syndrome are due to the inheritance of a grandmaternal methylation imprint through the male germline. Because of the uniqueness of these human cohorts these findings still await independent replication. Most cases of segregation of abnormal DNA methylation patterns in families with rare diseases, however, turned out to be caused by an underlying genetic variant. Thus, it is important that studies of this nature rule out the effects of traditional genetic inheritence as being a factor of the observed phenotypes.

Genetic inheritance alone cannot fully explain why we resemble our parents. In addition to genes, we inherited from our parents the environment and culture, which in parts have been constructed by the previous generations (Fig. 13.27). A specific form of the environment is our mother’s womb, to which we were exposed during the first 9 months of our life. The maternal environment can have long-lasting effects on our health. In the Dutch hunger winter, for example, severe undernourishment affected pregnant women, their unborn offspring and the offspring’s fetal germ cells. The increased incidence of cardiovascular and metabolic disease observed in F1 adults, is not due to the transmission of epigenetic information through the maternal germline, but a direct consequence of the exposure in utero, a phenomenon called “fetal programming” or—if fetal germ cells and F2 offspring are affected—“intergenerational inheritance”.

Figure 13.27. Transgenerational inheritance systems. a Offspring inherit from their parents genes (black), the environment (green) and culture (blue). Genes and the environment affect the epigenome (magenta) and the phenotype 22 . Culture also affects the phenotype, but at present there is no evidence for a direct effect of culture on the epigenome (broken blue lines). It is a matter of debate, how much epigenetic information is inherited through the germline (broken magenta lines). G genetic variant, E epigenetic variant. b An epimutation (promoter methylation and silencing of gene B in this example) often results from aberrant read-through transcription from a mutant neighboring gene, either in sense orientation as shown here or in antisense orientation. The presence of such a secondary epimutation in several generations of a family mimics transgenerational epigenetic inheritance, although it in fact represents genetic inheritance. Black arrow, transcription black vertical bar, transcription termination signal broken arrow, read-through transcription


Although this term is also sometimes used interchangeably with exon, it is not the exact same thing: the exon is composed of the coding region as well as the 3' and 5' untranslated regions of the RNA, and so therefore, an exon would be partially made up of coding regions. The 3' and 5' untranslated regions of the RNA, which do not code for protein, are termed non-coding regions and are not discussed on this page. [4]

There is often confusion between coding regions and exomes and there is a clear distinction between these terms. While the exome refers to all exons within a genome, the coding region refers to a singular section of the DNA or RNA which specifically codes for a certain kind of protein.

In 1978, Walter Gilbert published "Why Genes in Pieces" which first began to explore the idea that the gene is a mosaic—that each full nucleic acid strand is not coded continuously but is interrupted by "silent" non-coding regions. This was the first indication that there needed to be a distinction between the parts of the genome that code for protein, now called coding regions, and those that do not. [5]

The evidence suggests that there is a general interdependence between base composition patterns and coding region availability. [7] The coding region is thought to contain a higher GC-content than non-coding regions. There is further research that discovered that the longer the coding strand, the higher the GC-content. Short coding strands are comparatively still GC-poor, similar to the low GC-content of the base composition translational stop codons like TAG, TAA, and TGA. [8]

GC-rich areas are also where the ratio point mutation type is altered slightly: there are more transitions, which are changes from purine to purine or pyrimidine to pyrimidine, compared to transversions, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a silent mutation (especially if they occur in the third nucleotide of a codon) which is usually beneficial to the organism during translation and protein formation. [9]

This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to mutation compared to accessory and non-essential regions (gene-poor). [10] However, it is still unclear whether this came about through neutral and random mutation or through a pattern of selection. [11] There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased. [12]

In DNA, the coding region is flanked by the promoter sequence on the 5' end of the template strand and the termination sequence on the 3' end. During transcription, the RNA Polymerase (RNAP) binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA nucleotides complementary to the coding region in order to form the mRNA, substituting uracil in place of thymine. [13] This continues until the RNAP reaches the termination sequence. [13]

After transcription and maturation, the mature mRNA formed encompasses multiple parts important for its eventual translation into protein. The coding region in an mRNA is flanked by the 5' untranslated region (5'-UTR) and 3' untranslated region (3'-UTR), [1] the 5' cap, and Poly-A tail. During translation, the ribosome facilitates the attachment of the tRNAs to the coding region, 3 nucleotides at a time (codons). [14] The tRNAs transfer their associated amino acids to the growing polypeptide chain, eventually forming the protein defined in the initial DNA coding region.

The coding region can be modified in order to regulate gene expression.

Alkylation is one form of regulation of the coding region. [16] The gene that would have been transcribed can be silenced by targeting a specific sequence. The bases in this sequence would be blocked using alkyl groups, which create the silencing effect. [17]

While the regulation of gene expression manages the abundance of RNA or protein made in a cell, the regulation of these mechanisms can be controlled by a regulatory sequence found before the open reading frame begins in a strand of DNA. The regulatory sequence will then determine the location and time that expression will occur for a protein coding region. [18]

RNA splicing ultimately determines what part of the sequence becomes translated and expressed, and this process involves cutting out introns and putting together exons. Where the RNA spliceosome cuts, however, is guided by the recognition of splice sites, in particular the 5' splicing site, which is one of the substrates for the first step in splicing. [19] The coding regions are within the exons, which become covalently joined together to form the mature messenger RNA.

Mutations in the coding region can have very diverse effects on the phenotype of the organism. While some mutations in this region of DNA/RNA can result in advantageous changes, others can be harmful and sometimes even lethal to an organism's survival. In contrast, changes in the coding region may not always result in detectable changes in phenotype.

Mutation types Edit

There are various forms of mutations that can occur in coding regions. One form is silent mutations, in which a change in nucleotides does not result in any change in amino acid after transcription and translation. [21] There also exist nonsense mutations, where base alterations in the coding region code for a premature stop codon, producing a shorter final protein. Point mutations, or single base pair changes in the coding region, that code for different amino acids during translation, are called missense mutations. Other types of mutations include frameshift mutations such as insertions or deletions. [21]

Formation Edit

Some forms of mutations are hereditary (germline mutations), or passed on from a parent to its offspring. [22] Such mutated coding regions are present in all cells within the organism. Other forms of mutations are acquired (somatic mutations) during an organisms lifetime, and may not be constant cell-to-cell. [22] These changes can be caused by mutagens, carcinogens, or other environmental agents (ex. UV). Acquired mutations can also be a result of copy-errors during DNA replication and are not passed down to offspring. Changes in the coding region can also be de novo (new) such changes are thought to occur shortly after fertilization, resulting in a mutation present in the offspring's DNA while being absent in both the sperm and egg cells. [22]

Prevention Edit

There exist multiple transcription and translation mechanisms to prevent lethality due to deleterious mutations in the coding region. Such measures include proofreading by some DNA Polymerases during replication, mismatch repair following replication, [23] and the 'Wobble Hypothesis' which describes the degeneracy of the third base within an mRNA codon. [24]

While it is well known that the genome of one individual can have extensive differences when compared to the genome of another, recent research has found that some coding regions are highly constrained, or resistant to mutation, between individuals of the same species. This is similar to the concept of interspecies constraint in conserved sequences. Researchers termed these highly constrained sequences constrained coding regions (CCRs), and have also discovered that such regions may be involved in high purifying selection. On average, there is approximately 1 protein-altering mutation every 7 coding bases, but some CCRs can have over 100 bases in sequence with no observed protein-altering mutations, some without even synonymous mutations. [25] These patterns of constraint between genomes may provide clues to the sources of rare developmental diseases or potentially even embryonic lethality. Clinically validated variants and de novo mutations in CCRs have been previously linked to disorders such as infantile epileptic encephalopathy, developmental delay and severe heart disease. [25]

While identification of open reading frames within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins. [26] Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of gene prediction, the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences.

In both prokaryotes and eukaryotes, gene overlapping occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions. [27] [28] For both DNA and RNA, pairwise alignments can detect overlapping coding regions, including short open reading frames in viruses, but would require a known coding strand to compare the potential overlapping coding strand with. [29] An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive. [30]

There are small parts between Genes in an Operon that does not encode for any amino acids. What is the purpose of these parts? - Biology

Article Summary:

Messenger ribonucleic acid or mRNA encodes for a protein production. mRNA is produced from a DNA template by a process known as transcription. This mRNA carries all the required codes required for the synthesis of protein to cytoplasm. Here in cytoplasm with the help of ribosome proteins are produced. Just like DNA, mRNA also contains genetic information in the sequence of nucleotides arranged into codons. Each codon consists of three bases, and they encode for a specific amino acid. Only the stop codon terminates the protein synthesis. This process required two types of RNA, transfer RNA for recognizing the codon and also provides the corresponding amino acid, and ribosomal RNA is the central component of the ribosome's protein synthesis process, which is also called as translation.

Structure of Messenger RNA - mRNA:-

Messenger RNA is a single stranded structure, with no base pairing. It contains bases such as adenine, guanine, cytosine and uracil. Since mRNA is transcribed from the DNA molecule, its sequences are complementary to that of DNA on which they are transcribed. Usually each gene transcribes its own mRNA therefore there may be 1000 to 10000 different types of mRNA may be present in a single cell.

The mRNA molecule has the following structural features:
1. Cap: It is present at the 5' end of the mRNA molecule in most of the eukaryotic cells
The rate of protein synthesis depends upon the presence of the cap. Without the cap mRNA molecules bind very poorly to the protein producing factory ribosome.

2. Noncoding region 1 (NC1). The cap is followed by a region of 10 to 100
nucleotides. This region is rich in adenine and uracil bases, and they do not code for any protein so the name noncoding region.

3. Initiation Codon: AUG is the initiation codon in both prokaryotes and eukaryotes.

4. The coding region: This consists of about 1,500 nucleotides on the average and
translates into a functional protein.

Difference between Prokaryotic and Eukaryotic mRNA:

1. The mRNA of many types of bacteria and bacteriophage are polygenic, that is a single mRNA is transcribed by the several structural genes of an operon. It also contains many sites for initiation and termination codons. That is a single mRNA can code for several different protein molecules.

Whereas all the known eukaryotic mRNA have got only one site for initiation and also termination of protein synthesis. Therefore eukaryotic mRNA is monocistronic in nature.
2. In most of the bacterial cells translation of the mRNA begins while the mRNA is still being transcribed from the DNA molecule.
Whereas in eukaryotes the mRNA produced from DNA template are first transported into the cytoplasm via nuclear pores, then it forms complexes with ribosome, then the protein are synthesized. Thus translation process begging only after transcription of mRNA is completed.

3. Life span of the prokaryotic mRNA is very short. mRNA molecules are constantly breakdown into its ribonucleotides by enzyme known as ribonucleases. In E.coli the average half life of mRNA is only about two minutes. That is at one end of the mRNA may be being degraded and on the other end translation may take place simultaneously. Short life span of mRNA enables prokaryotes to synthesize different proteins or enzymes in response to changes in the external environment.

Eukaryote mRNAs have much longer life span than bacterial mRNAs. That is eukaryotic mRNA are metabolically stable. For example mammalian reticulocytes synthesize protein even after hours or days after losing their nuclei.

4. In prokaryotes mRNA undergoes very little post transcriptional changes and also there is a very short time interval between transcription and translational process. For instance translation may occur simultaneously while transcription is going on at one end of mRNA molecule.
In eukaryotes the transcribed mRNA undergoes major post transcriptional modifications.

a. Polyadenylation at the 3' end of the mRNA. This poly adenyl chain helps in giving stability to the mRNA molecule.

b. Capping or formation of a cap at 5' end by condensation of guanylate residue

c. Transcribed mRNA present in the nucleus before post transcriptional modifications are called as heterogeneous mRNA. These heterogeneous mRNA consists of both introns and exon regions. Then by the help of slicing mechanism mature mRNA are produce which consists of only coding region. Therefore the mature mRNA is only a fraction in length of heterogeneous mRNA molecules.

These are some of the major differences between prokaryotic and eukaryotic mRNA molecules.

About Author / Additional Info:

Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.

Implications of the hypothesis

The evolution of known well established class-I RFs itself holds several unsolved puzzles. Since there is no strong evidence for an evolutionary relationship between bacterial class-I RFs and their counterparts from archaea and eukaryotes, it is unknown how termination was mediated in the last common ancestor. If there was an RNA-based factor similar to tRNAs, was it independently substituted with convergently evolved protein analogs after the kingdoms of life split? It is unknown why there are two class-I RFs in bacteria, while for most organisms from the other kingdoms one factor serves the purpose well. Even among bacteria themselves, there is a small group of Mycoplasma and Ureplasma species which have lost their RF2 genes (UGA was reassigned to encode Trp). These bacteria rely on a sole RF1 for recognition of their remaining stop codons. Yet these are obligatory pathogens with highly reduced genomes, and no free-living bacterium is known to lack either RF1 or RF2. Presumably, strong selective pressure preserves two class-I RFs in bacteria, although the benefits of having two factors with overlapping specificity are not apparent.

The hypothesis presented here of a third class-I RF does not simplify the situation. On the contrary, it makes it seem even more complicated. Nevertheless, even though experimental investigation of RFH may not give simple answers to above questions, it will help to recreate a more accurate picture of RF evolution. The most provocative aspect of the RFH story is the lack of an apparent need for yet another class-I RF. It is unclear what kind of signals RFH might recognize in mRNA.

Specific and conserved alterations (compared to RF1 and RF2) in those parts of RFH that interact with mRNA suggest that RFH recognizes something different from normal stop codons. Several speculative suggestions can be made regarding what might be a potential RFH signal. We will mention a few of them. If RFH recognizes a combination of standard nucleotides in mRNA other than stop codons (specifically or non-specifically), it will compete with tRNAs. This will result in ambiguous translation of sense codons as stop codons. Under normal conditions, such ambiguous translation is unlikely to be beneficial. However, during starvation for certain amino acids, premature termination on their corresponding codons will release stalled ribosomes. Hence, such a situation might be beneficial if RFH is expressed under starvation conditions for one or more amino acids. This would be useful in dealing with the ribosomes whose A-site is unoccupied in contrast to the RelA mediated stringent response triggered by stalled ribosomes occupied with deacylated tRNAs [39]. Since equilibrium between such ribosomal states is likely, RFH may act with RelA in parallel. If correct the function of RFH would partially overlap with that of tmRNA, but it would not have the tmRNA feature of ensuring the addition of a C-terminal tag, which is the substrate for a specific protease that rapidly degrades the product.

The co-occurrence of RFH and the upstream gene, may also represent a toxin/antidote balance. Unwanted premature termination (performed by RFH) would be toxic, and should be closely controlled by another protein, here suggested to be the upstream gene product.

Another potential role for RFH could be in recognition of mRNA containing nucleotides that are modified because of damage or for other reasons. The list of potential signals could be continued. Whatever the RFH function is, RFH is dispensable in most modern bacteria, meaning that either its function is also dispensable or it is accomplished by a different parallel system.

We know other examples of organisms with additional RFs. In A. thaliana, there are three highly similar isogenic eRF1s [40]. In some ciliates, e. g. Euplotes and in certain methanogenic archaea, there are two class-I RFs instead of only one [41,42]. Interestingly, in the genetic codes of ciliates and methanogenic archaea, stop codons have been reassigned to sense codons. In many Euplotes UGA is reassigned to tryptothan [41], while in methagenic archaea UAG is translated as pyrrolysine [43]. The corresponding RF1s in these species have multiple substitutions in the area of the NIKS motif that is responsible for stop codon discrimination [42]. Whether the emergence of RFH was a result of a similar codon reassignment event is another interesting question to be answered.

For more information about genes:

MedlinePlus Genetics provides consumer-friendly gene summaries that include an explanation of each gene's normal function and how variants in the gene cause particular genetic conditions.

More information about how genetic conditions and genes are named is also available from MedlinePlus Genetics.

The Tech Museum of Innovation at Stanford University describes genes and how they were discovered.

The Virtual Genetics Education Centre, created by the University of Leicester, offers additional information on DNA, genes, and chromosomes.

There are small parts between Genes in an Operon that does not encode for any amino acids. What is the purpose of these parts? - Biology

73 notecards = 19 pages ( 4 cards per page)

Chapter 18 AP Biology

The role of a metabolite that controls a repressible operon is to A) bind to the promoter region and decrease the affinity of RNA polymerase for the promoter. B) bind to the operator region and block the attachment of RNA polymerase to the promoter. C) increase the production of inactive repressor proteins. D) bind to the repressor protein and inactivate it. E) bind to the repressor protein and activate it.

The tryptophan operon is a repressible operon that is A) permanently turned on. B) turned on only when tryptophan is present in the growth medium. C) turned off only when glucose is present in the growth medium. D) turned on only when glucose is present in the growth medium. E) turned off whenever tryptophan is added to the growth medium.

Which of the following is a protein produced by a regulatory gene? A) operon B) inducer C) promoter D) repressor E) corepressor

A lack of which molecule would result in the cell's inability to "turn off" genes? A) operon B) inducer C) promoter D) ubiquitin E) corepressor

Which of the following, when taken up by the cell, binds to the repressor so that the repressor no longer binds to the operator? A) ubiquitin B) inducer C) promoter D) repressor E) corepressor

Most repressor proteins are allosteric. Which of the following binds with the repressor to alter its conformation? A) inducer B) promoter C) RNA polymerase D) transcription factor E) cAMP

A mutation that inactivates the regulatory gene of a repressible operon in an E. coli cell would result in A) continuous transcription of the structural gene controlled by that regulator. B) complete inhibition of transcription of the structural gene controlled by that regulator. C) irreversible binding of the repressor to the operator. D) inactivation of RNA polymerase by alteration of its active site. E) continuous translation of the mRNA because of alteration of its structure

The lactose operon is likely to be transcribed when A) there is more glucose in the cell than lactose. B) the cyclic AMP levels are low. C) there is glucose but no lactose in the cell. D) the cyclic AMP and lactose levels are both high within the cell. E) the cAMP level is high and the lactose level is low.

Transcription of the structural genes in an inducible operon A) occurs continuously in the cell. B) starts when the pathway's substrate is present. C) starts when the pathway's product is present. D) stops when the pathway's product is present. E) does not result in the production of enzymes.

For a repressible operon to be transcribed, which of the following must occur? A) A corepressor must be present. B) RNA polymerase and the active repressor must be present. C) RNA polymerase must bind to the promoter, and the repressor must be inactive. D) RNA polymerase cannot be present, and the repressor must be inactive. E) RNA polymerase must not occupy the promoter, and the repressor must be inactive.

For a repressible operon to be transcribed, which of the following must occur? A) A corepressor must be present. B) RNA polymerase and the active repressor must be present. C) RNA polymerase must bind to the promoter, and the repressor must be inactive. D) RNA polymerase cannot be present, and the repressor must be inactive. E) RNA polymerase must not occupy the promoter, and the repressor must be inactive.

Altering patterns of gene expression in prokaryotes would most likely serve the organism's survival in which of the following ways? A) organizing gene expression so that genes are expressed in a given order B) allowing each gene to be expressed an equal number of times C) allowing the organism to adjust to changes in environmental conditions D) allowing young organisms to respond differently from more mature organisms E) allowing environmental changes to alter the prokaryote's genome

In response to chemical signals, prokaryotes can do which of the following? A) turn off translation of their mRNA B) alter the level of production of various enzymes C) increase the number and responsiveness of their ribosomes D) inactivate their mRNA molecules E) alter the sequence of amino acids in certain proteins

If glucose is available in the environment of E. coli, the cell responds with a very low concentration of cAMP. When the cAMP increases in concentration, it binds to CAP. Which of the following would you expect to be a measurable effect? A) decreased concentration of the lac enzymes B) increased concentration of the trp enzymes C) decreased binding of the RNA polymerase to sugar metabolism-related promoters D) decreased concentration of alternative sugars in the cell E) increased concentrations of sugars such as arabinose in the cell

In positive control of several sugar-metabolism-related operons, the catabolite activator protein (CAP) binds to DNA to stimulate transcription. What causes an increase in CAP? A) increase in glucose and increase in cAMP B) decrease in glucose and increase in cAMP C) increase in glucose and decrease in cAMP D) decrease in glucose and increase in repressor E) decrease in glucose and decrease in repressor

There is a mutation in the repressor that results in a molecule known as a super-repressor because it represses the lac operon permanently. Which of these would characterize such a mutant? A) It cannot bind to the operator. B) It cannot make a functional repressor. C) It cannot bind to the inducer. D) It makes molecules that bind to one another. E) It makes a repressor that binds CAP.

Which of the following mechanisms is (are) used to coordinate the expression of multiple, related genes in eukaryotic cells? A) Genes are organized into clusters, with local chromatin structures influencing the expression of all the genes at once. B) The genes share a common intragenic sequence, and allow several activators to turn on their transcription, regardless of location. C) The genes are organized into large operons, allowing them to be transcribed as a single unit. D) A single repressor is able to turn off several related genes. E) Environmental signals enter the cell and bind directly to promoters.

If you were to observe the activity of methylated DNA, you would expect it to A) be replicating nearly continuously. B) be unwinding in preparation for protein synthesis. C) have turned off or slowed down the process of transcription. D) be very actively transcribed and translated. E) induce protein synthesis by not allowing repressors to bind to it.

Genomic imprinting, DNA methylation, and histone acetylation are all examples of A) genetic mutation. B) chromosomal rearrangements. C) karyotypes. D) epigenetic phenomena. E) translocation.

When DNA is compacted by histones into 10-nm and 30-nm fibers, the DNA is unable to interact with proteins required for gene expression. Therefore, to allow for these proteins to act, the chromatin must constantly alter its structure. Which processes contribute to this dynamic activity? A) DNA supercoiling at or around H1 B) methylation and phosphorylation of histone tails C) hydrolysis of DNA molecules where they are wrapped around the nucleosome core D) accessibility of heterochromatin to phosphorylating enzymes E) nucleotide excision and reconstruction

Two potential devices that eukaryotic cells use to regulate transcription are A) DNA methylation and histone amplification. B) DNA amplification and histone methylation. C) DNA acetylation and methylation. D) DNA methylation and histone modification. E) histone amplification and DNA acetylation.

During DNA replication, A) all methylation of the DNA is lost at the first round of replication. B) DNA polymerase is blocked by methyl groups, and methylated regions of the genome are therefore left uncopied. C) methylation of the DNA is maintained because methylation enzymes act at DNA sites where one strand is already methylated and thus correctly methylates daughter strands after replication. D) methylation of the DNA is maintained because DNA polymerase directly incorporates methylated nucleotides into the new strand opposite any methylated nucleotides in the template. E) methylated DNA is copied in the cytoplasm, and unmethylated DNA is copied in the nucleus.

In eukaryotes, general transcription factors A) are required for the expression of specific protein-encoding genes. B) bind to other proteins or to a sequence element within the promoter called the TATA box. C) inhibit RNA polymerase binding to the promoter and begin transcribing. D) usually lead to a high level of transcription even without additional specific transcription factors. E) bind to sequences just after the start site of transcription.

Steroid hormones produce their effects in cells by A) activating key enzymes in metabolic pathways. B) activating translation of certain mRNAs. C) promoting the degradation of specific mRNAs. D) binding to intracellular receptors and promoting transcription of specific genes. E) promoting the formation of looped domains in certain regions of DNA.

Transcription factors in eukaryotes usually have DNA binding domains as well as other domains that are also specific for binding. In general, which of the following would you expect many of them to be able to bind? A) repressors B) ATP C) protein-based hormones D) other transcription factors E) tRNA

Gene expression might be altered at the level of post-transcriptional processing in eukaryotes rather than prokaryotes because of which of the following? A) Eukaryotic mRNAs get 5' caps and 3' tails. B) Prokaryotic genes are expressed as mRNA, which is more stable in the cell. C) Eukaryotic exons may be spliced in alternative patterns. D) Prokaryotes use ribosomes of different structure and size. E) Eukaryotic coded polypeptides often require cleaving of signal sequences before localization.

Which of the following experimental procedures is most likely to hasten mRNA degradation in a eukaryotic cell? A) enzymatic shortening of the poly-A tail B) removal of the 5' cap C) methylation of C nucleotides D) methylation of histones E) removal of one or more exons

Which of the following is most likely to have a small protein called ubiquitin attached to it? A) a cyclin that usually acts in G1, now that the cell is in G2 B) a cell surface protein that requires transport from the ER C) an mRNA that is leaving the nucleus to be translated D) a regulatory protein that requires sugar residues to be attached E) an mRNA produced by an egg cell that will be retained until after fertilization

In prophase I of meiosis in female Drosophila, studies have shown that there is phosphorylation of an amino acid in the tails of histones of gametes. A mutation in flies that interferes with this process results in sterility. Which of the following is the most likely hypothesis? A) These oocytes have no histones. B) Any mutation during oogenesis results in sterility. C) All proteins in the cell must be phosphorylated. D) Histone tail phosphorylation prohibits chromosome condensation. E) Histone tails must be removed from the rest of the histones.

The phenomenon in which RNA molecules in a cell are destroyed if they have a sequence complementary to an introduced double-stranded RNA is called A) RNA interference. B) RNA obstruction. C) RNA blocking. D) RNA targeting. E) RNA disposal.

At the beginning of this century there was a general announcement regarding the sequencing of the human genome and the genomes of many other multicellular eukaryotes. There was surprise expressed by many that the number of protein-coding sequences was much smaller than they had expected. Which of the following could account for most of the rest? A) "junk" DNA that serves no possible purpose B) rRNA and tRNA coding sequences C) DNA that is translated directly without being transcribed D) non-protein-coding DNA that is transcribed into several kinds of small RNAs with biological function E) non-protein-coding DNA that is transcribed into several kinds of small RNAs without biological function

Among the newly discovered small noncoding RNAs, one type reestablishes methylation patterns during gamete formation and block expression of some transposons. These are known as A) miRNA. B) piRNA. C) snRNA. D) siRNA. E) RNAi.

Which of the following best describes siRNA? A) a short double-stranded RNA, one of whose strands can complement and inactivate a sequence of mRNA B) a single-stranded RNA that can, where it has internal complementary base pairs, fold into cloverleaf patterns C) a double-stranded RNA that is formed by cleavage of hairpin loops in a larger precursor D) a portion of rRNA that allows it to bind to several ribosomal proteins in forming large or small subunits E) a molecule, known as Dicer, that can degrade other mRNA sequences

One way scientists hope to use the recent knowledge gained about noncoding RNAs lies with the possibilities for their use in medicine. Of the following scenarios for future research, which would you expect to gain most from RNAs? A) exploring a way to turn on the expression of pseudogenes B) targeting siRNAs to disable the expression of an allele associated with autosomal recessive disease C) targeting siRNAs to disable the expression of an allele associated with autosomal dominant disease D) creating knock-out organisms that can be useful for pharmaceutical drug design E) looking for a way to prevent viral DNA from causing infection in humans

Which of the following describes the function of an enzyme known as Dicer? A) It degrades single-stranded DNA. B) It degrades single-stranded mRNA. C) It degrades mRNA with no poly-A tail. D) It trims small double-stranded RNAs into molecules that can block translation. E) It chops up single-stranded DNAs from infecting viruses.

In a series of experiments, the enzyme Dicer has been inactivated in cells from various vertebrates so that the centromere is abnormally formed from chromatin. Which of the following is most likely to occur? A) The usual mRNAs transcribed from centromeric DNA will be missing from the cells. B) Tetrads will no longer be able to form during meiosis I. C) Centromeres will be euchromatic rather than heterochromatic and the cells will soon die in culture. D) The cells will no longer be able to resist bacterial contamination. E) The DNA of the centromeres will no longer be able to replicate.

Since Watson and Crick described DNA in 1953, which of the following might best explain why the function of small RNAs is still being explained? A) As RNAs have evolved since that time, they have taken on new functions. B) Watson and Crick described DNA but did not predict any function for RNA. C) The functions of small RNAs could not be approached until the entire human genome was sequenced. D) Ethical considerations prevented scientists from exploring this material until recently. E) Changes in technology as well as our ability to determine how much of the DNA is expressed have now made this possible.

You are given an experimental problem involving control of a gene's expression in the embryo of a particular species. One of your first questions is whether the gene's expression is controlled at the level of transcription or translation. Which of the following might best give you an answer? A) You explore whether there has been alternative splicing by examining amino acid sequences of very similar proteins. B) You measure the quantity of the appropriate pre-mRNA in various cell types and find they are all the same. C) You assess the position and sequence of the promoter and enhancer for this gene. D) An analysis of amino acid production by the cell shows you that there is an increase at this stage of embryonic life. E) You use an antibiotic known to prevent translation.

In humans, the embryonic and fetal forms of hemoglobin have a higher affinity for oxygen than that of adults. This is due to A) nonidentical genes that produce different versions of globins during development. B) identical genes that generate many copies of the ribosomes needed for fetal globin production. C) pseudogenes, which interfere with gene expression in adults. D) the attachment of methyl groups to cytosine following birth, which changes the type of hemoglobin produced. E) histone proteins changing shape during embryonic development.

The fact that plants can be cloned from somatic cells demonstrates that A) differentiated cells retain all the genes of the zygote. B) genes are lost during differentiation. C) the differentiated state is normally very unstable. D) differentiated cells contain masked mRNA. E) differentiation does not occur in plants.


In our two-layer algorithm, the lower layer predicts the initial candidate uber-operons through identifying a set of linker genes using a single reference genome. The higher layer fuses all the uber-operon predictions provided by the lower layer against each set of reference genomes to give the final prediction. The purpose of using multiple reference genomes is to increase the prediction reliability by reducing accidental false prediction or missing linker genes, which may occur by using a single reference genome.

Data preparation

By selecting one complete genome in each genus, we have obtained 115 genomes from 224 complete bacterial genomes at the NCBI website (release of 03/05/2005). Operon prediction results for these genomes were downloaded from, denoted as VIMSS operons ( 7). We have also applied our in-house program, JPOP ( 2, 6), for operon prediction. The average operon size predicted by JPOP is slightly smaller than that of the VIMSS operons, although the two programs have similar prediction accuracy (F. Mao and Y. Xu, unpublished data). The VIMSS operons are used for our study, because their slightly larger operon size should in principle lead to lower false negative rate in linker gene identification. Since VIMSS has operon predictions for only 91 out of the 115 genomes (including Escherichia coli K12), we have removed the remaining 24 genomes from further consideration (see Supplementary Table S1).

Another dataset needed for our uber-operon prediction is the homologous genes in the reference genomes for each gene in our target genome. We have carried out a homologous gene mapping for each of the 91 genomes against the remaining 90 genomes, using BLAST search with an E-value cutoff at 10 −3 . Both the predicted operons and the homologous genes are provided at our Uber-Operon Database:

Uber-operon prediction against one reference genome

We first formulate the problem of uber-operon identification based on one reference genome, and then outline an algorithm for solving the problem. The main and fundamental difference between our algorithm and the algorithm of ( 14) is that we do not assume that the orthologous gene relationship is given instead orthologous gene relationship is detected simultaneously with uber-operon prediction.

Consider a target genome G1 and a reference genome G2. We assume that each gene in G1 has at most one ortholog in G2, and vice versa. Intuitively, a uber-operon is modeled as a maximal group of transcriptionally or functionally related operons that are linked through linker genes and there is no overlap between any two uber-operons (unlike regulons). One challenging issue in identifying uber-operons is to accurately identify orthologous genes between two genomes. Our previous study has demonstrated that existing methods, such as BDBH ( 20), its variations ( 21) and COG ( 22) are not adequate for highly specific and accurate identification of orthologous genes at a large scale, since these algorithms all attempt to predict orthology based mainly on sequence similarity information, and sequence similarity information alone does not imply orthology ( 12). This problem has been partially overcome by a new strategy employed in our recent work on orthologous gene mapping by using both sequence similarity and genomic structure information ( 12, 23). The basic idea is as follows. If a pair of genes g1, g2 are in the same operon of G1 and their homologous genes g1′ and g2′ are also in the same operon in G2, then the probability for g1 and g1′ and g2 and g2′, respectively, to be orthologous is high ( 23). So our uber-operon identification algorithm is to find such mappings in the context of finding uber-operons, which maximizes the overall probability for all the mapped gene pairs to be orthologous.

Formally, we define a bipartite graph B = (U, V, E) for genomes G1 and G2 as follows. Let


Computer analysis had been used for prediction of bacterial transcription signals for more than 15 years ( 10, 38–14) and on many occasions the results have served as the basis for further experimental work (e.g. 43). Co-evolution of regulons and regulators also was examined ( 45). However, to the best of our knowledge, this study is the first attempt to systematically characterize regulatory sites in two or more genomes by comparing the respective complete gene sets.

This comparative approach involves three main components: (i) prediction of transcription factor binding sites, (ii) delineation of orthologous relationship between genes by comparing their protein products and (iii) comparison and, when necessary, prediction of protein functions. The use of complete genomes facilitates the identification of orthologs and thus increases the reliability of inferences regarding identical or similar cellular roles of proteins. However, in spite of potential uncertainty in terms of orthology, identification of homologous genes in all bacterial species, including those whose genome sequences have not been completed yet, using similarity search in Gen-Bank is a useful supplement to this analysis.

All sites considered in this paper are approximately palindromic. However, we used the sites in the orientation corresponding to the direction of transcription and did not symmetrize the profiles. There were two reasons for this. First, we were interested in designing a general procedure for site recognition, rather than one that is applicable to symmetrical sites only. Second, it is not guaranteed that even the dimeric factors bind their operators in the symmetric manner. This possibility has been raised in the case of TrpR based on the crystallographic data ( 46) and chemical modification of natural sites ( 47), and in the case of AraC based on mutational analysis ( 48). The Lrp binding signal derived from the SELEX data is not symmetrical either ( 49).

The comparative analysis of the E.coli and H.influenzae genomes revealed three principal types of differences between operons that are subject to the same mode of regulation. The differences of the first type are limited to the presence or absence of individual genes in otherwise conserved operons. The examples in H.influenzae are operons ycfCpurB (purB in E.coli, Fig. 3b), argH (argCBH in E.coli, Fig. 3c), ydfGtrpBA (trpBA in P.multocida, Fig. 3e) and tyr A (aroFtyrA in E.coli, Fig. 3g).

The second type of changes involves breaking of an operon into two parts, both of which retain the regulation. Two E.coli opérons, purHD and glyA, both regulated by PurR, correspond, in H.influenzae, to the gene string HI0887-HI0889 with a PUR box upstream of HI0887 ( Fig. 3a). Similarly, the tryptophan operon is broken in H.influenzae into two parts, trpEDC and trpBA, both of which have strong TRP boxes in the regulatory regions.

Finally, some opérons lose or switch regulation. The most interesting case in this category is the elimination of purR autoregulation in H.influenzae. The loss of ‘regulation of regulators’ appears to be a more general phenomenon: in E.coli, the repressor IlvY regulates both its own gene ilvY and the adjacent ilvC gene, which are transcribed from divergent promoters. By contrast, in H.influenzae, although the overall location of these genes is the same, the distance between them is much larger, and a candidate binding site is close to ilvC, but too distant from ilvY to expect autoregulation (M.Gelfand, unpublished observation). The elimination of this higher level of regulation may be linked to the evolution of the parasitic lifestyle of H.influenzae that requires much less versatility in the response of the bacterium to environmental changes than its free-living relatives, such as E.coli. Another clear case of simplification in regulation includes the loss of the TYR box by the H.influenzae mtr operon, which in E.coli is regulated by both TrpR and TyrR. The roadblock mechanism of repression of purB by purR in E.coli is not conserved in H.influenzae, although the repression itself seems to exist. Finally, it is possible that the gene aroG of H.influenzae has switched its regulation from TyrR to TrpR.

The conservation of a regulatory DNA-binding protein in an uncharacterized bacterial genome seems to be a reliable predictor of the conservation of the binding sites in at least some operons, even if most of the regulon is missing. For example, there are only three known genes in the arginine regulon of H.influenzae, including the repressor ArgR itself (but not counting the transport proteins predicted to belong to the arginine regulon in this work), but the ARG boxes are conserved. The E.coli ARG box recognition matrix seems capable of detecting the relevant signals even in the distantly related Bacillus subtilis genome, which also encodes an ortholog of ArgR (A.AMironov and M.S.Gelfand, unpublished observations). Conversely, there are no strong PUR boxes in the Helicobacter pylori genome that does not encode a PurR ortholog. Similarly, although there is a purine repressor in B.subtilis, it is unrelated to the E.coli PurR, and indeed, the type of regulation (mostly by attenuation) and regulatory sites (in a few genes regulated at the transcription level) of the B.subtilis purine regulon differ from those of E.coli. The P.aeruginosa operon trpBA is regulated by the repressor TrpI, which is unrelated to TrpR of E.coli and H.influenzae, and predictably, there are no TRP boxes in the region upstream of this operon.

This study allowed us to make several predictions that appear to be readily experimentally testable. One group of such predictions includes inferences about changes in regulation patterns, namely the loss of autoregulation in the H.influenzae ortholog of PurR, different mode of repression of purB, and the apparent change in the regulation of aroG. The second group of predictions extends the purine and arginine regulons both in E.coli and H.influenzae by inclusion of transport proteins (purine and arginine transporters). It is somewhat surprising that these transport systems, especially the large family of H + / purine symporters, have not been identified as part of the purine regulon by genetic analysis. A possible explanation is that all genes from this family that are predicted to be under the PurR regulation have close non-regulated paralogs, and thus the effect of mutations in the regulated genes might be manifest only under very specific conditions.

Further research directions will include analysis of global regulatory systems, such as SOS, CRP, Fur and Fnr regulons, and multiple interacting systems, for example the interaction between purine and pyrimidine regulation or the interaction between the regulation by repression and by attenuation in the aromatic amino acid regulon, as well as comparisons between more distant genomes, such as E.coli and B.subtilis. As a more distant goal, we envisage development of techniques for systematic characterization of regulatory pathways in newly sequenced genomes.

Watch the video: 060912 ΓΟΝΙΔΙΑ (June 2022).