An update of the unceasingly growing and diverse AraC/XylS family of transcriptional activators

Corresponding author: Laboratorio de Genética Microbiana, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Prol. De Carpio y Plan de Ayala S/N Col. Santo Tomás Alc. Miguel Hidalgo CP 11340, Ciudad de México, México. Tel: 52+5557296000 ext. 62482; E-mail: jaig19@gmail.com, jibarrag@ipn.mx

Search for other works by this author on:

These authors contributed equally to this work.

FEMS Microbiology Reviews, Volume 45, Issue 5, September 2021, fuab020, https://doi.org/10.1093/femsre/fuab020

10 April 2021 30 October 2020 31 March 2021 10 April 2021

Cite

Daniel Cortés-Avalos, Noemy Martínez-Pérez, Mario A Ortiz-Moncada, Aylin Juárez-González, Arturo A Baños-Vargas, Paulina Estrada-de los Santos, Ernesto Pérez-Rueda, J Antonio Ibarra, An update of the unceasingly growing and diverse AraC/XylS family of transcriptional activators, FEMS Microbiology Reviews, Volume 45, Issue 5, September 2021, fuab020, https://doi.org/10.1093/femsre/fuab020

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

ABSTRACT

Transcriptional factors play an important role in gene regulation in all organisms, especially in Bacteria. Here special emphasis is placed in the AraC/XylS family of transcriptional regulators. This is one of the most abundant as many predicted members have been identified and more members are added because more bacterial genomes are sequenced. Given the way more experimental evidence has mounded in the past decades, we decided to update the information about this captivating family of proteins. Using bioinformatics tools on all the data available for experimentally characterized members of this family, we found that many members that display a similar functional classification can be clustered together and in some cases they have a similar regulatory scheme. A proposal for grouping these proteins is also discussed. Additionally, an analysis of surveyed proteins in bacterial genomes is presented. Altogether, the current review presents a panoramic view into this family and we hope it helps to stimulate future research in the field.

INTRODUCTION

Transcriptional regulation is one of the definite regulatory steps in gene expression in all domains of life (Browning and Busby 2016; Mejía-Almonte et al. 2020). In general, this event occurs by the action of a class of specific proteins called transcriptional factors (TF) that usually bind to DNA, which exert their activity by either promoting (activators) or disfavoring (repressors) the transcription initiation of a gene. Some of these can even have both activities (dual regulators), which tend to activate or repress depending on environmental cues. Activators can also be classified as “positive” activators or “derepressors” (also called “antisilencers”); the former include those TFs that are able to promote the binding of the RNA polymerase (RNAP) while the latter are TFs that remove the action of a repressor (Browning and Busby 2016; Busby 2019). Depending on their amino acid sequence, TFs in bacteria have been classified into at least 19 families and their abundance depends on the phylum in most instances (Perez-Rueda et al. 2018; Flores-Bautista et al. 2020). Among the families with the highest number of members are the LysR, TetR/ArcR and AraC/XylS; these account for 30% of the total of TFs identified in bacteria. The focus of this review is the AraC/XylS family of transcriptional regulators. This family was originally described in 1990 by Henikoff and colleagues while applying early bioinformatics tools in protein sequences obtainable at that time (Henikoff, Wallace and Brown 1990; Henikoff and Henikoff 1992). Later, in a 1997 seminal review led by Juan Ramos and Robert Schleif the family was expanded and reviewed thoroughly using the available data (Gallegos et al. 1997). Back then, they gathered and analyzed close to 100 TFs, reviewed the state of the art and defined the identity of this family. Since then, the combination of more experimental evidence, the advent of high throughput sequencing, the increase of available sequenced genomes and access to more sophisticated bioinformatics tools has led to the discovery of many more of members of this family. Here we intended to update the knowledge regarding this growing family of TFs, discuss its classification and structure and propose a method to predict the possible function of new members.

A family trait, the DNA binding domain

The AraC/XylS family is distinguished by a segment of 99 amino acids that contains two helix-turn-helix (HTH) motifs separated by an alpha (α)-helix (Gallegos et al. 1997; Egan 2002; Ibarra et al. 2008). The HTH motifs make specific contacts with a DNA sequence and are contained in the “DNA binding domain” or DBD (Fig. 1). Interactions have been shown experimentally both indirectly, by using transcriptional fusions and mutants, or directly, by using protein–DNA interactions (such as electrophoretic mobility shift assays, Chip-on-ChIP, co-immunoprecipitation, footprint and atomic force microscopy) and crystallography. Previous findings showed that the AraC/XylS-like proteins are insoluble at high concentrations and are not easily purified (Egan and Schleif 1994; Fawcett and Wolf 1994; Domínguez-Cuevas et al. 2008), though when individual domains from some of these proteins were purified separately, they were more soluble than the whole protein (Yu and DiRita 2002; Timmes, Rodgers and Schleif 2004; Rodgers and Schleif 2009; Schleif 2010). Thus, a handful of proteins have been subjected to in-detail structural studies, which have confirmed the predictions that the DBD contains seven α-helices (α1–α7; Fig. 1A). These 3D structures have shown that the HTH1 includes α2 and α3 and HTH2 includes α5 and α6, while α4 is a linker structure between both HTH motifs and the α1 and α7 flank this domain (Rhee et al. 1998; Kwon et al. 2000; Rodgers and Schleif 2009; Lowden et al. 2010; Ni et al. 2013; Yao et al. 2013; Zhao et al. 2016). As such, each HTH could be considered a subdomain that is connected by α4 and these subdomains have a few contacts between them, which allows rotation of one HTH with respect to the other (Rodgers and Schleif 2009). This mobility in the subdomains would permit the protein to either bind to curved DNA or to bend it once it is bound to it. In some promoters a bending of approximately 35° has been shown but it is not always the case (Dangi et al. 2004). In this regard, α6 possibly undergoes a conformational change that allows binding to a major groove on the DNA (Rodgers and Schleif 2009; Lowden et al. 2010). A recent study on CdpR from Pseudomonas aeruginosa suggests that both the HTH1 and HTH2 display interactions with one another, mainly between the amino acid residues that are not conserved in the rest of the family (Zhao et al. 2016). In summary, the characteristic DBD among the AraC-like proteins has a conserved structure formed by 7 α-helices, but their internal interactions and those with their corresponding DNA targets might differ from protein to protein and this difference might dictate the interactions with their target DNA.

The distinctive AraC/XylS family DNA binding domain. Structure conformation of MarA (P0ACH5) is shown bound to DNA in (A) as an example of the DNA binding domain (DBD) in AraC-like proteins. The seven alpha (α) helices characteristic of the AraC/XylS family DBD are indicated. This image was generated with information for the MarA structure and the PDBe-KB tool (https://www.ebi.ac.uk/pdbe/pdbe-kb/proteins/P0ACH5). In (B) a simplified graphic of the possible configuration of the DBD (shown in red and the HTH motifs as white boxes) and the companion domain (CD) (shown in blue), which could be located either at the N-terminus or at the C-terminus. As mentioned in the text, some AraC-like proteins contain more than one CD and in some cases the DBD is between two or more CDs.

The distinctive AraC/XylS family DNA binding domain. Structure conformation of MarA (P0ACH5) is shown bound to DNA in (A) as an example of the DNA binding domain (DBD) in AraC-like proteins. The seven alpha (α) helices characteristic of the AraC/XylS family DBD are indicated. This image was generated with information for the MarA structure and the PDBe-KB tool (https://www.ebi.ac.uk/pdbe/pdbe-kb/proteins/P0ACH5). In (B) a simplified graphic of the possible configuration of the DBD (shown in red and the HTH motifs as white boxes) and the companion domain (CD) (shown in blue), which could be located either at the N-terminus or at the C-terminus. As mentioned in the text, some AraC-like proteins contain more than one CD and in some cases the DBD is between two or more CDs.

With the help of bioinformatics the DBD has been used to identify potential new members for this family. In order to determine how many proteins belong to this family, a search for proteins exhibiting homology with the AraC/XylS family was performed in 5321 prokaryotic genomes as previously described (Perez-Rueda et al. 2018). AraC/XylS-like proteins were found in 80.1% of the genomes (n = 4263; Supplementary file 1 ). In order to study in more detail the probable AraC/XylS-like proteins a non-redundant database of genomes was analyzed. In this dataset all the repeated genomes were removed to avoid overrepresentation, only a few representative genomes from each bacterial group were left and those overrepresented were mostly removed (The complete list of bacterial and archaeal genomes is available in Supplementary file 2 ; Moreno-Hagelsieb et al. 2013; Perez-Rueda et al. 2018). In summary, this dataset was composed of 1245 bacterial genomes and 106 archaeal genomes. By using this non-redundant dataset a similar percentage of bacteria containing at least one member of the AraC/XylS family as that observed with all the prokaryote genomes was found (80.6%). Continuing with this initial analysis only in three out of 106 archaeal genomes (2.83%) were proteins from this family found, and no AraC-like proteins were found in the Eukaryotes, showing that this family is almost restricted to Bacteria. Further analyses of the predicted proteins showed that the phyla with more predicted AraC-like proteins are the Bacteriodetes, Proteobacteria and Spirochaetes (Fig. 2A); despite the fact that only a few Firmicutes exhibit any predicted proteins, some organisms in this phylum encode up to 200 AraC-like proteins. In comparison, most of the Proteobacteria have between 20 and 30 members. This means that there are more AraC/XylS-type proteins in Proteobacteria than in Firmicutes but those few organisms in this phylum that encode AraC-like proteins have up to 10 times more of them than the average bacterium in the Proteobacteria phylum. From the total 15 935 proteins in this dataset, the most common lengths in amino acid residues are those ranging from 250 to 351 with a mode of 289 residues (Fig. 2B). The length ranges were from 46 to 1511 amino acid residues. The latter set of ranges suggests that by using the arbitrary limit of 90 amino acids mentioned above for the signature DBD in the AraC/XylS family, there are 73 proteins that would be considered too small to fit within the seven characteristic α-helices. However, in these small proteins both HTH motifs were found, representing an interesting variation and a potential matter of study. Thus, a series of experiments should be carried out, such as defining whether they are actually translated, their role in regulating transcription and their ability to bind DNA.

Survey of AraC/XylS proteins in a non-redundant genomes database. The prediction of AraC/XylS-like proteins was done as described previously (Flores-Bautista et al. 2020). In panel (A) the distribution of AraC/XyLS by phylum was done by using data in Supplementary file 2. The x-axis shows which phyla the bacterial genomes we studied belonged to (each blue circle represents a genome); y-axis shows the total number of AraC/XylS proteins. Thick lines in the middle of each box represent the median and whiskers caps represent the minimum and maximum values. Points outside the bars represent the outlier genomes. In (B) the lengths of all the predicted regulatory proteins were analyzed to determine their variability (data is deposited in Supplementary file 2). The x-axis designates the length from N-terminus to C-terminus as a percentage of the size of the protein; the y-axis indicates the proportion of proteins. In (C) the histogram shows the predicted location of the DNA-binding domain containing the AraC-motif (blue outlined bars) and the prediction for the location of the companion domain (s) (red outlined bars) in all the predicted proteins. For this analysis, the center of the each DBD was calculated and attributed to a window with respect to the normalized <a href=length of each protein. This normalization is shown in sheet 2 of supplementary file 2. Therefore, the total of proteins with the domain centered within each 10% window were counted and plotted as shown. In the x-axis “0” is the N-terminus and the “100%” is the C-terminus for each protein. The DBD in most of the family members (about 70%) is located in the last 80–90% of the analyzed sequences (i.e. towards the C-termini, blue bars); the additional domains or CDs are preferentially on the left of the graph, meaning that they are located towards the N-termini in those proteins. The graphics in the upper section of panel C illustrate the predicted location of the DBD and CD domains." />

Survey of AraC/XylS proteins in a non-redundant genomes database. The prediction of AraC/XylS-like proteins was done as described previously (Flores-Bautista et al. 2020). In panel (A) the distribution of AraC/XyLS by phylum was done by using data in Supplementary file 2 . The x-axis shows which phyla the bacterial genomes we studied belonged to (each blue circle represents a genome); y-axis shows the total number of AraC/XylS proteins. Thick lines in the middle of each box represent the median and whiskers caps represent the minimum and maximum values. Points outside the bars represent the outlier genomes. In (B) the lengths of all the predicted regulatory proteins were analyzed to determine their variability (data is deposited in Supplementary file 2 ). The x-axis designates the length from N-terminus to C-terminus as a percentage of the size of the protein; the y-axis indicates the proportion of proteins. In (C) the histogram shows the predicted location of the DNA-binding domain containing the AraC-motif (blue outlined bars) and the prediction for the location of the companion domain (s) (red outlined bars) in all the predicted proteins. For this analysis, the center of the each DBD was calculated and attributed to a window with respect to the normalized length of each protein. This normalization is shown in sheet 2 of supplementary file 2 . Therefore, the total of proteins with the domain centered within each 10% window were counted and plotted as shown. In the x-axis “0” is the N-terminus and the “100%” is the C-terminus for each protein. The DBD in most of the family members (about 70%) is located in the last 80–90% of the analyzed sequences (i.e. towards the C-termini, blue bars); the additional domains or CDs are preferentially on the left of the graph, meaning that they are located towards the N-termini in those proteins. The graphics in the upper section of panel C illustrate the predicted location of the DBD and CD domains.

Finally, an analysis of the DBDs of a well-characterized cohort of AraC-like TFs ( Table S1 , Supporting Information ) led us to suggest an update of the consensus sequence for this family as follows: (I/L)A—–SL-R-FG—–YI—R—A—L–S–SI-GFSS—FF(R/K)—GP–Y [amino acid residues in bold were originally suggested by Gallegos et al. ( 1997)]. In this sequence nine new conserved residues were also identified, extending the previous consensus sequence. In summary, the essential signature of the AraC/XylS family is the DBD containing two HTH motifs that seems to be structurally conserved and contains 7 α-helices and its distribution along the protein architecture varies among the family members.

The “companion” domain

Regarding their conformation, some of the AraC/XylS family members are formed almost exclusively by the DBD, as in Escherichia coli SoxS and MarA (Fig. 1B). In contrast, most of the AraC/XylS-like proteins have at least one other domain in addition to the DBD. Previously, we have called it the “companion” domain (CD; Perez-Rueda et al. 2018) or partner domain (PaDo; Rivera-Gómez et al. 2017) while in many other publications it has been called “effector domain”, “regulatory domain” or even “dimerization domain” (see below; Gallegos et al. 1997; Tobes and Ramos 2002; Schüller et al. 2012). This CD could be located either in the carboxyl terminus (CTD) or in the amino terminus (NTD) of the protein (Fig. 1B). In others the DBD could be located in the middle of the protein, as in the case of the methyl-repairing protein Ada in E. coli that is formed by two domains. In this protein the NTD has a methylphosphotriester-DNA-protein-cysteine methyltransferase activity and contains the DBD, and the CTD has a methylated-DNA-protein-cysteine methyltransferase activity (Sedgwick et al. 1988). Thus, Ada is formed by the DBD and two CDs, one at the NTD and one at the CTD.

As for the roles of the CDs, for many well-characterized proteins it has been shown that they sense distinctive signaling molecules through these domains. In regard to the TFs that give name to the family, XylS it is a 321 amino acids transcriptional activator in Pseudomonas putida that recognizes aromatic compounds (toluene, m-xylene and p-xylene) through the CD located in the NTD. When these compounds are bound, the protein binds to the Pm promoter as a dimer, inducing the transcription of the xylXYZLTEGFJQKIH operon for the degradation of these molecules (Ramos et al. 1990; Gallegos et al. 1997). On the other hand, AraC from E. coli is one of the best-characterized members of this family; it has a dual activity as a repressor and activator depending on the presence of arabinose, the inducer molecule (reviewed in Schleif 2010). Interactions have been shown between the CD and the DBD for AraC and changes in these interactions are caused by arabinose, which likely controls the rigidity of the inter-linker domain (see below) (Frato and Schleif 2009; Cole and Schleif 2012; Brown and Schleif 2019). An analysis of the AraC-like proteins found in the non-redundant dataset showed that 80% of all the proteins comprised of two or more domains have the DBD at the C-terminus region, whereas 15% of them have the CD in the N-terminus domain ( Supplementary File 2 ; Fig. 2C).

Other members of the family that also use the CD as a sensor domain are: MelR from E. coli with a similar dual action as AraC but detecting melibiose (Kahramanoglou et al. 2006); RhaS and RhaR in E. coli both detect rhamnose (Egan and Schleif 1993; Kolin et al. 2008), XylR in E. coli detects D-xylose (Ni et al. 2013), Vibrio cholerae ToxT that detects bile salts, bicarbonate ions and fatty acids (Chatterjee, Dutta and Chowdhury 2007; Abuaita and Withey 2009; Lowden et al. 2010; Li et al. 2016), HilD, Rns and VirF also detect fatty acids (Day et al. 2014; Golubeva et al. 2016), RegA in Citrobacter rodentium also detects bicarbonate ions (Yang et al. 2008), Streptomyces scabies TxtR senses cellobiose (Joshi et al. 2007), UreR in Proteus mirabilis detects urea (Gendlina et al. 2002; Yang, Tauschek and Robins-Browne 2011), Bacillus subtilis Btr senses the siderophore bacillibactin (Gaballa and Helmann 2007) and AxyR in Paenibacillus sp. detects xylo-oligosaccharides as a cofactor (Fukuda et al. 2012).

In addition to sensing signal molecules, the CD may suffer irreversible changes that alter the specificity of the transcriptional factor. Such is the case of Ada in E. coli that automethylates ( me Ada) by removing methyl groups from alkylated bases in the DNA. me Ada has a higher affinity than Ada to its regulatory regions where it is able to favor transcription by recruiting the RNA polymerase (RNAP; Landini and Volker 2000; Sedgwick and Lindahl 2002). These findings have led to the proposal that the CDs in the AraC/XylS family have a role as sensors of environmental signals and the terms “sensor domain” or “response domain” have been used in many reports when describing the CDs (Housseini B Issa, Phan and Broutin 2018). Thus, it is important to note that for the majority of members it is not known whether the CD has the ability to detect molecules or not. It is more likely that those AraC/XylS members involved in the regulation of the expression of metabolic routes detect one or more molecules and some TFs involved in the expression of virulence genes also detect some signals (see below for the functional classification of the AraC/XylS factors).

Another feature that has been assigned to the CD in this family is the ability to form dimers, as in the case of AraC, XylS and MelR, which dimerize through their CDs (Gallegos et al. 1997; Tobes and Ramos 2002; Ibarra et al. 2008; Schüller et al. 2012). Thus, the perception that dimerization is a common feature of this domain in all the family members is also widespread. Again, this is a misconception, as in order to be certain that these proteins actually form dimers it is important to experimentally demonstrate the protein-protein interactions for each member of the family. For instance, some members that have a CD are not able to form dimers (Ibarra, Villalba and Puente 2003; Romero-González et al. 2020) emphasizing the fact that it is important to avoid generalizations and to experimentally demonstrate this ability for each new AraC-like protein.

An analysis of the CDs in the dataset of AraC/XylS proteins detected in all the sequenced genomes showed that 34.6% (n = 26 540) of the proteins are monodomain (i.e. formed only by the DBD), while 65.4% (n = 50 183) of the proteins are composed of multidomains with 1–15 CDs. From this group, 60.4% of the proteins have two domains, the DBD and one CD, showing that this is the most common architecture for the members of the family. Proteins with 3 or 4 domains range from 2.5 to 1.8%, respectively, and those with 5–15 CDs are less common (0.001–0.2%; Supplementary File 1 ). Analysis of the CDs showed that they belong to 94 different PFAM families, according to their PFAM assignments, the most frequent being the arabinose-binding and dimerization domain of the bacterial gene regulatory protein AraC (PF02311) found in 25.2% of the proteins. This suggests that the proteins with this CD might be able to detect a carbohydrate molecule and to dimerize, though this needs to be tested experimentally. The next most recurrent family is PF01965, which is the DJ-1/PfpI family present in proteases and in some transcriptional regulators such as AdpA from Streptomyces griseus (Yamazaki et al. 2004). A list of the PFAM found in the complete dataset is available in Supplementary file 3 . Some other families of proteins found in the AraC/XylS proteins include: response regulator receiver (PF00072), GyrI-like small molecule binding domain (PF06445), cupin domain (PF12852, PF07883 and PF08007) and tetratricopeptide repeat (PF13432, PF14559 and PF13374). These results show the wide diversity of CDs in the AraC/XylS proteins, as we have previously shown for a smaller dataset (Perez-Rueda et al. 2018). Thus, there are some examples of AraC-like proteins that are a mixture of regulatory families, that is, there are some of these TFs that show a CD with a significant similarity to the histidine kinase domain of the two-component system response regulators family (Lange et al. 1999; Majewski et al. 2020). Another example is XylR from E. coli, whose structure showed that its CD shares a similarity with the LacI/GalR ligand binding domain (Ni et al. 2013). As might be expected, there are more examples and this only illustrates how these domains behave like modules (Rivera-Gómez et al. 2017). Hence, it is not strange to find such combinations in the multiple TFs families, especially the AraC/XylS, which is among those families with a higher variability than others (Rivera-Gómez et al. 2017).

Lastly, another feature that the AraC/XylS proteins with CDs have is a proteinaceous region that tethers the DBD and the CD together, the inter-domain linker. This is usually a predicted non-structured region that connects both the DBD and the CD domains (Eustance and Schleif 1996; Gallegos et al. 1997; Brown and Schleif 2019). Despite the fact that this region tolerates some changes in a few members of the family such as AraC, RhaR, RhaS, Rns and VirF (Eustance and Schleif 1996; Porter and Dorman 2002; Kolin et al. 2007; Mahon, Smyth and Smith 2010), recent studies in AraC have shown that for this protein the linker is relevant not only for connecting both domains but for their interaction which changes the ability of this protein to interact with its regulatory region mainly in repressing conditions (Seedorff and Schleif 2011). Arabinose shifts the motility of the linker domain by changing its conformation from a α-helix to a non-helical state (Malaga et al. 2016; Brown and Schleif 2019). While studies with TF from other families have shown that point mutations in the linker region affect the DNA binding ability (Ekka et al. 2020), more studies in other members of the AraC family are needed to learn more about this previously unappreciated region.In summary, the DBD has a conserved role in all the members of the AraC/XylS family and denotes the distinctive attribute in these proteins, the majority of the proteins have a CD that in many members of the family has a role of either environment-sensing, dimerization or both; however these characteristics cannot be attributed to all members and such activities should be experimentally defined for each particular TF.

Regulatory roles

As other transcriptional regulators, the AraC/XylS family proteins can be classified as activators, repressors or both (see above). In order to learn more about regulatory roles for members of the family multiple experimentally characterized proteins were gathered and analyzed ( Table S1 , Supporting Information ). As a result of this analysis, we see that most of the members in the family are activators (110 out of 126, 87.3%), meaning that they act positively on the genes they regulate (Fig. 3). Here a sub-classification takes place: as mentioned above the activators can be divided into either a) anti-repressor (or derepressor or antisilencer) or b) positive activators (Browning and Busby 2016), sometimes also referred to as “classical activators” (Haugen, Ross and Gourse 2008). In the anti-repressor proteins, the transcriptional factor removes a repressor to allow access to the RNAP and initiate transcription (Fig. 3B). For instance, HilD in Salmonella enterica removes the global regulator H-NS from the regulatory region of the cognate genes and allows access to the RNAP (Martínez et al. 2014). A similar scenario has been described for VirF from Shigella (reviewed in Di Martino et al. 2016). In cases like this, the removal of the repressor by mutation or the use of dominant negative variants has shown that the regulated genes are expressed even in the absence of the transcriptional factor.

Transcriptional activation by AraC/XylS proteins. Transcriptional activation by AraC-like regulators can be by either positive activation (A) or by anti-repression (B). In both scenarios the TF might act as either a monomer or a dimer, though in (B) only the monomer is shown. When acting as a positive activator it is possible for it to contact the RNA polymerase (RNAP). The anti-repression occurs when the TF removes the action of a repressor, <a href=allowing the entrance of the RNAP." />

Transcriptional activation by AraC/XylS proteins. Transcriptional activation by AraC-like regulators can be by either positive activation (A) or by anti-repression (B). In both scenarios the TF might act as either a monomer or a dimer, though in (B) only the monomer is shown. When acting as a positive activator it is possible for it to contact the RNA polymerase (RNAP). The anti-repression occurs when the TF removes the action of a repressor, allowing the entrance of the RNAP.

In the case of positive activators, proteins can be classified in two classes: class I, when a TF is needed to recruit the RNAP; and class II, when the promoter region is modified by the TF to allow for the binding of the RNAP (Browning and Busby 2016). In either case there are contacts with some of the subunits of the RNAP (Fig. 3A), usually with the α or with the σ and in some cases with both subunits. AraC/XylS family members have being shown to have contacts with the RNAP, such as XylS (Ruiz and Ramos 2001), SoxS (Griffith and Wolf 2002; Shah and Wolf 2004; Zafar, Shah and Wolf 2010), MelR (Grainger et al. 2004), MarA (Gillette, Martin and Rosner 2000; Dangi et al. 2004), Rob (Jair et al. 1996; Taliaferro et al. 2012), RhaR and RhaS (Wickstrum and Egan 2004) and ExsA (Vakulskas, Brutinel and Yahr 2010). Results with MarA suggest possible amino acid residues for these contacts in other members of the family (Dangi et al. 2004), but more evidence from other proteins are needed to corroborate this hypothesis.

As for this interaction with the RNAP, there are two proposed models: the recruitment and the pre-recruitment models. In the first one a TF interacts first with the DNA and then recruits the RNAP while in the second the TF binds to the RNAP in solution and then directs it to the binding sites (reviewed in Duval and Lister 2013). Some AraC/XylS proteins interact with the RNAP in solution and for these it seems that the pre-recruitment model is favored. For instance, SoxS interacts with the RNAP α subunit and inhibits its interaction with the UP elements, while directing the RNAP to the “soxbox” sites (Shah and Wolf 2004). In such cases, based on a computational model for MarA, it is proposed that the RNAP-TF interaction increases the polymerase kinetics (Wall et al. 2009). In order to determine whether this is true or not, further experimental evidence including other AraC-like proteins aside from MarA or SoxS are necessary.

As mentioned in previous sections in this manuscript, some members of the family have been described as dual regulators. That is, they are able to change from repressor to activator (or vice versa) depending on the presence or absence of an inducer molecule (Grainger et al. 2004; Kahramanoglou et al. 2006). In the analyzed dataset, 13 TFs (10.3%) have a reported dual activity. Examples of proteins in this group include AraC, MelR, Rob, MarA and SoxS (reviewed in Duval and Lister 2013).

Finally, only three out of 126 TFs (2.4%) in the analyzed dataset are repressors. That is, they avoid transcription by prohibiting RNAP to bind to the promoter region. It seems that this a regulatory property not widely distributed, particularly among this family of proteins.

In summary, as in other families of TFs, the AraC-like proteins fall in one of three regulatory groups. Given the analyses on reported proteins it is more likely that any new identified member in this family is classified as an activator than as a repressor.

Regulating the regulator

As mentioned in the beginning of this review, regulation can happen at multiple levels that include the transcriptional, post-transcriptional, translational and post-translational levels. Members of the AraC/XylS are subjected to all these levels of regulation (Fig. 4). At the transcriptional level many of these TFs auto-regulate their own expression (Tobin and Schleif 1990; Froehlich et al. 1994; Yahr and Frank 1994; Iwaki et al. 1999; Martínez-Laguna, Calva and Puente 1999; Ellermeier, Ellermeier and Slauch 2005; Schleif 2010). For others, especially the single-domain members such as MarA and SoxS, their expression is usually regulated by other proteins (Alekshun and Levy 1997; Duval and Lister 2013).

Regulation of the AraC/XylS at multiple levels. Expression of gene encoding for an AraC-like protein can be subjected to either repression (R), activation (A), or both by a TF from either the same family or by regulators from another family. Transcriptional regulation by sRNA may occur as has been reported for a few members, while at the post-transcriptional level a regulatory thermometer has been shown to repress (for example in VirF). Once translated, sometimes the AraC-like protein auto-regulates its own expression and it can be subjected to post-translational regulation by proteases or by other proteins such as repressors (ANR, Hcp-like, etc.), activators, anti-activators or anti-anti-activators.

Regulation of the AraC/XylS at multiple levels. Expression of gene encoding for an AraC-like protein can be subjected to either repression (R), activation (A), or both by a TF from either the same family or by regulators from another family. Transcriptional regulation by sRNA may occur as has been reported for a few members, while at the post-transcriptional level a regulatory thermometer has been shown to repress (for example in VirF). Once translated, sometimes the AraC-like protein auto-regulates its own expression and it can be subjected to post-translational regulation by proteases or by other proteins such as repressors (ANR, Hcp-like, etc.), activators, anti-activators or anti-anti-activators.

Once transcription starts, for some members a post-transcriptional regulation step has been shown, such as that described for HilD in Salmonella (Hung et al. 2019) where the RNA binding protein and global regulator CsrA is involved. For LcrQ in Yersinia pseudotuberculosis the presence of an “RNA thermometer” in the mRNA of this TF was shown to play a role in regulating the translation (reviewed in Schwiesow et al. 2015).

At the post-translational level of regulation there are several examples of TFs or parts of the transcriptional machinery that are subjected to it. For instance, some σ factors are regulated by anti-sigma factors, which in turn can sometimes be also regulated by anti-anti-sigma factors (Campbell, Westblade and Darst 2008). This type of regulation has also been shown for members of the AraC/XylS family and involves the interaction with other proteins that can cause either a positive or a negative effect in the function of these regulators (Fig. 4).

Recently, a family of AraC negative regulators (ANR) has been described and it seems to be widely distributed in Bacteria (Santiago et al. 2016). The first member of this family of anti-activators found was Aar from enteroaggregative E. coli, which inhibits the activity of the AraC/XylS-like protein AggR by means of protein–rotein interactions that avoid the dimerization of this regulator. Moreover, it appears that Aar not only regulates AggR, but it also regulates the expression of the global regulator H-NS and two other homologues (Santiago et al. 2017). Despite the fact that several putative members of the ANR have been found, the experimental demonstration in other bacterial systems is needed in order to prove their function.

Even before the ANR family was described other proteins with similar activity had previously been discovered. For instance, HilE inhibits the formation of HilD dimers, which in turn blocks HilD binding to DNA (Grenz et al. 2018; Paredes-Amaya et al. 2018). HilE shares homology with the Hcp proteins that are related to the type 6 secretion system; thus, it does not belong to the ANR family.

Similarly, ExsD is also a negative post-translational regulator of ExsA, a regulator of virulence genes in P. aeruginosa that acts in a similar fashion to HilE by inhibiting formation of dimers and therefore, the DNA binding ability of ExsA. Nonetheless, the ExsA-ExsD regulatory circuit is a little more complicated than that of HilE, because ExsD is inhibited by the anti-anti-activator ExsC, which is in turn also blocked by ExsE, an effector of the type III-secretion system regulated by ExsA (reviewed in Thibault et al. 2009; Shrestha et al. 2020). Furthermore, ExsA is also regulated by another anti-activator, PrtA, which exerts its inhibitory activity by binding to ExsA in the presence of copper salts. In this case, the molecular effect of PrtA has not yet been characterized. LcrF (VirFY) role as an activator in Yersinia sp. is also inhibited by yet another protein, LcrQ, displaying no similarity to those described above (reviewed in Schwiesow et al. 2015). Once LcrQ is secreted LcrF is allowed to become active.

Not all protein interactions with AraC-like proteins have a deleterious effect (Fig. 4). InvF is an activator in the last section of the regulatory cascade that drives invasion of epithelial cells in S. enterica (Darwin and Miller 2001). Nevertheless, for InvF to be functional the presence of the chaperone SicA is necessary, hence it acts as a co-activator; and although it is known that both proteins interact with one another, the molecular details on the actual role SicA plays in stimulating the InvF activator are not entirely known (Romero-González et al. 2020). Likewise, MxiE in Shigella flexneri is also involved in the expression of virulence genes, but it has been shown to participate in a more intricate cascade. MxiE is bound by OspD1, an anti-activator that prevents the binding of the co-activator chaperone protein IpgC, which seems to be “sequestered” by the effector proteins IpaB and IpaC (in this scheme both act as anti-co-activators) (Fig. 4). OspD1 is stabilized by the chaperone Spa15, which then acts as a co-anti-activator. When the type III-secretion system effectors OspD1, IpgB and IpgC are secreted or translocated they release IpgC that binds to MxiE to activate the expression of the remaining virulence genes (Mavris et al. 2002; Parsot et al. 2005). Yet, the molecular mechanism exerted by the anti-activator protein still needs to be revealed. Another example described so far is that for BsaN, which also regulates the expression of virulence genes in Burkholderia pseudomallei. Similar to InvF and MxiE, BsaN requires the BicA chaperone to act as a co-activator (Sun et al. 2010). The molecular basis of this activity also needs clarification (Fig. 4). It appears rather evident that there is bias of regulation by co-activator, anti-activator and anti-anti-activator proteins for AraC/XylS regulators in the TFs regulating virulence genes (see the following section). Whether this is an evolutionary adaptation for these types of regulators or whether it is also distributed in the other classes needs verification. Support for the former is given by the fact that BsaN, InvF and MxiE form a small cluster (Fig. 5). Alternatively, given that the expression of virulence genes is finely tuned, that is, their expression depends on very specific environmental and location conditions, the need of co-activator, anti-activator and anti-anti-activator proteins might be a consequence of these temporal and spatial tunings in the regulatory process.

Phylogenetic tree of the DBD of well-characterized AraC/XylS TFs. The tree is based on the protein alignment of the DNA binding domain (DBD) of 126 well-characterized AraC/XylS transcription factors (see Table S1 for more details). Functional categories are as follows: general metabolism (M); adaptive and stress responses (S); virulence (V). The construction of the phylogenetic tree was carried out with the maximum likelihood method using the Dayhoff (PAM) substitution model as described before (Ibarra et al. 2008; Sachman-Ruiz et al. 2020). Numbers in the branches represent bootstrap values of 1000 replicates. Bar represents two substitutions per position.

Phylogenetic tree of the DBD of well-characterized AraC/XylS TFs. The tree is based on the protein alignment of the DNA binding domain (DBD) of 126 well-characterized AraC/XylS transcription factors (see Table S1 for more details). Functional categories are as follows: general metabolism (M); adaptive and stress responses (S); virulence (V). The construction of the phylogenetic tree was carried out with the maximum likelihood method using the Dayhoff (PAM) substitution model as described before (Ibarra et al. 2008; Sachman-Ruiz et al. 2020). Numbers in the branches represent bootstrap values of 1000 replicates. Bar represents two substitutions per position.

Lastly, these regulators can also be exposed to the action of proteases. Hence, their half-lives and functionality depend on whether they are degraded by these enzymes or not, and the level of expression is a function of both the rate of synthesis and the rate of degradation (Fig. 4). Examples of this regulation by proteases include SoxS, MarA, Rob, HilD, CdpR and ToxT (Griffith et al. 2004; Duval and Lister 2013; De la Cruz et al. 2015; Thomson, Plecha and Withey 2015; Zhao et al. 2016). Interestingly, for SoxS it appears that either binding of this regulator to the “soxbox” DNA binding site or the presence of the RNAP, inhibit the degradation of SoxS by the Lon protease. Furthermore, this protection might be related to either the removal or protection of this TF, depending on whether the organism needs to activate oxidative stress response genes (Shah and Wolf 2006a, b). Whether this model of regulation is distributed among other AraC-like proteins is yet to be uncovered.

In conclusion, TFs in the AraC/XylS family can be regulated at many levels, including regulation by inducing molecules, co-activator or repressing proteins and, in some cases, a combination of these.

Functional classification

Proteins in the AraC/XylS family have been classified by the genes they regulate: those involved in (1) the regulation of carbon metabolism, (2) stress response or (3) virulence (Gallegos et al. 1997; Egan 2002; Ibarra et al. 2008). For this review, we analyzed the dataset of experimentally characterized proteins ( Table S1 , Supporting Information ). In this way, the list of characterized members was manually curated and extended from 56 in a previous report (Ibarra et al. 2008) to 126. In order to update the classification we propose the next classes or groups: Metabolism in general (M), which would include members that regulate the expression of genes involved in any type of metabolism regardless of the source or product; the stress response (S) group, which would include those regulators involved in response to pH changes, DNA damage or antibiotic resistance; and lastly, the virulence group (V), which would include those TFs involved in regulation of virulence factors, including biofilm formation and iron or other metal scavenging (such as the siderophores). With this classification in mind, a slight majority of the 126 proteins shown in Table S1 ( Supporting Information ) belong to the M class (n = 53), followed by those in the V class (n = 47); the S class represents the lowest number of proteins (n = 26). A cautionary note should be added: despite the urge to make a simple classification, it is important to note that some of these proteins are able to regulate the expression of proteins involved in different activities than those previously suggested. Additionally, it is important to mention that many metabolism genes are needed for an effective virulence process and that the encounter of a pathogen with the immune system involves stress. Thus, this classification needs to be considered carefully. For instance, AggR regulates virulence genes in enteroaggregative E. coli, but it has just recently been shown that it also controls the expression of genes involved in lipid metabolism (Belmont-Monroy et al. 2020). Similarly, other activities besides virulence have been shown for VirF, a member of class V (Di Martino et al. 2016). In addition to this examples, there are others, so one should keep in mind that regulation is more of a grey-scale situation, rather than a black and white one. With such consideration in mind, analysis of these 126 TFs is described in the following sections.

Given the fact that these TFs are grouped into such classes we previously reported that the DBDs of many of them are clustered, not into three clear branches (one for each group) but into discrete clusters for each of the classification groups (Ibarra et al. 2008). By applying the same logic to the expanded list of characterized proteins, the analysis of the 126 characterized proteins corroborated our previous findings and suggests that there might be a common evolutionary ancestor for some of these proteins (Fig. 5). It also supports the idea that for some members of this family it is possible to predict their role by analyzing the DBD through phylogenetic means. For instance, this approach was recently used to predict the class of a protein with identity with this family identified in V. cholerae that was shown to regulate genes encoding for a siderophore (Sachman-Ruiz et al. 2020). This was reinforced by the genomic context around the gene coding for the putative new AraC/XylS member. Of course this hypothesis should be further tested in many more cases to strengthen the prediction and also it should include more proteins when they are experimentally characterized.

Applications

All the knowledge accumulated on the AraC/XylS family has not only shed light on the molecular basis of their mechanisms of action, but also has shown a further utility (Fig. 6). Once the regulatory gears for AraC regulation were understood, this and the araBAD promoter have been used in genetic systems for the production of recombinant proteins (Guzman et al. 1995; Haldimann, Daniels and Wanner 1998; Brautaset, Lale and Valla 2009; Gawin, Valla and Brautaset 2017) or for tuning the expression in other systems (Fig. 6A). A similar pattern has been applied for XylS (Damalas et al. 2020). These systems take advantage of the particular abilities of some of these proteins to detect specific molecules. Another application for members of the AraC/XylS family is as whole-cell biosensors (Fernandez-López et al. 2015; Frazão et al. 2018). In these systems, designed by genetic engineering and synthetic biology, the signal sensing TF is coupled with a reporter system and helps in the detection of pollutants, among other compounds. Moreover, there are some examples in which the ability for signaling molecules has been modified to detect particular molecules, as in the case of XylS (Galvão and de Lorenzo 2006; Ogawa et al. 2019), XylR (Wei et al. 2020; Tang et al. 2020) and FapR (Kalkreuter et al. 2019). Thus, one can expect that more biosensors will be developed either using the natural ability of the AraC-like proteins or by modifying it.

Biotechnology and medical applications for the AraC/XylS proteins. Members of the family are used in expression systems for recombinant proteins either using wild-type proteins or modified ones (A). (B) Others have been adapted as biosensors either directly or by modifying either the CD (represented by a yellow triangle) or the DBD to detect defined signal molecules (blue square). (C) illustrates the use of synthetic or natural chemical compounds as inhibitors for the activity of AraC-like regulators.

Biotechnology and medical applications for the AraC/XylS proteins. Members of the family are used in expression systems for recombinant proteins either using wild-type proteins or modified ones (A). (B) Others have been adapted as biosensors either directly or by modifying either the CD (represented by a yellow triangle) or the DBD to detect defined signal molecules (blue square). (C) illustrates the use of synthetic or natural chemical compounds as inhibitors for the activity of AraC-like regulators.

Additionally, the emergence of multidrug resistant bacteria has driven the search for alternatives or complements for antibiotics treatment and this involves anti-virulence compounds (Lyons and Strynadka 2019). In that sense, the search for molecules with the potential to inhibit activation by AraC-like proteins, especially for those proteins in the S and V categories, is under way. For instance, John Mekalanos’ lab discovered virstatin (4-[N-(1,8-naphthalimide)]-n-butyric acid), which is able to inhibit the expression of the cholera toxin and the toxin coregulated pilus by preventing the dimerization of the AraC-like protein ToxT (Hung et al. 2005; Shakhnovich et al. 2007). Other compounds have been described for other members of the family (Skredenske et al. 2013; Koppolu et al. 2013; Duval and Lister 2013; Emanuele and Garcia 2015; Bosire et al. 2020). These compounds work mainly by modifying the DNA binding ability of the TFs once they bind to the CD. If the results are successful, it will allow their use to stop the colonization or invasion of animal and human hosts (Fig. 6B).

CONCLUSIONS

Since the foundation of this family of transcriptional regulators, more and more members have joined; similarly more experimental evidence has accumulated, shedding light on multiple regulatory models for the AraC/XylS-like proteins. For now, it seems that the predominance of proteins are activators, but this will become clearer as more and more members are experimentally characterized. We consider that by using the DBDs phylogenetic relationship it is possible to propose a role as regulator for M, V or S genes. The ability to detect molecules is distributed mostly between V and M members but still more work is needed in order to define the role for more companion domains. Definitively, as more techniques, data and informatics tools become available, the AraC/XylS family will be even more fascinating.

ACKNOWLEDGEMENTS

We would like to thank all the members of our research groups for their support and hard work in the lab. Dr Andy Weiss is very much appreciated for his assistance. EP-R appreciates the technical support of Israel Sanchez, Joaquin Morales and Sandra Sauza. We deeply appreciate Martha Thayer for thoroughly proofreading this manuscript.

FUNDING

This work was supported by a grant from Consejo Nacional de Ciencia y Tecnología (CONACYT A1-S-25438;“Proyecto Apoyado por el Fondo Sectorial de Investigación para la Educación”) and partially by Secretaría de Investigación y Posgrado (IPN; SIP-20200728) to JAI. EP-R is supported by “Dirección General de Asuntos del Personal Académico-Universidad Nacional Autónoma de México” (IN-209620) and CYTED (P918PTE0261). DC-A holds a scholarship from CONACYT (935269). MAO-M and AJ-G had a BEIFI-IPN scholarship. JAI and PES also receive support from COFAA-IPN, EDI-IPN and SNI-CONACyT. The funders had no role in the design of the study, the data collection and analysis, the decision to publish or the preparation of the manuscript.