pAgo

Contributors: Daan Swarts

Description

Argonaute proteins comprise a diverse protein family and can be found in both prokaryotes and eukaryotes () . Despite low sequence conservation, eAgos and long pAgos generally have a conserved domain architecture and share a common mechanism of action; they use a 5'-phosphorylated single-stranded nucleic acid guide (generally 15-22 nt in length) to target complementary nucleic acid sequences () eAgos strictly mediate RNA-guided RNA silencing, while pAgos show higher mechanistic diversification, and can make use of guide RNAs and/or single-stranded guide DNAs to target RNA and/or DNA targets () . Depending on the presence of catalytic residues and the degree of complementarity between the guide and target sequences, eAgo and pAgos either cleave the target or recruit and/or activate accessory proteins. This can result in degradation of the target nucleic acid, but might also trigger alternative downstream effects, ranging from poly(A) tail shortening and RNA decapping () or chromatin formation in eukaryotes () , to abortive infection in prokaryotes () .

Molecular mechanism

Based on their phylogeny, Agos have been subdivided into various (sub)clades. eAgos are generally subdivided into the AGO and PIWI clades, but these will not be discussed further here. pAgos can be further subdivided into long-A pAgos, long-B pAgos, short pAgos, SiAgo-like pAgos, and PIWI-RE proteins (, , , , ) . Below, we briefly outline the general mechanism of pAgos that have a demonstrated role in host defense.

Long-A pAgos

Akin to eAgos, most long A-pAgos characterized to date have a N-L1-PAZ-L2-MID-PIWI domain architecture () . In contrast to eAgos, however, certain long-A pAgos use a single-stranded guide DNA to bind and cleave complementary target DNA sequences (, , , , , ) . Long-A pAgos are preferentially programmed with guide DNAs targeting invading DNA through a poorly understood mechanism, which might involve DNA repair proteins () or the pAgo itself (, ) . Most long-A pAgos have an intact catalytic site in the PIWI domain which allows to cleave their targets () . As such, they act as an innate immune system that clears plasmid and phage DNA from the cell (, , , , ) .

Within the long-A pAgo clade, various subclades of other pAgos exist that rely on distinct function mechanisms. For example, various long-A pAgo can (additionally) use guide RNAs and/or cleave RNA targets. Furthermore, CRISPR-associated pAgos use 5'-OH guide RNAs to target DNA () , and PliAgo-like pAgos use small DNA guides to target RNA () . Certain long-A pAgos genetically co-localize with other putative enzymes including (but not limited to) putative nucleases, helicases, DNA-binding proteins, or PLD-like proteins (, ) . The relevance of these associations is currently unknown.

Long-B pAgos

Akin to long-A pAgogs, long B-pAgos have a N-L1-PAZ-L2-MID-PIWI domain composition, but most have a shorter PAZ* domain, and in contrast to long-A pAgos all long-B pAgos are catalytically inactive () . Long-B pAgos characterized to date use guide RNAs to bind invading DNA (, , ) . In the absence of co-encoded proteins, long-B pAgos repress invader activity () . In addition, most long-B pAgos are co-encoded with effector proteins including (but not limited to) SIR2, nucleases, membrane proteins, and restriction endonucleases (, , , ) . These effector proteins are activated upon pAgo-mediated invader detection, and generally catalyze reactions that result in cell death () . As such, long-B pAgo together with their associated proteins mediate abortive infection.

Short pAgos

Short pAgos are truncated: they only contain the MID and PIWI domains essential for guide-mediate target binding () . They are catalytically inactive and are co-encoded with an APAZ domain that is fused to one of the various effector domains. In short pAgo systems characterized to date, the short pAgo and the APAZ domain-containing protein form a heterodimeric complex (, ) . Within this complex, the short pAgo uses a guide RNA to bind complementary target DNAs. This triggers catalytic activation of the effector domain fused to the APAZ domain, generally resulting in cell death (, ) . As such, short pAgo systems mediate abortive infection.

Based on their phylogeny, short pAgos are subdivided in S1A, S1B, S2A, and S2B clades (, ) . In clade S1A and S1B (SPARSA) systems, APAZ is fused to a SIR2 domain. In clade S2A (SPARTA) systems, APAZ is fused to a TIR domain. Both SPARSA and SPARTA systems trigger cell death by depletion of NAD(P)+ (, ) . In S2B clade systems, APAZ is fused to one or more effector domains, including Mrr-like, DUF4365, RecG/DHS-like and other domains. In all clade S1A SPARSA systems, but also for certain other systems within other clades, the effector-APAZ is fused to the short pAgo.

Pseudo-short pAgos

Akin to short pAgos, pseudo-short pAgos are comprised of the MID and PIWI domains only () . However, they do not phylogenetically cluster with canonical short pAgos and do not colocalize with effector-APAZ proteins. Instead, certain pseudo-short are found across the long-A and long-B pAgo clades (e.g. Archaeoglobus fulgidus pAgo, a truncated long-B pAgo (, ) ), while others form a distinct branch in the phylogenetic pAgo tree (see SiAgo-like pAgos below).

SiAgo-like pAgos

SiAgo-like pAgos are pseudo-short pAgos that form a separate branch in the phylogenetic tree of pAgos. They are named after the type system from Sulfolobus islandicus () . SiAgo is comprised of MID and PIWI domains and is co-encoded with Ago-associated proteins Aga1 and Aga2. SiAgo and Aga1 form a cytoplasmic heterodimeric complex. While it is currently unknown what guide/target types activate the SiAgo/Aga1 complex, it is directed toward membrane-localized Aga2 upon viral infection. This triggers Aga2-mediated membrane depolarization and causes cell death () .

Example of genomic structure

A total of 6 subsystems have been described for the pAgo system. Here are some examples found in the RefSeq database:

The pAgo_LongA system in Halosimplex pelagicum (GCF_013415905.1, NZ_CP058909) is composed of 1 protein: pAgo_LongA (WP_179918860.1)

The pAgo_LongB system in Serratia fonticola (GCF_019252525.1, NZ_CP072742) is composed of 2 proteins pAgo_LongB (WP_218520044.1) EcAgaN (WP_235784821.1)

The pAgo_S1A system in Parabacteroides merdae (GCF_020735605.1, NZ_CP085927) is composed of 2 proteins pAgo_S1A (WP_227945673.1) pAgo_S1A (WP_227945674.1)

The pAgo_S1B system in Comamonas flocculans (GCF_007954405.1, NZ_CP042344) is composed of 2 proteins SIR2APAZ (WP_146914209.1) pAgo_S1B (WP_146913473.1)

The pAgo_S2B system in Granulicella tundricola (GCF_000178975.2, NC_015064) is composed of 2 proteins XAPAZ (WP_013581437.1) pAgo_S2B (WP_013581438.1)

The pAgo_SPARTA system in Roseivivax sp. THAF30 (GCF_009363575.1, NZ_CP045389) is composed of 2 proteins TIRAPAZ (WP_152461295.1) pAgo_SPARTA (WP_152461296.1)

Distribution of the system among prokaryotes

Among the 22,803 complete genomes of RefSeq, the pAgo is detected in 575 genomes (2.52 %).

The system was detected in 464 different species.

Proportion of genome encoding the pAgo system for the 14 phyla with more than 50 genomes in the RefSeq database.

Structure

pAgo

Example 1

Experimental validation

      
graph LR;
    Kuzmenko_2020[Kuzmenko et al., 2020] --> Origin_0
    Origin_0[ Ago
Clostridium butyricum 
WP_045143632.1] --> Expressed_0[Escherichia coli]
    Expressed_0[Escherichia coli] ----> M13 & P1vir
    Xing_2022[Xing et al., 2022] --> Origin_1
    Origin_1[Natronobacterium gregoryi 
WP_005580376.1] --> Expressed_1[Escherichia coli]
    Expressed_1[Escherichia coli] ----> T7
    Zaremba_2022[Zaremba et al., 2022] --> Origin_2
    Origin_2[ GsSir2/Ago
Geobacter sulfurreducens 
WP_010942012.1, WP_010942011.1] --> Expressed_2[Escherichia coli]
    Expressed_2[Escherichia coli] ----> LambdaVir & SECphi27
    Zaremba_2022[Zaremba et al., 2022] --> Origin_2
    Origin_2[ GsSir2/Ago
Geobacter sulfurreducens 
WP_010942012.1, WP_010942011.1] --> Expressed_2[Escherichia coli]
    Expressed_2[Escherichia coli] ----> LambdaVir & SECphi27
    Zaremba_2022[Zaremba et al., 2022] --> Origin_3
    Origin_3[ CcSir2/Ago
Caballeronia cordobensis 
WP_053571900.1, WP_053571899.1] --> Expressed_3[Escherichia coli]
    Expressed_3[Escherichia coli] ----> LambdaVir
    Zaremba_2022[Zaremba et al., 2022] --> Origin_4
    Origin_4[ PgSir2/Ago
Paraburkholderia graminis 
WP_006053074.1] --> Expressed_4[Escherichia coli]
    Expressed_4[Escherichia coli] ----> LambdaVir & SECphi27
    Lisitskaya_2022[Lisitskaya et al., 2023] --> Origin_5
    Origin_5[ Ago
Exiguobacterium marinum] --> Expressed_5[Escherichia coli]
    Expressed_5[Escherichia coli] ----> P1vir
    Garb_2022[Garb et al., 2022] --> Origin_6
    Origin_6[ Sir2/Ago
Geobacter sulfurreducens 
NP_952413, NP_952414] --> Expressed_6[Escherichia coli]
    Expressed_6[Escherichia coli] ----> LambdaVir
    Zeng_2021[Zeng et al., 2022] --> Origin_7
    Origin_7[ SiAgo/Aga1/Aga2
Sulfolobus islandicus 
WP_012735993.1, WP_012718851.1,
WP_012735992.1] --> Expressed_7[Sulfolobus islandicus]
    Expressed_7[Sulfolobus islandicus] ----> SMV1
    subgraph Title1[Reference]
        Kuzmenko_2020
        Xing_2022
        Zaremba_2022
        Lisitskaya_2022
        Garb_2022
        Zeng_2021
end
    subgraph Title2[System origin]
        Origin_0
        Origin_1
        Origin_2
        Origin_2
        Origin_3
        Origin_4
        Origin_5
        Origin_6
        Origin_7
end
    subgraph Title3[Expression species]
        Expressed_0
        Expressed_1
        Expressed_2
        Expressed_2
        Expressed_3
        Expressed_4
        Expressed_5
        Expressed_6
        Expressed_7
end
    subgraph Title4[Protects against]
        M13
        P1vir
        T7
        LambdaVir
        SECphi27
        LambdaVir
        SECphi27
        LambdaVir
        LambdaVir
        SECphi27
        P1vir
        LambdaVir
        SMV1
end
    style Title1 fill:none,stroke:none,stroke-width:none
    style Title2 fill:none,stroke:none,stroke-width:none
    style Title3 fill:none,stroke:none,stroke-width:none
    style Title4 fill:none,stroke:none,stroke-width:none

    

References

10.1016/j.cell.2022.03.012
no authors
no containerTitle ()
10.1016/j.chom.2022.04.015
no authors
no containerTitle ()
10.1038/s41564-022-01207-8
no authors
no containerTitle ()
10.1038/s41586-020-2605-1
no authors
no containerTitle ()
10.1186/1745-6150-4-29
no authors
no containerTitle ()
10.1038/s41564-022-01239-0
no authors
no containerTitle ()
10.1016/j.mib.2023.102313
no authors
no containerTitle ()
10.1038/nsmb.2879
no authors
no containerTitle ()
10.1016/J.CELL.2018.03.006
no authors
no containerTitle ()
10.1038/s41580-022-00528-0
no authors
no containerTitle ()
10.1016/j.tcb.2022.10.005
no authors
no containerTitle ()
10.1128/mBio.01935-18
no authors
no containerTitle ()
10.1186/1745-6150-8-13
no authors
no containerTitle ()
10.1093/nar/gkz306
no authors
no containerTitle ()
10.1093/nar/gkz379
no authors
no containerTitle ()
10.1093/nar/gkv415
no authors
no containerTitle ()
10.1038/nature12971
no authors
no containerTitle ()
10.1038/nmicrobiol.2017.35
no authors
no containerTitle ()
10.1016/j.molcel.2017.01.033
no authors
no containerTitle ()
10.1038/nmicrobiol.2017.34
no authors
no containerTitle ()
10.1073/pnas.1321032111
no authors
no containerTitle ()
10.1093/nar/gkad290
no authors
no containerTitle ()
10.1073/pnas.1524385113
no authors
no containerTitle ()
10.1038/s41467-022-32079-5
no authors
no containerTitle ()
10.1038/s41598-023-32600-w
no authors
no containerTitle ()
10.1016/j.molcel.2013.08.014
no authors
no containerTitle ()
10.1038/s41467-023-42793-3
no authors
no containerTitle ()
10.1038/s41598-021-83889-4
no authors
no containerTitle ()