PhytoSyn — Sample Reports

Sample Reports

← Back to Sign In

Choose a report:

Rice — Pangenomes

2020–2022 (pre-2023 data) 3 papers

Review Article

Beyond the Single Reference: Pangenome Construction, Structural Variation, and Functional Gene Discovery in Rice

Abstract

The transition from single-reference to pangenomic frameworks is reshaping rice genetics. This review synthesizes three bioRxiv preprints (2020–2022) that collectively advance rice pangenomics through gap-free genome assembly, pangenome-scale genotyping arrays, and computational pipelines for functional gene identification. Together, these studies establish that structural variants — including presence-absence and copy-number variation — account for substantial phenotypic diversity missed by SNP-centric approaches. Whether these pangenomic resources can accelerate breeding for complex traits in polyploid and orphan cereals remains to be determined.

Introduction

Rice (Oryza sativa) feeds more than half of the global population, and its compact diploid genome (~390 Mb) has made it the reference cereal for comparative genomics since the landmark IRGSP assembly in 2005. However, a single reference genome captures only a fraction of species-level diversity. Population resequencing studies repeatedly demonstrated that 5–15% of genes in any given accession are absent from the Nipponbare reference (Wang et al., 2018; Zhao et al., 2018), suggesting that agronomically important variation hides in the "dispensable" genome.

The pangenome concept — cataloguing all genomic sequences across a species — has gained traction as long-read sequencing costs have declined. In rice, early pangenome efforts used short reads to call presence-absence variants (PAVs), but these approaches struggled with repetitive regions and complex structural rearrangements. The preprints reviewed here represent a new wave: telomere-to-telomere (T2T) assemblies that resolve centromeres and heterochromatin (Song et al., 2021), array-based genotyping platforms that translate pangenomic knowledge into breeding tools (Daware et al., 2022), and computational frameworks that connect structural variants to gene function (Wang et al., 2022). In this review, we synthesize these advances to assess how pangenomics is transforming rice functional genomics and crop improvement.

Gap-Free Assembly and the Architecture of Rice Centromeres

Song et al. (2021) reported two gap-free reference genomes for xian/indica rice cultivars, achieving complete telomere-to-telomere coverage. Using a combination of PacBio HiFi reads and Hi-C scaffolding, the authors resolved all 12 centromeres — regions that had been collapsed or absent in previous assemblies. The centromeric regions were characterized by tandem arrays of the CentO satellite repeat interspersed with Ty3/gypsy retrotransposons, with centromere size varying from 0.5 to 3.2 Mb across chromosomes. This work demonstrated that centromeric and pericentromeric regions harbor lineage-specific structural variants and active genes that are invisible to short-read resequencing. The validation strategy, combining optical mapping with FISH cytogenetics, provides a high level of confidence in the assembly completeness, though whether the centromere architectures described here are representative of the broader indica gene pool remains to be determined with additional T2T assemblies.

Pangenome-Aware Genotyping for Breeding

Translating pangenomic knowledge into tools accessible to breeding programs was the focus of Daware et al. (2022), who developed the Rice Pan-genome Array (RPGA). This SNP array was designed using variant data from 4,726 diverse accessions spanning indica, japonica, aus, and wild rice groups. The array captures not only single-nucleotide polymorphisms but also tags for PAVs and structural variants identified through pangenome analysis. Consistent with this approach, the RPGA achieved higher resolution of indica–japonica population structure than conventional 50K arrays and detected association signals at known loci for grain quality and disease resistance that were missed by previous GWAS based on SNPs alone. However, the array remains limited by the diversity of accessions used in its design, and rare variants in landraces or wild progenitors may still be underrepresented.

Computational Pipelines Linking Structure to Function

Wang et al. (2022) addressed a persistent bottleneck in pangenomics: connecting structural variants to gene function. Their PSVCP pipeline integrates variant calling, gene-level PAV detection, and expression quantitative trait locus (eQTL) mapping into a unified framework. Applied to a rice diversity panel, PSVCP identified 1,247 genes with significant PAV–expression associations, including several previously unannotated genes in dispensable genome segments. The approach provides a systematic way to prioritize candidate genes from pangenomic data, though validation of individual candidates through reverse genetics will be necessary to confirm functional assignments. The pipeline's reliance on short-read data for variant calling means that complex structural variants — precisely the category most likely to be functionally relevant — may be undercalled.

Open Questions and Future Directions

First, how representative are current rice pangenomes of the full species diversity? The T2T assemblies of Song et al. (2021) cover only indica subtypes; extending this to japonica, aus, and wild Oryza species is essential. Graph-based pangenome references that incorporate hundreds of assemblies are technically feasible with current sequencing costs but require substantial computational infrastructure. Second, can pangenome-aware GWAS improve the genetic architecture of complex traits such as yield and drought tolerance? The RPGA platform (Daware et al., 2022) provides a foundation, but large-scale field phenotyping across environments will be needed. Third, the functional validation bottleneck identified by Wang et al. (2022) — linking PAVs to phenotypes — calls for high-throughput reverse genetics approaches such as CRISPR-based screens targeting dispensable genes.

Authors	Year	Key Finding	System	Approach
Song et al.	2021	Gap-free T2T indica genomes resolve centromere architecture	Rice (xian/indica)	PacBio HiFi + Hi-C + optical mapping
Daware et al.	2022	Pangenome-aware SNP array captures PAV-tagged associations	4,726 rice accessions	Array design + GWAS
Wang et al.	2022	PSVCP pipeline links 1,247 PAVs to expression variation	Rice diversity panel	Computational pipeline + eQTL

Papers Referenced

Assembly and Validation of Two Gap-free Reference Genomes for Xian/indica Rice Reveals Insights into Plant Centromere Architecture

Song J-M, Xie W-Z, Wang S, Guo Y-X, Koo D-H, Kudrna D, et al. · 2021-01-01

Rice Pan-genome Array (RPGA): an efficient genotyping solution for pan-genome-based accelerated crop improvement in rice

Daware A, Malik A, Srivastava R, Das D, Ellur RK, Singh AK, Tyagi AK, Parida SK · 2022-01-21

A pangenome analysis pipeline (PSVCP) provides insights into rice functional gene identification

Wang J, Wu Y, Zhang S, Hu H, Yuan Y, Dong J, et al. · 2022-06-17

Wheat — CRISPR Gene Editing

2021–2022 (pre-2023 data) 4 papers

Review Article

Genome Editing in Hexaploid Wheat: Delivery Methods, Multiplexed Targeting, and the Path Toward Transgene-Free Precision Breeding

Abstract

CRISPR-based genome editing in wheat (Triticum aestivum) faces unique challenges arising from its large (~17 Gb), allohexaploid genome with three homeologous copies of most genes. This review synthesizes four bioRxiv preprints (2021–2022) that address critical bottlenecks in wheat genome editing: viral delivery of guide RNAs for multiplexed promoter and gene editing (Wang et al., 2022), biolistic optimization for improved mutation efficiency (Tanaka et al., 2022), in planta ribonucleoprotein delivery to bypass tissue culture (Kumagai et al., 2021), and systematic base-editor optimization across cereals (Gaillochet et al., 2022). Collectively, these studies are converging toward transgene-free, high-efficiency editing pipelines that could transform wheat improvement for grain quality, disease resistance, and climate adaptation.

Introduction

Wheat is the most widely grown crop on Earth, providing approximately 20% of calories and protein consumed globally. Its allohexaploid genome — derived from three ancestral diploid species — means that functional redundancy among homeologs often masks loss-of-function mutations, requiring simultaneous editing of all three copies to achieve a phenotype. This constraint, combined with wheat's notoriously low transformation and regeneration efficiency, has made it one of the most challenging crops for genome editing.

The first successful CRISPR edits in wheat were reported in 2014, targeting simple genes with clear loss-of-function phenotypes. Since then, the field has expanded rapidly, driven by demand for reduced-allergen flour, lower-asparagine grain (to reduce acrylamide in baking), and durable disease resistance. However, three persistent bottlenecks have limited the translation of CRISPR from proof-of-concept to breeding: the inefficiency of Agrobacterium-mediated transformation in elite cultivars, the difficulty of multiplexed editing across homeologs, and regulatory concerns about transgene integration. The preprints reviewed here each tackle one or more of these barriers. In this review, we evaluate the strengths and limitations of these approaches and assess their collective implications for wheat precision breeding.

Virus-Based Guide RNA Delivery for Multiplexed Editing

Wang et al. (2022) developed a system in which guide RNAs are delivered via a barley stripe mosaic virus (BSMV) vector to wheat plants constitutively expressing Cas9. This approach enables rapid testing of multiple sgRNAs without regenerating new transgenic lines for each target. The authors demonstrated simultaneous editing of promoter regions and coding sequences at two independent loci, achieving heritable edits in T1 progeny. The viral delivery system is particularly suited for multiplexed experiments because new guides can be cloned into the BSMV backbone in days rather than the months required for stable transformation. However, the requirement for a pre-existing Cas9-expressing line limits the system's applicability to cultivars where such lines are available. Whether viral delivery can achieve the editing efficiencies needed for simultaneous knockout of all three homeologs at a single locus — a common breeding objective — was not directly tested.

Optimizing Biolistic and Ribonucleoprotein Delivery

Tanaka et al. (2022) systematically optimized biolistic (gene gun) parameters for CRISPR delivery in wheat, testing gold particle size, DNA coating ratios, bombardment pressure, and target tissue age. Their optimized protocol achieved a 3.4-fold improvement in mutation efficiency compared to standard conditions, and critically, demonstrated efficient editing in the recalcitrant cultivar Fielder using immature embryo-derived callus. Consistent with this emphasis on delivery optimization, Kumagai et al. (2021) pursued an entirely different strategy: direct delivery of Cas9-sgRNA ribonucleoprotein (RNP) complexes to wheat embryos in planta, bypassing tissue culture entirely. The in planta approach yielded low but reproducible editing rates (1–3% of treated embryos), with the significant advantage that edited plants are by definition transgene-free. The two studies present a trade-off that is central to current wheat editing: biolistic delivery of DNA constructs achieves higher efficiency but carries the risk of transgene integration, while RNP delivery is inherently transgene-free but currently too inefficient for routine use. Whether RNP delivery efficiency can be improved through electroporation or nanoparticle-assisted uptake remains an active area of investigation.

Base Editing Across Cereals

Gaillochet et al. (2022) addressed the challenge of precision mutations — single-nucleotide changes rather than knockouts — through systematic optimization of Cas12a-linked base editors in wheat and maize. Using their ITER (Iterative Testing and Evolution of Reagents) platform, the authors tested 96 base-editor variants across multiple target sites, identifying configurations that achieved C-to-T conversion rates of up to 40% in wheat protoplasts. In contrast to Cas9-based knockouts, base editing can introduce specific gain-of-function or regulatory mutations, making it a more versatile tool for trait improvement. The study's limitation is that protoplast-to-plant regeneration was not demonstrated for the optimized editors, leaving open the question of whether high protoplast editing rates translate to whole-plant outcomes. The ITER platform itself, however, represents a generalizable approach for rapid reagent optimization that could accelerate editing in other recalcitrant crop species.

Open Questions and Future Directions

Several critical questions emerge from this synthesis. First, can transgene-free editing methods — RNP delivery (Kumagai et al., 2021) or transient viral expression (Wang et al., 2022) — be scaled to efficiencies sufficient for breeding programs, or will stable transformation remain the workhorse? The 1–3% efficiency of in planta RNP delivery is currently impractical for targets requiring simultaneous tri-homeolog editing. Second, how should base editing be deployed for complex, quantitative traits in wheat? Gaillochet et al. (2022) demonstrated the feasibility of precise single-nucleotide changes, but identifying which nucleotide changes will improve yield or quality requires extensive prior functional genomics data that is still sparse for wheat. Third, the regulatory landscape for genome-edited crops varies dramatically by jurisdiction; whether transgene-free approaches will receive expedited regulatory approval in major wheat-producing nations is an open policy question with direct implications for technology adoption. Fourth, delivery innovations from outside plant biology — including lipid nanoparticles and cell-penetrating peptides — have shown promise in animal systems and are beginning to be tested in plant protoplasts; whether these can be adapted for in planta delivery in cereals deserves systematic evaluation.

Authors	Year	Key Finding	System	Approach
Wang et al.	2022	BSMV-delivered sgRNAs enable multiplexed editing in Cas9 wheat	Wheat (Cas9 line)	Viral guide delivery
Tanaka et al.	2022	Optimized biolistics achieve 3.4-fold improvement in editing	Wheat cv. Fielder	Biolistic parameter optimization
Kumagai et al.	2021	In planta RNP delivery yields transgene-free edits at 1–3%	Wheat embryos	Ribonucleoprotein in planta
Gaillochet et al.	2022	ITER platform optimizes Cas12a base editors to 40% C-to-T in protoplasts	Wheat + maize protoplasts	Base editor screening (ITER)

Papers Referenced

Multiplexed promoter and gene editing in wheat using the virus-based guide RNA delivery system

Wang W, Yu Z, He F, Bai G, Trick HN, Akhunova A, Akhunov E · 2022-04-06

Improvement of Gene Delivery and Mutation Efficiency in the CRISPR-Cas9 Wheat Genomics System via Biolistics

Tanaka J, Minkenberg B, Poddar S, Staskawicz B, Cho M-J · 2022-04-10

In planta genome editing with CRISPR/Cas9 ribonucleoproteins

Kumagai Y, Liu Y, Hamada H, Luo W, Zhu J, Kuroki M, et al. · 2021-06-24

Systematic optimization of Cas12a base editors in wheat and maize using the ITER platform

Gaillochet C, Pena Fernandez A, Goossens V, D'Halluin K, Jacobs TB, et al. · 2022-05-11

Arabidopsis — Single-Cell Transcriptomics

2020–2022 (pre-2023 data) 4 papers

Review Article

Cellular Cartography of the Arabidopsis Root: Single-Cell Transcriptomics Reveals Developmental Trajectories, Regulatory Landscapes, and Pathogen Response Programs

Abstract

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for dissecting the cellular heterogeneity of plant tissues. This review synthesizes four bioRxiv preprints (2020–2022) that apply scRNA-seq to the Arabidopsis root — the best-characterized plant organ at the anatomical and genetic levels. These studies collectively construct a high-resolution cell atlas of wild-type and mutant roots (Shahan et al., 2020), map the cis-regulatory logic underlying cell identity (Dorrity et al., 2020), resolve the transcriptional dynamics of lateral root initiation at single-cell resolution (Gala et al., 2020), and reveal cell-type-specific immune responses to bacterial pathogens (Zhu et al., 2022). Together, they establish a reference framework for plant single-cell biology while exposing the technical and biological limitations that must be addressed as the field matures.

Introduction

The Arabidopsis root is organized in a radially symmetric pattern of concentric cell layers — epidermis, cortex, endodermis, pericycle, and vasculature — each with distinct gene expression programs. This organization, combined with the accessibility of the root to live imaging and the wealth of cell-type-specific marker lines, has made it the primary testbed for single-cell genomics in plants. The first plant scRNA-seq studies appeared in 2019, profiling Arabidopsis root tips with droplet-based platforms (10x Genomics Chromium). By 2020–2022, the field had progressed from proof-of-concept cell-type classification to mechanistic questions: how are cell identities established and maintained? How do individual cells within a tissue respond differently to environmental signals?

The transition from bulk to single-cell transcriptomics has been particularly revelatory in plants because protoplasting — the enzymatic removal of the cell wall required for droplet capture — introduces transcriptional artifacts that confound interpretation. Each of the studies reviewed here addresses this challenge differently, and the degree to which protoplasting artifacts have been controlled remains an active debate. In this review, we evaluate how these four preprints collectively advance our understanding of root cell identity, developmental plasticity, and stress signaling at single-cell resolution.

Constructing the Root Cell Atlas

Shahan et al. (2020) generated the most comprehensive single-cell atlas of the Arabidopsis root to date, profiling over 110,000 cells from wild-type roots alongside roots of key cell-identity mutants (scarecrow, shortroot, and wooden leg). By comparing wild-type and mutant trajectories, the authors identified transcription factors whose loss redirected developmental trajectories rather than simply ablating cell types. For instance, scarecrow mutant cells adopted a hybrid cortex-endodermis identity rather than defaulting to an undifferentiated state, suggesting that cell fate specification in the root involves competitive interactions between regulatory programs rather than a simple linear hierarchy. The inclusion of mutant datasets substantially strengthens the atlas's utility as a reference, though the reliance on a single developmental time point (5-day-old seedlings) means that age-dependent changes in cell composition are not captured.

Regulatory Landscapes Underlying Cell Identity

Dorrity et al. (2020) complemented the transcriptomic atlas with an analysis of chromatin accessibility at single-cell resolution, using single-cell ATAC-seq (scATAC-seq) in Arabidopsis roots. By integrating scRNA-seq and scATAC-seq data, the authors identified cell-type-specific cis-regulatory elements and predicted transcription factor binding events driving cell differentiation. The analysis revealed that epidermal and vascular cells maintain highly distinct regulatory landscapes, with trichoblasts (root-hair-forming epidermal cells) showing a strikingly open chromatin configuration at loci associated with tip growth and nutrient uptake. However, the correlation between chromatin accessibility and gene expression was imperfect — approximately 30% of differentially accessible regions did not correspond to differentially expressed nearby genes — highlighting the complexity of gene regulatory logic and the limitations of inferring regulation from accessibility data alone. Whether the discrepant regions reflect poised enhancers, long-range regulatory interactions, or technical noise remains to be determined.

Lateral Root Initiation at Single-Cell Resolution

Gala et al. (2020) applied scRNA-seq to a developmental process — lateral root initiation — that involves the reprogramming of pericycle founder cells into a new meristem. By profiling roots at multiple stages of lateral root development, the authors identified a transient progenitor population that expresses markers of both pericycle identity and stem cell activity. This intermediate state had been hypothesized from lineage tracing experiments but had never been captured transcriptomically. The study further showed that auxin-responsive genes are activated heterogeneously among pericycle cells, with only a subset responding to the lateral root initiation signal. This finding is consistent with a stochastic competence model in which pericycle cells must be in a permissive transcriptional state to respond to auxin. The temporal resolution of the study — profiling at 0, 6, and 24 hours post-induction — provides useful snapshots but may miss rapid transcriptional dynamics occurring in the first minutes to hours of lateral root initiation.

Cell-Type-Specific Immune Responses

Zhu et al. (2022) extended single-cell profiling from development to biotic stress, analyzing the Arabidopsis root response to Pseudomonas syringae infection. The study revealed that immune activation is not uniform across cell types: epidermal and cortical cells mounted the strongest transcriptional response, upregulating pathogen-associated molecular pattern (PAMP) receptor genes and defense-related metabolic pathways, while endodermal and vascular cells showed a delayed and attenuated response. This spatial heterogeneity in immune activation suggests that the outer cell layers serve as the primary immunological barrier in roots, consistent with their physical exposure to soil-borne pathogens. The study was limited to a single pathogen and a single time point post-infection; whether similar spatial patterns hold for fungal pathogens or across the time course of infection remains an open question.

Open Questions and Future Directions

The studies reviewed here collectively raise several questions that will shape the next phase of plant single-cell biology. First, how significant are protoplasting artifacts, and can they be systematically corrected? All four studies used enzymatic digestion to generate protoplasts, a process known to induce wound-response genes. Nucleus-based methods (single-nucleus RNA-seq) that avoid protoplasting are beginning to be applied in plants and may resolve this issue, though at the cost of losing cytoplasmic transcripts. Second, can single-cell approaches scale to crop species with larger, more complex genomes? Technical advances in combinatorial indexing and spatial transcriptomics (which avoids dissociation entirely) are promising but have not yet been widely adopted in plant systems. Third, the integration of scRNA-seq with scATAC-seq demonstrated by Dorrity et al. (2020) points toward multi-omic single-cell profiling; extending this to include protein (CyTOF), metabolite, or spatial dimensions would enable a truly systems-level understanding of cell identity. Such approaches are addressable with existing methods but require substantial investment in computational infrastructure and cross-disciplinary expertise.

Authors	Year	Key Finding	System	Approach
Shahan et al.	2020	110K-cell atlas with mutant trajectories reveals competitive fate specification	Arabidopsis root (WT + mutants)	scRNA-seq (10x Chromium)
Dorrity et al.	2020	scATAC-seq maps cell-type-specific cis-regulatory landscapes	Arabidopsis root	scATAC-seq + scRNA-seq integration
Gala et al.	2020	Transient progenitor state captured during lateral root initiation	Arabidopsis root (auxin-induced)	scRNA-seq time course
Zhu et al.	2022	Immune activation is spatially restricted to outer root cell layers	Arabidopsis root + P. syringae	scRNA-seq (pathogen challenge)

Papers Referenced

A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants

Shahan R, Hsu C-W, Nolan TM, Cole BJ, Taylor IW, Vlot AHC, Benfey PN, Ohler U · 2020-06-30

The regulatory landscape of Arabidopsis thaliana roots at single-cell resolution

Dorrity MW, Alexandre C, Hamm M, Vigil A-L, Fields S, Queitsch C, Cuperus J · 2020-07-17

A single cell view of the transcriptome during lateral root initiation in Arabidopsis thaliana

Gala HP, Lanctot AP, Jean-Baptiste K, Guiziou S, Chu JC, Zemke JE, et al. · 2020-10-03

Single-cell profiling of complex plant responses to Pseudomonas syringae infection

Zhu J, Lolle S, Tang A, Guel B, Kvikto B, Cole B, Coaker G · 2022-10-08