Sample Reports
The transition from single-reference to pangenomic frameworks is reshaping rice genetics. This review synthesizes three bioRxiv preprints (2020–2022) that collectively advance rice pangenomics through gap-free genome assembly, pangenome-scale genotyping arrays, and computational pipelines for functional gene identification. Together, these studies establish that structural variants — including presence-absence and copy-number variation — account for substantial phenotypic diversity missed by SNP-centric approaches. Whether these pangenomic resources can accelerate breeding for complex traits in polyploid and orphan cereals remains to be determined.
Rice (Oryza sativa) feeds more than half of the global population, and its compact diploid genome (~390 Mb) has made it the reference cereal for comparative genomics since the landmark IRGSP assembly in 2005. However, a single reference genome captures only a fraction of species-level diversity. Population resequencing studies repeatedly demonstrated that 5–15% of genes in any given accession are absent from the Nipponbare reference (Wang et al., 2018; Zhao et al., 2018), suggesting that agronomically important variation hides in the "dispensable" genome.
The pangenome concept — cataloguing all genomic sequences across a species — has gained traction as long-read sequencing costs have declined. In rice, early pangenome efforts used short reads to call presence-absence variants (PAVs), but these approaches struggled with repetitive regions and complex structural rearrangements. The preprints reviewed here represent a new wave: telomere-to-telomere (T2T) assemblies that resolve centromeres and heterochromatin (Song et al., 2021), array-based genotyping platforms that translate pangenomic knowledge into breeding tools (Daware et al., 2022), and computational frameworks that connect structural variants to gene function (Wang et al., 2022). In this review, we synthesize these advances to assess how pangenomics is transforming rice functional genomics and crop improvement.
Song et al. (2021) reported two gap-free reference genomes for xian/indica rice cultivars, achieving complete telomere-to-telomere coverage. Using a combination of PacBio HiFi reads and Hi-C scaffolding, the authors resolved all 12 centromeres — regions that had been collapsed or absent in previous assemblies. The centromeric regions were characterized by tandem arrays of the CentO satellite repeat interspersed with Ty3/gypsy retrotransposons, with centromere size varying from 0.5 to 3.2 Mb across chromosomes. This work demonstrated that centromeric and pericentromeric regions harbor lineage-specific structural variants and active genes that are invisible to short-read resequencing. The validation strategy, combining optical mapping with FISH cytogenetics, provides a high level of confidence in the assembly completeness, though whether the centromere architectures described here are representative of the broader indica gene pool remains to be determined with additional T2T assemblies.
Translating pangenomic knowledge into tools accessible to breeding programs was the focus of Daware et al. (2022), who developed the Rice Pan-genome Array (RPGA). This SNP array was designed using variant data from 4,726 diverse accessions spanning indica, japonica, aus, and wild rice groups. The array captures not only single-nucleotide polymorphisms but also tags for PAVs and structural variants identified through pangenome analysis. Consistent with this approach, the RPGA achieved higher resolution of indica–japonica population structure than conventional 50K arrays and detected association signals at known loci for grain quality and disease resistance that were missed by previous GWAS based on SNPs alone. However, the array remains limited by the diversity of accessions used in its design, and rare variants in landraces or wild progenitors may still be underrepresented.
Wang et al. (2022) addressed a persistent bottleneck in pangenomics: connecting structural variants to gene function. Their PSVCP pipeline integrates variant calling, gene-level PAV detection, and expression quantitative trait locus (eQTL) mapping into a unified framework. Applied to a rice diversity panel, PSVCP identified 1,247 genes with significant PAV–expression associations, including several previously unannotated genes in dispensable genome segments. The approach provides a systematic way to prioritize candidate genes from pangenomic data, though validation of individual candidates through reverse genetics will be necessary to confirm functional assignments. The pipeline's reliance on short-read data for variant calling means that complex structural variants — precisely the category most likely to be functionally relevant — may be undercalled.
First, how representative are current rice pangenomes of the full species diversity? The T2T assemblies of Song et al. (2021) cover only indica subtypes; extending this to japonica, aus, and wild Oryza species is essential. Graph-based pangenome references that incorporate hundreds of assemblies are technically feasible with current sequencing costs but require substantial computational infrastructure. Second, can pangenome-aware GWAS improve the genetic architecture of complex traits such as yield and drought tolerance? The RPGA platform (Daware et al., 2022) provides a foundation, but large-scale field phenotyping across environments will be needed. Third, the functional validation bottleneck identified by Wang et al. (2022) — linking PAVs to phenotypes — calls for high-throughput reverse genetics approaches such as CRISPR-based screens targeting dispensable genes.
| Authors | Year | Key Finding | System | Approach |
|---|---|---|---|---|
| Song et al. | 2021 | Gap-free T2T indica genomes resolve centromere architecture | Rice (xian/indica) | PacBio HiFi + Hi-C + optical mapping |
| Daware et al. | 2022 | Pangenome-aware SNP array captures PAV-tagged associations | 4,726 rice accessions | Array design + GWAS |
| Wang et al. | 2022 | PSVCP pipeline links 1,247 PAVs to expression variation | Rice diversity panel | Computational pipeline + eQTL |
CRISPR-based genome editing in wheat (Triticum aestivum) faces unique challenges arising from its large (~17 Gb), allohexaploid genome with three homeologous copies of most genes. This review synthesizes four bioRxiv preprints (2021–2022) that address critical bottlenecks in wheat genome editing: viral delivery of guide RNAs for multiplexed promoter and gene editing (Wang et al., 2022), biolistic optimization for improved mutation efficiency (Tanaka et al., 2022), in planta ribonucleoprotein delivery to bypass tissue culture (Kumagai et al., 2021), and systematic base-editor optimization across cereals (Gaillochet et al., 2022). Collectively, these studies are converging toward transgene-free, high-efficiency editing pipelines that could transform wheat improvement for grain quality, disease resistance, and climate adaptation.
Wheat is the most widely grown crop on Earth, providing approximately 20% of calories and protein consumed globally. Its allohexaploid genome — derived from three ancestral diploid species — means that functional redundancy among homeologs often masks loss-of-function mutations, requiring simultaneous editing of all three copies to achieve a phenotype. This constraint, combined with wheat's notoriously low transformation and regeneration efficiency, has made it one of the most challenging crops for genome editing.
The first successful CRISPR edits in wheat were reported in 2014, targeting simple genes with clear loss-of-function phenotypes. Since then, the field has expanded rapidly, driven by demand for reduced-allergen flour, lower-asparagine grain (to reduce acrylamide in baking), and durable disease resistance. However, three persistent bottlenecks have limited the translation of CRISPR from proof-of-concept to breeding: the inefficiency of Agrobacterium-mediated transformation in elite cultivars, the difficulty of multiplexed editing across homeologs, and regulatory concerns about transgene integration. The preprints reviewed here each tackle one or more of these barriers. In this review, we evaluate the strengths and limitations of these approaches and assess their collective implications for wheat precision breeding.
Wang et al. (2022) developed a system in which guide RNAs are delivered via a barley stripe mosaic virus (BSMV) vector to wheat plants constitutively expressing Cas9. This approach enables rapid testing of multiple sgRNAs without regenerating new transgenic lines for each target. The authors demonstrated simultaneous editing of promoter regions and coding sequences at two independent loci, achieving heritable edits in T1 progeny. The viral delivery system is particularly suited for multiplexed experiments because new guides can be cloned into the BSMV backbone in days rather than the months required for stable transformation. However, the requirement for a pre-existing Cas9-expressing line limits the system's applicability to cultivars where such lines are available. Whether viral delivery can achieve the editing efficiencies needed for simultaneous knockout of all three homeologs at a single locus — a common breeding objective — was not directly tested.
Tanaka et al. (2022) systematically optimized biolistic (gene gun) parameters for CRISPR delivery in wheat, testing gold particle size, DNA coating ratios, bombardment pressure, and target tissue age. Their optimized protocol achieved a 3.4-fold improvement in mutation efficiency compared to standard conditions, and critically, demonstrated efficient editing in the recalcitrant cultivar Fielder using immature embryo-derived callus. Consistent with this emphasis on delivery optimization, Kumagai et al. (2021) pursued an entirely different strategy: direct delivery of Cas9-sgRNA ribonucleoprotein (RNP) complexes to wheat embryos in planta, bypassing tissue culture entirely. The in planta approach yielded low but reproducible editing rates (1–3% of treated embryos), with the significant advantage that edited plants are by definition transgene-free. The two studies present a trade-off that is central to current wheat editing: biolistic delivery of DNA constructs achieves higher efficiency but carries the risk of transgene integration, while RNP delivery is inherently transgene-free but currently too inefficient for routine use. Whether RNP delivery efficiency can be improved through electroporation or nanoparticle-assisted uptake remains an active area of investigation.
Gaillochet et al. (2022) addressed the challenge of precision mutations — single-nucleotide changes rather than knockouts — through systematic optimization of Cas12a-linked base editors in wheat and maize. Using their ITER (Iterative Testing and Evolution of Reagents) platform, the authors tested 96 base-editor variants across multiple target sites, identifying configurations that achieved C-to-T conversion rates of up to 40% in wheat protoplasts. In contrast to Cas9-based knockouts, base editing can introduce specific gain-of-function or regulatory mutations, making it a more versatile tool for trait improvement. The study's limitation is that protoplast-to-plant regeneration was not demonstrated for the optimized editors, leaving open the question of whether high protoplast editing rates translate to whole-plant outcomes. The ITER platform itself, however, represents a generalizable approach for rapid reagent optimization that could accelerate editing in other recalcitrant crop species.
Several critical questions emerge from this synthesis. First, can transgene-free editing methods — RNP delivery (Kumagai et al., 2021) or transient viral expression (Wang et al., 2022) — be scaled to efficiencies sufficient for breeding programs, or will stable transformation remain the workhorse? The 1–3% efficiency of in planta RNP delivery is currently impractical for targets requiring simultaneous tri-homeolog editing. Second, how should base editing be deployed for complex, quantitative traits in wheat? Gaillochet et al. (2022) demonstrated the feasibility of precise single-nucleotide changes, but identifying which nucleotide changes will improve yield or quality requires extensive prior functional genomics data that is still sparse for wheat. Third, the regulatory landscape for genome-edited crops varies dramatically by jurisdiction; whether transgene-free approaches will receive expedited regulatory approval in major wheat-producing nations is an open policy question with direct implications for technology adoption. Fourth, delivery innovations from outside plant biology — including lipid nanoparticles and cell-penetrating peptides — have shown promise in animal systems and are beginning to be tested in plant protoplasts; whether these can be adapted for in planta delivery in cereals deserves systematic evaluation.
| Authors | Year | Key Finding | System | Approach |
|---|---|---|---|---|
| Wang et al. | 2022 | BSMV-delivered sgRNAs enable multiplexed editing in Cas9 wheat | Wheat (Cas9 line) | Viral guide delivery |
| Tanaka et al. | 2022 | Optimized biolistics achieve 3.4-fold improvement in editing | Wheat cv. Fielder | Biolistic parameter optimization |
| Kumagai et al. | 2021 | In planta RNP delivery yields transgene-free edits at 1–3% | Wheat embryos | Ribonucleoprotein in planta |
| Gaillochet et al. | 2022 | ITER platform optimizes Cas12a base editors to 40% C-to-T in protoplasts | Wheat + maize protoplasts | Base editor screening (ITER) |
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for dissecting the cellular heterogeneity of plant tissues. This review synthesizes four bioRxiv preprints (2020–2022) that apply scRNA-seq to the Arabidopsis root — the best-characterized plant organ at the anatomical and genetic levels. These studies collectively construct a high-resolution cell atlas of wild-type and mutant roots (Shahan et al., 2020), map the cis-regulatory logic underlying cell identity (Dorrity et al., 2020), resolve the transcriptional dynamics of lateral root initiation at single-cell resolution (Gala et al., 2020), and reveal cell-type-specific immune responses to bacterial pathogens (Zhu et al., 2022). Together, they establish a reference framework for plant single-cell biology while exposing the technical and biological limitations that must be addressed as the field matures.
The Arabidopsis root is organized in a radially symmetric pattern of concentric cell layers — epidermis, cortex, endodermis, pericycle, and vasculature — each with distinct gene expression programs. This organization, combined with the accessibility of the root to live imaging and the wealth of cell-type-specific marker lines, has made it the primary testbed for single-cell genomics in plants. The first plant scRNA-seq studies appeared in 2019, profiling Arabidopsis root tips with droplet-based platforms (10x Genomics Chromium). By 2020–2022, the field had progressed from proof-of-concept cell-type classification to mechanistic questions: how are cell identities established and maintained? How do individual cells within a tissue respond differently to environmental signals?
The transition from bulk to single-cell transcriptomics has been particularly revelatory in plants because protoplasting — the enzymatic removal of the cell wall required for droplet capture — introduces transcriptional artifacts that confound interpretation. Each of the studies reviewed here addresses this challenge differently, and the degree to which protoplasting artifacts have been controlled remains an active debate. In this review, we evaluate how these four preprints collectively advance our understanding of root cell identity, developmental plasticity, and stress signaling at single-cell resolution.
Shahan et al. (2020) generated the most comprehensive single-cell atlas of the Arabidopsis root to date, profiling over 110,000 cells from wild-type roots alongside roots of key cell-identity mutants (scarecrow, shortroot, and wooden leg). By comparing wild-type and mutant trajectories, the authors identified transcription factors whose loss redirected developmental trajectories rather than simply ablating cell types. For instance, scarecrow mutant cells adopted a hybrid cortex-endodermis identity rather than defaulting to an undifferentiated state, suggesting that cell fate specification in the root involves competitive interactions between regulatory programs rather than a simple linear hierarchy. The inclusion of mutant datasets substantially strengthens the atlas's utility as a reference, though the reliance on a single developmental time point (5-day-old seedlings) means that age-dependent changes in cell composition are not captured.
Dorrity et al. (2020) complemented the transcriptomic atlas with an analysis of chromatin accessibility at single-cell resolution, using single-cell ATAC-seq (scATAC-seq) in Arabidopsis roots. By integrating scRNA-seq and scATAC-seq data, the authors identified cell-type-specific cis-regulatory elements and predicted transcription factor binding events driving cell differentiation. The analysis revealed that epidermal and vascular cells maintain highly distinct regulatory landscapes, with trichoblasts (root-hair-forming epidermal cells) showing a strikingly open chromatin configuration at loci associated with tip growth and nutrient uptake. However, the correlation between chromatin accessibility and gene expression was imperfect — approximately 30% of differentially accessible regions did not correspond to differentially expressed nearby genes — highlighting the complexity of gene regulatory logic and the limitations of inferring regulation from accessibility data alone. Whether the discrepant regions reflect poised enhancers, long-range regulatory interactions, or technical noise remains to be determined.
Gala et al. (2020) applied scRNA-seq to a developmental process — lateral root initiation — that involves the reprogramming of pericycle founder cells into a new meristem. By profiling roots at multiple stages of lateral root development, the authors identified a transient progenitor population that expresses markers of both pericycle identity and stem cell activity. This intermediate state had been hypothesized from lineage tracing experiments but had never been captured transcriptomically. The study further showed that auxin-responsive genes are activated heterogeneously among pericycle cells, with only a subset responding to the lateral root initiation signal. This finding is consistent with a stochastic competence model in which pericycle cells must be in a permissive transcriptional state to respond to auxin. The temporal resolution of the study — profiling at 0, 6, and 24 hours post-induction — provides useful snapshots but may miss rapid transcriptional dynamics occurring in the first minutes to hours of lateral root initiation.
Zhu et al. (2022) extended single-cell profiling from development to biotic stress, analyzing the Arabidopsis root response to Pseudomonas syringae infection. The study revealed that immune activation is not uniform across cell types: epidermal and cortical cells mounted the strongest transcriptional response, upregulating pathogen-associated molecular pattern (PAMP) receptor genes and defense-related metabolic pathways, while endodermal and vascular cells showed a delayed and attenuated response. This spatial heterogeneity in immune activation suggests that the outer cell layers serve as the primary immunological barrier in roots, consistent with their physical exposure to soil-borne pathogens. The study was limited to a single pathogen and a single time point post-infection; whether similar spatial patterns hold for fungal pathogens or across the time course of infection remains an open question.
The studies reviewed here collectively raise several questions that will shape the next phase of plant single-cell biology. First, how significant are protoplasting artifacts, and can they be systematically corrected? All four studies used enzymatic digestion to generate protoplasts, a process known to induce wound-response genes. Nucleus-based methods (single-nucleus RNA-seq) that avoid protoplasting are beginning to be applied in plants and may resolve this issue, though at the cost of losing cytoplasmic transcripts. Second, can single-cell approaches scale to crop species with larger, more complex genomes? Technical advances in combinatorial indexing and spatial transcriptomics (which avoids dissociation entirely) are promising but have not yet been widely adopted in plant systems. Third, the integration of scRNA-seq with scATAC-seq demonstrated by Dorrity et al. (2020) points toward multi-omic single-cell profiling; extending this to include protein (CyTOF), metabolite, or spatial dimensions would enable a truly systems-level understanding of cell identity. Such approaches are addressable with existing methods but require substantial investment in computational infrastructure and cross-disciplinary expertise.
| Authors | Year | Key Finding | System | Approach |
|---|---|---|---|---|
| Shahan et al. | 2020 | 110K-cell atlas with mutant trajectories reveals competitive fate specification | Arabidopsis root (WT + mutants) | scRNA-seq (10x Chromium) |
| Dorrity et al. | 2020 | scATAC-seq maps cell-type-specific cis-regulatory landscapes | Arabidopsis root | scATAC-seq + scRNA-seq integration |
| Gala et al. | 2020 | Transient progenitor state captured during lateral root initiation | Arabidopsis root (auxin-induced) | scRNA-seq time course |
| Zhu et al. | 2022 | Immune activation is spatially restricted to outer root cell layers | Arabidopsis root + P. syringae | scRNA-seq (pathogen challenge) |