Earth BioGenome Project Hong Kong: Symposium 2022 & Workshop 2023

The Earth BioGenome Project: Hong Kong is a joint effort of eight local universities funded by the Hong Kong University Grants Committee (UGC). Led by Prof. Jerome Hui of CUHK, this UGC-funded project aims to sequence genomes of local animals, plants, and fungi in the local territory with a state-of-the-art genome sequencer and to form a local network of biodiversity genomic research hub.

The Earth BioGenome Project, which has been described as a moonshot project for biology, aims to sequence, catalogue, and analyse the genomes of all eukaryotes on Earth, including animals, fungi, and plants. Similar initiatives have already started in different parts of the world, including the Darwin Tree of Life Project in the UK, which aims to sequence all eukaryotes in the country in the first phase.

The benefits of revealing the genomes of all animals, fungi, and plants in different parts in the world, including Hong Kong, will form an informative base to solve many current issues in the human society. Such benefits could range from increasing the understanding of how biodiversity is evolving under climate change, conservation of endangered species, provision of ecosystem services, to discovering of hidden biological knowledge for new technological inventions and development.

Under this project, a symposium was held on 30 August 2022 to review the progress and roadmap. With the installation of Asia’s first PacBio Sequel IIe at CUHK, a virtual training workshop on the use of top-notch sequencing technology was conducted on 7 January 2023. The technology allows access to highly accurate long reads genome sequencing that cannot be attained by any of the previous technologies and will open many more possibilities for genomic research.  

EBPHK Symposium 2022

Find out more information about the Symposium 2022 here.

EBPHK Workshop 2023

Find out the student feedback and more information about the Workshop 2023 here.

Sharing from Participants of the Training Workshop

Ms. Yi Fei Yu, PhD student in Biology, CUHK

This workshop provides an introduction of the SMRT sequencing (Single Molecule, Real-Time Sequencing). SMRT sequencing has the advantages of long reads, high accuracy, single-molecule resolution, uniform coverage and simultaneous epigenetic detection. The data types can be divided into CLR Data (consensus sequence generation from multiple individual reads) or CCS Data (consensus sequence generation from multiple passes [subreads]) of the same DNA molecule. It uses the SMRTbell template structure to apply essentially the same basic protocol steps for generating libraries of all size ranges fragments. This practice covers steps during the whole process, from sample preparation, sequencing to data analysis.

Taking the tutorial on computer operation as an example, this explanation is very detailed for different perspectives, from the reagents used, the various pages of the operating and the definition of each noun. There are examples and some annotations for the diagrams that appear in each module. It also proposes some parameters that can be used as references on how to define whether the sample meets the computer standard, how to prove that the sample has been completely and successfully analyzed, and whether the standard sample is standard. For some error conditions or when the output does not meet the standard, there are relevant graphics showing the low output area. In short, this introduction is straightforward and comprehensive. I am very grateful for this opportunity to understand and be familiar with this sequencing. 


Mr. Cody Wong, MPhil student in Biology, CUHK

PacBio technology is changing the game in genomics by delivering highly accurate long reads genome sequencing that previous technologies were unable to achieve. This was made possible by the three key technological innovations of PacBio: zero-mode waveguide for sensitive single molecule detection, phospholinked fluorescent nucleotide analogues for natural polymerase cleavage, and SMRT Cell for simultaneous sequencing.

This virtual training workshop has covered the complete workflow of genome sequencing with PacBio, from overview of the technology, library preparation, sequencing, to data analysis with the SMRT Link software.

The procedure of sample preparation and evaluation is straightforward, with the help of various DNA extraction kit if needed. In particular, PacBio has provided not only the recommended protocol on their website, but also a collection of publications by the scientific community describing extraction protocols for high-molecular weight DNA, which is the prerequisite for successful PacBio sequencing. This documentation comes in handy, especially for dealing with difficult or unknown samples in large scale species sequencing projects like The Darwin Tree of Life and EBPHK.

Subsequent sample setup, instrument operation, sequencing, de novo assembly, and data analysis could all be performed with the open-source and freely available SMRT Link software. The software comes with user interface and a lot of tasks could be automated, for example, on-instrument analysis pipeline, run QC report and metrices, enabling data analysis even by bench scientists. It is exciting to know that there are already community contributions on analysis pipeline, for example on github, from scientists around the world.

Overall, the workflow of PacBio sequencing was very well-documented and well-explained in the workshop. This workshop is a good start for any keen biologists and bioinformatics to learn how to harass the power of this state-of-the-art genome sequencer and build a connection to other researchers.