1. Wong B.S.H., Kim J.M., Fung S.H., Xiong Q., Ao K.F.K., Wei J., Wang R., Wang D.M., Zhou J., Feng B., Cheng A.S.L., Yip K.Y.#, Tsui S.K.W.#, and Cao Q.#. Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation. arXiv.
2. Fung S.H., Zhang Z., Wang R., Miao C., Wong B.S.H., Li K.Y., Hong C., Zhou J., Yip K.Y.#, Tsui S.K.W.#, and Cao Q.#. Can Large Language Models “Read” Biological Sequences? A Systematic Evaluation of In-Context Learning for Antibody Characterization. bioRxiv.
3. Li Q., Li K.Y., Nicoletti C., Tsui S.K.W., Puri P.L., Cao Q.#, and Yip K.Y.#. Overcoming Artificial Structures in Resolution-Enhanced Hi-C Data by Signal Decomposition and Multi-Scale Attention. bioRxiv.
4. Chow S.H.C., Shi C.H., Deshpande A., Cao Q., and Yip K.Y. Towards Universal Modeling of Transcript Isoform Expression Levels. bioRxiv.
5. Liao T., Chen S., Wang S., Huang Y., Tsui S.K.W., Stüeken E.E., Cao Q., and Luo H. (2026) Noncanonical genetic markers resolve the pre-GOE emergence of aerobic bacteria in Earth’s history. Proceedings of the National Academy of Sciences 123(4): e2515709123.
6. Miao C.*, Zhang Z.*, Chen J.*, Rebibo D., Wu H., Fung S.H., Cheng A.S.L., Tsui S.K.W., Sinha S., Cao Q.#, and Yip K.Y.#. (2025) Developing foundations for biomedical knowledgebases from literature using large language models – A systematic assessment. Computational and Structural Biotechnology Journal 27(1): 3299-3306.
7. Li K.Y., Cao Q., Chow S.H., Nicoletti C., Puri P.L., Wang H., Leung D., and Yip K.Y.. (2025) Regulatory roles of three-dimensional structures of chromatin domains. Genome Biology 26(1):184.
8. Wang R., Qian Y., Guo X., Song F., Xiong Z., Cai S., Bian X., Wong M.H., Cao Q.#, Cheng L.#, Lu G.#, and Leung K.S.#. (2025) STModule: identifying tissue modules to uncover spatial components and characteristics of transcriptomic landscapes. Genome Medicine 17(1):18.
9. Chen W.*, Miao C.*, Zhang Z., Fung C.S.H., Wang R., Chen Y., Qian Y., Cheng L., Yip KY#, Tsui SKW#, and Cao Q#. (2024) Commonly used software tools produce conflicting and overly-optimistic AUPRC values. Genome Biology 25(1):118.
10. Hong C.*, Cao Q.*#, Zhang Z., Tsui S.K.W., and Yip K.Y.#. (2022) Reusability report: Capturing properties of biological objects and their relationships using graph neural networks. Nature Machine Intelligence 4: 222-226.
11. Qian Y.*, Zhai E.*, Chen S.*, Liu Y., Ma Y., Chen J., Liu J., Qin C., Cao Q.#, Chen J.#, and Cai S.#. (2022) Single-cell RNA-seq dissecting heterogeneity of tumor cells and comprehensive dynamics in tumor microenvironment during lymph nodes metastasis in gastric cancer. International Journal of Cancer 151(8): 1367-1381.
12. Cao Q.*, Zhang Z.*, Fu A.X., Wu Q., Lee T.L., Lo E., Cheng A.S.L., Cheng C., Leung D., and Yip K.Y. (2020) A unified framework for integrative study of heterogeneous gene regulatory mechanisms. Nature Machine Intelligence 2(8): 447-456.
13. Ho E.Y.K.*, Cao Q.*, Gu M., Chan R.W.L., Wu Q., Gerstein M., and Yip K.Y. (2020) Shaping the nebulous enhancer in the era of high-throughput assays and genome editing. Briefings in Bioinformatics 21(3): 836-850.
14. Cao Q., Anyansi C., Hu X., Xu L., Xiong L., Tang W., Mok M.T.S., Cheng C., Fan X., Gerstein M, Cheng ASL, and Yip KY. (2017) Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines. Nature Genetics 49: 1428-1436.
15. Cao Q., and Yip K.Y. (2016) A survey of the computational methods for enhancers and enhancer-target predictions. Computational biology and bioinformatics: Gene regulation 3-27.
(*:co-first authors; #:co-corresponding authors)