Revolutionizing Pathology Image Annotation: A Paradigm Shift to a Universal Labeling and Annotation Protocol

Codatta
9 min readAug 5, 2024

--

TL;DR:

PathAI (Pathology Artificial Intelligence) is transforming diagnostics by addressing the global pathologist shortage and enhancing diagnostic accuracy. Startups like Paige.AI and others lead this AI revolution, but developing robust PathAI systems requires large volumes of annotated data. Crowdsourcing and blockchain offer solutions by distributing annotation tasks and ensuring data integrity. Public digital pathology datasets face challenges in annotation standards and reliability. A blockchain-based collaborative platform can enhance these datasets, ensuring fair compensation and quality control. Democratizing pathology image annotation is just the beginning, aiming to create a comprehensive and equitable healthcare ecosystem.

Pathology AI: Revolutionizing Diagnostics Amidst Critical Challenges

At the vanguard of medical innovation, pathology artificial intelligence (PathAI) is transforming diagnostic practices, offering a beacon of hope in addressing the global pathologist shortage while enhancing diagnostic accuracy. Startups such as Paige.AI, Aiforia, Ibex, and Owkin are leading this technological surge, with Paige.AI Prostate notably receiving FDA clearance for its clinical use, underscoring the potential of AI to reshape pathology and improve patient care, see Figure 1. These advancements position AI not only as a valuable tool for pathologists but also as a pivotal solution in the quest for equitable and high-quality diagnostic service worldwide.

Figure 1: Sensitivity for all three pathologists increased with Paige Prostate Alpha (average sensitivity without Paige Prostate Alpha: 74% ± 11%; with Paige Prostate Alpha: 90% ± 4%). Source: 10.1038/s41379–020–0551-y

The development of robust PathAI systems necessitates large volumes of annotated data due to the need to capture diverse pathological patterns, improve generalization, reduce bias, and enhance overall accuracy (Komura & Ishikawa, 2018). The current annotation process involves expert pathologists manually examining digital scans of pathology slides, delineating regions of interest, labeling specific structures, and grading diseases (Campanella et al., 2019). Despite its critical importance, the field continues to grapple with the trade-off between the quantity of annotated data required for robust AI models and the feasibility of producing such datasets given the limited availability of expert annotators.

Harnessing Crowdsourcing and Blockchain for Enhanced Annotation

To address these challenges, crowdsourcing and collaborative platforms have emerged as promising strategies to distribute the annotation workload and potentially accelerate the creation of large-scale datasets with high-quality annotations (Alialy et al., 2020). These approaches leverage the collective intelligence of a wider pool of contributors, including medical students, hospital residents, and pathologists, see Figure 2. For instance, Anne Grote and colleagues demonstrated in their study that crowdsourcing can be effectively utilized for complex tasks such as labeling and delineation of histopathological images by medical students, who, despite not having specific domain knowledge, were able to provide accurate annotations when given proper instruction and context (Grote et al., 2019). On the other hand, researchers have developed innovative quality control mechanisms and hybrid models that combine crowdsourced annotations with expert validation (Ørting et al., 2020).

Figure 2: A crowdsourcing study of breast cancer digital pathology images. Participants including senior pathologists and non-pathologists used a web-based interface to annotate a number of tissue regions in each ROI including tumor, stroma, immune infiltration, and others. Source: 10.1038/s41598–021–90821–3

The traditional crowdsourcing model faces notable challenges, including ensuring data integrity, fairly incentivizing contributors, and maintaining traceability of annotations (Holmgren et al., 2020). In contrast, a blockchain-based platform offers not only fair financial incentives but also a tangible sense of societal contribution, allowing individuals to see their annotations directly contribute to clinical settings and patient care, with the promise of expanding to multimodal pathology datasets. By leveraging blockchain technology, these systems can ensure transparent and immutable records of annotation history, effectively resolving issues of version control and data provenance (Kuo et al., 2019). The decentralized nature of blockchain allows for a more democratic and inclusive annotation process, enabling experts from various institutions to contribute their insights securely (Zheng et al., 2019). The following table summarizes the challenges of traditional crowdsourcing and how a blockchain-based platform can help.

Improving Digital Pathology Datasets with High-Quality Annotations

Nowadays, institutions are actively compiling extensive collections of digital pathology slides, either by incorporating slide scanning into their standard procedures or by converting existing slide archives to digital format. Public initiatives have bolstered this endeavor to amass sizable groups of data across various disease types, see Figure 3, including the National Cancer Institute’s The Cancer Genome Atlas (TCGA) Program being the largest high-quality collection. It’s worth noting that there are already over 30 open-access datasets online, and more than 100k+ slides are available for the computational pathology community.

Figure 3: Large-scale datasets are made available by various institutions, laying the foundations for computational pathology to make a clinical impact. Source: 10.1038/s44222–023–00096–8

These public pathology datasets, while invaluable, often pose significant challenges for direct application in robust PathAI development. Specifically, diagnostic errors in subtyping, grading, and other metrics can undermine the reliability of these datasets. Additionally, there need to be more coherent annotation standard, annotations vary from slide-level to region-level, patch-level, pixel-level, and with or without associated reports (most likely without). Moreover, there is no systematic way to reflect the latest WHO guidelines or enrich datasets through augmentations. This lack of an update mechanism is exacerbated by the absence of clear incentives for pathologists to invest their time in making corrections or augmentations. To overcome these issues, developing a blockchain-based collaborative platform for annotating and re-annotating public pathology datasets is essential and urgent.

For example, the PANDA dataset is a significant public resource with over 11,000 whole-slide images of digitized H&E-stained biopsies from two centers, making it one of the largest public whole-slide image datasets available. The label noise in the PANDA dataset arises mainly from the use of medical students for training set annotations instead of expert pathologists, which results in a significant performance drop for AI models developed using these datasets (Singhal et al., 2022). In this case, a blockchain-based annotation platform can enhance the re-annotation of the PANDA dataset by leveraging a role-based system with expert (senior uropathologist) verification for annotations that come with the original dataset and new annotations submitted by more experienced contributors to build consensus, see Figure 4.

Figure 4: Example of a blockchain-based pathology image annotation process.

Conclusion

Democratizing the annotation and labeling of pathology images is just the first step in a broader vision where annotation reaches beyond slides to include the rich landscapes of multimodal data. This pathology image annotation exemplifies the broad spectrum of healthcare topics demanding high-quality annotated data. Such data will be invaluable not only for building AI models (expert LLMs) that can match human experts’ diagnostic accuracy but also for educating the next generation of human experts. Traditional annotation platforms have failed to provide sufficient incentives and necessary tools to attract healthcare experts to label data. However, our collaborative efforts will create an expansive, easily navigable, and inclusive-access database for AI modeling, medical education, and diagnostics, contributing to an equitable healthcare ecosystem¹. Blockchain networks provide comprehensive incentive systems to fairly reward contributors of various skill levels, balancing anonymity and trustworthiness through reputation systems. Anonymity allows contributors to express opinions more truthfully without worrying about factors other than the data itself. Combined with privacy-preserving techniques (ZKP, FHE, TEE, etc.), this approach facilitates the development of intelligent systems without jeopardizing privacy and constructs training samples of personal data and expert labels for training AI. This ecosystem will address current needs and prepare us for the challenges and opportunities presented by future advancements in high-performance pathology AI.

Glossary

  1. Equitable Healthcare Ecosystem: a comprehensive and inclusive network that democratizes the annotation and labeling of medical data, extending beyond traditional platforms to encompass multimodal data. This system ensures fair incentives and utilizes advanced technologies like blockchain for transparency, privacy-preserving techniques for data security, and collaborative efforts to create accessible and high-quality annotated databases.

About the Author

Dr. Daniel Wei is currently serving as Chief Scientist and Head of Healthcare at codatta, where he contributes his expertise in pathology AI and bioinformatics to the innovative team. He received his PhD from Carnegie Mellon University in 2015. Dr. Wei was the Chief Technology Officer at CoreOne Pathology Group, where he worked with world-renowned pathologists to develop AI models for tumor detection and subtyping. Before that, he co-founded a next-generation sequencing lab and developed bioinformatics pipelines for tumor analysis.

References:

  1. Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., & Navab, N. (2016). AggNet: Deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, 35(5), 1313–1321.
  2. Alialy, R., Tavakkol, S., Tavakkol, E., Ghorbani-Aghbolagh, A., Ghaffari, A., Kim, S. H., & Shahabi, C. (2020). A review on the applications of crowdsourcing in human pathology. Journal of pathology informatics, 11.
  3. Bulten, W., Pinckaers, H., van Boven, H., Vink, R., de Bel, T., van Ginneken, B., … & Litjens, G. (2021). Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology, 22(3), 402–411.
  4. Campanella, G., Hanna, M. G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K. J., … & Fuchs, T. J. (2019). Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine, 25(8), 1301–1309.
  5. Funk, E., Riddell, J., Ankel, F., & Cabrera, D. (2018). Blockchain technology: a data framework to improve validity, trust, and accountability of information exchange in health professions education. Academic Medicine, 93(12), 1791–1794.
  6. Grote, A., Schaadt, N. S., Forestier, G., Wemmert, C., & Feuerhake, F. (2019). Crowdsourcing of Histological Image Labeling and Object Delineation by Medical Students. IEEE Transactions on Medical Imaging, 38(5), 1284–1294. DOI: 10.1109/TMI.2018.2883237.
  7. Holmgren, A. J., Apathy, N. C., & Adler-Milstein, J. (2020). Barriers to hospital electronic public health reporting and implications for the COVID-19 pandemic. Journal of the American Medical Informatics Association, 27(8), 1306–1309.
  8. Komura, D., & Ishikawa, S. (2018). Machine learning methods for histopathological image analysis. Computational and structural biotechnology journal, 16, 34–42.
  9. Kuo, T. T., Kim, H. E., & Ohno-Machado, L. (2019). Blockchain distributed ledger technologies for biomedical and health care applications. Journal of the American Medical Informatics Association, 26(5), 462–473.
  10. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42, 60–88.
  11. Ørting, S. N., Doyle, A., van Hilten, A., Hirth, M., Inel, O., Madan, C. R., … & Cheplygina, V. (2020). A survey of crowdsourcing in medical image analysis. Human Computation, 7(1), 1–26.
  12. Singhal, N., Soni, S., Bonthu, S. et al. A deep learning system for prostate cancer diagnosis and grading in whole slide images of core needle biopsies. Sci Rep, 12, 3383 (2022).
  13. Zheng, X., Sun, S., Mukkamala, R. R., Vatrapu, R., & Ordieres-Meré, J. (2019). Accelerating health data sharing: A solution based on the internet of things and distributed ledger technologies. Journal of medical Internet research, 21(6), e13583.

About codatta

Codatta is a universal annotation and labeling platform that turns your intelligence into AI.

Our mission is to lower the barrier for AI development teams by providing inclusive access to quality data, facilitating AI advancement, and to empower individuals to contribute to AI development and enjoy long-lasting rewards for their critical contributions. We tackle challenges across various verticals, including crypto (account and user annotation), healthcare, and robotics. Our user-contributed data is on the right track to commercialization in areas like web3 ads, AML, and healthcare.

Stay Connected with codatta

Follow us on social media for the latest news, insights, and developments about our innovative projects. Join our growing community below and don’t forget to like, comment, and share our posts to help spread the word!

🌐 Website|🆇 Twitter|💬 Telegram|👾 Discord|📱App

--

--

Codatta

Codatta is a universal annotation and labeling platform that turns your intelligence into AI.