TRACK: AI Applied To Medical Imaging
Medical images are very important in the diagnostic phase of many diseases, and therefore leveraging AI tools to help clinicians in "reading" clinical images can dramatically increase the efficiency of the process, as well as its accuracy. We would like to see how AI can help extracting information from medical images that represent a pathological tissue or an organ, for example by leveraging the open-source InnerEye Deep Learning Toolkit or with use of NVIDIA Clara Imaging toolkit.
Extraction of imaging features from chest CT of Lung Cancer patients receiving immunotherapy (Open Data Set): Cancer patients treated with immune check point inhibitors (e.g. Pembrolizumab) may be divided in early-progressors (ca 55% of the treated patients), displaying aggravation or death, even before radiological restaging, and durable-responders (ca 45% of the treated patients), showing long -term disease control.
Being able to identify the two population of patients before the start of the treatment will help modulating the therapy and even excluding some patients which may benefit from other therapeutic regimens.
The aim of this Hackathon is to identify and automatically extract from chest CT specific signatures characteristic of the two population of patients and to correlate these features with prognostic prediction of the response to therapy. Features such as vascular infiltration, immunogenic infiltration, tumor volume or density, may be relevant, however more features could be identified
Relevant data sets:
SEER cancer incidence (https://seer.cancer.gov/data/)
BROAD Institute Cancer Program Datasets (Cancer Program Legacy Publication Resources (broadinstitute.org)
CT Medical Images (https://www.kaggle.com/kmader/siim-medical-images)
Simulation of trans aortic valve flow and gradient using 3D CT cardiac models (Hospital Data Set): severe aortic stenosis is the most frequent valve disease in the elderly. Symptomatic patients with severe stenosis have a poor prognosis, and valve replacement is the only effective treatment. In the last decade minimally invasive transcatheter aortic valve replacement (TAVR) have been propose to offer a therapeutic strategy in patients with prohibitive surgical risk. Promising results led FDA to approve TAVR in patients with severe AS at intermediate risk in 2016 and in low risk patients in 2019. Computed Tomography angiography is of pivotal importance for TAVR planning, providing information about the aortic annulus size, about the landing zone (distance to coronary ostia, aortic root dimension, angle, outflow tract dimension, calcification) and about the access site (transfemoral, radial, tranapical). At the aim to correctly assess valve annulus and its relationship with surrounding structure, a dynamic acquisition of the entire cardiac cycle (0-90%) is required. This acquisition allows also to extract information about myocardial volume changes across the cardiac cycle and myocardial wall contractility and deformation.
The main determinant of patients symptoms is the severity of reduction of valve leaflet opening (aortic valve area <1 cmq), the severity of flow gradient across it (mean gradient >40 mmHg) and peak aortic jet velocity >4.0 m/s. Currently, CT is able only to provide morphological information, while functional flow information are derived from echocardiography, in turn potentially affected by several issues including unfavorable acoustic window and limited information about 3D structure with complex morphology.
Objective of the proposed hackathon challenge is to extract a simulation of trans aortic valve flow and gradient using 3D aortic valve, outflow tract and left ventricle chamber deformation across the cardiac cycle.
Relevant data sets:
Calculation of the Cobb angles of scoliosis in coronal radiographs
The aim is to develop a tool able to estimate the position of landmarks for the thoracolumbar vertebrae visible in coronal (anteroposterior) radiographs of the spine of subjects suffering adolescent idiopathic scoliosis.
A dataset including 481 coronal radiographs will be provided to the participants. Ground truth data, i.e. the coordinates of all landmarks, will be provided as well.
To test the performance of the tools created by the participants, a set of 128 images will be provided to the participants, who will be asked to calculate the coordinates of all landmarks as well as the Cobb angles based on a reference script.
The metric for the evaluation of the results is the average error in the calculated Cobb angles.
Figure caption: The Cobb angle describing the severity of a scoliotic curve.
Relevant data sets:
There are various datasets publicly available for tackling the "Localization of vertebral landmarks in biplanar radiographs of the trunk of patients suffering from spinal deformities", here is the dataset which the customer has without the labels: https://www.dropbox.com/sh/su1n8avktk371r6/AABraUsf65AwjmYZri5QKp1Ha?dl=0
The same material has been already used for another challenge in 2019: https://aasce19.github.io/#results-submission You should mention the source if you use the images without the labels, for accessing the original data (with labels) you need to register.
Features extraction in pathological tissues images (Hospital Data Set): the analysis process of pathological tissue samples consists of several steps. One of the first is the macroscopic description where, among others, the tissue dimensions are measured. This is done by hand and usually takes a lot of time. The objective of the proposed hackathon challenge is to find an AI-powered way to extract the tissue dimensions from an unlabeled dataset
Relevant data sets: available per direct request to Goran Vuksic via Discord
Medical imaging is a central tool for diagnosis of diseases. During the last years, many AI algorithms have been developed tackling specific diagnostic questions. While the principle that AI algorithms can detect and segment findings and anatomical structures on medical imaging data with high levels of accuracy is now well established, the integration into clinical workflows remains a true challenge. However, if this challenge is not overcome, AI will not have an impact on patient outcomes. This hackathon challenge is all about exploring creative and efficient ways to produce AI results and feed them into clinical workflows – and not stopping half way after having demonstrated high levels of accuracy of the AI algorithm. A sample dataset and simple clinical question (wrist fracture yes / no) is provided. But you can also make use of additional or other public datasets. And always keep in mind: communication is the key!"
"We agreed that it would be nice to have a creative component in it- and it would be boring to come up with another "detect/segment X on Y" challenge to find out that the accuracy is 91 or 94 or 99 %. Basically, the fact that AI can detect and segment on images is known. Much more interesting in my opinion is to explore ways to overcome the challenges of communication of AI results to the radiologist / clinician. If this remains unsolved, AI will not have an impact on patient outcome. There are many many options (Desktop app, Push messages to mobile phones, integration into primary systems) and I would really like to see creative prototypes /ideas of the teams. And then work with the winners for three months at USB."
Relevant data sets: available per direct request to Goran Vuksic via Discord
TRACK: Predictive Medicine
To be able to predict what will happen in the future, to both an individual patient or the population, clinicians need to analyze and combine a huge variety of data, both structured and unstructured. We would like to see for example how it’s possible to obtain predictive models of evolution of the disease, epidemiological models, information on the response to the various treatments applied, knowledge of virus for the creation of a vaccine, and sociodemographic data on the impact on the population of the covid-19 virus, or predict the likelihood of a rare disease.
Leveraging Clinical Data to Advance on the Clinical Management of COVID-19 (Hospital Data Set): use of an open data sets containing anonymized EHR data from thousands of COVID-19 patients to advance our knowledge, prediction, treatment and overall understanding of COVID-19.
Objective of the proposed hackathon challenge is to obtain predictive models of evolution, epidemiological models, information on the response to the various treatments applied, knowledge of virus for the creation of a vaccine, and sociodemographic data on the impact on the population of the virus by analyzing a combined dataset of various hospitals by integrating data form two different medical providers, quality checking the data, identifying and correcting biases, etc.
Relevant data sets:
Data request forms:
- Sanitas: request form
- HM Hospitales: https://www.hmhospitales.com/coronavirus/covid-data-save-lives/english-version (When requesting data please include your personal data and project data. In the gray fields, referring to oficial ID documents, include your personal IDs.)
Rare disease predictor (publicly available datasets). On average, the time to diagnosis for rare diseases is 5 years full of questions, specialists, misdiagnosis, unnecessary surgeries and failed treatments. In essence, the rare disease problem is not fit to the healthcare systems designed for the majorities. There have been many algorithms created to predict the correct rare disease based on the patient phenotype (set of symptoms). There is, however, no published algorithm that leverages the advances and representation capabilities of modern AI technology. The most commonly used algorithms rely heavily on the topology of ontological structures to compute similarities between symptoms and extrapolate them to the disease/patient level. Furthermore, they are based on data aggregated at the disease level, which removes vital information of symptom co-occurrence and variability. The main goal of this project is to build a predictive model that is capable of quickly inferring the likely disease of a new patient taking symptoms, sex and age into account. The model should use as input data from individual patients and should be able to handle missing data when, for example, the patient age is not available. The model development will be mainly based on simulated data. The model should have representations of the Human Phenotype Ontology (HPO) terms not directly associated with a rare disease in the current databases, as terms are added faster to the HPO ontology that are annotated to the disease databases. This goal is a large unmet need in the global rare disease community, especially in those areas where large reference centers are not accessible for families. The secondary goal of this project is that the algorithm should be quick enough to be used in a what-if system to allow users to quickly evaluate hypothetical scenarios of the patient developing symptom s at age y. The tertiary goal is to leverage the concept representation in this model to suggest what other symptoms a new patient may have. The suggestion should be the most clinically useful towards predicting the correct diagnosis. Such suggestions would also play a vital role in order to increase the granularity and richness of the patient input phenotype, which should, in turn, increase the predictive power of the disease prediction algorithm.
Relevant data sets: https://github.com/foundation29org/RareCrowds
TRACK: Empower Patients
Patients often struggle to understand their medical reports or other medical documents because they don’t have the necessary knowledge to understand the terminology. This can leave patients discouraged and in a state of helplessness, and may add an additional level of desperation on top of an already stressful situation. It may also make it harder for healthcare workers to have a meaningful discussions about different treatment options with their patients.
We would like to see how technology can help in simplifying the complexity of medical terminology and turn it into everyday language, for example by taking the SumMed.org community project to the next level.
We also would like to see how technology can help with the topic of "trust", e.g. how can patients assess the trustworthiness of medical information (like research papers, brochures, health magazine articles…) they find on the internet.
Finally, we would like to see contributions that support more accessibility and multi-language support, especially for languages other than English. Many NLP models today are based on a english corpora / language models. How can we support e.g. German, Italian or French native patients, both from the input side (e.g. medical report/text to analyze is in non-english language) and the output side (e.g. the results needs to be translated into the patients native language).
"Translate" or illustrate a medical document from an input source (like MRI scan document, research article, web page, Audio transcription etc..) into "plain" everyday language understandable by patients.
Some ideas: Apply NLP and other ML techniques, such as automatic summarization (extractive / abstractive), use "visual storytelling", provide additional meaningful and trustworthy references based on the input etc.
Think about how we can provide a great user experience for patients, and how to make it more accessible for people with different reading abilities, non-english speakers and other patient groups with special requirements.
Everything that helps patients and others to better and faster "understand" a medical document and its implications will be considered for this challenge. Ideally, we would like to see contributions that could be integrated into the SumMed.org Open Source project to help us progress this initiative. Please have a look at the current state of the prototype, to avoid simply re-inventing what has already been done: https://vimeo.com/476680206
Existing SumMed.org Open Source prototype
- SumMed.org Homepage: https://summed.org
- GitHub Repository - API: https://github.com/whatchacallit/medjargonbuster-api
- GitHub Repository - Web frontend: https://github.com/whatchacallit/medjargonbuster-webapp
Tech stack is React / Typescript + ANT Design for the web frontend, and Python/FastAPI/SpaCy 2.x/… for the API backend.
Example data sets
Some documents for testing:
- Breast Cancer Screening Recommendations: African American Women Are at a Disadvantage (web article or PDF) https://academic.oup.com/jbi/article/2/5/416/5901429?guestAccessKey=289c5c3e-d63c-4a04-8963-c9ac392d1705
- Clinical Guideline (english, pdf): https://github.com/whatchacallit/medjargonbuster-api/raw/main/test-documents/research_papers/simple.pdf
- German surgery report (anonymized, scan/photo): https://github.com/whatchacallit/medjargonbuster-api/raw/main/test-documents/clinical_reports/de-OP-Bericht-001.jpeg
- German MRI report (anonymized, Word document): https://github.com/whatchacallit/medjargonbuster-api/blob/main/test-documents/clinical_reports/de-report01.docx?raw=true
- Research paper (english, PDF): https://www.ajronline.org/doi/pdf/10.2214/AJR.10.6157
Potentially interesting research
- TLDR: Extreme Summarization Of Scientific Documents https://github.com/whatchacallit/medjargonbuster-api/raw/main/test-documents/research_papers/2004.15011.pdf
- Explainable Automated Fact-Checking for Public Health Claims https://www.aclweb.org/anthology/2020.emnlp-main.623
- Med7: a transferable clinical natural language processing model for electronic health records https://arxiv.org/abs/2003.01271
- MedSpacy: https://github.com/medspacy/medspacy
Medical APIs and databases
Publication / pre-print DBs
Medical dictionary (Merriam-Webster)
Some useful Open Source libraries and resources
- spaCy: https://spacy.io/
- Apache Tika: https://tika.apache.org/
- FastAPI: https://fastapi.tiangolo.com/
- Overview of different Text Summarization approaches and code examples: https://www.machinelearningplus.com/nlp/text-summarization-approaches-nlp-example/
Some useful Microsoft Azure services
- Azure Text Analytics for Health: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health?tabs=ner (contact us for demo access, it's still in preview)
- Azure Computer Vision OCR: https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recognizing-text
- Azure Immersive Reader (client side component for text reading and comprehension, text-to-speech and language translation): https://docs.microsoft.com/en-us/azure/cognitive-services/immersive-reader/
- Azure Web App for Containers (run containers on Azure): https://azure.microsoft.com/en-us/services/app-service/containers/#overview
- Azure Functions (Serverless): https://docs.microsoft.com/en-us/azure/azure-functions/
- Bing custom search: https://www.customsearch.ai/