I have been exploring the potential of using machine learning in identifying COVID-19 cases using x-rays. Code & results are available on Github.

Radiologists currently do not recommend the use of CT-scans for COVID-19 detection as it’s hard to tell the difference between COVID-19 and common lung infections like bacterial or viral pneumonia 1 2. AI could help us differentiate these fine nuances in matter of seconds. This machine learning model achieves 93.5% ± 5 accuracy in identifying COVID-19 and pneumonia cases using X-rays.

While the results are promising, this is merely a preliminary experiment and is not suitable for clinical diagnostics due to the limited crowdsourced dataset. We will need to concentrate our efforts on sourcing a quality dataset before we can consider such tools ready for clinical trials. I believe this will be a valuable tool to supplement understaffed radiology departments.

This experiment was inspired by a study in China which used CT-scans to train a machine learning model to screen patients for COVID-19. CT-scans are probably a better alternative for a dataset, since COVID-19 symptoms such as pulmonary nodules can be easily seen in a CT-scan than X-rays. Unfortunately, the publicly available dataset for this is even more limited. There is a study that even suggests CT-scans are better for diagnostics than RT-PCR tests.

Automating the process of evaluating CT-scans/X-Rays for clinical diagnosis (or even helping radiologists to evaluate them faster) would allow us to do mass screening and will help us get out of this crisis.

I am looking for collaborators who are interested in building a clinical dataset or improving the performance and safety of the model. Do reach out to me.



  • We're only using anteroposterior X-rays
  • COVID-19 patients X-rays are crowdsourced
  • Normal and pneumonia X-rays are selected from cohorts of pediatric patients from Guangzhou Women and Children’s Medical Center (available on Kaggle)
  • since the available dataset for COVID-19 is small, I made sure the distribution is balanced. ie equal number of X-rays for normal, pneumonia and covid

Future work

  • Experiment with EfficientNet and DenseNet
  • Integrate image segmentation
  • Fine tuning & improve dataset quality