'Integrating a 2D Medical Imaging X-Ray classifier which was trained on jpegs with a script which receives DCM files to be able to diagnose dicom files

Below is the order of how I am going to present my problem:

  • First I will show you the script .py that I am using to run the web app in a local host(flask app). This web app is a classifier which shows you whether a person has either Viral Pneumonia, Bacterial Pneumonia or they are Normal. Thus there are three classes(Viral, Bacterial or Normal) looking from chest x-rays which are in jpeg format.
  • Second I will show you the differnt .py script for Binary Classification for Pneumonia which is taking in raw dicom files and converting them into numpy arrays before they are diagnosed.

So to achieve diagnosis I am trying to integrate my app.py script which takes in jpegs, with the Pneumonia binary classification which takes in dicom files so as to take advantage of the dicom processing function of the second script but using all of the information and weights of the Viral and Bacterial one that I have, so that it can be used in a clinical setup. Clinical setups use dicom files not jpegs, that is why I am trying to combine these two scripts to reach the goal.

Below is my app.py script for Viral and Bacterial Pneumonia Classification which takes in jpegs, which I am trying to integrate on the other script that I am going to attach further below:

#::: Import modules and packages :::
# Flask utils
from flask import Flask, redirect, url_for, request, render_template
from werkzeug.utils import secure_filename
from gevent.pywsgi import WSGIServer

# Import Keras dependencies
from tensorflow.keras.models import model_from_json
from tensorflow.python.framework import ops
ops.reset_default_graph()
from keras.preprocessing import image

# Import other dependecies
import numpy as np
import h5py
from PIL import Image
import PIL
import os

#::: Flask App Engine :::
# Define a Flask app
app = Flask(__name__)

# ::: Prepare Keras Model :::
# Model files
MODEL_ARCHITECTURE = './model/model_adam.json'
MODEL_WEIGHTS = './model/model_100_eopchs_adam_20190807.h5'

# Load the model from external files
json_file = open(MODEL_ARCHITECTURE)
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)

# Get weights into the model
model.load_weights(MODEL_WEIGHTS)
print('Model loaded. Check http://127.0.0.1:5000/')


# ::: MODEL FUNCTIONS :::
def model_predict(img_path, model):
    '''
        Args:
            -- img_path : an URL path where a given image is stored.
            -- model : a given Keras CNN model.
    '''

    IMG = image.load_img(img_path).convert('L')
    print(type(IMG))

    # Pre-processing the image
    IMG_ = IMG.resize((257, 342))
    print(type(IMG_))
    IMG_ = np.asarray(IMG_)
    print(IMG_.shape)
    IMG_ = np.true_divide(IMG_, 255)
    IMG_ = IMG_.reshape(1, 342, 257, 1)
    print(type(IMG_), IMG_.shape)

    print(model)

    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='rmsprop')
    predict_x = model.predict(IMG_)
    print(predict_x)
    prediction = np.argmax(predict_x,axis=1)
    print(prediction)

    return prediction


# ::: FLASK ROUTES
@app.route('/', methods=['GET'])
def index():
    # Main Page
    return render_template('index.html')

@app.route('/predict', methods=['GET', 'POST'])
def upload():

    # Constants:
    classes = {'TRAIN': ['BACTERIA', 'NORMAL', 'VIRUS'],
               'VALIDATION': ['BACTERIA', 'NORMAL'],
               'TEST': ['BACTERIA', 'NORMAL', 'VIRUS']}

    if request.method == 'POST':

        # Get the file from post request
        f = request.files['file']

        # Save the file to ./uploads
        basepath = os.path.dirname(__file__)
        file_path = os.path.join(
            basepath, 'uploads', secure_filename(f.filename))
        f.save(file_path)

        # Make a prediction
        prediction = model_predict(file_path, model)

        predicted_class = classes['TRAIN'][prediction[0]]
        print('We think that is {}.'.format(predicted_class.lower()))

        return str(predicted_class).lower()

if __name__ == '__main__':
    app.run(debug = True)`

Below again is the already functioning script of Pneumonia binary classification which is taking in dicom files that I am trying to integrate with the weights and preprocessing information of the Viral and Bacterial classifier that I want to use:

## Loading standard modules and libraries 

import numpy as np
import pandas as pd
import pydicom
%matplotlib inline
import matplotlib.pyplot as plt

import keras 
from keras.models import Sequential
from keras.layers import Dense
from keras.models import model_from_json
from skimage.transform import resize


# This function reads in a .dcm file, checks the important fields for our device, and returns a numpy array
# of just the imaging data

def check_dicom(filename): 
    
    print('Loading file {} ...'.format(filename))
    ds = pydicom.dcmread(filename)   
    
    if (ds.BodyPartExamined !='CHEST') | (ds.Modality !='DX') | (ds.PatientPosition not in ['PA', 'AP']):
        print('The image is not valid because the image position, the image type or the body part is not as per standards')
        return
    else:
        print('ID:', ds.PatientID, 
              'Age:', ds.PatientAge, 
              'Modality:', ds.Modality,
              'Postion: ', ds.PatientPosition, 
              'Body Part: ', ds.BodyPartExamined, 
              'Study Desc: ', ds.StudyDescription)
    
    img = ds.pixel_array
    return img
    # This function takes the numpy array output by check_dicom and 
# runs the appropriate pre-processing needed for our model input

def preprocess_image(img,img_mean,img_std,img_size): 
    # todo
    
    img = resize(img, (224,224))   
    img = img / 255.0  
    grey_img = (img - img_mean) / img_std 
    
    proc_img = np.zeros((224,224,3))
    proc_img[:, :, 0] = grey_img
    proc_img[:, :, 1] = grey_img
    proc_img[:, :, 2] = grey_img
    
    proc_img = np.resize(proc_img, img_size)
    
    return proc_img

# This function loads in our trained model w/ weights and compiles it 

def load_model(model_path, weight_path):
    # todo
    
    json_file = open(model_path, 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights(weight_path)
    
    return model
# This function uses our device's threshold parameters to predict whether or not
# the image shows the presence of pneumonia using our trained model

def predict_image(model, img, thresh): 
    # todo    
    
    result = model.predict(img)  
    print('Predicted value:', result)
    
    predict=result[0]
    prediction = "Negative"
    if(predict > thresh):
        prediction = "Positive"
    
    return prediction 
# This function uses our device's threshold parameters to predict whether or not
# the image shows the presence of pneumonia using our trained model

def predict_image(model, img, thresh): 
    # todo    
    
    result = model.predict(img)  
    print('Predicted value:', result)
    
    predict=result[0]
    prediction = "Negative"
    if(predict > thresh):
        prediction = "Positive"
    
    return prediction 
test_dicoms = ['test1.dcm','test2.dcm','test3.dcm','test4.dcm','test5.dcm','test6.dcm']

model_path = "my_model2.json" #path to saved model
weight_path = "xray_class_my_model2.best.hdf5" #path to saved best weights

IMG_SIZE=(1,224,224,3) # This might be different if you did not use vgg16
img_mean = 0.49262813   # mean image value from Build and train model line 22
img_std = 0.24496286 # loads the std dev from Build and train model line 22

my_model = load_model(model_path, weight_path) #loads model
thresh = 0.62786263 #threshold value for New Model2 from Build and train model line 66 at 80% Precision 


# use the .dcm files to test your prediction
for i in test_dicoms:
    
    img = np.array([])
    img = check_dicom(i)
    
    if img is None:
        continue
        
    img_proc = preprocess_image(img,img_mean,img_std,IMG_SIZE)
    pred = predict_image(my_model,img_proc,thresh)
    print('Model Classification:', pred , 'for Pneumonia' )
    print('--------------------------------------------------------------------------------------------------------')

Output of above script:

Loading file test1.dcm ...
ID: 2 Age: 81 Modality: DX Postion:  PA Body Part:  CHEST Study Desc:  No Finding
Predicted value: [[0.4775539]]
Model Classification: Negative for Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test2.dcm ...
ID: 1 Age: 58 Modality: DX Postion:  AP Body Part:  CHEST Study Desc:  Cardiomegaly
Predicted value: [[0.47687072]]
Model Classification: Negative for Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test3.dcm ...
ID: 61 Age: 77 Modality: DX Postion:  AP Body Part:  CHEST Study Desc:  Effusion
Predicted value: [[0.47764364]]
Model Classification: Negative for Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test4.dcm ...
The image is not valid because the image position, the image type or the body part is not as per standards
Loading file test5.dcm ...
The image is not valid because the image position, the image type or the body part is not as per standards
Loading file test6.dcm ...
The image is not valid because the image position, the image type or the body part is not as per standards

Threshold of 0.62786263 is considered at 80% Precision

Below is what I have tried so far but the diagnosis I am getting is always Viral on each and every dicom sample:

## Loading standard modules and libraries

import numpy as np
import pandas as pd
import pydicom
from PIL import Image
#%matplotlib inline
import matplotlib.pyplot as plt

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.models import model_from_json
from keras.preprocessing import image
from skimage.transform import resize

# This function reads in a .dcm file, checks the important fields for our device, and returns a numpy array
# of just the imaging data

def check_dicom(filename):

    print('Loading file {} ...'.format(filename))
    ds = pydicom.dcmread(filename)

    if (ds.BodyPartExamined !='CHEST'): #| (ds.Modality !='DX'): #| (ds.PatientPosition not in ['PA', 'AP']):
        print('The image is not valid because the image position, the image type or the body part is not as per standards')
        return
    else:
        print('ID:', ds.PatientID, 
              'Age:', ds.PatientAge, 
              'Modality:', ds.Modality,
              'Postion: ', ds.PatientPosition, 
              'Body Part: ', ds.BodyPartExamined, 
              'Study Desc: ', ds.StudyDescription)
              
    img = ds.pixel_array
   
    return img


# This function takes the numpy array output by check_dicom and
# runs the appropriate pre-processing needed for our model input

def preprocess_image(img):
    # todo

    #im = np.reshape(img, (342,257 ))
    #im = np.arange(257)
    #img = Image.fromarray(im)
    #img = image.load_img(img).convert('L')
    img = resize(img, (342,257))
    grey_img = img / 255.0
    #grey_img = (img - img_mean) / img_std

    proc_img = np.zeros((1,342,257,1))
    proc_img[:, :, :, 0] = grey_img
    #proc_img[:, :, :, 1] = grey_img
    #proc_img[:, :, :, 2] = grey_img
    proc_img = proc_img.reshape(1, 342, 257, 1)
    
    return proc_img

# This function loads in our trained model w/ weights and compiles it

def load_model(model_path, weight_path):
    # todo
    
    json_file = open(model_path, 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights(weight_path)
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='rmsprop')
    
    return model

# This function uses our device's threshold parameters to predict whether or not
# the image shows the presence of pneumonia using our trained model


def predict_image(model, img):
    # todo
    
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='rmsprop')
    #x = np.expand_dims(img, axis=0)
    predict_x= model.predict(img)
    print(predict_x)
    prediction = np.argmax(predict_x,axis=1)
    print(prediction)
  
    return prediction

test_dicoms = ['test3.dcm','test2.dcm','test1.dcm','test4.dcm','test5.dcm','test6.dcm']

model_path = "model_adam.json" #path to saved model
weight_path = "model.h5" #path to saved best weights

#IMG_SIZE=(1,342,257,1) # This might be different if you did not use vgg16
#img_mean = 0.49262813   # mean image value from Build and train model line 22
#img_std = 0.24496286 # loads the std dev from Build and train model line 22

#my_model = load_model(model_path, weight_path) #loads model
#thresh = 0.62786263 #threshold value for New Model2 from Build and train model line 66 at 80% Precision


# use the .dcm files to test your prediction

for i in test_dicoms:

    img = np.array([])
    img = check_dicom(i)
    
    

    if img is None:
        continue
           
    classes = {'TRAIN': ['BACTERIAL', 'NORMAL', 'VIRAL'],
               'VALIDATION': ['BACTERIA', 'NORMAL'],
               'TEST': ['BACTERIA', 'NORMAL', 'VIRUS']}
    img_proc = preprocess_image(img)
    prediction = predict_image(load_model(model_path, weight_path),img_proc)
    predicted_class = classes['TRAIN'][int(prediction[0])]
    print('Model Classification:', predicted_class, 'Pneumonia' )
    print('--------------------------------------------------------------------------------------------------------')

Below is the output:

2022-01-02 10:50:00.817561: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-02 10:50:00.817601: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Loading file test3.dcm ...
ID: 61 Age: 77 Modality: DX Postion:  AP Body Part:  CHEST Study Desc:  Effusion
2022-01-02 10:50:02.652828: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-01-02 10:50:02.652859: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-02 10:50:02.652899: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (Wisdom-HP-250-G3-Notebook-PC): /proc/driver/nvidia/version does not exist
2022-01-02 10:50:02.653123: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[[0.01132523 0.00254696 0.98612785]]
[2]
Model Classification: VIRAL Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test2.dcm ...
ID: 1 Age: 58 Modality: DX Postion:  AP Body Part:  CHEST Study Desc:  Cardiomegaly
[[0.01112939 0.00251635 0.9863543 ]]
[2]
Model Classification: VIRAL Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test1.dcm ...
ID: 2 Age: 81 Modality: DX Postion:  PA Body Part:  CHEST Study Desc:  No Finding
[[0.01128576 0.00255111 0.9861631 ]]
[2]
Model Classification: VIRAL Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test4.dcm ...
The image is not valid because the image position, the image type or the body part is not as per standards
Loading file test5.dcm ...
ID: 2 Age: 81 Modality: CT Postion:  PA Body Part:  CHEST Study Desc:  No Finding
[[0.01128576 0.00255111 0.9861631 ]]
[2]
Model Classification: VIRAL Pneumonia
--------------------------------------------------------------------------------------------------------
Loading file test6.dcm ...
ID: 2 Age: 81 Modality: DX Postion:  XX Body Part:  CHEST Study Desc:  No Finding
WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fba38ed19d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
[[0.01128576 0.00255111 0.9861631 ]]
[2]
Model Classification: VIRAL Pneumonia
---------------------------------------

My suspicion is that I did it wrong on the image preprocessing steps when I have integrated these two scripts (Remember: The goal is to take advantage of the Dicom reading function of the second script). Thus the model is taking in and predicting wrong input altogether due to wrong array arrangements on trying to preprocess when I have integrated these two scripts. If in need of some information on parameters in the jupyter training presentation of the model kindly highlight.



Solution 1:[1]

When a classifier work okay in train/test but not when doing inference in production, a very common reason is that the training data was processed differently from the production data. The fix is to make sure it is processed the same, ideally using the same bit of code.

  1. How were the jpegs the classifier was trained on processed? Do the originally come from dicoms? If yes, what was the exact code for the conversion?

  2. How were the jpegs loaded during training? Pay special attention to bits that modify the data rather than merely copy it, such as grey_img = (img - img_mean) / img_std and the other commented out lines in your code (maybe they were not commented out during training)

  3. If you copy the dicom->jpeg conversion from 1 and the jpeg loading from 2, you will probably have a working prediction

Solution 2:[2]

The below dicom to jpeg conversion function did the job for me:

def take_dicom(dicomname):
    ds = read_file('Dicom_files/' + dicomname)
    im = fromarray(ds.pixel_array)
    final_img = im.save('./Jpeg/' + dicomname + '.jpg')
    pure_jpg = dicomname + '.jpg' 

    return pure_jpg

Just had to use the os function to point my prediction function to where it should pick these jpegs before they are preprocessed and diagnosed:

def preprocess_image(pure_jpg):
    '''
        Args:
            -- img_path : an URL path where a given image is stored.
            -- model : a given Keras CNN model.
    '''
    
    #print(pure_jpg)
    basepath = os.path.dirname('./Jpeg/')       
    file_path = os.path.join(
            basepath, img)
    #image = take_dicom(file_path)
    #print(str(image))
    
    IMG = image.load_img(file_path).convert('L')
    #print(IMG)
    #print(type(IMG))

    # Pre-processing the image
    IMG_ = IMG.resize((257, 342))
    #print(type(IMG_))
    IMG_ = np.asarray(IMG_)
    #print(IMG_.shape)
    IMG_ = np.true_divide(IMG_, 255)
    IMG_ = IMG_.reshape(1, 342, 257, 1)
    #print(type(IMG_), IMG_.shape)

    return IMG_

However, the problem is that it's only working for the following two dicom imaging modalities:

  1. DX (Digital X-Ray)
  2. CT (Computed Tormography)

CR (Computed Radiography) dicom images are failing to convert.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex I
Solution 2 Dharman