Reinforcement learning , is a sub domain of artificial intelligence that allows agent to fulfil a given goal while maximizing a numerical reward signal. It was developed to deal with three main strategies. The first is the concept of learning by trial and error , discovered during researches undertaken in psychology and neuroscience of animal learning. The second concept is the problem of optimal control developed in the 1950s using a discrete stochastic version of the environment known as Markovian decision processes  and adopting a concept of a dynamical system’s state and optimal return function (Reward) and defining the “Bellman equation”  to optimize the agent behaviour over the time (Dynamic programming ). The last concept concerns the temporal-difference methods , which become the mainstream, and was boosted by the actor-critic  architecture.
Deep learning  is a machine learning sub-domain, based on the concept of artificial neural networks that works similar to human brain for processing data and create patterns to be used in decision making. Deep learning allows automatic feature engineering and end-to-end learning through gradient descent  and back-propagation . There are different types of Deep learning nets, whose usage depend and the nature of the problem being treated and on the application for which they are used. For time sequences like speech recognition , natural language processing  recurrent neural network is used.
For extracting visual features, like image classification  and object detection , convolutional neural network  is used. For data pattern recognition like classification and segmentation , , , the feed-forward networks and for some complex tasks like video processing , object tracking  and image captioning  the combinations of the networks are used.
The link between reinforcement learning  and deep learning  technologies was made, while artificial intelligence researchers were seeking to implement a single agent that can think and act autonomously in the real world, and get rid of any hand-engineered features . In fact, in 2015, Google Deep-mind succeed to combine reinforcement learning, which is a decision-making framework and deep learning, which is a representation learning  framework allowing visual features extraction, to create the first end-to-end artificial agent that achieves human-level performance in several and diverse domains. This new technology named deep reinforcement learning is used now, not only to play ATARI  games, but also to design next generation of intelligent self-driving cars like Google with Waymo, uber and tesla.
Deep reinforcement learning(DRL)  is a newer machine learning technique which became popular in 2015 that combines deep learning architectures and reinforcement learning algorithms to create efficient algorithms that can be applied to solve the problems in the areas of robot engineering, healthcare maintenance, video gaming applications, financing etc. The deep reinforcement learning algorithms uses neural networks algorithms like CNN  to learn deep features and reinforcement learning algorithms like Q-Learning , actor critic  etc. to solve previous unsolvable problems. The traditional reinforcement learning use lookup table to store states and actions, which is too slow, since it learns the value of each state individually, and it is memory consuming, especially when we deal with large or infinite problems, and this is due to what Richard Bellman called the curse of dimensionality . By leveraging deep learning algorithms, especially, convolutional neural networks, it became possible for RL algorithm not only to act but to be totally autonomous and learn to see and act. There are basically three main types of deep reinforcement learning (DRL) algorithms.
Value optimization : The algorithm optimizes the Value Function V or Q or the advantage function A.
Policy optimization : The algorithm optimizes the policy directly function π(θ) representing the neural network.
Actor-critic  which incorporates the advantages of each of the above, by learning value function with implicit policy: It includes Policy gradient component “Actor” which calculates policy gradients. Value function component “Critic” that observes the performance of the agent.
Deep learning  is a branch of machine learning based on deep (> 2 hidden layers) and wide (many input/hidden neurons) neural networks, that model high-level abstractions in data, based on an architecture composed of multiple non-linear layers of neurons. Each neuron of the hidden layers performs a linear combination of its inputs and applies a non-linear function (Relu, Softmax, Sigmoid, and tanh,) to the result, which allows neurons from the next layer to separate classes with a curve (hypercurve/hyperplane) and no more with a simple line thus, hidden layers learn hierarchical features. The three main types of neural networks are:
Convolutional Network : A convolutional network assumes special spatial structure in its input. In particular, it assumes that inputs that are close to each other in the original input are semantically related. This assumption makes most sense for images, which is one reason convolutional layers have found wide use in deep architectures for image processing.
Recurrent Neural Network (RNN) : Layers Recurrent neural network layers are primitives which permit neural networks to learn from sequences of inputs. This layer assumes that the input evolves from sequence step to next sequence step following a defined update rule which can be learned from data. This update rule presents a prediction of the next state in the sequence given all the states which have come previously.
II. Analysis of Complex Images
The analysis of histo-patholical images, MRI images, X-ray images in the form of complex images is a daunting and time-consuming task in the domain of image processing  and interpretation. Generally these images have indistinguishable parts which are important to be separated to determine the severity of the disease for further medical diagnosis . Technically these images are processed by the system in the binary form using gray scale levels which are difficult to process for these complex images. The following are the difficulties that occurs in complex image analysis.
The complex images in the form of histopathological images have very minute or indistinguishable differences in terms of important information they carries.
Traditionally the extraction of the important information in form of features were mainly handcrafted using edge and corner knowledge. The extraction of this information from the images play a vital role for making accurate predictions.
The histopathological regions of grey scale images are difficult to distinguish as relevant and irrelevant regions may be non-separable. The images needs to be segmented properly for overlapping of cells/scenes before training of the model.
The images also needs to be denoised, enhanced and restored to enhance its quality before the model is trained.
The process of image registration is difficult because of morphological distortion and staining variations for histopathological images.
III. Literature Review and Related Work
The following section gives a review of literature with respect to the applications of deep learning techniques and deep reinforcement learning for the year 2018-2019.
Gustavo Carneiro et al. proposed a deep reinforcement-based method for automatic detection of breast cancer . The dynamic contrast enhanced magnetic resonance volumes (DCE-MRI) have been used in past for the study of brain but now its use have been broaden to pathologies of heart ,cerebral afflictions, stroke etc. with the aim of early detection and treatment. The current approaches in lesion detection are mostly dependent on handcrafted features and that too requires exhaustive search procedures to accommodate variations in lesion shape, location and size. Also, these approaches are computationally complex and lacks accuracy of lesion detection. Caicedo and Lazebnik have recently proposed the use of a deep Q-network (DQN)  to detect objects efficiently when the amount of data to deal with is limited. Ghesu et al. used DQN to extract consistent patterns from visual classes using fixed small-size regions of the medical images to aid anatomical landmark detection . To overcome these obstacles a new algorithm for lesion detection from DCE-MRI after being inspired by previously proposed DQN is proposed. They replace the fixed small size regions with the bounding box whose size keeps on changing with every successive iteration. The reinforcement learning agent learns a policy to alter the focus of attention using scaling and translation from initially large bounding box to small bounding box inscribing a lesion if exists. The DQN decides the next action (i.e. either to scale or translate the current bounding box or to end the search process).The dataset of 117 patients divided into 57 train and 59 test cases have been used to evaluate the accuracy of proposed methodology and obtained results show same detection accuracy in accordance with the already existent methods but with huge reduction in run times.
Recently a paper titled “Deep recurrent attention models for histopathological image analysis” by Alexendre Momeni et al. discusses the analysis of histopathological images using deep recurrent models . The authors proposed a Deep Recurrent Attention Model (DRAM) (Mnih et. al) to mimic pathologist process . The model is inspired from the manner how human recognise visual tasks thereby attending to most informative regions of the image. Also similar to CNN it is also translation invariant but is independent of the input image size. The model is trained with the reinforcement learning to attend the most informative regions of the large image patches. The model is tested for histological and molecular subtype classification of the gliomas (a type of adult brain tumour) using data from The Cancer Genome Atlas (TCGA).The results depict that DRAM has a comparable performance to already existent methods of CNN despite processing only a selected number of image patches.
Issa Ali et al. performed a lung node detection using deep reinforcement learning . The authors proposed a method with an objective to develop and validate deep reinforcement learning model based on deep artificial neural networks for early detection of lung nodules in thoracic CT images. The model is inspired from AlphaGo system which takes a raw CT image as input and views it as a collection of states, and output a classification of whether a nodule is present or not. The LIDC/IDRI database hosted by the lung nodule analysis (LUNA) challenge is used to train a model. 888 CT scans with annotations based on agreement from at least three out of four radiologists were considered for experimentation. There were 590 individuals having one or more nodules, and 298 having none. The training results yielded an overall accuracy of 99.1% [sensitivity 99.2%, specificity 99.1%, positive predictive value (PPV) 99.1%, and a negative predictive value (NPV) 99.2%]. In our test, the results yielded an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%). These early results show an inclination towards solving the major issue of false positives in CT screening of lung nodules, and can be used to save unnecessary follow-up tests and expenditures.
C. Martinez et al. proposed a deep reinforcement learning approach for early classification of time series . The time series data when classified early give rise to many applications in real life ranging from predictive maintenance to personalized medicine. This task is being addressed by a novel approach based on reinforcement learning by introducing an early classifier agent , an end-to-end reinforcement learning agent i.e. (deep Q-network, DQN)  to perform early classification in an efficient manner).The early classification problem is formulated in a reinforcement learning framework by introducing a suitable set of states and actions along with define a specific reward function that aims at finding a settlement between earliness and classification accuracy. Even though there were many solutions already existent but they do not take time into account for making a final decision. The provided solution allows the user to set this trade-off in a more flexible way. Specifically the experiments were performed on datasets from the UCR time series archive  which showed the agent’s ability in continuously adapting to the behaviour with explicit human intervention and gradually learned to maintain a balance between accurate and fast predictions.
Adriana Dinis et al. presented a self-developing system for medical data analysis . In this paper authors present a concept project for a self-developing system based on agents built for a hospital. The system monitors patients during and after being released from hospitalization, with the aim of understanding patterns and predicting future problems. Due to its complexity and dynamism the agents must be automatically generated and also they need to cooperate and compete with each other in order to get good results. By combining meta-heuristic algorithms with reinforcement and clustering techniques they targeted a large degree of autonomy in decision making. Zi Wang et al. employed the reinforcement learning technique for studying the cell movements in early stage of Caenorhabditis elegans embryogenesis . The proposed work captures the complexity of cell movement patterns in the embryo and overcomes the local optimization problem encountered by traditional rule- based, agent-based modelling that uses greedy algorithms. The data was collected using 3-D time lapse microscopy images to explore the cell migration paths in the process of embryogenesis. The modelled framework uses the individual cell as an agent that stores a variety of information regarding its size, fate, division time and group information. The results indicate that deep reinforcement learning based agent system has gained a remarkable success to model regulatory mechanisms of cell movements.
Leo Celi et al. proposed a deep reinforcement learning model for treatment of sepsis . Sepsis is a life-threatening illness caused to multiple infections inside the body which is a leading cause of mortality among the patients in Europe. The authors use a continuous state space models and deep reinforcement learning to deduce the treatment policy for Sepsis. The data is collected from Multiparameter Intelligent Monitoring in Intensive Care (MIMIC-III v1.4) database. The results give a good learning policy that aid clinicians to make medical decisions and improve the likelihood of patient survival. Similarly Ning Liu et al. designed a deep reinforcement learning framework to estimate the optimal dynamic treatment regimens from observational medical data . The Centre for International Bone Marrow Transplant Research (CIBMTR) registry database was used to construct a dataset. The results gave a promising accuracy in predicting man expert’s decisions along with the high expected reward function using Deep Reinforcement Learning (DRL) based dynamic treatment regimens on medical registry data. Alexander Rakhlin et al. used deep convolutional neural network for breast cancer histology image analysis . The work proposes a computational approach based on deep convolution neural networks for breast cancer histology image classification. The ICIAR 2018 Grand Challenge on Breast Cancer Histology Images were used to construct a dataset from hematoxylin and eosin stained breast cancer histology images. The experimental results comprises of 87.2% accuracy for 4-class classification task and 93.8% accuracy for 2-class classification task to detect carcinoma. The approach outperforms other common methods in automated histopathological image classification.
Riku Turkki et al. employed breast cancer outcome prediction with tumour tissue images using deep learning . The tissue microarray samples from 1299 patients of breast cancer were taken nationwide. The findings demonstrate the feasibility of learning prognostic signals in tumour tissue images without domain knowledge. Although further validation is needed, our study suggests that machine learning algorithms can extract prognostically relevant information from tumour histology complementing the currently used prognostic factors in breast cancer.
Dmitrii Bychkov et al. proposed a deep learning based tissue analysis to predict outcome in colorectal cancer .A novel approach for directly predicting patient outcome from digitized haematoxylin-eosin-stained tumour tissue microarray (TMA) samples from 420 cancer patients was used using a combination of convolutional and recurrent architectures. The experimental results depicts that deep learning-based outcome prediction using only a portion of tissue areas as input outperforms visual histological assessment performed by human experts.
Parampal S. Grewal et al. reviewed the use of deep learning in ophthalmology . Ophthalmology is a branch of science that is used in the screening, diagnosis and management of eye disease. Deep Learning is an emerging technology with many potential applications in ophthalmology. The deep learning tools has been applied to different diagnostic purposes that includes digital photographs, optical coherence tomography and visual fields. These tools help in assessment of disease processes like cataract, diabetic retinopathy, glaucoma, age related macular degeneration etc. Nowadays the deep learning techniques are evolving rapidly and are being integrated into ophthalmic care. The paper discuses not only the current evidence for deep learning in ophthalmology but discusses the future applications and drawbacks.
Azam Hamidinekoo et al. applied deep learning in mammography and breast histology . The recent advancements in techniques of biomedical image analysis using deep learning have enhanced the performance of computer aided diagnosis (CAD) systems. The authors have proposed an overview of the state of the art overview of deep learning based applications for breast mammography histopathology images analysis. The study also reveals the relationship between mammography and histopathology phenotypes from the biological perspective.
The following table shows the publication of papers for the year 2018 and 2019 using deep reinforcement learning and deep learning and also compare the pros and cons of the techniques or algorithms used.
Table. 1 shows author, year, methods and pros/cons of the methods used for model training for DRL and DL.
|References||Algorithm/Methods for training machine learning model||Pros and Cons of the algorithm|
|Gustavo Carneiro et al.  (2019)
Alexendre Momeni et al.  (2019)
Issa Ali et al.  (2018)
C. Martinez et al.  (2018)
Adriana Dinis et al. . (2018)
Zi Wang et al.  (2018)
Leo Celi et al.  (2017)
Ning Liu et al.  (2017)
|Deep Q Learning:
-The deep Q-learning algorithm is a deep reinforcement learning based algorithm that combines deep learning based neural network architecture with reinforcement learning based Q-learning.
-The algorithm was developed by google deep-mind in 2015 for testing ATARI gaming platform. It showed that deep Q-network based agent on given the pixels and game score in input form was able to outperform against all the already known algorithms and gave a similar accuracy of a human game tester when tested against a set of 49 games.
-In deep Q-learning, we use a neural network to approximate the Q-value function. The state is given as the input and the Q-value of all possible actions is generated as the output.
The Deep Q algorithm is a powerful algorithm that combines the feature extraction capability of CNN and reward enhancement capability of reinforcement learning to achieve a desired objective function.
DQ learning agent enable an autonomous decision making with high accuracies and precision.
DQN requires a large amount of time to train a reinforcement learning agent.
The rewards and environment must be effectively defined.
|Alexander Rakhlin et al.  (2018)
Riku Turkki et al . (2019)
Dmitrii Bychkov et al.  (2018)
Parampal S. Grewal et al.  (2018)
Azam Hamidinekoo et al.  (2018)
|CNN (Convolutional Neural Network):
-A CNN stands for convolutional neural network.
-CNN is a combination of different layers. The four important ones are: input layer, convolutional layer, activation layer and pool layer. It is an important algorithm for image classification.
1. Input Layer: This layer holds the raw input of image with width 32, height 32 and depth 3.
2. Convolution Layer: This layer computes the output volume by computing dot product between all filters and image patch. Suppose we use total 12 filters for this layer we’ll get output volume of dimension 32 x 32 x 12.
3. Activation Function Layer: This layer will apply element wise activation function to the output of convolution layer.
4. Pool Layer: This layer is periodically inserted in the covnets and its main function is to reduce the size of volume which makes the computation fast reduces memory and also prevents from overfitting.
-CNN is a very powerful algorithm which is widely used for image classification and object detection. The hierarchical structure and powerful feature extraction capabilities from an image makes CNN a very robust algorithm for various image and object recognition tasks.
-CNN requires a large amount of data to train the model.
-The CNN output changes with the adversarial effects of input representations .i.e. the output changes as the input is changed.
IV. Conclusion and Future Work
From the survey of literature carried in section above it was found that deep learning is technique which has mainly and extensively used for classification and feature extraction from complex images using deep layers of neural networks. The powerful algorithms of deep learning like CNN  and RNN  are the most popular and successful ones. Deep reinforcement learning is an emerging and niche technique which combines the inherent properties of deep learning and policy optimization techniques using reward enhancement. The DeepQ network , Actor Critic , Sarsa , Temporal Difference  are the popular algorithms in the domain of deep reinforcement learning. The two techniques differ in the manner in which they can be used for image segmentation. The applications of image segmentation have a significant influence in the domain of medical diagnosis, image, tissue volume analysis, computer guided surgery, pathological localizations, tomography, object detection etc. The separation of image into some disjoint partitions homogenous to certain features such as colour intensity or texture is termed as image segmentation.
The competency of image segmentation methods is a debatable problem and a lot of research have been carried out in this direction. The solutions to image segmentations are generally problem based and because of resemblance in grey levels amongst the regions of interest in medical images the image segmentation can give produce significant errors. The other disadvantage that pops up is the lack of sufficient training examples and intervention of expert systems in preparing these samples. Considering these limitations in mind the authors have given a reinforcement learning based method of image segmentation. The techniques of deep learning requires a large amount of data for model training and suffers from the problems of overfitting whereas deep reinforcement learning algorithms can be applied to train a model with limited data without causing much overfitting. The undistinguishable regions of gray scale histopathological or medical images can be efficiently classified with deep reinforcement learning techniques.
The implementation of deep reinforcement algorithms area is new and emerging technique that was first of all designed by Google Deepmind for testing the Atari games. The use of deep reinforcement learning with recurrent attention vision for complex images which may be in the form of hand written MNIST dataset was done by Volodymyr Mnih et.al .The future work can be carried for image analysis which may be static for an image dataset or dynamic for video formats. This requires building a robust machine learning models that can be trained with limited data and should be efficiently reinforced to achieve a desired goal.