Gokce Iymen, Gizem Tanriver, and Onur Ergen*
Graduate School of Sciences and Engineering, Turkey
*Corresponding author: Onur Ergen, Graduate School of Sciences and Engineering, Turkey
Submission: July 08, 2020;Published: August 06, 2020
ISSN:2832-4463 Volume1 Issue2
Generative adversarial networks have become increasingly popular since they were first introduced in 2014. Many variants of GANs have been developed over the years and employed in a range of applications from computer vision to audio generation and medical imaging. As its applications in computer vision have been widely explored by the artificial intelligence community, here, we focus on more specific applications of GANs, namely audio generation and medical image synthesis. In the age of big data, these two fields still struggle with the scarcity of labelled data, hence they benefit greatly from the capabilities of GANs.
Keywords: Audio Generation; Generative Adversarial Networks; Generative Models; Medical Image Synthesis
Generative models producing synthetic but real-like data are one of the most exciting research topics in the field of artificial intelligence. Generative adversarial networks (GANs) which were introduced in 2014 by Goodfellow [1]. are a type of generative models using adversarial training for two neural network models, namely generator and discriminator [1]. The main difference of GANs from other generative models is its simplicity. Figure 1 illustrates the structure of a typical GAN, where the generator is trained so that the discriminator cannot distinguish synthetic data from real ones.
Figure 1: Structure of GAN consisting of a generator and a discriminator [1].
While generating new instances, models aim to generate a random variable from the probability distribution of a pre-existing dataset. The task can be extremely challenging since the parameters and even the existence of this probability distribution is not fully known. Furthermore, the probability distribution for high-dimensional data is generally very complex over a high-dimensional space; therefore, neural networks are commonly used to learn and mimic the unknown probability distribution from which the original data are sampled [1-3].
During training, GANs do not try to explicitly make approximations on the parametric features of a probability distribution, which requires complex computations as in [4-6]. Instead, they attempt to produce data samples from the probability distribution on target while forcing these samples to be as similar as possible to the ones from the original probability distribution.
Currently, many different versions of GANs such as Conditional GANs [7], CycleGAN [8], DCGANs [9], DiscoGAN [10], LSGAN [11], and MelGan [12] can be found in the literature, each of which is proposed for different application areas. In this review, we will focus on selected applications of GANs which have attracted great attention in recent years, namely audio generation, and medical imaging.
Audio generation
Although GANs are best known for their use in image generation, they have also been used successfully for generating sequential data as in [13-15]. Sequential data such as audio, natural language, and time-series can be generated by GANs with high performance in terms of both speed of generation and goodness of output, compared to the use of other generative counterparts. Audio generation can be applicable in specific domains such as speech synthesis and music generation. For speech synthesis, concatenative and parametric approaches were previously used before the advent of generative models. Generative models using autoregressive models such as WaveNet [16], Fast WaveNet [17] and SampleRNN [18] have since been developed, yet they work extremely slow due to their sample-level nature (i.e. they produce one sample at a time). On the other hand, GAN-based models [19- 21] for speech synthesis have been shown to work much more efficiently. Owing to their rapid sampling characteristics, GANs hold great potential for data augmentation in speech recognition models. Moreover, the methods used in speech synthesis can be generalized to any form of audio. Music generation is another application area in which GANs are used for producing new music as in [13,20].
Different representations of sound can be more desirable in some applications of audio generation. In [20], WaveGAN and SpecGAN, which use raw waveform of audio and spectrogram respectively, are presented. When comparing these two representations, they obtained promising results in both approaches. However, in [13], it was shown that use of spectrograms instead of waveforms yields more coherent output. Although spectrograms are inherently non-invertible, which may make them disadvantageous in certain conditions, it is possible to approximate them back to their waveforms. Since human perception is sensitive to coherence in speech or music, it is important to successfully convert generated spectrograms into waveforms in order not to lose fidelity.
Medical image synthesis
Deep learning algorithms are routinely used in medical imaging tasks such as classification and segmentation, whose performance relies heavily on availability of large amounts of labelled data [22]. Nonetheless, the medical field hugely suffers from the scarcity of labelled data more than any other, primarily due to the laborintensive annotation process for medical images [22]. According to Hou [23], manual segmentation of nucleus in a small dataset of 50 tissue image patches (each 600 × 600 pixel) takes about 225 hours of a pathologist’s time [23]. Considering the data-hungry characteristic of deep learning algorithms, annotating sufficient amounts of medical data requires unrealistic time and effort of medical experts. Data augmentation techniques are often utilized to increase the size of a training set; yet they generate augmented images that are too similar to the original ones, providing a very limited performance improvement. Other challenges also exist in medical imaging such as high-class imbalance (i.e. underrepresentation of diagnostically less common conditions in a dataset) and continuous spectrum of features (i.e. classes are not inherently distinct due to progressive nature of diseases) [24]. Various GAN architectures have been proposed to address these challenges in a range of medical applications [24-29], which provided promising results in generating realistic looking but synthetic medical images while improving model performance. One study comparing traditional data augmentation with synthetic data augmentation utilizing DCGAN for a liver lesion classification task demonstrated that the use of synthetic samples significantly improves classification performance even on a small dataset consisting of computed tomography images of 182 liver lesions [29]. While unconditional GAN architectures such as DCGAN address the instability problem of GANs, typically, they do not work well at relatively low resolutions [22]. Baur [30] exploited progressive growing of GANs (PGGAN) to synthesize skin lesion images at high resolution, which produced highly realistic synthetic images that expert dermatologists had difficulty distinguishing them from real images [30]. As the data scarcity remains a major obstacle for medical imaging, GANs are likely to become a standard practice to fill this gap.
GANs are used in a plethora of applications and their success has excited the deep learning community greatly. Although the trustability of generated data and the lack of established evaluation metrics for GAN-based methods remain as major limitations to their wider adoption, GANs have proven to be powerful even in specific domains such as audio generation and medical image synthesis. We hope that this mini review gives readers a sense of how GANs open up new possibilities in these two domains.
© 2020 Onur Ergen. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.