Learning representations is a key aspect of Artificial intelligence because the final performance of any AI model is heavily dependent on how the input data is represented. While using expert knowledge to design representations may yield good performance it is not generic and rely too much on manual work. This is where deep learning has an edge over hand-crafted features as it is a class of learning algorithms that is able, through a composition of layers, to learn a hierarchical representation directly from the data and then use this representation to solve an objective. The International Conference on Learning Representations is focused on faster and better ways to learn and use deep representations using supervised, semi-supervised or unsupervised approaches.
So what was new in Artificial intelligence from ICLR ?
How can NLP help cure cancer? Invited talk by Regina Barzilay.
Dr Barzilay did a very interesting presentation about using natural language processing to automate information extraction from medical records and use this information to do medical analysis. She explained how to get around the lack of annotated medical data in order to train an information extraction algorithm using transfer learning and a set of rules as supervision proxy. She also raised the issue of model interpretability that arises when models get less interpretable as they become more complex. The proposed solution to model interpretability was to use a modular neural framework that in addition of producing the correct prediction also produces a small summary of the input text as a justification of the prediction.
The ability to accurately measure the similarity between two texts has multiple application in natural language processing like sentence classification, paraphrasing detection or linking news headlines to social media posts.
This paper proposes a fixed length vector representation of sentences and then uses these representations to compute the distance between sentences. This method was shown to outperform existing methods like LSTMs despite being much simpler and using little to no supervision which may be extremely helpful in the case of scarcity of training data.
The method proposed is called SIF for smooth inverse transform and is a theoretically derived weighting scheme for dense word vector representations like Word2vec. The method was evaluated in an experimental setting and applied to some benchmark datasets like the SemEval semantic textual similarity which is an annual challenge where teams work to create a system able to measure the similarity between sentences. SIF was able to achieve state-of-the-art results using semi-supervised learning and beating supervised methods.
Generative adversarial neural networks were introduced by Ian Goodfellow et al. in 2014 and have been widely popular since then as they are a novel approach to do unsupervised learning that is able to approximate the probability distribution of the input data and then use the learned distribution to generate realistic looking synthetic samples. They are able to do that through making a generative model play against a discriminative model in a zero-sum game where the generative model tries to create realistic synthetic samples and the discriminative model tries to detect the real examples from the generated ones. However, GANs are known for having poorly understood optimization stability issues. This work from Arjovsky et al. aims at improving the theoretical understanding of GANs while also modifying the learning algorithm in order to achieve better stability.
This paper introduces a new architecture of neural networks that can be used for sequence modelling in natural language processing and that is able to not only use a dense word representation but also combines it with a character-level representation. It is able to do that through a gating mechanism that learns, depending on the type of input, what information to focus on and what should be ignored.
Dense word vector representations are usually used as input of neural network based language models and fine-tuned during training in order to adapt them to the task at hand. The downside of relying merely on word level features is that morphologically similar words would have very different representations and that out-of-vocabulary words would be poorly represented, this is why some methods that use character level features were introduced.
In this paper the character level features are derived from a recurrent neural network applied to the character sequence of each word. Usually this char-level representation is simply concatenated with the word level representation ( GloVe, word2vec… ) and then fed into a language model like a Bidirectional LSTM in order to do a task like sentiment analysis or semantic role labelling for example. The issue with the naive approach is that both representations are used for each word no matter what the word is even though it would makes more sense to use word-level in certain situations (Very frequent word that has an accurate word representation) and character representation in other cases ( less frequent words, out-of-vocabulary words). This is why the authors are using a gating mechanism that will decide which representation to choose depending on how frequent the word is, its part of speech, what type of named entity it is and also the representation of the word itself. All this information will be used to learn a gating function that will give more weight to either the word representation or the character representation.
This gating mecanisme was tested empirically on multiple tasks like social media tag prediction where it improved over the baseline by around 2% and a question answering task ( WDW dataset with 600k questions for training and 10k for testing) where it improved over the baseline by around 1.7%
This type of systems is particularly useful for our work at Fortia Financial Solutions because when we create a system for semantic role labelling on legal documents we want our system to generalize well even when applied to unseen or less frequent words. This ensure that the system has the best possible performance and an improved generalization capability
Original image from wikipedia
Finding the right deep network’s architecture in order to achieve the best possible performance is a challenging problem. Deep nets have a huge space of hyper-parameters that needs to be optimized in order to be able to converge to an acceptable solution. This is usually done through careful manual engineering and then an automated random search on a selected number of parameters. This paper uses Reinforcement learning to create an automatic agent that is able to leverage the performance feedback from the previously trained networks in order to make better and better deep net’s design decisions. The authors showed that their agent was able to create architectures that are on par with state-of-the-art human-made ones.
The authors applied the system to convolutional neural network in the context of image classification on the CIFAR-10 dataset and recurrent neural networks on the Penn Treebank dataset where it achieved state-of-the-art or near state-of-the-art performance in both.
ChatBots can have many useful applications, from retail to being a personal assistant. However, they suffer from the ambiguity and incompleteness of human language. The authors introduce a neural network with memory and soft attention and show empirically that their architecture is able to better use a memory in order to read useful informations, reason and ask clarification questions in order to answer a challenge question.
Aside from those examples that are particularly useful to our mission at Fortia Financial solutions there were many more papers about different other aspects of Representations learning like optimization methods of deep nets, GANs, Reinforcement learning, Transfer learning …
The AI field is advancing so fast that most of the research papers published just a few years ago are already obsolete, this is why following conferences like the ICLR is vital in order to make better Machine learning models and adapt them to meet the industry needs.