In recent years, there has been an explosion of interest in neural networks designed for graph structured data. Graph structured data is any set of nodes connected by edges. We denote a graph as with node set and edge set . These graphs can also have node and edge features denoted by matrices and where and are the dimensions of the node and edge features. The general term for neural network models on such graphs are called Graph Neural Networks (GNNs). Applications of GNNs involve social networks, biological networks, etc. with tasks such as node classification, graph classification, and link prediction. For a more comprehensive to graph neural networks please refer to the survey paper or the following medium article. In this article I want to give a quick summary of how Graph Deep Learning (GDA) is being applied to problems in natural language processing.
Text classification is a supervised learning task that involves assigning categories to text or documents of text. It can be used to organize and structure text. Given an input corpus involving documents of text, we would like to classify certain documents based on pre-defined categories.
Traditionally, text classification was through achieved through feature engineering and classification algorithms such as a Naive Bayes or Support Vector Machine classifier on these features. More recently, deep learning methods have emerged which use Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) to learn informative word representations that are later fed to a classifier.
The work Graph Convolutional Networks for Text Classification by Yao et. al introduces constructing a text graph for a corpus in which nodes are words and documents. Weighted edges are built according to word co-occurrence for word-word edges and TF-IDF for word-document edges. The intuition behind this graph construction is that it captures document word relations as well as global word-word relations. As a result, this work does not require any specific feature engineering or RNN or CNN for learning word representations. Instead, the model utilizes the graph as constructed above with a node feature matrix . In other words each node feature vector is an one hot representation. By utilizing a Graph Convolutional Network (GCN) on this graph the authors were able to achieve document and text representations that outperform CNN and LSTM baselines. For more details about this work and GCNs please check out this blog post.
Multiple Document Question and Answering
In the nlp community, there has been considerable progress in the question and answering task. In this task, we are given a text question and a corresponding passage, the goal is to output the correct answer that spans part of the passage. Such tasks test a model’s ability to reason from text. In a similar vein, there has been a focus on reasoning from multiple documents of text. An example of which is the dataset containing passages from Wikipedia. Here, given a query, and relation (analogous to the question), a set of supporting documents, and a set of candidate answers, the goal is to select the correct answer from the set of candidates that fulfills the (query, relation, answer) triplet.
Accordingly, there have been several works that aim to tackle the multi-document question and answering task using graph neural networks. I won’t go into the details of each paper but I think it is worthwhile to discuss the graph construction. In the work by Song et al., they use Named Entity Recognition (NER) and co-reference resolution systems to find entities and all mentions of the entities in the supporting documents. The algorithms for NER and coreference resolution are out of the scope of this article. In short, NER aims to find named entities (person, place, thing that has a name) and coreference resolution aims to find expressions that refer to the same entity. These entities and mentions form the nodes of our graph. The edges are constructed as such:
- entity and entity edge if mentioned in different documents or occur at a distance larger than some threshold t when in the same document
- all entity and its coreference have an edge
- two mentions of different entities of the same passage within the threshold t
Similarly, the work by De Cao et al. uses mentions of all candidate answers and the subject via exact matching (i.e the characters are the exact same). In addition, they use coreference resolution to find mentions. Each entity or mention is a node in the graph. Likewise edges are constructed as follows (each bullet is its own edge type)
- the nodes co-occur in the same document
- if refer to the same entity (can be across documents) via exact matching
- if two nodes refer to the same entity as returned by the coreference resolution solver.
- any two nodes that do not have any of the above edge types (to make sure the graph is connected)
Finally, inspired by the two works above, Tu et al. builds a graph with nodes for each document, candidate entities as well as all exact matches of candidates and the subject. In this case there are 3 node types namely the candidate nodes, the document nodes, and the entity nodes (the exact matches) They also use multiple edge types as follows (each bullet is its own edge type)
- an edge between document and candidate node if candidate appear in the document
- an edge between document node and entity node if entity is from the document
- an edge between a candidate nodes and entity node if the entity is a mention of the candidate
- an edge between two entity nodes if they are from the same document
- an edge between two entity nodes if they are mentions of the same candidate or subject but the entities are in different documents
- all candidate nodes are connected together
- all entity nodes that have not been connected yet
In all papers, a Graph Convolutional Network or slight variants of it (to allow for multiple edge types) is used. For more information about the node features, please refer to each individual paper. Overall, each paper aims to define some useful relational structure between entities and/or documents such that a GNN can reason from the graph.
Neural Machine Translation
Neural machine translation involves models that aim to translate text from one language into another. The conventional deep learning approach is through a encoder-decoder framework in which the text of the source language is encoded into a intermediary representation for each word in the input before fed into a decoder to predict the output language. The encoder is often modeled through an RNN or CNN with attention between the encoded representations and the decoder hidden representation. Details of the RNNs and CNNs are also out of the scope of this article.
Marcheggiani et al. builds upon the conventional encoder and decoder structure by incorporating semantic-role representations. More specifically, semantic arguments of predicates in a sentence are marked and are categorized according to their semantic roles. The predicate of a sentence is the part that modifies the subject in some way. In the example from the paper, gave is the predicate and it has three arguments: John (semantic role A0, ‘the giver’), wife (A2, ‘an entity given to’) and present (A1, ‘the thing given’). This structure is argued to aid machine translation as the relations are kept even if the sentence structure differs ie. the sentence becomes “John gave a nice present to his wonderful wife”. The paper uses this graph structure on top of the RNN/CNN encoder. In particular, the node features coming from directly from the RNN/CNN. This paper also uses a GCN on the semantic-role graph structure.
Overall, there are a lot of exciting applications of GNNs in Natural Language Processing. In all these cases, researchers were able to find or construct useful graph structured data from text data. The key is to find these insightful graph representations for the given application. From there it is a relatively simple application of any competitive graph neural network.