Introduction to Graph Neural Network on AWS
Recent advancement in neural networks has emerged as one of the most successful technologies for pattern recognition and data mining. Problems like object detection, machine translation, and speech recognition can be easily solved with deep learning neural networks, as these are based on Euclidean data sets.
Quick Trivia: Euclidean data means it can be plotted in n-dimensional linear space. But what about applications that generate data from non-Euclidean domains, such as graphs with complex relationships between objects. It becomes difficult to deal with this data type for current deep learning neural networks. This is where Graph Neural Network (GNN) comes to the rescue. GNN helps to solve problems with non-Euclidean datasets.
What is a Graph?
Before understanding what a graph neural network is, It is necessary to understand what a graph is and how it looks. Graphs are all around us. In the real world, when two objects are connected, they can be represented as a graph. In simple terms, a set of objects and connections between them can be represented as a graph.
What are Graph Neural Networks (GNN)?
Graph neural networks (GNN) is a type of neural network that operates on graph data. GNN can be directly applied to the graphs and provide an easy way to do node-level, edge-level, and graph-level prediction tasks. The primary goal of GNN architecture is to learn an embedding that contains information about its neighborhood. Embedding establishes the locations of nodes in data space.
Simply put, Graph neural networks are a subclass of deep learning techniques specifically built to deal with graph data and make inferences from it. GNNs are applied to graphs and can perform prediction tasks on them.
GNNs can detect fraud and abuse, support customer recommendations, and deliver market campaigns like who should get discounts or identifying influencers. Better connections can be analyzed using Graph Neural Networks.
Why are Graph Neural Networks (GNN) important?
Traditional deep learning methods like convolutional neural networks (CNN) and recurrent neural networks (RNN) cannot perform tasks on graph data when dealing with graph data.
- It is because of the arbitrary size of the graph and the complex topology.
- This means there is no spatial locality like CNN, where they identify spatial locality features using the kernel in images.
- There is unfix node ordering in graphs.
- At first, if node labels are A, B, C, D, and the second time they are labeled as B, D, A, C, then the input matrix will change.
- And graphs are invariant to node ordering.
- But we want the same result regardless of how nodes are ordered.
Graph Neural Network on AWS
Amazon Web Service (AWS) is a cloud platform that provides over 200+ services for different problems.
- AWS provides Amazon Neptune, a fully managed database service powering graphs.
- Neptune ML uses GNN to create models for various machine learning tasks.
- The workflow of the graph solution starts from loading the graph data from Neptune Database and then preprocessing and preparing training and testing sets.
- In the next step, with the help of Amazon SageMaker, using Deep Graph Library (DGL), GNN models can be built and trained.
- These models can be used for tasks such as Node classification, Link Prediction, and Graph Classification.
Amazon Neptune ML uses Graph Neural networks (GNNs) to make easy, fast, and more accurate predictions using graph data. Making predictions on graphs with billions of relationships can be difficult and time-consuming. Using DGL, it becomes easy to apply deep learning to graph data.
Neptune ML automates the task of selecting and training the best ML models for graph data, and it lets its user-run machine learning directly on graph data using Neptune APIs and queries. So, now you can create, train, and apply ML on Amazon Neptune in hours instead of weeks.
Architecture Diagram
- Data: Graphical data will be stored in Neptune DB.
- Data Export and configuration: Data can be exported using the Neptune export service in the Amazon S3 bucket.
- Data Preprocessing: The exported dataset is preprocessed to prepare it for model training. At the end of this step, a DGL graph is generated.
- Model Training: DGL graph is fed into Sagemaker training jobs for model training.
- Inference Endpoint: DGL model artifacts are launched at the sagemaker endpoint instance.
- Query ml model: To query predictions from the inference endpoint using Gremlin.
Benefits of enabling Graph neural network on AWS
- Neptune ML creates, trains, and applies ML models automatically.
- It uses DGL to select and train the best ML model for the workload.
- It enables us to make predictions in hours instead of weeks.
- With billions of relationships in the graph, it provides an accurate prediction.
- It improves the accuracy of more than 50% of most predictions, according to research from Stanford University.
How to implement Graph neural network on AWS?
This section will give a very brief explanation of how to implement GNN with Amazon Neptune.
- Creating a graph database: Create a database in Amazon Neptune, this simplest way is to create using the AWS CloudFormation template.
- Picking a data set: Pick up any available graph dataset.
- Formatting the data set: Neptune accepts different file formats. Property-graph load format (csv) and RDF format (nquads, rdfxml,ntriples).
- Loading the dataset into Neptune: Neptune requires a VPC Endpoint for S3 in the VPC. A curl call initiates data loading. The curl call requires parameters like cluster endpoint, source, format, region, and more.
- Exploring the graph data with Gremlin: Gremlin is a graph traversal language.
- Training GNN with Deep Graph Library (DGL): DGL is an open-source project that helps in training the GNNs. This step includes building the graph and GNNs and training the GNNs.
- Deploy the model: The model can be deployed at the sagemaker endpoint.
- Predictions: To make predictions using Gremlin query language from the endpoint using graph data from Neptune DB.
- Monitoring: The monitoring can be done using different services like Amazon Cloudwatch, Audit Log files, and AWS CloudTrails.
Use Cases of Graph Neural networks
The best use cases of Graph Neural Networks are described below:
- Enhancing Recommendations at E-commerce Stores: GNN recommended products that were more relevant to the customer instead of top-selling products from past weeks that the previous approach was showing.
- Improving Travel Time Predictions: Google uses graphical neural networks to decrease the likelihood of inaccurate forecasts in google maps. It divides roads into super segments.
- Social Media Suggestion: GNN is used in the background when you see suggestions about pages and people to follow on social sites.
Conclusion
It becomes tough to deal with graph data with traditional neural networks. The graph data sometimes have billions of relationships. And deep learning’s neural networks are developed for tabular data. AWS provides Amazon Neptune, a complete graph database service, and Amazon SageMaker, which uses Deep Graph Library to train the Graph neural networks. With time, Graph neural networks are gaining popularity, the way these graphs represent graphical data. So a wide range of applications in the future can use GNNs to overcome the complexity that current deep learning neural networks face.