+ All Categories
Home > Documents > N-gram Graph: Simple Unsupervised Representation for...

N-gram Graph: Simple Unsupervised Representation for...

Date post: 04-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
23
N-gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang University of Wisconsin-Madison, Madison
Transcript
Page 1: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graph: Simple Unsupervised Representation for Graphs, with

Applications to Molecules

Shengchao L iu , Mehmet Furkan Demire l , Yingyu L iang

Univers i ty o f Wiscons in -Madison, Madison

Page 2: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Machine Learning Progress

โ€ข Significant progress in Machine Learning

Computer vision Machine translation

Game Playing Medical Imaging

Page 3: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

ML for Molecules?

Page 4: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

ML for Molecules?

โ€ข Molecule property prediction

Machine

Learning

Model

Toxic

Not Toxic

Page 5: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Challenge: Representations

โ€ข Input to traditional ML models: vectors

โ€ข How to represent molecules as vectors? โ€ข Fingerprints: Morgan fingerprints, etc

โ€ข Graph kernels: Weisfeiler-Lehman kernel, etc

โ€ข Graph Neural Networks (GNN): Graph CNN, Weave, etc

โ€ข Fingerprints/kernels: unsupervised, fast to compute

โ€ข GNN: supervised end-to-end, more expensive; powerful

Page 6: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Our method: N-gram Graphs

โ€ข Unsupervised

โ€ข Relatively fast to compute

โ€ข Strong prediction performanceโ€ข Overall better than traditional fingerprint/kernel and popular GNNs

โ€ข Inspired by N-gram approach in Natural Language Processing

Page 7: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Approach in NLP

โ€ข ๐‘›-gram is a consecutive sequence of ๐‘› words in a sentence

โ€ข Example: โ€œthis molecule looks beautifulโ€

โ€ข Its 2-grams: โ€œthis moleculeโ€, โ€œmolecule looksโ€, โ€œlooks beautifulโ€

Page 8: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Approach in NLP

โ€ข ๐‘›-gram is a consecutive sequence of ๐‘› words in a sentence

โ€ข Example: โ€œthis molecule looks beautifulโ€

โ€ข Its 2-grams: โ€œthis moleculeโ€, โ€œmolecule looksโ€, โ€œlooks beautifulโ€

โ€ข N-gram count vector ๐‘(๐‘›) is a numeric representation vector

โ€ข coordinates correspond to all ๐‘›-grams

โ€ข coordinate value is the number of times the corresponding ๐‘›-gram shows up in the sentence

โ€ข Example: ๐‘(1) is just the histogram of the words in the sentence

Page 9: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Dimension Reduction by Embeddings

โ€ข N-gram vector ๐‘(๐‘›) has high dimensions: ๐‘‰ ๐‘› for vocabulary ๐‘‰

โ€ข Dimension reduction by word embeddings: ๐‘“(1) = ๐‘Š๐‘(1)

Page 10: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Dimension Reduction by Embeddings

โ€ข N-gram vector ๐‘(๐‘›) has high dimensions: ๐‘‰ ๐‘› for vocabulary ๐‘‰

โ€ข Dimension reduction by word embeddings: ๐‘“(1) = ๐‘Š๐‘(1)

โ€ข =

โ€ข ๐‘“(1) is just the sum of the word vectors in the sentence!

๐‘Š ๐‘(1)๐‘“(1)

๐‘–-th column is the embedding vector for ๐‘–-th word in the vocabulary

Page 11: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Dimension Reduction by Embeddings

โ€ข N-gram vector ๐‘(๐‘›) has high dimensions: ๐‘‰ ๐‘› for vocabulary ๐‘‰

โ€ข Dimension reduction by word embeddings: ๐‘“(1) = ๐‘Š๐‘(1)

For general ๐‘›:

โ€ข Embedding of an ๐‘›-gram: entrywise product of its word vectors

โ€ข ๐‘“(๐‘›): sum of embeddings of the ๐‘›-grams in the sentence

Page 12: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graphs

โ€ข Sentence: linear graph on words

โ€ข Molecule: graph on atoms with attributes

Analogy:

โ€ข Atoms with different attributes: different words

โ€ข Walks of length ๐‘›: ๐‘›-grams

Page 13: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graphs

โ€ข Sentence: linear graph on words

โ€ข Molecule: graph on atoms with attributes

Analogy:

โ€ข Atoms with different attributes: different words

โ€ข Walks of length ๐‘›: ๐‘›-grams

A molecular graphIts 2-grams

Page 14: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graph Algorithm

โ€ข Sentence: linear graph on words

โ€ข Molecule: graph on atoms with attributes

Given the embeddings for the atoms (vertex vectors)

โ€ข Enumerate all ๐‘›-grams (walks of length ๐‘›)

โ€ข Embedding of an ๐‘›-gram: entrywise product of its vertex vectors

โ€ข ๐‘“(๐‘›): sum of embeddings of the ๐‘›-grams

โ€ข Final N-gram Graph embedding ๐‘“๐บ: concatenation of ๐‘“(1), โ€ฆ , ๐‘“(๐‘‡)

Page 15: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graph Algorithm

โ€ข Sentence: linear graph on words

โ€ข Molecule: graph on atoms with attributes

Given the embeddings for the atoms (vertex vectors)

โ€ข Enumerate all ๐‘›-grams (walks of length ๐‘›)

โ€ข Embedding of an ๐‘›-gram: entrywise product of its vertex vectors

โ€ข ๐‘“(๐‘›): sum of embeddings of the ๐‘›-grams

โ€ข Final N-gram Graph embedding ๐‘“๐บ: concatenation of ๐‘“(1), โ€ฆ , ๐‘“(๐‘‡)

โ€ข Vertex vectors: trained by an algorithm similar to node2vec

Page 16: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

N-gram Graphs as Simple GNNs

โ€ข Efficient dynamic programming version of the algorithm

โ€ข Given vectors ๐‘“๐‘– for vertices ๐‘–, and the graph adjacent matrix A

โ€ข Equivalent to a simple GNN without parameters!

Page 17: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Experimental Results

โ€ข 60 tasks on 10 datasets from [1]

โ€ข Methodsโ€ข Weisfeiler-Lehman kernel + SVM

โ€ข Morgan fingerprints + Random Forest (RF) or XGBoost (XGB)

โ€ข GNN: Graph CNN (GCNN), Weave Neural Network (Weave), Graph Isomorphism Network (GIN)

โ€ข N-gram Graphs + Random Forest (RF) or XGBoost (XGB)

โ€ข Vertex embedding dimension ๐‘Ÿ = 100, and ๐‘‡ = 6

[1] Wu, Zhenqin, et al. "MoleculeNet: a benchmark for molecular machine

learning." Chemical science 9.2 (2018): 513-530.

Page 18: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Experimental Results

โ€ข N-gram+XGB: top-1 for 21 among 60 tasks, and top-3 for 48

โ€ข Overall better than the other methods

Page 19: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Runtime

โ€ข Relatively fast

Page 20: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Theoretical Analysis

โ€ข Recall ๐‘“(1) = ๐‘Š๐‘(1)โ€ข ๐‘Š is the vertex embedding matrix

โ€ข ๐‘(1) is the count vector

โ€ข With sparse ๐‘(1) and random ๐‘Š, ๐‘(1) can be recovered from ๐‘“(1)โ€ข Well-known in compressed sensing

Page 21: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Theoretical Analysis

โ€ข Recall ๐‘“(1) = ๐‘Š๐‘(1)โ€ข ๐‘Š is the vertex embedding matrix

โ€ข ๐‘(1) is the count vector

โ€ข With sparse ๐‘(1) and random ๐‘Š, ๐‘(1) can be recovered from ๐‘“(1)โ€ข Well-known in compressed sensing

โ€ข In general, ๐‘“(๐‘›) = ๐‘‡(๐‘›)๐‘(๐‘›), for some linear mapping ๐‘‡(๐‘›)depending on ๐‘Š

โ€ข With sparse ๐‘(๐‘›) and random ๐‘Š, ๐‘(๐‘›) can be recovered from ๐‘“(๐‘›)

Page 22: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

Theoretical Analysis

โ€ข Recall ๐‘“(1) = ๐‘Š๐‘(1)โ€ข ๐‘Š is the vertex embedding matrix

โ€ข ๐‘(1) is the count vector

โ€ข With sparse ๐‘(1) and random ๐‘Š, ๐‘(1) can be recovered from ๐‘“(1)โ€ข Well-known in compressed sensing

โ€ข In general, ๐‘“(๐‘›) = ๐‘‡(๐‘›)๐‘(๐‘›), for some linear mapping ๐‘‡(๐‘›)depending on ๐‘Š

โ€ข With sparse ๐‘(๐‘›) and random ๐‘Š, ๐‘(๐‘›) can be recovered from ๐‘“(๐‘›)

โ€ข So ๐‘“(๐‘›) preserves the information in ๐‘(๐‘›)

โ€ข Furthermore, can prove: regularized linear classifier on ๐‘“(๐‘›) is

competitive to the best linear classifier on ๐‘(๐‘›)

Page 23: N-gram Graph: Simple Unsupervised Representation for ...pages.cs.wisc.edu/~yliang/ngram_graph_presentation.pdfย ยท N-gram Approach in NLP โ€ข๐‘›-gram is a consecutive sequence of

THANK YOU!


Recommended