Introduction
Structural and 3D geometric graph representation of Zopiclone
This page contains source code and data sets related to the International Conference on Data Mining 2008 publication Frequent Geometric Subgraph Retrieval in Geometric Graph Databases. The materials are made available under an open-source license in order to foster the discussion and adoption of this and related approaches.
Abstract.
Discovery of knowledge from geometric graph databases is of particular
importance in chemistry and biology, because chemical compounds and proteins
are represented as graphs with 3D geometric coordinates.
In such applications, scientists are not interested in the statistics of
the whole database. Instead they need information about a novel drug candidate
or protein at hand, represented as a query graph. We propose a
polynomial-delay algorithm for geometric frequent subgraph retrieval.
It enumerates all subgraphs of a single given query graph which are
frequent geometric epsilon-subgraphs under the entire class of rigid
geometric transformations in a database. By using geometric
epsilon-subgraphs, we achieve tolerance against variations in geometry.
We compare the proposed algorithm to gSpan on chemical compound data, and we
show that for a given minimum support the total number of frequent patterns is
substantially limited by requiring geometric matching. Although the
computation time per pattern is larger than for non-geometric graph mining,
the total time is within a reasonable level even for small minimum support.
Features
The distributed source code can perform the following functions:
- Frequent Subgraph Retrieval for Geometric Graphs (3D coordinates)
- Matlab interface: freqgeo, isgeosubgraph
All of the code is written in C++ for GNU/Linux. (It is probably easy to port to Windows, no special platform-dependent libraries are used.)
Download
Distribution: source code, precompiled binary and demo file
- freqgeo-1.0.tar.bz2 (6.3 Mb)
License: The FreqGeo software is licensed under the OSI approved GNU Affero General Public License (GNU AGPLv3). A copy of the license document is included in the distribution.
Installation: See the src/Makefile file and adjust the MATLAB path. For GNU/Linux it should compile on both 32 and 64 bit systems.
Demo The distribution includes all three experiments and four datasets from the paper as well as some small testcases, see the exp and test Matlab scripts.
Publications
- Frequent Subgraph Retrieval in Geometric Graph Databases, ICDM 2008, Sebastian Nowozin and Koji Tsuda.
- Frequent Subgraph Retrieval in Geometric Graph Databases, extended techreport version of the ICDM 2008 paper, Sebastian Nowozin and Koji Tsuda.
- gBoost: A Mathematical Programming Approach to Graph Classification and Regression, Machine Learning, Hiroto Saigo, Sebastian Nowozin, Tadashi Kadowaki, Taku Kudo and Koji Tsuda.
Contact
- sebastian.nowozin@tuebingen.mpg.de, corresponding author
If you have comments or questions, please feel free to contact me. Thanks!