In my previous post I started a discussion of graph analytics in which connections and links among different types of entities can be analyzed to find patterns that lead to actionable intelligence. But before we can explore the details of the types of analyses to be performed, we must first get grounded in how connectivity networks are modeled and managed.
Graph analysis is driven by a model that can capture the links between entities, and a model of entities and connections can be elegantly described using a mathematical abstraction known as a as a graph. A graph is defined as a collection of vertices (each representing an entity) connected using labeled edges. Each connection in a graph can be described using three pieces of information: the two entities and the type of the edge that connects them. As a simple example, if John Smith works for International Widget Corporation, then John Smith and International Widget Corporation would be represented as vertices with one edge connecting those vertices representing the connection.
The graph model abstraction is straightforward, yet provides broad flexibility. Adding attributes to the vertices and edges linking them together adds context about what relationships represented within the graph actually mean. For example,
- Labeling vertices describes the types of entities in the graph, such as a “husband” or “employee.”
- Edges can be labeled with the nature of the relationship, such as “is married to” or “works for.”
- Edges can be directed to indicate the “flow” of the relationship, such as when one person is a child of another person.
- Magnitudes can be added to the relationships represented by the edges.
- Additional properties (such as durations) can be attributed to both edges and vertices.
Two entities may share more than one edge representing multiple relationships, such as when a party is a customer of a company and is also an employee of that company.
This abstraction can be practically represented in different ways. Although a relational database can be used to represent a graph, there are emerging NoSQL technologies that enable both dynamic programmatic representations as well as persistent representations. These graph databases are designed to support the types of queries and analyses one might want to perform on a network – more on this next time.
1 Comment
Nice introductory post. Looking forward to your next post on graph databases.