Definition
It is a non-linear data structure consisting of vertices and edges. It is useful in fields such as social network analysis, recommendation systems, and computer networks. In the field of sports data science, the graph data structure can be used to analyze and understand the dynamics of team performance and player interactions on the field.

A graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to as nodes, and the edges are lines or arcs that connect any two nodes in the graph. More formally, a graph is composed of a set of vertices (V) and a set of edges (E). The graph is denoted by G(V, E).
Imagine a game of football as a web of connections, where players are the nodes and their interactions on the field are the edges. This web of connections is exactly what a graph data structure represents, and it’s the key to unlocking insights into team performance and player dynamics in sports.
Components of Graph Data Structure
- Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices are also known as nodes. Every node/vertex can be labeled or unlabeled.
- Edges: Edges are used to connect two nodes of the graph. They can be ordered pairs of nodes in a directed graph. Edges can connect any two nodes in various ways. Sometimes, edges are also known as arcs. Every edge can be labeled or unlabeled.
Types of Graphs in Data Structures and Algorithms
1. Null Graph A graph is known as a null graph if there are no edges in the graph.
2. Trivial Graph A graph having only a single vertex. It is also the smallest graph possible.

3. Undirected Graph A graph in which edges do not have any direction. That is, the nodes are unordered pairs in the definition of every edge.
4. Directed Graph A graph in which each edge has a direction. That is, the nodes are ordered pairs in the definition of every edge.

5. Connected Graph A graph in which any node can be reached from any other node is known as a connected graph.
6. Disconnected Graph A graph in which at least one node is not reachable from some other node is known as a disconnected graph.

7. Regular Graph A graph in which the degree of every vertex is equal to k is called a k-regular graph.
8. Complete Graph A graph in which each node has an edge to every other node.

9. Cycle Graph A graph that forms a cycle. The minimum degree of each vertex is 2.
10. Cyclic Graph A graph containing at least one cycle is known as a cyclic graph.

11. Directed Acyclic Graph (DAG) A directed graph that does not contain any cycles.
12. Bipartite Graph A graph in which vertices can be divided into two sets such that no two vertices within the same set are connected.

13. Weighted Graph
- A graph in which the edges have associated weights is known as a weighted graph.
- Weighted graphs can be further classified as directed weighted graphs and undirected weighted graphs.
Advantages of Graph Data Structure:
- Graphs can represent a wide range of relationships, unlike other data structures (e.g., trees cannot have cycles and must be hierarchical).
- They can be used to model and solve a wide range of problems, including pathfinding, data clustering, network analysis, and machine learning.
- Any real-world problem involving a set of items and their relationships can be easily modeled using a graph. Standard graph algorithms like BFS, DFS, Spanning Tree, Shortest Path, Topological Sorting, and Strongly Connected Components are widely applicable.
- Graphs can represent complex data structures in a simple and intuitive way, making them easier to understand and analyze.
Disadvantages of Graph Data Structure:
- Graphs can be complex and difficult to understand, especially for those unfamiliar with graph theory or related algorithms.
- Creating and manipulating graphs can be computationally expensive, particularly for very large or complex graphs.
- Graph algorithms can be challenging to design and implement correctly and are prone to bugs and errors.
- Visualizing and analyzing very large or complex graphs can be difficult, making it hard to extract meaningful insights.
Representation of Graph
There are multiple ways to store a graph. The following are the most common representations:
- Adjacency Matrix
- Adjacency List
Adjacency Matrix
In this method, the graph is stored as a 2D matrix where rows and columns represent vertices. Each entry in the matrix represents the weight of the edge between the corresponding vertices.

Below is the implementation of a Graph represented using an Adjacency Matrix:
Output:
Adjacency List
This graph is represented as a collection of linked lists. There is an array of pointers that point to the edges connected to each vertex.

Below is the implementation of a Graph represented using an Adjacency List:
Output:
Traversing a Graph
To traverse a graph means to start at one vertex and follow the edges to visit other vertices until all vertices (or as many as possible) have been visited.
The two most common graph traversal methods are:
- Depth First Search (DFS)
- Breadth First Search (BFS)
DFS is usually implemented using a stack or via recursion (which utilizes the call stack), while BFS is usually implemented using a queue.
DFS (Depth First Search Traversal)
Depth First Search is called "deep" because it explores a vertex, then its adjacent vertex, and continues down that path before backtracking. For example:




The DFS traversal starts at vertex D, marks it as visited, and recursively visits all adjacent, unvisited vertices. When vertex A is visited, traversal continues to vertex C or vertex E, depending on the implementation.
Output:
Explanation:
-
Class Design:
adj_matrix:
2D vector for representing graph edges.vertex_data:
stores a label (like 'A', 'B', etc.) for each vertex.dfs_util:
helper function for recursive DFS traversal.
-
Functionality Mapping:
add_edge(u, v):
Adds an undirected edge by updating both adj_matrix[u][v] and adj_matrix[v][u].add_vertex_data(i, label):
Assigns a string label to a vertex.dfs(start_label):
Begins DFS from the vertex whose label matches start_label.
-
DFS Mechanism:
- A visited array ensures we don’t visit the same vertex multiple times.
- DFS is implemented recursively via dfs_util.
BFS (Breadth-First Search Traversal)
Breadth-First Search visits all adjacent vertices of a vertex before visiting the neighbors of those adjacent vertices. This means that vertices at the same distance from the starting vertex are visited before vertices that are farther away.
For Example:





As shown in the images above, BFS traversal visits vertices that are the same distance from the starting vertex before visiting those farther away. For example, after visiting vertex A, vertices E and C are visited before B, F, and G because those vertices are farther away.
Breadth-First Search works by placing all adjacent vertices into a queue (if they haven’t already been visited) and then using the queue to visit the next vertex.
Code example for Breadth-First Search:
Output
The bfs()
method starts by creating a queue with the start vertex, initializing a visited array, and marking the start vertex as visited.
It then repeatedly dequeues a vertex, prints it, enqueues its unvisited adjacent vertices, and continues this process until all reachable vertices have been visited.
Dijkstra's Algorithm for Shortest Distance
Dijkstra's algorithm is a popular method for solving single-source shortest path problems in graphs with non-negative edge weights. It is used to find the shortest distance between two vertices in a graph. The algorithm was conceived by Dutch computer scientist Edsger W. Dijkstra in 1956.
The algorithm maintains two sets: one for visited vertices and another for unvisited vertices. It starts at the source vertex and iteratively selects the unvisited vertex with the smallest tentative distance from the source. Then it visits the neighbors of this vertex and updates their tentative distances if a shorter path is found. This process continues until the destination vertex is reached or all reachable vertices have been visited.
Algorithm
- Mark the source node with a current distance of 0 and all others with infinity.
- Set the unvisited node with the smallest current distance as the current node.
- For each neighbor
N
of the current node, add the current distance of the node to the weight of the edge connecting them. If this is smaller than the current distance ofN
, update it. - Mark the current node as visited.
- Repeat step 2 if there are still unvisited nodes.
Example:
Given a weighted undirected graph represented as an edge list and a source vertex src
, find the shortest path distances from the source vertex to all other vertices in the graph. The graph contains V
vertices, numbered from 0 to V - 1
.
Note: The given graph does not contain any negative edges.
Input:
src = 0
, V = 5
, edges[][] = [[0, 1, 4], [0, 2, 8], [1, 4, 6], [2, 3, 2], [3, 4, 10]]

Output:
Explanation:
Output:
Time Complexity: O(E * log V), where E is the number of edges and V is the number of vertices. Auxiliary Space: O(V), where V is the number of vertices. (The adjacency list is excluded from auxiliary space as it is part of the input representation.)
Minimum Spanning Tree
A spanning tree is a tree-like subgraph of a connected, undirected graph that includes all the vertices of the graph. In simple terms, it is a subset of the graph’s edges that forms an acyclic structure (tree), where every node in the graph is connected.
A minimum spanning tree (MST) possesses all the properties of a spanning tree, with the added requirement that the total edge weight must be the smallest among all possible spanning trees. Like spanning trees, a graph can have multiple valid MSTs.

Properties:
- The number of vertices (V) in the graph and the spanning tree is the same.
- The number of edges in the spanning tree is always one less than the number of vertices (E = V - 1).
- The spanning tree must be connected; there should be only one connected component.
- The spanning tree must be acyclic.
- The total cost (or weight) of a spanning tree is the sum of the weights of its edges.
- There can be multiple spanning trees for a graph.

Algorithms to Find Minimum Spanning Tree
Several algorithms can be used to find the MST of a graph. Some popular ones include:
Kruskal’s Minimum Spanning Tree Algorithm
A minimum spanning tree (MST) or minimum weight spanning tree for a weighted, connected, and undirected graph is a spanning tree (a tree with no cycles that connects all vertices) with the minimum possible total edge weight. The weight of a spanning tree is the sum of all the edge weights in the tree.
In Kruskal's algorithm, all edges of the given graph are first sorted in increasing order of their weights. Then, edges are added one by one to the MST if the addition of the edge does not form a cycle. It begins by selecting the edge with the smallest weight and continues up to the edge with the largest weight. Thus, it makes a locally optimal choice at each step in order to find the globally optimal solution. Hence, it is a Greedy Algorithm.
This is one of the most popular algorithms for finding the minimum spanning tree of a connected, undirected graph. The algorithm follows the Greedy approach. The workflow of the algorithm is as follows:
- First, sort all the edges of the graph by their weights.
- Then, iterate through the sorted edges.
- At each iteration, add the next lowest-weight edge, provided it does not form a cycle with the edges selected so far.
Example The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will have (9 - 1) = 8 edges.

Step 1: Pick all sorted edges one by one. Pick edge 7–6. No cycle is formed, include it.

Step 2: Pick edge 8–2. No cycle is formed, include it.

Step 3: Pick edge 6–5. No cycle is formed, include it.

Step 4: Pick edge 0–1. No cycle is formed, include it.

Step 5: Pick edge 2–5. No cycle is formed, include it.

Step 6: Pick edge 8–6. Since including this edge results in a cycle, discard it. Pick edge 2–3. No cycle is formed, include it.

Step 7: Pick edge 7–8. Since including this edge results in a cycle, discard it. Pick edge 0–7. No cycle is formed, include it.

Step 8: Pick edge 1–2. Since including this edge results in a cycle, discard it. Pick edge 3–4. No cycle is formed, include it.

Step 9: Final obtained graph: Minimum Spanning Tree.

Output
This algorithm can be implemented efficiently using a DSU (Disjoint Set Union) data structure to keep track of the connected components of the graph. It is used in a variety of practical applications such as network design, clustering, and data analysis.
Prim’s Minimum Spanning Tree Algorithm
Prim’s algorithm is a greedy algorithm like Kruskal’s algorithm. It always starts with a single node and expands through adjacent nodes to explore all the connected edges along the way.
- The algorithm starts with an empty spanning tree.
- The idea is to maintain two sets of vertices: one containing the vertices already included in the MST, and the other containing the vertices not yet included.
- At every step, it considers all the edges that connect the two sets and picks the minimum weight edge from these. After picking the edge, it adds the other endpoint of the edge to the MST set.
In graph theory, a group of edges that connects two sets of vertices in a graph is called a cut. The vertices that, when removed, increase the number of connected components are called articulation points (or cut vertices), but this is different from the edges used in Prim’s algorithm. So, at every step of Prim’s algorithm, a cut is formed, the minimum weight edge crossing the cut is selected, and the corresponding vertex is included in the MST set.
Step 1: Choose an arbitrary vertex as the starting vertex of the MST. We pick 0 in the diagram below.
Step 2: Follow steps 3 to 5 until all vertices are included in the MST (known as fringe vertices).
Step 3: Find edges connecting any tree vertex with the fringe vertices.
Step 4: Find the minimum among these edges.
Step 5: Add the chosen edge to the MST. Since we only consider edges that connect tree vertices with fringe vertices, no cycles are formed.
Step 6: Return the MST and exit.
Example

Step 1: Start with edges 0-1 and 0-7; pick the minimum 0-1 and add vertex 1.

Step 2: Choose between 0-7 and 1-2; pick 0-7 (or 1-2), adding vertex 7.

Step 3: Select 7-6 as it has the least weight (1), adding vertex 6.

Step 4: Among 7-8, 6-8, and 6-5 (weight 2), add vertex 5.

Step 5: Select 5-2 (weight 4), adding vertex 2.

Step 6: Choose 2-8 (weight 2), adding vertex 8.

Step 7: Pick 2-3 as the smallest edge, adding vertex 3.

Step 8: Finally, select 3-4, adding vertex 4.

Step 9: All vertices are included, so the MST is complete.

Step 10:

Output
Advantages and Disadvantages of Prim's Algorithm
Advantages:
- Prim's algorithm is guaranteed to find the MST in a connected, weighted graph.
- It has a time complexity of O((E + V) * log(V)) using a binary heap or Fibonacci heap, where E is the number of edges and V is the number of vertices.
- It is relatively simple to understand and implement compared to some other MST algorithms.
Disadvantages:
- Like Kruskal’s algorithm, Prim’s algorithm can be slow on dense graphs with many edges, as it requires iterating over many edges.
- It relies on a priority queue, which may use additional memory and slow down on very large graphs.
- The choice of starting node can affect the structure of the MST, which may not be ideal in all applications.
Boruvka’s Minimum Spanning Tree Algorithm
This is also a graph traversal algorithm used to find the minimum spanning tree of a connected, undirected graph. It is one of the oldest MST algorithms. It works by iteratively building the MST, starting with each vertex in the graph as its own tree. In each iteration, the algorithm finds the cheapest edge that connects a tree to another tree and adds that edge to the MST. It is somewhat similar to Prim’s algorithm in its goal.
The algorithm follows this workflow:
-
Initialize a forest of trees, with each vertex in the graph as its own tree.
-
For each tree in the forest:
- Find the cheapest edge that connects it to another tree.
- Add these edges to the MST.
- Merge the connected trees.
-
Repeat the above steps until the forest contains only one tree, which is the MST.
The algorithm can be implemented using data structures like a priority queue to efficiently find the cheapest edges. Boruvka’s algorithm is simple and easy to implement but may not be as efficient as others for large graphs with many edges.
Applications of Minimum Spanning Trees
- Network Design: Minimizing the total cost of connecting all nodes (e.g., cables, roads).
- Image Processing: Segmenting regions based on similar features.
- Biology: Building phylogenetic trees to show evolutionary relationships.
- Social Network Analysis: Identifying essential links in a network.