Analysis Module
Overview
The Analysis module provides visualization and plotting functions for PatchTrack classification results. It generates charts and graphs to help understand patch application patterns across pull requests and repositories.
Purpose
This module:
- Visualizes classification distributions (PA, PN, NE, CC, ERROR)
- Generates bar charts, pie charts, and other plots
- Creates statistical summaries
- Exports results for reporting and publication
Chart Types
| Chart Type | Purpose | Best For |
|---|---|---|
| Bar Chart | Absolute counts by classification | Overall distribution |
| Pie Chart | Percentage breakdown | Proportional understanding |
| Line Chart | Trends over repositories | Time-series analysis |
| Heatmap | 2D distribution | Multiple dimensions |
Key Visualizations
Classification Distribution
Shows how many patches fall into each category:
PA (Patch Applied) ████████████ 45%
PN (Not Applied) ██████ 25%
NE (Not Existing) ███ 15%
CC (Cannot Classify) ██ 10%
ERROR █ 5%
Repository Comparison
Compares classification patterns across different repositories:
- X-axis: Repository name
- Y-axis: Count or percentage
- Groups: Classification type
Trend Analysis
Tracks how patch application rates change:
- Over time
- Across project size ranges
- By programming language
Usage Example
from analyzer import analysis
# Basic bar chart of classification totals
totals_list = [45, 25, 15, 10, 5] # PA, PN, NE, CC, ERROR
analysis.all_class_bar(totals_list, is_percentage=False)
# As percentage
analysis.all_class_bar(totals_list, is_percentage=True)
# Pie chart
analysis.all_class_pie(totals_list)
# Repository comparison
repo_data = {
'repo1': [30, 20, 10, 5, 2],
'repo2': [15, 5, 5, 5, 3],
}
analysis.repo_comparison_bar(repo_data)
Output Formats
Display
- Interactive plots in Jupyter notebooks
- Static images saved to disk
- Console output for quick analysis
File Exports
- PNG images for publications
- CSV data for further analysis
- JSON for programmatic access
Statistical Metrics
The module can calculate:
Descriptive Statistics
- Mean, median, standard deviation
- Min/max values
- Quartile ranges
Comparative Metrics
- Patch application rate
- Classification accuracy
- Inter-rater agreement (if multiple classifiers)
Aggregation Levels
- Per repository
- Per programming language
- Per project size
- Overall
Configuration
Common parameters:
# Chart styling
figsize = (10, 6) # Figure dimensions
dpi = 300 # Resolution for exports
style = 'seaborn' # Matplotlib style
colormap = 'viridis' # Color scheme
# Data options
normalize = False # Percentage vs absolute
log_scale = False # Linear vs log scale
sort_by = 'value' # Sort order
API Reference
Visualization module for PatchTrack analysis.
Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.
analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)
Generate a bar chart for basic patch classification categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
List of frequency values for each classification. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
plotting
|
bool
|
Whether to display the plot. |
False
|
Source code in analyzer/analysis.py
analyzer.analysis.create_pie(slices, ax)
Create a pie chart for modification type distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slices
|
List[int]
|
Frequency values for each modification type. |
required |
ax
|
Matplotlib pyplot module. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_bar(height, ax)
Create a bar chart for modification type distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
Frequency values for each modification type. |
required |
ax
|
Matplotlib pyplot module. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)
Generate a grouped bar chart comparing multiple classification metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y0
|
List[int]
|
Missed Opportunity frequencies. |
required |
y1
|
List[int]
|
Effort Duplication frequencies. |
required |
y2
|
List[int]
|
Split (MO/ED) frequencies. |
required |
y3
|
List[int]
|
Added File frequencies. |
required |
y4
|
List[int]
|
Deleted File frequencies. |
required |
y5
|
List[int]
|
Uninteresting frequencies. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_all_bars(data, repo_nr)
Create a grid of bar charts for all intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict
|
Dictionary mapping interval labels to frequency lists. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_all_pie(data, repo_nr)
Create a grid of pie charts for all intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict
|
Dictionary mapping interval labels to frequency lists. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)
Generate a bar chart for extended patch classification with even distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
List of frequency values for each classification. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)
Generate a pie chart for patch classification distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slices
|
List[int]
|
Frequency values for each classification category. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
plotting
|
bool
|
Whether to display the plot. |
False
|
Source code in analyzer/analysis.py
Integration Points
The analysis module is typically called:
- From
main.py: After classification completes - From notebooks: For exploratory analysis
- In CI/CD pipelines: For automated reporting
- Standalone: For custom analysis scripts
Example Pipeline
from analyzer import main, analysis
# Run classification
pt = main.PatchTrack(tokens)
pt.classify(pr_pairs)
# Get results
results = pt.get_results()
classifications = pt.pr_classifications
# Visualize
analysis.visualize_results(classifications)
Output Examples
Bar Chart Output
Classification Distribution
┌─────────────────────────────┐
│ PA ████████████ 45 (45%) │
│ PN ██████ 25 (25%) │
│ NE ███ 15 (15%) │
│ CC ██ 10 (10%) │
│ ERR █ 5 (5%) │
└─────────────────────────────┘
Tips & Best Practices
✅ Do
- Use descriptive titles for charts
- Include legends and axis labels
- Export with high DPI for publication
- Validate data before visualization
❌ Don't
- Exceed 5-6 categories in one chart
- Use 3D charts (hard to read)
- Mix different metrics in one chart
- Forget to add axis units
Visualization module for PatchTrack analysis.
Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.
analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)
Generate a bar chart for basic patch classification categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
List of frequency values for each classification. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
plotting
|
bool
|
Whether to display the plot. |
False
|
Source code in analyzer/analysis.py
analyzer.analysis.create_pie(slices, ax)
Create a pie chart for modification type distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slices
|
List[int]
|
Frequency values for each modification type. |
required |
ax
|
Matplotlib pyplot module. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_bar(height, ax)
Create a bar chart for modification type distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
Frequency values for each modification type. |
required |
ax
|
Matplotlib pyplot module. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)
Generate a grouped bar chart comparing multiple classification metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y0
|
List[int]
|
Missed Opportunity frequencies. |
required |
y1
|
List[int]
|
Effort Duplication frequencies. |
required |
y2
|
List[int]
|
Split (MO/ED) frequencies. |
required |
y3
|
List[int]
|
Added File frequencies. |
required |
y4
|
List[int]
|
Deleted File frequencies. |
required |
y5
|
List[int]
|
Uninteresting frequencies. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_all_bars(data, repo_nr)
Create a grid of bar charts for all intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict
|
Dictionary mapping interval labels to frequency lists. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.create_all_pie(data, repo_nr)
Create a grid of pie charts for all intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict
|
Dictionary mapping interval labels to frequency lists. |
required |
repo_nr
|
int
|
Repository number for file naming. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)
Generate a bar chart for extended patch classification with even distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
height
|
List[int]
|
List of frequency values for each classification. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
Source code in analyzer/analysis.py
analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)
Generate a pie chart for patch classification distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slices
|
List[int]
|
Frequency values for each classification category. |
required |
pr_nr
|
int
|
Pull request number for tracking. |
required |
plotting
|
bool
|
Whether to display the plot. |
False
|
Source code in analyzer/analysis.py
See Also
- Aggregator - Produces data for visualization
- Main - Calls analysis after classification