Skip to content

Analysis Module

Overview

The Analysis module provides visualization and plotting functions for PatchTrack classification results. It generates charts and graphs to help understand patch application patterns across pull requests and repositories.

Purpose

This module:

  • Visualizes classification distributions (PA, PN, NE, CC, ERROR)
  • Generates bar charts, pie charts, and other plots
  • Creates statistical summaries
  • Exports results for reporting and publication

Chart Types

Chart Type Purpose Best For
Bar Chart Absolute counts by classification Overall distribution
Pie Chart Percentage breakdown Proportional understanding
Line Chart Trends over repositories Time-series analysis
Heatmap 2D distribution Multiple dimensions

Key Visualizations

Classification Distribution

Shows how many patches fall into each category:

PA (Patch Applied)     ████████████ 45%
PN (Not Applied)       ██████ 25%
NE (Not Existing)      ███ 15%
CC (Cannot Classify)   ██ 10%
ERROR                  █ 5%

Repository Comparison

Compares classification patterns across different repositories:

  • X-axis: Repository name
  • Y-axis: Count or percentage
  • Groups: Classification type

Trend Analysis

Tracks how patch application rates change:

  • Over time
  • Across project size ranges
  • By programming language

Usage Example

from analyzer import analysis

# Basic bar chart of classification totals
totals_list = [45, 25, 15, 10, 5]  # PA, PN, NE, CC, ERROR
analysis.all_class_bar(totals_list, is_percentage=False)

# As percentage
analysis.all_class_bar(totals_list, is_percentage=True)

# Pie chart
analysis.all_class_pie(totals_list)

# Repository comparison
repo_data = {
    'repo1': [30, 20, 10, 5, 2],
    'repo2': [15, 5, 5, 5, 3],
}
analysis.repo_comparison_bar(repo_data)

Output Formats

Display

  • Interactive plots in Jupyter notebooks
  • Static images saved to disk
  • Console output for quick analysis

File Exports

  • PNG images for publications
  • CSV data for further analysis
  • JSON for programmatic access

Statistical Metrics

The module can calculate:

Descriptive Statistics

  • Mean, median, standard deviation
  • Min/max values
  • Quartile ranges

Comparative Metrics

  • Patch application rate
  • Classification accuracy
  • Inter-rater agreement (if multiple classifiers)

Aggregation Levels

  • Per repository
  • Per programming language
  • Per project size
  • Overall

Configuration

Common parameters:

# Chart styling
figsize = (10, 6)           # Figure dimensions
dpi = 300                    # Resolution for exports
style = 'seaborn'            # Matplotlib style
colormap = 'viridis'         # Color scheme

# Data options
normalize = False            # Percentage vs absolute
log_scale = False            # Linear vs log scale
sort_by = 'value'            # Sort order

API Reference

Visualization module for PatchTrack analysis.

Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.

analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)

Generate a bar chart for basic patch classification categories.

Parameters:

Name Type Description Default
height List[int]

List of frequency values for each classification.

required
pr_nr int

Pull request number for tracking.

required
plotting bool

Whether to display the plot.

False
Source code in analyzer/analysis.py
def all_class_bar(height: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a bar chart for basic patch classification categories.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['PA', 'CC', 'NE', 'PN', 'ERROR']
    colors = [
        COLOR_PATCH_APPLIED,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_PATCH_NOT_APPLIED,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_PATCH_APPLIED, label=LABEL_PATCH_APPLIED),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label=LABEL_NOT_EXISTING),
        mpatches.Patch(color=COLOR_PATCH_NOT_APPLIED, label=LABEL_PATCH_NOT_APPLIED),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    if plotting:
        plt.show()

analyzer.analysis.create_pie(slices, ax)

Create a pie chart for modification type distribution.

Parameters:

Name Type Description Default
slices List[int]

Frequency values for each modification type.

required
ax

Matplotlib pyplot module.

required
Source code in analyzer/analysis.py
def create_pie(slices: List[int], ax) -> None:
    """Create a pie chart for modification type distribution.

    Args:
        slices: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['r', 'y', 'g', 'b', 'c']

    ax.pie(slices, labels=labels, colors=colors,
           startangle=90, shadow=True, explode=(0, 0, 0, 0, 0),
           radius=1, autopct='%1.1f%%')
    ax.legend()

analyzer.analysis.create_bar(height, ax)

Create a bar chart for modification type distribution.

Parameters:

Name Type Description Default
height List[int]

Frequency values for each modification type.

required
ax

Matplotlib pyplot module.

required
Source code in analyzer/analysis.py
def create_bar(height: List[int], ax) -> None:
    """Create a bar chart for modification type distribution.

    Args:
        height: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['red', 'yellow', 'green', 'blue', 'cyan']

    ax.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)
    ax.set_xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    ax.set_ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)

analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)

Generate a grouped bar chart comparing multiple classification metrics.

Parameters:

Name Type Description Default
y0 List[int]

Missed Opportunity frequencies.

required
y1 List[int]

Effort Duplication frequencies.

required
y2 List[int]

Split (MO/ED) frequencies.

required
y3 List[int]

Added File frequencies.

required
y4 List[int]

Deleted File frequencies.

required
y5 List[int]

Uninteresting frequencies.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def grouped_bar_chart(y0: List[int], y1: List[int], y2: List[int], y3: List[int],
                       y4: List[int], y5: List[int], repo_nr: int) -> None:
    """Generate a grouped bar chart comparing multiple classification metrics.

    Args:
        y0: Missed Opportunity frequencies.
        y1: Effort Duplication frequencies.
        y2: Split (MO/ED) frequencies.
        y3: Added File frequencies.
        y4: Deleted File frequencies.
        y5: Uninteresting frequencies.
        repo_nr: Repository number for file naming.
    """
    bar_width = 0.15
    base_positions = np.arange(len(y0))
    position_offset_0 = base_positions
    position_offset_1 = [x + bar_width for x in position_offset_0]
    position_offset_2 = [x + bar_width for x in position_offset_1]
    position_offset_3 = [x + bar_width for x in position_offset_2]
    position_offset_4 = [x + bar_width for x in position_offset_3]
    position_offset_5 = [x + bar_width for x in position_offset_4]

    plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT), dpi=FIGURE_DPI)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_ADDED_FILE, label=LABEL_ADDED_FILE),
        mpatches.Patch(color=COLOR_DELETED_FILE, label=LABEL_DELETED_FILE),
        mpatches.Patch(color=COLOR_UNINTERESTING, label=LABEL_UNINTERESTING)
    ]
    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)

    plt.bar(position_offset_0, y0, color=COLOR_MISSED_OPPORTUNITY, width=bar_width, edgecolor='white')
    plt.bar(position_offset_1, y1, color=COLOR_EFFORT_DUPLICATION, width=bar_width, edgecolor='white')
    plt.bar(position_offset_2, y2, color=COLOR_SPLIT, width=bar_width, edgecolor='white')
    plt.bar(position_offset_3, y3, color=COLOR_ADDED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_4, y4, color=COLOR_DELETED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_5, y5, color=COLOR_UNINTERESTING, width=bar_width, edgecolor='white')

    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    interval_labels = ['0-100', '10-100', '20-100', '30-100', '40-100',
                      '50-100', '60-100', '70-100', '80-100', '90-100']
    plt.xticks([r + bar_width for r in range(len(y0))], interval_labels)

    plt.savefig(f"Plots/Grouped_bar_{repo_nr}.png", format="PNG", dpi=FIGURE_DPI, bbox_inches='tight')
    plt.show()

analyzer.analysis.create_all_bars(data, repo_nr)

Create a grid of bar charts for all intervals.

Parameters:

Name Type Description Default
data Dict

Dictionary mapping interval labels to frequency lists.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def create_all_bars(data: Dict, repo_nr: int) -> None:
    """Create a grid of bar charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Bar Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_bar(data[interval_label], plt)

    plt.savefig(f"Plots/All_Bars_{repo_nr}.png", format="PNG")
    plt.show()

analyzer.analysis.create_all_pie(data, repo_nr)

Create a grid of pie charts for all intervals.

Parameters:

Name Type Description Default
data Dict

Dictionary mapping interval labels to frequency lists.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def create_all_pie(data: Dict, repo_nr: int) -> None:
    """Create a grid of pie charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Pie Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_pie(data[interval_label], plt)

    plt.savefig(f"Plots/{repo_nr}_All_Pies.png", format="PNG")
    plt.show()

analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)

Generate a bar chart for extended patch classification with even distribution.

Parameters:

Name Type Description Default
height List[int]

List of frequency values for each classification.

required
pr_nr int

Pull request number for tracking.

required
Source code in analyzer/analysis.py
def all_class_bar_w_even_d(height: List[int], pr_nr: int) -> None:
    """Generate a bar chart for extended patch classification with even distribution.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
    """
    x_positions = [1, 2, 3, 4, 5, 6, 7, 8]
    x_labels = ['MO', 'ED', 'Split(MO/ED)', 'CC', 'NE', 'NA', 'EVEN_D', 'ERROR']
    colors = [
        COLOR_MISSED_OPPORTUNITY,
        COLOR_EFFORT_DUPLICATION,
        COLOR_SPLIT,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_NOT_APPLICABLE,
        COLOR_EVEN_DISTRIBUTION,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label="Not Existing Files"),
        mpatches.Patch(color=COLOR_NOT_APPLICABLE, label=LABEL_NOT_APPLICABLE),
        mpatches.Patch(color=COLOR_EVEN_DISTRIBUTION, label=LABEL_EVEN_DISTRIBUTION),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    plt.savefig(f"Plots/All_Classes_Bar_70_EVED_{pr_nr}.png", format="PNG",
                dpi=FIGURE_DPI, bbox_inches='tight')

analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)

Generate a pie chart for patch classification distribution.

Parameters:

Name Type Description Default
slices List[int]

Frequency values for each classification category.

required
pr_nr int

Pull request number for tracking.

required
plotting bool

Whether to display the plot.

False
Source code in analyzer/analysis.py
def all_class_pie(slices: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a pie chart for patch classification distribution.

    Args:
        slices: Frequency values for each classification category.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    labels = ['Effort Duplication', 'Cannot Classify', 'Not Existing Files',
              'Not Applicable', 'Error']
    colors = [COLOR_EFFORT_DUPLICATION, COLOR_CANNOT_CLASSIFY, COLOR_NOT_EXISTING,
              COLOR_NOT_APPLICABLE, COLOR_ERROR]

    plt.pie(slices, labels=labels, colors=colors,
            startangle=0, shadow=True, explode=(0, 0, 0, 0, 0),
            radius=3, autopct='%1.1f%%')
    plt.rc('font', size=FONT_SIZE_LEGEND)
    plt.rc('legend', fontsize=FONT_SIZE_TICK)
    plt.legend(loc='center left', bbox_to_anchor=(2, 1.5))

    if plotting:
        plt.show()

Integration Points

The analysis module is typically called:

  1. From main.py: After classification completes
  2. From notebooks: For exploratory analysis
  3. In CI/CD pipelines: For automated reporting
  4. Standalone: For custom analysis scripts

Example Pipeline

from analyzer import main, analysis

# Run classification
pt = main.PatchTrack(tokens)
pt.classify(pr_pairs)

# Get results
results = pt.get_results()
classifications = pt.pr_classifications

# Visualize
analysis.visualize_results(classifications)

Output Examples

Bar Chart Output

Classification Distribution
┌─────────────────────────────┐
│ PA  ████████████ 45 (45%)   │
│ PN  ██████ 25 (25%)         │
│ NE  ███ 15 (15%)            │
│ CC  ██ 10 (10%)             │
│ ERR █ 5 (5%)                │
└─────────────────────────────┘

Tips & Best Practices

Do

  • Use descriptive titles for charts
  • Include legends and axis labels
  • Export with high DPI for publication
  • Validate data before visualization

Don't

  • Exceed 5-6 categories in one chart
  • Use 3D charts (hard to read)
  • Mix different metrics in one chart
  • Forget to add axis units

Visualization module for PatchTrack analysis.

Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.

analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)

Generate a bar chart for basic patch classification categories.

Parameters:

Name Type Description Default
height List[int]

List of frequency values for each classification.

required
pr_nr int

Pull request number for tracking.

required
plotting bool

Whether to display the plot.

False
Source code in analyzer/analysis.py
def all_class_bar(height: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a bar chart for basic patch classification categories.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['PA', 'CC', 'NE', 'PN', 'ERROR']
    colors = [
        COLOR_PATCH_APPLIED,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_PATCH_NOT_APPLIED,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_PATCH_APPLIED, label=LABEL_PATCH_APPLIED),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label=LABEL_NOT_EXISTING),
        mpatches.Patch(color=COLOR_PATCH_NOT_APPLIED, label=LABEL_PATCH_NOT_APPLIED),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    if plotting:
        plt.show()

analyzer.analysis.create_pie(slices, ax)

Create a pie chart for modification type distribution.

Parameters:

Name Type Description Default
slices List[int]

Frequency values for each modification type.

required
ax

Matplotlib pyplot module.

required
Source code in analyzer/analysis.py
def create_pie(slices: List[int], ax) -> None:
    """Create a pie chart for modification type distribution.

    Args:
        slices: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['r', 'y', 'g', 'b', 'c']

    ax.pie(slices, labels=labels, colors=colors,
           startangle=90, shadow=True, explode=(0, 0, 0, 0, 0),
           radius=1, autopct='%1.1f%%')
    ax.legend()

analyzer.analysis.create_bar(height, ax)

Create a bar chart for modification type distribution.

Parameters:

Name Type Description Default
height List[int]

Frequency values for each modification type.

required
ax

Matplotlib pyplot module.

required
Source code in analyzer/analysis.py
def create_bar(height: List[int], ax) -> None:
    """Create a bar chart for modification type distribution.

    Args:
        height: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['red', 'yellow', 'green', 'blue', 'cyan']

    ax.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)
    ax.set_xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    ax.set_ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)

analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)

Generate a grouped bar chart comparing multiple classification metrics.

Parameters:

Name Type Description Default
y0 List[int]

Missed Opportunity frequencies.

required
y1 List[int]

Effort Duplication frequencies.

required
y2 List[int]

Split (MO/ED) frequencies.

required
y3 List[int]

Added File frequencies.

required
y4 List[int]

Deleted File frequencies.

required
y5 List[int]

Uninteresting frequencies.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def grouped_bar_chart(y0: List[int], y1: List[int], y2: List[int], y3: List[int],
                       y4: List[int], y5: List[int], repo_nr: int) -> None:
    """Generate a grouped bar chart comparing multiple classification metrics.

    Args:
        y0: Missed Opportunity frequencies.
        y1: Effort Duplication frequencies.
        y2: Split (MO/ED) frequencies.
        y3: Added File frequencies.
        y4: Deleted File frequencies.
        y5: Uninteresting frequencies.
        repo_nr: Repository number for file naming.
    """
    bar_width = 0.15
    base_positions = np.arange(len(y0))
    position_offset_0 = base_positions
    position_offset_1 = [x + bar_width for x in position_offset_0]
    position_offset_2 = [x + bar_width for x in position_offset_1]
    position_offset_3 = [x + bar_width for x in position_offset_2]
    position_offset_4 = [x + bar_width for x in position_offset_3]
    position_offset_5 = [x + bar_width for x in position_offset_4]

    plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT), dpi=FIGURE_DPI)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_ADDED_FILE, label=LABEL_ADDED_FILE),
        mpatches.Patch(color=COLOR_DELETED_FILE, label=LABEL_DELETED_FILE),
        mpatches.Patch(color=COLOR_UNINTERESTING, label=LABEL_UNINTERESTING)
    ]
    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)

    plt.bar(position_offset_0, y0, color=COLOR_MISSED_OPPORTUNITY, width=bar_width, edgecolor='white')
    plt.bar(position_offset_1, y1, color=COLOR_EFFORT_DUPLICATION, width=bar_width, edgecolor='white')
    plt.bar(position_offset_2, y2, color=COLOR_SPLIT, width=bar_width, edgecolor='white')
    plt.bar(position_offset_3, y3, color=COLOR_ADDED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_4, y4, color=COLOR_DELETED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_5, y5, color=COLOR_UNINTERESTING, width=bar_width, edgecolor='white')

    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    interval_labels = ['0-100', '10-100', '20-100', '30-100', '40-100',
                      '50-100', '60-100', '70-100', '80-100', '90-100']
    plt.xticks([r + bar_width for r in range(len(y0))], interval_labels)

    plt.savefig(f"Plots/Grouped_bar_{repo_nr}.png", format="PNG", dpi=FIGURE_DPI, bbox_inches='tight')
    plt.show()

analyzer.analysis.create_all_bars(data, repo_nr)

Create a grid of bar charts for all intervals.

Parameters:

Name Type Description Default
data Dict

Dictionary mapping interval labels to frequency lists.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def create_all_bars(data: Dict, repo_nr: int) -> None:
    """Create a grid of bar charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Bar Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_bar(data[interval_label], plt)

    plt.savefig(f"Plots/All_Bars_{repo_nr}.png", format="PNG")
    plt.show()

analyzer.analysis.create_all_pie(data, repo_nr)

Create a grid of pie charts for all intervals.

Parameters:

Name Type Description Default
data Dict

Dictionary mapping interval labels to frequency lists.

required
repo_nr int

Repository number for file naming.

required
Source code in analyzer/analysis.py
def create_all_pie(data: Dict, repo_nr: int) -> None:
    """Create a grid of pie charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Pie Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_pie(data[interval_label], plt)

    plt.savefig(f"Plots/{repo_nr}_All_Pies.png", format="PNG")
    plt.show()

analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)

Generate a bar chart for extended patch classification with even distribution.

Parameters:

Name Type Description Default
height List[int]

List of frequency values for each classification.

required
pr_nr int

Pull request number for tracking.

required
Source code in analyzer/analysis.py
def all_class_bar_w_even_d(height: List[int], pr_nr: int) -> None:
    """Generate a bar chart for extended patch classification with even distribution.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
    """
    x_positions = [1, 2, 3, 4, 5, 6, 7, 8]
    x_labels = ['MO', 'ED', 'Split(MO/ED)', 'CC', 'NE', 'NA', 'EVEN_D', 'ERROR']
    colors = [
        COLOR_MISSED_OPPORTUNITY,
        COLOR_EFFORT_DUPLICATION,
        COLOR_SPLIT,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_NOT_APPLICABLE,
        COLOR_EVEN_DISTRIBUTION,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label="Not Existing Files"),
        mpatches.Patch(color=COLOR_NOT_APPLICABLE, label=LABEL_NOT_APPLICABLE),
        mpatches.Patch(color=COLOR_EVEN_DISTRIBUTION, label=LABEL_EVEN_DISTRIBUTION),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    plt.savefig(f"Plots/All_Classes_Bar_70_EVED_{pr_nr}.png", format="PNG",
                dpi=FIGURE_DPI, bbox_inches='tight')

analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)

Generate a pie chart for patch classification distribution.

Parameters:

Name Type Description Default
slices List[int]

Frequency values for each classification category.

required
pr_nr int

Pull request number for tracking.

required
plotting bool

Whether to display the plot.

False
Source code in analyzer/analysis.py
def all_class_pie(slices: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a pie chart for patch classification distribution.

    Args:
        slices: Frequency values for each classification category.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    labels = ['Effort Duplication', 'Cannot Classify', 'Not Existing Files',
              'Not Applicable', 'Error']
    colors = [COLOR_EFFORT_DUPLICATION, COLOR_CANNOT_CLASSIFY, COLOR_NOT_EXISTING,
              COLOR_NOT_APPLICABLE, COLOR_ERROR]

    plt.pie(slices, labels=labels, colors=colors,
            startangle=0, shadow=True, explode=(0, 0, 0, 0, 0),
            radius=3, autopct='%1.1f%%')
    plt.rc('font', size=FONT_SIZE_LEGEND)
    plt.rc('legend', fontsize=FONT_SIZE_TICK)
    plt.legend(loc='center left', bbox_to_anchor=(2, 1.5))

    if plotting:
        plt.show()

See Also

  • Aggregator - Produces data for visualization
  • Main - Calls analysis after classification