Analysis Module

Overview

The Analysis module provides visualization and plotting functions for PatchTrack classification results. It generates charts and graphs to help understand patch application patterns across pull requests and repositories.

Purpose

This module:

Visualizes classification distributions (PA, PN, NE, CC, ERROR)
Generates bar charts, pie charts, and other plots
Creates statistical summaries
Exports results for reporting and publication

Chart Types

Chart Type	Purpose	Best For
Bar Chart	Absolute counts by classification	Overall distribution
Pie Chart	Percentage breakdown	Proportional understanding
Line Chart	Trends over repositories	Time-series analysis
Heatmap	2D distribution	Multiple dimensions

Key Visualizations

Classification Distribution

Shows how many patches fall into each category:

PA (Patch Applied)     ████████████ 45%
PN (Not Applied)       ██████ 25%
NE (Not Existing)      ███ 15%
CC (Cannot Classify)   ██ 10%
ERROR                  █ 5%

Repository Comparison

Compares classification patterns across different repositories:

X-axis: Repository name
Y-axis: Count or percentage
Groups: Classification type

Trend Analysis

Tracks how patch application rates change:

Over time
Across project size ranges
By programming language

Usage Example

from analyzer import analysis

# Basic bar chart of classification totals
totals_list = [45, 25, 15, 10, 5]  # PA, PN, NE, CC, ERROR
analysis.all_class_bar(totals_list, is_percentage=False)

# As percentage
analysis.all_class_bar(totals_list, is_percentage=True)

# Pie chart
analysis.all_class_pie(totals_list)

# Repository comparison
repo_data = {
    'repo1': [30, 20, 10, 5, 2],
    'repo2': [15, 5, 5, 5, 3],
}
analysis.repo_comparison_bar(repo_data)

Output Formats

Display

Interactive plots in Jupyter notebooks
Static images saved to disk
Console output for quick analysis

File Exports

PNG images for publications
CSV data for further analysis
JSON for programmatic access

Statistical Metrics

The module can calculate:

Descriptive Statistics

Mean, median, standard deviation
Min/max values
Quartile ranges

Comparative Metrics

Patch application rate
Classification accuracy
Inter-rater agreement (if multiple classifiers)

Aggregation Levels

Per repository
Per programming language
Per project size
Overall

Configuration

Common parameters:

# Chart styling
figsize = (10, 6)           # Figure dimensions
dpi = 300                    # Resolution for exports
style = 'seaborn'            # Matplotlib style
colormap = 'viridis'         # Color scheme

# Data options
normalize = False            # Percentage vs absolute
log_scale = False            # Linear vs log scale
sort_by = 'value'            # Sort order

API Reference

Visualization module for PatchTrack analysis.

Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.

`analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)`

Generate a bar chart for basic patch classification categories.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	List of frequency values for each classification.	required
`pr_nr`	`int`	Pull request number for tracking.	required
`plotting`	`bool`	Whether to display the plot.	`False`

Source code in analyzer/analysis.py

def all_class_bar(height: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a bar chart for basic patch classification categories.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['PA', 'CC', 'NE', 'PN', 'ERROR']
    colors = [
        COLOR_PATCH_APPLIED,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_PATCH_NOT_APPLIED,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_PATCH_APPLIED, label=LABEL_PATCH_APPLIED),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label=LABEL_NOT_EXISTING),
        mpatches.Patch(color=COLOR_PATCH_NOT_APPLIED, label=LABEL_PATCH_NOT_APPLIED),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    if plotting:
        plt.show()

`analyzer.analysis.create_pie(slices, ax)`

Create a pie chart for modification type distribution.

Parameters:

Name	Type	Description	Default
`slices`	`List[int]`	Frequency values for each modification type.	required
`ax`		Matplotlib pyplot module.	required

Source code in analyzer/analysis.py

def create_pie(slices: List[int], ax) -> None:
    """Create a pie chart for modification type distribution.

    Args:
        slices: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['r', 'y', 'g', 'b', 'c']

    ax.pie(slices, labels=labels, colors=colors,
           startangle=90, shadow=True, explode=(0, 0, 0, 0, 0),
           radius=1, autopct='%1.1f%%')
    ax.legend()

`analyzer.analysis.create_bar(height, ax)`

Create a bar chart for modification type distribution.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	Frequency values for each modification type.	required
`ax`		Matplotlib pyplot module.	required

Source code in analyzer/analysis.py

def create_bar(height: List[int], ax) -> None:
    """Create a bar chart for modification type distribution.

    Args:
        height: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['red', 'yellow', 'green', 'blue', 'cyan']

    ax.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)
    ax.set_xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    ax.set_ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)

`analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)`

Generate a grouped bar chart comparing multiple classification metrics.

Parameters:

Name	Type	Description	Default
`y0`	`List[int]`	Missed Opportunity frequencies.	required
`y1`	`List[int]`	Effort Duplication frequencies.	required
`y2`	`List[int]`	Split (MO/ED) frequencies.	required
`y3`	`List[int]`	Added File frequencies.	required
`y4`	`List[int]`	Deleted File frequencies.	required
`y5`	`List[int]`	Uninteresting frequencies.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def grouped_bar_chart(y0: List[int], y1: List[int], y2: List[int], y3: List[int],
                       y4: List[int], y5: List[int], repo_nr: int) -> None:
    """Generate a grouped bar chart comparing multiple classification metrics.

    Args:
        y0: Missed Opportunity frequencies.
        y1: Effort Duplication frequencies.
        y2: Split (MO/ED) frequencies.
        y3: Added File frequencies.
        y4: Deleted File frequencies.
        y5: Uninteresting frequencies.
        repo_nr: Repository number for file naming.
    """
    bar_width = 0.15
    base_positions = np.arange(len(y0))
    position_offset_0 = base_positions
    position_offset_1 = [x + bar_width for x in position_offset_0]
    position_offset_2 = [x + bar_width for x in position_offset_1]
    position_offset_3 = [x + bar_width for x in position_offset_2]
    position_offset_4 = [x + bar_width for x in position_offset_3]
    position_offset_5 = [x + bar_width for x in position_offset_4]

    plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT), dpi=FIGURE_DPI)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_ADDED_FILE, label=LABEL_ADDED_FILE),
        mpatches.Patch(color=COLOR_DELETED_FILE, label=LABEL_DELETED_FILE),
        mpatches.Patch(color=COLOR_UNINTERESTING, label=LABEL_UNINTERESTING)
    ]
    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)

    plt.bar(position_offset_0, y0, color=COLOR_MISSED_OPPORTUNITY, width=bar_width, edgecolor='white')
    plt.bar(position_offset_1, y1, color=COLOR_EFFORT_DUPLICATION, width=bar_width, edgecolor='white')
    plt.bar(position_offset_2, y2, color=COLOR_SPLIT, width=bar_width, edgecolor='white')
    plt.bar(position_offset_3, y3, color=COLOR_ADDED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_4, y4, color=COLOR_DELETED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_5, y5, color=COLOR_UNINTERESTING, width=bar_width, edgecolor='white')

    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    interval_labels = ['0-100', '10-100', '20-100', '30-100', '40-100',
                      '50-100', '60-100', '70-100', '80-100', '90-100']
    plt.xticks([r + bar_width for r in range(len(y0))], interval_labels)

    plt.savefig(f"Plots/Grouped_bar_{repo_nr}.png", format="PNG", dpi=FIGURE_DPI, bbox_inches='tight')
    plt.show()

`analyzer.analysis.create_all_bars(data, repo_nr)`

Create a grid of bar charts for all intervals.

Parameters:

Name	Type	Description	Default
`data`	`Dict`	Dictionary mapping interval labels to frequency lists.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def create_all_bars(data: Dict, repo_nr: int) -> None:
    """Create a grid of bar charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Bar Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_bar(data[interval_label], plt)

    plt.savefig(f"Plots/All_Bars_{repo_nr}.png", format="PNG")
    plt.show()

`analyzer.analysis.create_all_pie(data, repo_nr)`

Create a grid of pie charts for all intervals.

Parameters:

Name	Type	Description	Default
`data`	`Dict`	Dictionary mapping interval labels to frequency lists.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def create_all_pie(data: Dict, repo_nr: int) -> None:
    """Create a grid of pie charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Pie Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_pie(data[interval_label], plt)

    plt.savefig(f"Plots/{repo_nr}_All_Pies.png", format="PNG")
    plt.show()

`analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)`

Generate a bar chart for extended patch classification with even distribution.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	List of frequency values for each classification.	required
`pr_nr`	`int`	Pull request number for tracking.	required

Source code in analyzer/analysis.py

def all_class_bar_w_even_d(height: List[int], pr_nr: int) -> None:
    """Generate a bar chart for extended patch classification with even distribution.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
    """
    x_positions = [1, 2, 3, 4, 5, 6, 7, 8]
    x_labels = ['MO', 'ED', 'Split(MO/ED)', 'CC', 'NE', 'NA', 'EVEN_D', 'ERROR']
    colors = [
        COLOR_MISSED_OPPORTUNITY,
        COLOR_EFFORT_DUPLICATION,
        COLOR_SPLIT,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_NOT_APPLICABLE,
        COLOR_EVEN_DISTRIBUTION,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label="Not Existing Files"),
        mpatches.Patch(color=COLOR_NOT_APPLICABLE, label=LABEL_NOT_APPLICABLE),
        mpatches.Patch(color=COLOR_EVEN_DISTRIBUTION, label=LABEL_EVEN_DISTRIBUTION),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    plt.savefig(f"Plots/All_Classes_Bar_70_EVED_{pr_nr}.png", format="PNG",
                dpi=FIGURE_DPI, bbox_inches='tight')

`analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)`

Generate a pie chart for patch classification distribution.

Parameters:

Name	Type	Description	Default
`slices`	`List[int]`	Frequency values for each classification category.	required
`pr_nr`	`int`	Pull request number for tracking.	required
`plotting`	`bool`	Whether to display the plot.	`False`

Source code in analyzer/analysis.py

def all_class_pie(slices: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a pie chart for patch classification distribution.

    Args:
        slices: Frequency values for each classification category.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    labels = ['Effort Duplication', 'Cannot Classify', 'Not Existing Files',
              'Not Applicable', 'Error']
    colors = [COLOR_EFFORT_DUPLICATION, COLOR_CANNOT_CLASSIFY, COLOR_NOT_EXISTING,
              COLOR_NOT_APPLICABLE, COLOR_ERROR]

    plt.pie(slices, labels=labels, colors=colors,
            startangle=0, shadow=True, explode=(0, 0, 0, 0, 0),
            radius=3, autopct='%1.1f%%')
    plt.rc('font', size=FONT_SIZE_LEGEND)
    plt.rc('legend', fontsize=FONT_SIZE_TICK)
    plt.legend(loc='center left', bbox_to_anchor=(2, 1.5))

    if plotting:
        plt.show()

Integration Points

The analysis module is typically called:

From main.py: After classification completes
From notebooks: For exploratory analysis
In CI/CD pipelines: For automated reporting
Standalone: For custom analysis scripts

Example Pipeline

from analyzer import main, analysis

# Run classification
pt = main.PatchTrack(tokens)
pt.classify(pr_pairs)

# Get results
results = pt.get_results()
classifications = pt.pr_classifications

# Visualize
analysis.visualize_results(classifications)

Output Examples

Bar Chart Output

Classification Distribution
┌─────────────────────────────┐
│ PA  ████████████ 45 (45%)   │
│ PN  ██████ 25 (25%)         │
│ NE  ███ 15 (15%)            │
│ CC  ██ 10 (10%)             │
│ ERR █ 5 (5%)                │
└─────────────────────────────┘

Tips & Best Practices

✅ Do

Use descriptive titles for charts
Include legends and axis labels
Export with high DPI for publication
Validate data before visualization

❌ Don't

Exceed 5-6 categories in one chart
Use 3D charts (hard to read)
Mix different metrics in one chart
Forget to add axis units

Visualization module for PatchTrack analysis.

Provides functions to generate bar charts, pie charts, and grouped visualizations for analyzing patch classification patterns and integration metrics.

`analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)`

Generate a bar chart for basic patch classification categories.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	List of frequency values for each classification.	required
`pr_nr`	`int`	Pull request number for tracking.	required
`plotting`	`bool`	Whether to display the plot.	`False`

Source code in analyzer/analysis.py

def all_class_bar(height: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a bar chart for basic patch classification categories.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['PA', 'CC', 'NE', 'PN', 'ERROR']
    colors = [
        COLOR_PATCH_APPLIED,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_PATCH_NOT_APPLIED,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_PATCH_APPLIED, label=LABEL_PATCH_APPLIED),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label=LABEL_NOT_EXISTING),
        mpatches.Patch(color=COLOR_PATCH_NOT_APPLIED, label=LABEL_PATCH_NOT_APPLIED),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    if plotting:
        plt.show()

`analyzer.analysis.create_pie(slices, ax)`

Create a pie chart for modification type distribution.

Parameters:

Name	Type	Description	Default
`slices`	`List[int]`	Frequency values for each modification type.	required
`ax`		Matplotlib pyplot module.	required

Source code in analyzer/analysis.py

def create_pie(slices: List[int], ax) -> None:
    """Create a pie chart for modification type distribution.

    Args:
        slices: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['r', 'y', 'g', 'b', 'c']

    ax.pie(slices, labels=labels, colors=colors,
           startangle=90, shadow=True, explode=(0, 0, 0, 0, 0),
           radius=1, autopct='%1.1f%%')
    ax.legend()

`analyzer.analysis.create_bar(height, ax)`

Create a bar chart for modification type distribution.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	Frequency values for each modification type.	required
`ax`		Matplotlib pyplot module.	required

Source code in analyzer/analysis.py

def create_bar(height: List[int], ax) -> None:
    """Create a bar chart for modification type distribution.

    Args:
        height: Frequency values for each modification type.
        ax: Matplotlib pyplot module.
    """
    x_positions = [1, 2, 3, 4, 5]
    x_labels = ['MO', 'ED', 'SP', 'AF', 'DF']
    colors = ['red', 'yellow', 'green', 'blue', 'cyan']

    ax.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)
    ax.set_xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    ax.set_ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)

`analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)`

Generate a grouped bar chart comparing multiple classification metrics.

Parameters:

Name	Type	Description	Default
`y0`	`List[int]`	Missed Opportunity frequencies.	required
`y1`	`List[int]`	Effort Duplication frequencies.	required
`y2`	`List[int]`	Split (MO/ED) frequencies.	required
`y3`	`List[int]`	Added File frequencies.	required
`y4`	`List[int]`	Deleted File frequencies.	required
`y5`	`List[int]`	Uninteresting frequencies.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def grouped_bar_chart(y0: List[int], y1: List[int], y2: List[int], y3: List[int],
                       y4: List[int], y5: List[int], repo_nr: int) -> None:
    """Generate a grouped bar chart comparing multiple classification metrics.

    Args:
        y0: Missed Opportunity frequencies.
        y1: Effort Duplication frequencies.
        y2: Split (MO/ED) frequencies.
        y3: Added File frequencies.
        y4: Deleted File frequencies.
        y5: Uninteresting frequencies.
        repo_nr: Repository number for file naming.
    """
    bar_width = 0.15
    base_positions = np.arange(len(y0))
    position_offset_0 = base_positions
    position_offset_1 = [x + bar_width for x in position_offset_0]
    position_offset_2 = [x + bar_width for x in position_offset_1]
    position_offset_3 = [x + bar_width for x in position_offset_2]
    position_offset_4 = [x + bar_width for x in position_offset_3]
    position_offset_5 = [x + bar_width for x in position_offset_4]

    plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT), dpi=FIGURE_DPI)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_ADDED_FILE, label=LABEL_ADDED_FILE),
        mpatches.Patch(color=COLOR_DELETED_FILE, label=LABEL_DELETED_FILE),
        mpatches.Patch(color=COLOR_UNINTERESTING, label=LABEL_UNINTERESTING)
    ]
    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)

    plt.bar(position_offset_0, y0, color=COLOR_MISSED_OPPORTUNITY, width=bar_width, edgecolor='white')
    plt.bar(position_offset_1, y1, color=COLOR_EFFORT_DUPLICATION, width=bar_width, edgecolor='white')
    plt.bar(position_offset_2, y2, color=COLOR_SPLIT, width=bar_width, edgecolor='white')
    plt.bar(position_offset_3, y3, color=COLOR_ADDED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_4, y4, color=COLOR_DELETED_FILE, width=bar_width, edgecolor='white')
    plt.bar(position_offset_5, y5, color=COLOR_UNINTERESTING, width=bar_width, edgecolor='white')

    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    interval_labels = ['0-100', '10-100', '20-100', '30-100', '40-100',
                      '50-100', '60-100', '70-100', '80-100', '90-100']
    plt.xticks([r + bar_width for r in range(len(y0))], interval_labels)

    plt.savefig(f"Plots/Grouped_bar_{repo_nr}.png", format="PNG", dpi=FIGURE_DPI, bbox_inches='tight')
    plt.show()

`analyzer.analysis.create_all_bars(data, repo_nr)`

Create a grid of bar charts for all intervals.

Parameters:

Name	Type	Description	Default
`data`	`Dict`	Dictionary mapping interval labels to frequency lists.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def create_all_bars(data: Dict, repo_nr: int) -> None:
    """Create a grid of bar charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Bar Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_bar(data[interval_label], plt)

    plt.savefig(f"Plots/All_Bars_{repo_nr}.png", format="PNG")
    plt.show()

`analyzer.analysis.create_all_pie(data, repo_nr)`

Create a grid of pie charts for all intervals.

Parameters:

Name	Type	Description	Default
`data`	`Dict`	Dictionary mapping interval labels to frequency lists.	required
`repo_nr`	`int`	Repository number for file naming.	required

Source code in analyzer/analysis.py

def create_all_pie(data: Dict, repo_nr: int) -> None:
    """Create a grid of pie charts for all intervals.

    Args:
        data: Dictionary mapping interval labels to frequency lists.
        repo_nr: Repository number for file naming.
    """
    fig = plt.figure(figsize=(GROUPED_FIGURE_WIDTH, GROUPED_FIGURE_HEIGHT))
    for idx, interval_label in enumerate(data, start=1):
        plt.subplot(2, 5, idx)
        plt.title(f"Pie Chart for interval {interval_label}", fontsize=FONT_SIZE_TITLE)
        create_pie(data[interval_label], plt)

    plt.savefig(f"Plots/{repo_nr}_All_Pies.png", format="PNG")
    plt.show()

`analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)`

Generate a bar chart for extended patch classification with even distribution.

Parameters:

Name	Type	Description	Default
`height`	`List[int]`	List of frequency values for each classification.	required
`pr_nr`	`int`	Pull request number for tracking.	required

Source code in analyzer/analysis.py

def all_class_bar_w_even_d(height: List[int], pr_nr: int) -> None:
    """Generate a bar chart for extended patch classification with even distribution.

    Args:
        height: List of frequency values for each classification.
        pr_nr: Pull request number for tracking.
    """
    x_positions = [1, 2, 3, 4, 5, 6, 7, 8]
    x_labels = ['MO', 'ED', 'Split(MO/ED)', 'CC', 'NE', 'NA', 'EVEN_D', 'ERROR']
    colors = [
        COLOR_MISSED_OPPORTUNITY,
        COLOR_EFFORT_DUPLICATION,
        COLOR_SPLIT,
        COLOR_CANNOT_CLASSIFY,
        COLOR_NOT_EXISTING,
        COLOR_NOT_APPLICABLE,
        COLOR_EVEN_DISTRIBUTION,
        COLOR_ERROR
    ]

    plt.figure(figsize=(FIGURE_WIDTH, FIGURE_HEIGHT), dpi=FIGURE_DPI)
    plt.bar(x_positions, height, tick_label=x_labels, width=0.8, color=colors)

    patches = [
        mpatches.Patch(color=COLOR_MISSED_OPPORTUNITY, label=LABEL_MISSED_OPPORTUNITY),
        mpatches.Patch(color=COLOR_EFFORT_DUPLICATION, label=LABEL_EFFORT_DUPLICATION),
        mpatches.Patch(color=COLOR_SPLIT, label=LABEL_SPLIT),
        mpatches.Patch(color=COLOR_CANNOT_CLASSIFY, label=LABEL_CANNOT_CLASSIFY),
        mpatches.Patch(color=COLOR_NOT_EXISTING, label="Not Existing Files"),
        mpatches.Patch(color=COLOR_NOT_APPLICABLE, label=LABEL_NOT_APPLICABLE),
        mpatches.Patch(color=COLOR_EVEN_DISTRIBUTION, label=LABEL_EVEN_DISTRIBUTION),
        mpatches.Patch(color=COLOR_ERROR, label=LABEL_ERROR)
    ]

    plt.legend(fontsize=FONT_SIZE_LEGEND, loc="upper left", handles=patches)
    plt.xlabel('Classifications', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.ylabel('Frequency', fontsize=FONT_SIZE_AXIS_LABEL)
    plt.xticks(fontsize=FONT_SIZE_TICK)
    plt.yticks(fontsize=FONT_SIZE_TICK)

    plt.savefig(f"Plots/All_Classes_Bar_70_EVED_{pr_nr}.png", format="PNG",
                dpi=FIGURE_DPI, bbox_inches='tight')

`analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)`

Generate a pie chart for patch classification distribution.

Parameters:

Name	Type	Description	Default
`slices`	`List[int]`	Frequency values for each classification category.	required
`pr_nr`	`int`	Pull request number for tracking.	required
`plotting`	`bool`	Whether to display the plot.	`False`

Source code in analyzer/analysis.py

def all_class_pie(slices: List[int], pr_nr: int, plotting: bool = False) -> None:
    """Generate a pie chart for patch classification distribution.

    Args:
        slices: Frequency values for each classification category.
        pr_nr: Pull request number for tracking.
        plotting: Whether to display the plot.
    """
    labels = ['Effort Duplication', 'Cannot Classify', 'Not Existing Files',
              'Not Applicable', 'Error']
    colors = [COLOR_EFFORT_DUPLICATION, COLOR_CANNOT_CLASSIFY, COLOR_NOT_EXISTING,
              COLOR_NOT_APPLICABLE, COLOR_ERROR]

    plt.pie(slices, labels=labels, colors=colors,
            startangle=0, shadow=True, explode=(0, 0, 0, 0, 0),
            radius=3, autopct='%1.1f%%')
    plt.rc('font', size=FONT_SIZE_LEGEND)
    plt.rc('legend', fontsize=FONT_SIZE_TICK)
    plt.legend(loc='center left', bbox_to_anchor=(2, 1.5))

    if plotting:
        plt.show()

Analysis Module

Overview

Purpose

Chart Types

Key Visualizations

Classification Distribution

Repository Comparison

Trend Analysis

Usage Example

Output Formats

Display

File Exports

Statistical Metrics

Descriptive Statistics

Comparative Metrics

Aggregation Levels

Configuration

API Reference

analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)

analyzer.analysis.create_pie(slices, ax)

analyzer.analysis.create_bar(height, ax)

analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)

analyzer.analysis.create_all_bars(data, repo_nr)

analyzer.analysis.create_all_pie(data, repo_nr)

analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)

analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)

Integration Points

Example Pipeline

Output Examples

Bar Chart Output

Tips & Best Practices

analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)

analyzer.analysis.create_pie(slices, ax)

analyzer.analysis.create_bar(height, ax)

analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)

analyzer.analysis.create_all_bars(data, repo_nr)

analyzer.analysis.create_all_pie(data, repo_nr)

analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)

analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)

See Also

`analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)`

`analyzer.analysis.create_pie(slices, ax)`

`analyzer.analysis.create_bar(height, ax)`

`analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)`

`analyzer.analysis.create_all_bars(data, repo_nr)`

`analyzer.analysis.create_all_pie(data, repo_nr)`

`analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)`

`analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)`

`analyzer.analysis.all_class_bar(height, pr_nr, plotting=False)`

`analyzer.analysis.create_pie(slices, ax)`

`analyzer.analysis.create_bar(height, ax)`

`analyzer.analysis.grouped_bar_chart(y0, y1, y2, y3, y4, y5, repo_nr)`

`analyzer.analysis.create_all_bars(data, repo_nr)`

`analyzer.analysis.create_all_pie(data, repo_nr)`

`analyzer.analysis.all_class_bar_w_even_d(height, pr_nr)`

`analyzer.analysis.all_class_pie(slices, pr_nr, plotting=False)`