Skip to content

Common Module

Overview

The Common module provides shared configuration and global settings used throughout PatchTrack. It manages state variables and utilities needed across multiple modules.

Purpose

This module:

  • Stores global configuration variables
  • Manages n-gram size settings
  • Provides common utility constants
  • Enables cross-module communication

Key Configuration Variables

N-gram Size

Controls the granularity of code comparison:

import analyzer.common as common

# Set n-gram size (lines of code per token)
common.ngram_size = 1    # Default: compare line-by-line
common.ngram_size = 2    # Compare pairs of lines
common.ngram_size = 4    # Compare 4-line blocks

Impact on Classification:

N-gram Size Speed Precision Use Case
1 Fast Low Quick scans
2-3 Medium Medium Standard
4+ Slow High Detailed analysis

Recommended

Use n-gram size of 1-4 for most analysis. Larger values may miss partial matches.


Usage Patterns

Setting Global State

from analyzer import common, classifier

# Configure before classification
common.ngram_size = 4

# Run classification with this setting
patch_loader, source_loader = classifier.process_patch(...)

Multiple Classifications with Different Settings

from analyzer import common, main

# First run: n-gram size 2
common.ngram_size = 2
pt1 = main.PatchTrack(tokens)
pt1.classify(pr_pairs)

# Second run: n-gram size 4
common.ngram_size = 4
pt2 = main.PatchTrack(tokens)
pt2.classify(pr_pairs)

# Compare results
compare(pt1.pr_classifications, pt2.pr_classifications)

Configuration Impact

How N-gram Size Affects Results

Low N-gram Size (1-2): - ✅ Fast processing - ✅ Catches small code snippets - ❌ More false positives - ❌ May match unrelated code

High N-gram Size (4+): - ✅ Fewer false positives - ✅ More precise matches - ❌ Slower processing - ❌ Misses small changes

Example

ChatGPT Code:
    x = 5
    y = 10
    z = x + y

N-gram Size 1: [x, y, z, 5, 10]
N-gram Size 2: [(x,5), (y,10), (z,x+y)]
N-gram Size 4: [(x,5,y,10), (y,10,z,x+y)]

Best Practices

Do

  • Set n-gram size once at initialization
  • Use consistent settings for all PRs in a batch
  • Document your choice in results metadata
  • Experiment to find optimal value for your data

Don't

  • Change n-gram size during classification
  • Use values below 1 or above 10
  • Assume one setting works for all cases
  • Forget to reset settings between runs

Typical Configuration Values

For Different Project Types

Project Type Recommended Reason
Small snippets 1-2 Capture brief code changes
Regular code 2-3 Balanced accuracy/speed
Complex logic 4-5 Need more context
Large methods 5+ Comprehensive matching

Integration with Other Modules

The common module is used by:

  • classifier.py: Sets n-gram size via common.ngram_size
  • patchLoader.py: Respects current n-gram setting
  • sourceLoader.py: Uses n-gram configuration
  • main.py: May set configuration before processing
main.py
  └── common.ngram_size = value
       ├── classifier.py
       ├── patchLoader.py
       └── sourceLoader.py

API Reference

Common variables and functions for patch analysis.

Initial version by Jiyong Jang, 2012 Modified by Daniel Ogenrwot, 2023

analyzer.common.FileExt

Index for file types supported by the tool.

Source code in analyzer/common.py
class FileExt:
    """Index for file types supported by the tool."""

    NonText = 0
    Text = 1
    C = 2
    Java = 3
    ShellScript = 4
    Python = 5
    Perl = 6
    PHP = 7
    Ruby = 8
    yaml = 9
    Scala = 10
    ipynb = 11
    JavaScript = 12
    JSON = 13
    Kotlin = 14
    Xml = 15
    gradle = 16
    GEMFILE = 17
    REQ_TXT = 18
    TypeScript = 19
    CPP = 20
    CSHARP = 21
    VUE = 22
    REACT = 23
    Bash = 24
    markdown = 25
    goland = 26
    html = 27
    CSS = 28
    Fsharp = 29
    REGEX = 30
    conf = 31
    svelte = 32
    TSX = 33
    SQL = 34
    SWIFT = 35
    RUST = 36
    SOLIDITY = 37
    VB = 38

analyzer.common.fnv1a_hash(string)

FNV-1a 32-bit hash (http://isthe.com/chongo/tech/comp/fnv/).

Parameters:

Name Type Description Default
string str

The string to be hashed.

required

Returns:

Name Type Description
int

The hash value.

Source code in analyzer/common.py
def fnv1a_hash(string):
    """
    FNV-1a 32-bit hash (http://isthe.com/chongo/tech/comp/fnv/).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 2166136261
    for c in string:
        hash_value ^= ord(c)
        hash_value *= 16777619
        hash_value &= 0xFFFFFFFF
    return hash_value

analyzer.common.djb2_hash(string)

djb2 hash (http://www.cse.yorku.ca/~oz/hash.html).

Parameters:

Name Type Description Default
string str

The string to be hashed.

required

Returns:

Name Type Description
int int

The hash value.

Source code in analyzer/common.py
def djb2_hash(string: str) -> int:
    """
    djb2 hash (http://www.cse.yorku.ca/~oz/hash.html).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 5381
    for c in string:
        hash_value = ((hash_value << 5) + hash_value) + ord(c)
        hash_value &= 0xFFFFFFFF
    return hash_value

analyzer.common.sdbm_hash(string)

sdbm hash (http://www.cse.yorku.ca/~oz/hash.html).

Parameters:

Name Type Description Default
string str

The string to be hashed.

required

Returns:

Name Type Description
int int

The hash value.

Source code in analyzer/common.py
def sdbm_hash(string: str) -> int:
    """
    sdbm hash (http://www.cse.yorku.ca/~oz/hash.html).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 0
    for c in string:
        hash_value = ord(c) + (hash_value << 6) + (hash_value << 16) - hash_value
        hash_value &= 0xFFFFFFFF
    return hash_value

analyzer.common.file_type(file_path)

Get the file type of the given file path.

Delegates to helpers.get_file_type.

Source code in analyzer/common.py
def file_type(file_path: str) -> Any:
    """Get the file type of the given file path.

    Delegates to `helpers.get_file_type`.
    """
    return helpers.get_file_type(file_path)

analyzer.common.verbose_print(text)

Print text when verbose_mode is set.

Kept as a small helper for compatibility with existing call sites.

Source code in analyzer/common.py
def verbose_print(text: str) -> None:
    """Print text when `verbose_mode` is set.

    Kept as a small helper for compatibility with existing call sites.
    """
    if verbose_mode:
        print(text)

analyzer.common.read_prs(pair_nr, source)

Load pull request data from pickle file.

Parameters:

Name Type Description Default
pair_nr int

The pair number.

required
source str

The source in 'org/repo' format.

required

Returns:

Name Type Description
dict Any

The loaded pull request data.

Source code in analyzer/common.py
def read_prs(pair_nr: int, source: str) -> Any:
    """
    Load pull request data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded pull request data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_prs", "prs")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

analyzer.common.read_results(pair_nr, source)

Load results data from pickle file.

Parameters:

Name Type Description Default
pair_nr int

The pair number.

required
source str

The source in 'org/repo' format.

required

Returns:

Name Type Description
dict Any

The loaded results data.

Source code in analyzer/common.py
def read_results(pair_nr: int, source: str) -> Any:
    """
    Load results data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded results data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_results", "results")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

analyzer.common.read_totals(pair_nr, source)

Load metrics/totals data from pickle file.

Parameters:

Name Type Description Default
pair_nr int

The pair number.

required
source str

The source in 'org/repo' format.

required

Returns:

Name Type Description
dict Any

The loaded metrics data.

Source code in analyzer/common.py
def read_totals(pair_nr: int, source: str) -> Any:
    """
    Load metrics/totals data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded metrics data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_totals", "totals")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

analyzer.common.pickle_file(file_path, data)

Save data to a pickle file.

Parameters:

Name Type Description Default
file_path str

The file path (without .pkl extension).

required
data object

The data to pickle.

required
Source code in analyzer/common.py
def pickle_file(file_path: str, data: object) -> None:
    """
    Save data to a pickle file.

    Args:
        file_path (str): The file path (without .pkl extension).
        data: The data to pickle.
    """
    with open(f"{file_path}.pkl", 'wb') as f:
        pickle.dump(data, f)

Configuration Checklist

Before running classification:

  • Import common module
  • Set appropriate n-gram size
  • Verify setting with print statement
  • Run classification
  • Document configuration in results
  • Reset if running multiple analyses

See Also