Common Module

Overview

The Common module provides shared configuration and global settings used throughout PatchTrack. It manages state variables and utilities needed across multiple modules.

Purpose

This module:

Stores global configuration variables
Manages n-gram size settings
Provides common utility constants
Enables cross-module communication

Key Configuration Variables

N-gram Size

Controls the granularity of code comparison:

import analyzer.common as common

# Set n-gram size (lines of code per token)
common.ngram_size = 1    # Default: compare line-by-line
common.ngram_size = 2    # Compare pairs of lines
common.ngram_size = 4    # Compare 4-line blocks

Impact on Classification:

N-gram Size	Speed	Precision	Use Case
1	Fast	Low	Quick scans
2-3	Medium	Medium	Standard
4+	Slow	High	Detailed analysis

Recommended

Use n-gram size of 1-4 for most analysis. Larger values may miss partial matches.

Usage Patterns

Setting Global State

from analyzer import common, classifier

# Configure before classification
common.ngram_size = 4

# Run classification with this setting
patch_loader, source_loader = classifier.process_patch(...)

Multiple Classifications with Different Settings

from analyzer import common, main

# First run: n-gram size 2
common.ngram_size = 2
pt1 = main.PatchTrack(tokens)
pt1.classify(pr_pairs)

# Second run: n-gram size 4
common.ngram_size = 4
pt2 = main.PatchTrack(tokens)
pt2.classify(pr_pairs)

# Compare results
compare(pt1.pr_classifications, pt2.pr_classifications)

Configuration Impact

How N-gram Size Affects Results

Low N-gram Size (1-2): - ✅ Fast processing - ✅ Catches small code snippets - ❌ More false positives - ❌ May match unrelated code

High N-gram Size (4+): - ✅ Fewer false positives - ✅ More precise matches - ❌ Slower processing - ❌ Misses small changes

Example

ChatGPT Code:
    x = 5
    y = 10
    z = x + y

N-gram Size 1: [x, y, z, 5, 10]
N-gram Size 2: [(x,5), (y,10), (z,x+y)]
N-gram Size 4: [(x,5,y,10), (y,10,z,x+y)]

Best Practices

✅ Do

Set n-gram size once at initialization
Use consistent settings for all PRs in a batch
Document your choice in results metadata
Experiment to find optimal value for your data

❌ Don't

Change n-gram size during classification
Use values below 1 or above 10
Assume one setting works for all cases
Forget to reset settings between runs

Typical Configuration Values

For Different Project Types

Project Type	Recommended	Reason
Small snippets	1-2	Capture brief code changes
Regular code	2-3	Balanced accuracy/speed
Complex logic	4-5	Need more context
Large methods	5+	Comprehensive matching

Integration with Other Modules

The common module is used by:

classifier.py: Sets n-gram size via common.ngram_size
patchLoader.py: Respects current n-gram setting
sourceLoader.py: Uses n-gram configuration
main.py: May set configuration before processing

main.py
  ↓
  └── common.ngram_size = value
       ↓
       ├── classifier.py
       ├── patchLoader.py
       └── sourceLoader.py

API Reference

Common variables and functions for patch analysis.

Initial version by Jiyong Jang, 2012 Modified by Daniel Ogenrwot, 2023

`analyzer.common.FileExt`

Index for file types supported by the tool.

Source code in analyzer/common.py

class FileExt:
    """Index for file types supported by the tool."""

    NonText = 0
    Text = 1
    C = 2
    Java = 3
    ShellScript = 4
    Python = 5
    Perl = 6
    PHP = 7
    Ruby = 8
    yaml = 9
    Scala = 10
    ipynb = 11
    JavaScript = 12
    JSON = 13
    Kotlin = 14
    Xml = 15
    gradle = 16
    GEMFILE = 17
    REQ_TXT = 18
    TypeScript = 19
    CPP = 20
    CSHARP = 21
    VUE = 22
    REACT = 23
    Bash = 24
    markdown = 25
    goland = 26
    html = 27
    CSS = 28
    Fsharp = 29
    REGEX = 30
    conf = 31
    svelte = 32
    TSX = 33
    SQL = 34
    SWIFT = 35
    RUST = 36
    SOLIDITY = 37
    VB = 38

`analyzer.common.fnv1a_hash(string)`

FNV-1a 32-bit hash (http://isthe.com/chongo/tech/comp/fnv/).

Parameters:

Name	Type	Description	Default
`string`	`str`	The string to be hashed.	required

Returns:

Name	Type	Description
`int`		The hash value.

Source code in analyzer/common.py

def fnv1a_hash(string):
    """
    FNV-1a 32-bit hash (http://isthe.com/chongo/tech/comp/fnv/).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 2166136261
    for c in string:
        hash_value ^= ord(c)
        hash_value *= 16777619
        hash_value &= 0xFFFFFFFF
    return hash_value

`analyzer.common.djb2_hash(string)`

djb2 hash (http://www.cse.yorku.ca/~oz/hash.html).

Parameters:

Name	Type	Description	Default
`string`	`str`	The string to be hashed.	required

Returns:

Name	Type	Description
`int`	`int`	The hash value.

Source code in analyzer/common.py

def djb2_hash(string: str) -> int:
    """
    djb2 hash (http://www.cse.yorku.ca/~oz/hash.html).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 5381
    for c in string:
        hash_value = ((hash_value << 5) + hash_value) + ord(c)
        hash_value &= 0xFFFFFFFF
    return hash_value

`analyzer.common.sdbm_hash(string)`

sdbm hash (http://www.cse.yorku.ca/~oz/hash.html).

Parameters:

Name	Type	Description	Default
`string`	`str`	The string to be hashed.	required

Returns:

Name	Type	Description
`int`	`int`	The hash value.

Source code in analyzer/common.py

def sdbm_hash(string: str) -> int:
    """
    sdbm hash (http://www.cse.yorku.ca/~oz/hash.html).

    Args:
        string (str): The string to be hashed.

    Returns:
        int: The hash value.
    """
    hash_value = 0
    for c in string:
        hash_value = ord(c) + (hash_value << 6) + (hash_value << 16) - hash_value
        hash_value &= 0xFFFFFFFF
    return hash_value

`analyzer.common.file_type(file_path)`

Get the file type of the given file path.

Delegates to helpers.get_file_type.

Source code in analyzer/common.py

def file_type(file_path: str) -> Any:
    """Get the file type of the given file path.

    Delegates to `helpers.get_file_type`.
    """
    return helpers.get_file_type(file_path)

`analyzer.common.verbose_print(text)`

Print text when verbose_mode is set.

Kept as a small helper for compatibility with existing call sites.

Source code in analyzer/common.py

def verbose_print(text: str) -> None:
    """Print text when `verbose_mode` is set.

    Kept as a small helper for compatibility with existing call sites.
    """
    if verbose_mode:
        print(text)

`analyzer.common.read_prs(pair_nr, source)`

Load pull request data from pickle file.

Parameters:

Name	Type	Description	Default
`pair_nr`	`int`	The pair number.	required
`source`	`str`	The source in 'org/repo' format.	required

Returns:

Name	Type	Description
`dict`	`Any`	The loaded pull request data.

Source code in analyzer/common.py

def read_prs(pair_nr: int, source: str) -> Any:
    """
    Load pull request data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded pull request data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_prs", "prs")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

`analyzer.common.read_results(pair_nr, source)`

Load results data from pickle file.

Parameters:

Name	Type	Description	Default
`pair_nr`	`int`	The pair number.	required
`source`	`str`	The source in 'org/repo' format.	required

Returns:

Name	Type	Description
`dict`	`Any`	The loaded results data.

Source code in analyzer/common.py

def read_results(pair_nr: int, source: str) -> Any:
    """
    Load results data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded results data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_results", "results")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

`analyzer.common.read_totals(pair_nr, source)`

Load metrics/totals data from pickle file.

Parameters:

Name	Type	Description	Default
`pair_nr`	`int`	The pair number.	required
`source`	`str`	The source in 'org/repo' format.	required

Returns:

Name	Type	Description
`dict`	`Any`	The loaded metrics data.

Source code in analyzer/common.py

def read_totals(pair_nr: int, source: str) -> Any:
    """
    Load metrics/totals data from pickle file.

    Args:
        pair_nr (int): The pair number.
        source (str): The source in 'org/repo' format.

    Returns:
        dict: The loaded metrics data.
    """
    file_path = _repo_pickle_path(pair_nr, source, "Repos_totals", "totals")
    with open(file_path, 'rb') as f:
        return pickle.load(f)

`analyzer.common.pickle_file(file_path, data)`

Save data to a pickle file.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	The file path (without .pkl extension).	required
`data`	`object`	The data to pickle.	required

Source code in analyzer/common.py

def pickle_file(file_path: str, data: object) -> None:
    """
    Save data to a pickle file.

    Args:
        file_path (str): The file path (without .pkl extension).
        data: The data to pickle.
    """
    with open(f"{file_path}.pkl", 'wb') as f:
        pickle.dump(data, f)

Configuration Checklist

Before running classification:

Import common module
Set appropriate n-gram size
Verify setting with print statement
Run classification
Document configuration in results
Reset if running multiple analyses

Common Module

Overview

Purpose

Key Configuration Variables

N-gram Size

Usage Patterns

Setting Global State

Multiple Classifications with Different Settings

Configuration Impact

How N-gram Size Affects Results

Example

Best Practices

Typical Configuration Values

For Different Project Types

Integration with Other Modules

API Reference

analyzer.common.FileExt

analyzer.common.fnv1a_hash(string)

analyzer.common.djb2_hash(string)

analyzer.common.sdbm_hash(string)

analyzer.common.file_type(file_path)

analyzer.common.verbose_print(text)

analyzer.common.read_prs(pair_nr, source)

analyzer.common.read_results(pair_nr, source)

analyzer.common.read_totals(pair_nr, source)

analyzer.common.pickle_file(file_path, data)

Configuration Checklist

See Also

`analyzer.common.FileExt`

`analyzer.common.fnv1a_hash(string)`

`analyzer.common.djb2_hash(string)`

`analyzer.common.sdbm_hash(string)`

`analyzer.common.file_type(file_path)`

`analyzer.common.verbose_print(text)`

`analyzer.common.read_prs(pair_nr, source)`

`analyzer.common.read_results(pair_nr, source)`

`analyzer.common.read_totals(pair_nr, source)`

`analyzer.common.pickle_file(file_path, data)`