Common Module
Overview
The Common module provides shared configuration and global settings used throughout PatchTrack. It manages state variables and utilities needed across multiple modules.
Purpose
This module:
- Stores global configuration variables
- Manages n-gram size settings
- Provides common utility constants
- Enables cross-module communication
Key Configuration Variables
N-gram Size
Controls the granularity of code comparison:
import analyzer.common as common
# Set n-gram size (lines of code per token)
common.ngram_size = 1 # Default: compare line-by-line
common.ngram_size = 2 # Compare pairs of lines
common.ngram_size = 4 # Compare 4-line blocks
Impact on Classification:
| N-gram Size | Speed | Precision | Use Case |
|---|---|---|---|
| 1 | Fast | Low | Quick scans |
| 2-3 | Medium | Medium | Standard |
| 4+ | Slow | High | Detailed analysis |
Recommended
Use n-gram size of 1-4 for most analysis. Larger values may miss partial matches.
Usage Patterns
Setting Global State
from analyzer import common, classifier
# Configure before classification
common.ngram_size = 4
# Run classification with this setting
patch_loader, source_loader = classifier.process_patch(...)
Multiple Classifications with Different Settings
from analyzer import common, main
# First run: n-gram size 2
common.ngram_size = 2
pt1 = main.PatchTrack(tokens)
pt1.classify(pr_pairs)
# Second run: n-gram size 4
common.ngram_size = 4
pt2 = main.PatchTrack(tokens)
pt2.classify(pr_pairs)
# Compare results
compare(pt1.pr_classifications, pt2.pr_classifications)
Configuration Impact
How N-gram Size Affects Results
Low N-gram Size (1-2): - ✅ Fast processing - ✅ Catches small code snippets - ❌ More false positives - ❌ May match unrelated code
High N-gram Size (4+): - ✅ Fewer false positives - ✅ More precise matches - ❌ Slower processing - ❌ Misses small changes
Example
ChatGPT Code:
x = 5
y = 10
z = x + y
N-gram Size 1: [x, y, z, 5, 10]
N-gram Size 2: [(x,5), (y,10), (z,x+y)]
N-gram Size 4: [(x,5,y,10), (y,10,z,x+y)]
Best Practices
✅ Do
- Set n-gram size once at initialization
- Use consistent settings for all PRs in a batch
- Document your choice in results metadata
- Experiment to find optimal value for your data
❌ Don't
- Change n-gram size during classification
- Use values below 1 or above 10
- Assume one setting works for all cases
- Forget to reset settings between runs
Typical Configuration Values
For Different Project Types
| Project Type | Recommended | Reason |
|---|---|---|
| Small snippets | 1-2 | Capture brief code changes |
| Regular code | 2-3 | Balanced accuracy/speed |
| Complex logic | 4-5 | Need more context |
| Large methods | 5+ | Comprehensive matching |
Integration with Other Modules
The common module is used by:
- classifier.py: Sets n-gram size via
common.ngram_size - patchLoader.py: Respects current n-gram setting
- sourceLoader.py: Uses n-gram configuration
- main.py: May set configuration before processing
API Reference
Common variables and functions for patch analysis.
Initial version by Jiyong Jang, 2012 Modified by Daniel Ogenrwot, 2023
analyzer.common.FileExt
Index for file types supported by the tool.
Source code in analyzer/common.py
analyzer.common.fnv1a_hash(string)
FNV-1a 32-bit hash (http://isthe.com/chongo/tech/comp/fnv/).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The string to be hashed. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
The hash value. |
Source code in analyzer/common.py
analyzer.common.djb2_hash(string)
djb2 hash (http://www.cse.yorku.ca/~oz/hash.html).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The string to be hashed. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The hash value. |
Source code in analyzer/common.py
analyzer.common.sdbm_hash(string)
sdbm hash (http://www.cse.yorku.ca/~oz/hash.html).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The string to be hashed. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The hash value. |
Source code in analyzer/common.py
analyzer.common.file_type(file_path)
analyzer.common.verbose_print(text)
Print text when verbose_mode is set.
Kept as a small helper for compatibility with existing call sites.
analyzer.common.read_prs(pair_nr, source)
Load pull request data from pickle file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair_nr
|
int
|
The pair number. |
required |
source
|
str
|
The source in 'org/repo' format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Any
|
The loaded pull request data. |
Source code in analyzer/common.py
analyzer.common.read_results(pair_nr, source)
Load results data from pickle file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair_nr
|
int
|
The pair number. |
required |
source
|
str
|
The source in 'org/repo' format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Any
|
The loaded results data. |
Source code in analyzer/common.py
analyzer.common.read_totals(pair_nr, source)
Load metrics/totals data from pickle file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pair_nr
|
int
|
The pair number. |
required |
source
|
str
|
The source in 'org/repo' format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Any
|
The loaded metrics data. |
Source code in analyzer/common.py
analyzer.common.pickle_file(file_path, data)
Save data to a pickle file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The file path (without .pkl extension). |
required |
data
|
object
|
The data to pickle. |
required |
Source code in analyzer/common.py
Configuration Checklist
Before running classification:
- Import common module
- Set appropriate n-gram size
- Verify setting with print statement
- Run classification
- Document configuration in results
- Reset if running multiple analyses
See Also
- Classifier - Uses n-gram configuration
- Patch Loader - Tokenizes based on n-grams
- Source Loader - Processes code with n-grams
- Main - May initialize configuration