Getting Started with PatchTrack
Python Version
PatchTrack requires Python >= 3.10. Please verify your Python version before proceeding.
Quick Start
Get PatchTrack up and running in just 3 steps:
# 1. Clone the repository
git clone https://github.com/replication-pack/PatchTrack.git
cd PatchTrack
# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Initialize PatchTrack (installs dependencies & datasets)
python PatchTrack.py --init
That's it! You can now start using PatchTrack. Proceed to Running PatchTrack section.
System Requirements
Minimum Specifications
- Operating System: macOS, Linux, or Windows
- Python: >= 3.10
- RAM: >= 4 GB
- Storage: >= 1 GB
- Processor: CPU 1.18 GHz or greater
- Git: Latest version
Installation
Step 1: Clone the Repository
Step 2: Create Python Virtual Environment
=== "macOS & Linux"
```bash
python3 -m venv venv
source venv/bin/activate
```
=== "Windows"
```powershell
python -m venv venv
venv\Scripts\activate
```
Step 3: Install Dependencies
PatchTrack has two types of dependencies:
- OS-specific:
libmagiclibrary (required before Python dependencies) - Python packages: Automatically installed in the next step
Install OS Dependencies
=== "macOS"
```bash
# Using Homebrew
brew install libmagic
```
=== "Ubuntu/Debian"
```bash
sudo apt-get update
sudo apt-get install libmagic1
```
=== "Fedora/RHEL"
```bash
sudo dnf install file-libs
```
=== "Windows"
Windows users can skip this step - `libmagic` is handled by Python packages.
Alternative: Automated Script
You can also run the automated installation script:
This script automatically detects your OS and installs the appropriate dependencies.Step 4: Initialize PatchTrack
This command installs all Python dependencies and extracts datasets:
Installation Complete!
Your PatchTrack environment is ready to use.
Verify Installation
Confirm everything is set up correctly by running the verification script:
You should also see a data/ directory with extracted datasets.
Project Structure
Click to expand directory structure
.
├── LICENSE # MIT License
├── PatchTrack.py # Main entry point
├── README.md # Project README
├── requirements.txt # Python dependencies
├── mkdocs.yml # Documentation config
│
├── analyzer/ # Core analysis modules
│ ├── __init__.py
│ ├── main.py # Main PatchTrack class
│ ├── classifier.py # Patch classification (PA/PN/NE)
│ ├── patchLoader.py # Parse PR patches (diff format)
│ ├── sourceLoader.py # Parse ChatGPT code snippets
│ ├── analysis.py # Result visualization & plotting
│ ├── aggregator.py # Aggregate PR-level decisions
│ ├── helpers.py # Utility functions (API, normalization)
│ ├── common.py # Shared settings (n-grams, file types)
│ ├── constant.py # Global constants
│ └── dataDict.py # Track PR-project pair info
│
├── dataprep/ # Data preparation
│ ├── __init__.py
│ ├── load.py # Dataset loading functions
│ ├── allPullRequestSharings.zip # Main dataset
│ ├── patches.zip # Extracted patches
│ └── manual/ # Custom dataset generation docs
│
├── notebooks/ # Jupyter experiments
│ ├── __init__.py
│ └── run_experiment.ipynb # Reproduce paper results
│
├── bin/ # Installation scripts
│ └── os-package.sh # OS-specific dependency installer
│
├── docs/ # Documentation
│ ├── index.md
│ ├── getting_started.md
│ └── reference/
│
├── output/ # Results & visualizations
├── tests/ # Unit tests (WIP)
├── RQ1_2_3_4/ # Research question results
│
└── tokens-example.txt # GitHub tokens template
Running PatchTrack
Option 1: Jupyter Notebook (Recommended)
The easiest way to test and reproduce the paper results:
# Make sure your virtual environment is activated
cd notebooks/
jupyter notebook run_experiment.ipynb
Recommended for:
- Reproducing published results
- Interactive exploration
- Learning how PatchTrack works
Option 2: Command Line
Use PatchTrack with customizable command-line arguments:
Command Reference
| Command | Description | Default |
|---|---|---|
-h, --help |
Show help message | - |
-i, --init |
Setup datasets & directories (run once) | - |
-n, --ngram NUM |
N-gram size in lines | 1 |
-c, --context NUM |
Context lines for output | 10 |
-v, --verbose |
Enable verbose logging | False |
-p, --patch_path STR |
Path to ChatGPT/PR patches | data/patches |
-s, --source_path STR |
Path to extracted conversations | data/extracted |
-r, --restore |
Restore default settings & directories | - |
Example Usage
# Run with custom n-gram size and verbose output
python PatchTrack.py -n 4 -v
# Use custom patch directory
python PatchTrack.py -p /path/to/patches
# Restore defaults
python PatchTrack.py -r
For detailed help:
GitHub Tokens Configuration
Why GitHub Tokens?
GitHub API has rate limits. Using authentication tokens increases your rate limit from 60 to 5,000 requests per hour, which is essential for processing many PRs.
Setup Instructions
- Create tokens at GitHub Settings → Tokens (classic)
- Select these scopes:
public_repo,read:user -
Save your tokens in a secure location
-
Configure PatchTrack with your tokens:
- Copy
tokens-example.txttotokens.txt - Add your tokens (comma-separated):
Security Note
- Never commit
tokens.txtto version control - Use multiple tokens (minimum 2 recommended) to avoid rate limit issues
- Rotate tokens regularly
- Keep tokens private and secure
Rate Limiting
With rotating tokens, PatchTrack can process: - ~500 PRs per token without hitting rate limits - Multiple tokens provide redundancy and higher throughput
Troubleshooting
libmagic Not Found
Error: ImportError: libmagic not found
Solution: Install libmagic using the OS-specific method above, or run:
ModuleNotFoundError: No module named 'X'
Error: Missing Python dependencies
Solutions: 1. Ensure virtual environment is activated:
-
Reinstall dependencies:
-
Run initialization again:
Permission Denied (macOS/Linux)
Error: PermissionError: [Errno 13]
Solution: Ensure you have read/write permissions to the directory and activate your virtual environment.
Python Version Mismatch
Error: SyntaxError or version-related issues
Solution: Verify Python version:
If needed, use python3.10 or python3.11 instead of python.
GitHub API Rate Limit
Error: HTTP 403: API rate limit exceeded
Solutions:
1. Add more tokens to tokens.txt
2. Increase token count and restart
3. Wait for rate limit to reset (typically 1 hour)
Jupyter Notebook Issues
Error: No module named 'jupyter'
Solution:
Then restart the notebook kernel.
Next Steps
- Explore Results: Check the
notebooks/run_experiment.ipynbfor data analysis - View Output: Results are saved in the
output/directory - Research Questions: See
RQ1_2_3_4/for detailed findings - Customize: Modify command-line arguments to analyze different datasets
Need Help?
- 📖 See the Reference Documentation
- 🐛 Report issues on GitHub
- 💬 Check the README for more details