Skip to content

Helpers

This page documents the analyzer.helpers utilities used across PatchTrack.

The module provides small, well-scoped helpers for common tasks used by the analysis pipeline: API request handling, file and path utilities, comment removal for many languages, light-weight timing instrumentation, and simple file I/O helpers.

Notes

  • The module uses a lazy _get_extension_map() helper to avoid import-time circular dependencies with analyzer.common.
  • The remove_comments alias exists for backward compatibility — call either remove_comment or remove_comments.

Examples

Remove comments from a Python source string:

from analyzer import helpers, common

code = """# comment\nprint('hello')\n"""
clean = helpers.remove_comment(code, common.FileExt.Python)
print(clean)

Get the file type for a filename and count lines in a file:

ft = helpers.get_file_type('app.py')
lines = helpers.count_loc('/path/to/file.py')

Use the timing decorator to measure function time:

from analyzer.helpers import timing

@timing
def expensive():
    # work
    pass

expensive()

API Reference

Helper utilities for PatchTrack analysis.

Provides functions for API requests, file handling, comment removal, and type detection across multiple programming languages.

analyzer.helpers.unique(items)

Get unique items from a list while preserving order.

Parameters:

Name Type Description Default
items List

Input list with potential duplicates.

required

Returns:

Type Description
List

List with duplicate entries removed.

Source code in analyzer/helpers.py
def unique(items: List) -> List:
    """Get unique items from a list while preserving order.

    Args:
        items: Input list with potential duplicates.

    Returns:
        List with duplicate entries removed.
    """
    unique_list = pd.Series(items).drop_duplicates().to_list()
    return unique_list

analyzer.helpers.api_request(url, token)

Make an authenticated API request to GitHub.

Parameters:

Name Type Description Default
url str

The URL endpoint to request.

required
token str

GitHub API token for authentication.

required

Returns:

Type Description
Any

Parsed JSON response or response object on error.

Source code in analyzer/helpers.py
def api_request(url: str, token: str) -> Any:
    """Make an authenticated API request to GitHub.

    Args:
        url: The URL endpoint to request.
        token: GitHub API token for authentication.

    Returns:
        Parsed JSON response or response object on error.
    """
    header = {'Authorization': f'token {token}'}
    response = requests.get(url, headers=header)
    try:
        json_response = json.loads(response.content)
        return json_response
    except Exception:
        return response

analyzer.helpers.get_response(url, token_list, ct)

Retrieve JSON response from API endpoint using token rotation.

Parameters:

Name Type Description Default
url str

API endpoint URL.

required
token_list List[str]

List of available GitHub API tokens.

required
ct int

Current token index counter.

required

Returns:

Type Description
tuple

Tuple of (json_data, updated_token_counter).

Source code in analyzer/helpers.py
def get_response(url: str, token_list: List[str], ct: int) -> tuple:
    """Retrieve JSON response from API endpoint using token rotation.

    Args:
        url: API endpoint URL.
        token_list: List of available GitHub API tokens.
        ct: Current token index counter.

    Returns:
        Tuple of (json_data, updated_token_counter).
    """
    json_data = None
    len_tokens = len(token_list)

    try:
        ct = ct % len_tokens
        headers = {'Authorization': f'Bearer {token_list[ct]}'}
        request = requests.get(url, headers=headers)
        json_data = json.loads(request.content)
        ct += 1
    except Exception as e:
        print(f"Error in get_response: {e}")

    return json_data, ct

analyzer.helpers.file_name(name)

Extract the file name from a file path.

Parameters:

Name Type Description Default
name str

File path or name string.

required

Returns:

Type Description
str

Extracted file name.

Source code in analyzer/helpers.py
def file_name(name: str) -> str:
    """Extract the file name from a file path.

    Args:
        name: File path or name string.

    Returns:
        Extracted file name.
    """
    if name.startswith('.'):
        return name[1:]
    elif '/' in name:
        return name.split('/')[-1]
    else:
        return name

analyzer.helpers.file_dir(name)

Extract the directory path from a file path.

Parameters:

Name Type Description Default
name str

File path string.

required

Returns:

Type Description
str

Directory path (empty string if no directory).

Source code in analyzer/helpers.py
def file_dir(name: str) -> str:
    """Extract the directory path from a file path.

    Args:
        name: File path string.

    Returns:
        Directory path (empty string if no directory).
    """
    if name.startswith('.'):
        return name[1]
    elif '/' in name:
        return '/'.join(name.split('/')[:-1])
    else:
        return ''

analyzer.helpers.save_file(file, storage_dir, file_name)

Save binary file to specified directory.

Parameters:

Name Type Description Default
file bytes

Binary file content.

required
storage_dir str

Directory path for storage.

required
file_name str

Name of file to save.

required
Source code in analyzer/helpers.py
def save_file(file: bytes, storage_dir: str, file_name: str) -> None:
    """Save binary file to specified directory.

    Args:
        file: Binary file content.
        storage_dir: Directory path for storage.
        file_name: Name of file to save.
    """
    if not os.path.exists(storage_dir):
        os.makedirs(storage_dir)

    file_path = os.path.join(storage_dir, file_name)
    mode = 'xb' if not os.path.exists(file_path) else 'wb'

    with open(file_path, mode) as f:
        f.write(file)

analyzer.helpers.get_file_type(file_path)

Detect file type based on extension.

Parameters:

Name Type Description Default
file_path str

Path or name of the file.

required

Returns:

Type Description
int

FileExt enum value indicating the file type.

Source code in analyzer/helpers.py
def get_file_type(file_path: str) -> int:
    """Detect file type based on extension.

    Args:
        file_path: Path or name of the file.

    Returns:
        FileExt enum value indicating the file type.
    """
    name = file_name(file_path)

    if name.lower() in _SPECIAL_FILES:
        return common.FileExt.REQ_TXT

    ext = file_path.split('.')[-1].lower()
    return _get_extension_map().get(ext, common.FileExt.Text)

analyzer.helpers.remove_comment(source, file_ext)

Remove comments from source code based on file type.

Parameters:

Name Type Description Default
source str

Source code text.

required
file_ext int

FileExt enum value indicating language type.

required

Returns:

Type Description
str

Source code with comments removed.

Source code in analyzer/helpers.py
def remove_comment(source: str, file_ext: int) -> str:
    """Remove comments from source code based on file type.

    Args:
        source: Source code text.
        file_ext: FileExt enum value indicating language type.

    Returns:
        Source code with comments removed.
    """
    # C-like languages (C, Java, Go, CSS)
    if file_ext in [common.FileExt.C, common.FileExt.Java, common.FileExt.goland, common.FileExt.CSS]:
        source = _extract_noncomments_with_newlines(source, common.C_REGEX)

    # Python and config files
    elif file_ext in [common.FileExt.Python, common.FileExt.conf]:
        source = _extract_noncomments(source, common.PY_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_1_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_2_REGEX)

    # Shell scripts
    elif file_ext == common.FileExt.ShellScript:
        source = _extract_noncomments(source, common.SHELLSCRIPT_REGEX)

    # Perl
    elif file_ext == common.FileExt.Perl:
        source = _extract_noncomments(source, common.PERL_REGEX)

    # SQL
    elif file_ext == common.FileExt.SQL:
        source = _extract_noncomments(source, common.SQL_REGEX)

    # Rust
    elif file_ext == common.FileExt.RUST:
        source = _extract_noncomments(source, common.RUST_REGEX)

    # TypeScript/TSX
    elif file_ext == common.FileExt.TSX:
        source = _extract_noncomments(source, common.TSX_REGEX)

    # Solidity
    elif file_ext == common.FileExt.SOLIDITY:
        source = _extract_noncomments(source, common.SOLIDITY_REGEX)

    # Visual Basic
    elif file_ext == common.FileExt.VB:
        source = _extract_noncomments(source, common.VB_REGEX)

    # PHP
    elif file_ext == common.FileExt.PHP:
        source = _extract_noncomments_with_newlines(source, common.PHP_REGEX)

    # Ruby
    elif file_ext in [common.FileExt.Ruby, common.FileExt.GEMFILE]:
        source = _extract_noncomments_with_newlines(source, common.RUBY_REGEX)

    # JavaScript-like languages
    elif file_ext in [common.FileExt.Scala, common.FileExt.JavaScript, common.FileExt.TypeScript,
                      common.FileExt.Kotlin, common.FileExt.gradle, common.FileExt.svelte]:
        source = _extract_noncomments(source, common.JS_REGEX)
        source = _extract_noncomments(source, common.JS_PARTIAL_COMMENT_REGEX)

    # YAML
    elif file_ext == common.FileExt.yaml:
        source = _extract_noncomments(source, common.YAML_REGEX)
        source = re.sub(common.YAML_DOUBLE_QUOTE_REGEX, "", source)
        source = re.sub(common.YAML_SINGLE_QUOTE_REGEX, "", source)

    # Jupyter Notebook
    elif file_ext == common.FileExt.ipynb:
        json_data = json.loads(source)
        python_code = ''

        for cell in json_data['cells']:
            for line in cell['source']:
                python_code += line if line.endswith('\n') else line + '\n'

        source = _extract_noncomments(python_code, common.PY_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_1_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_2_REGEX)

    # JSON
    elif file_ext == common.FileExt.JSON:
        source = common.WHITESPACE_REGEX.sub("", source)
        source = source.lower()

    # XML-like languages
    elif file_ext in [common.FileExt.Xml, common.FileExt.markdown, common.FileExt.html]:
        source = _extract_noncomments(source, common.XML_REGEX)

    return source

analyzer.helpers.remove_comments(source, file_ext)

Alias for remove_comment to maintain backward compatibility.

Source code in analyzer/helpers.py
def remove_comments(source: str, file_ext: int) -> str:
    """Alias for `remove_comment` to maintain backward compatibility."""
    return remove_comment(source, file_ext)

analyzer.helpers.timing(func)

Decorator to measure and print function execution time.

Parameters:

Name Type Description Default
func

Function to decorate.

required

Returns:

Type Description

Decorated function with timing output.

Source code in analyzer/helpers.py
def timing(func):
    """Decorator to measure and print function execution time.

    Args:
        func: Function to decorate.

    Returns:
        Decorated function with timing output.
    """
    @wraps(func)
    def wrap(*args, **kwargs):
        start_time = time()
        result = func(*args, **kwargs)
        end_time = time()
        elapsed = end_time - start_time
        print(f'func: {func.__name__} args: [{args}, {kwargs}] took: {elapsed:.4f} sec')
        return result

    return wrap

analyzer.helpers.count_loc(file_path)

Count total lines of code in a file.

Parameters:

Name Type Description Default
file_path str

Path to the file.

required

Returns:

Type Description
int

Total number of lines in the file.

Source code in analyzer/helpers.py
def count_loc(file_path: str) -> int:
    """Count total lines of code in a file.

    Args:
        file_path: Path to the file.

    Returns:
        Total number of lines in the file.
    """
    with open(file_path, 'r') as f:
        return sum(1 for _ in f)

See Also