Helpers

This page documents the analyzer.helpers utilities used across PatchTrack.

The module provides small, well-scoped helpers for common tasks used by the analysis pipeline: API request handling, file and path utilities, comment removal for many languages, light-weight timing instrumentation, and simple file I/O helpers.

Notes

The module uses a lazy _get_extension_map() helper to avoid import-time circular dependencies with analyzer.common.
The remove_comments alias exists for backward compatibility — call either remove_comment or remove_comments.

Examples

Remove comments from a Python source string:

from analyzer import helpers, common

code = """# comment\nprint('hello')\n"""
clean = helpers.remove_comment(code, common.FileExt.Python)
print(clean)

Get the file type for a filename and count lines in a file:

ft = helpers.get_file_type('app.py')
lines = helpers.count_loc('/path/to/file.py')

Use the timing decorator to measure function time:

from analyzer.helpers import timing

@timing
def expensive():
    # work
    pass

expensive()

API Reference

Helper utilities for PatchTrack analysis.

Provides functions for API requests, file handling, comment removal, and type detection across multiple programming languages.

`analyzer.helpers.unique(items)`

Get unique items from a list while preserving order.

Parameters:

Name	Type	Description	Default
`items`	`List`	Input list with potential duplicates.	required

Returns:

Type	Description
`List`	List with duplicate entries removed.

Source code in analyzer/helpers.py

def unique(items: List) -> List:
    """Get unique items from a list while preserving order.

    Args:
        items: Input list with potential duplicates.

    Returns:
        List with duplicate entries removed.
    """
    unique_list = pd.Series(items).drop_duplicates().to_list()
    return unique_list

`analyzer.helpers.api_request(url, token)`

Make an authenticated API request to GitHub.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL endpoint to request.	required
`token`	`str`	GitHub API token for authentication.	required

Returns:

Type	Description
`Any`	Parsed JSON response or response object on error.

Source code in analyzer/helpers.py

def api_request(url: str, token: str) -> Any:
    """Make an authenticated API request to GitHub.

    Args:
        url: The URL endpoint to request.
        token: GitHub API token for authentication.

    Returns:
        Parsed JSON response or response object on error.
    """
    header = {'Authorization': f'token {token}'}
    response = requests.get(url, headers=header)
    try:
        json_response = json.loads(response.content)
        return json_response
    except Exception:
        return response

`analyzer.helpers.get_response(url, token_list, ct)`

Retrieve JSON response from API endpoint using token rotation.

Parameters:

Name	Type	Description	Default
`url`	`str`	API endpoint URL.	required
`token_list`	`List[str]`	List of available GitHub API tokens.	required
`ct`	`int`	Current token index counter.	required

Returns:

Type	Description
`tuple`	Tuple of (json_data, updated_token_counter).

Source code in analyzer/helpers.py

def get_response(url: str, token_list: List[str], ct: int) -> tuple:
    """Retrieve JSON response from API endpoint using token rotation.

    Args:
        url: API endpoint URL.
        token_list: List of available GitHub API tokens.
        ct: Current token index counter.

    Returns:
        Tuple of (json_data, updated_token_counter).
    """
    json_data = None
    len_tokens = len(token_list)

    try:
        ct = ct % len_tokens
        headers = {'Authorization': f'Bearer {token_list[ct]}'}
        request = requests.get(url, headers=headers)
        json_data = json.loads(request.content)
        ct += 1
    except Exception as e:
        print(f"Error in get_response: {e}")

    return json_data, ct

`analyzer.helpers.file_name(name)`

Extract the file name from a file path.

Parameters:

Name	Type	Description	Default
`name`	`str`	File path or name string.	required

Returns:

Type	Description
`str`	Extracted file name.

Source code in analyzer/helpers.py

def file_name(name: str) -> str:
    """Extract the file name from a file path.

    Args:
        name: File path or name string.

    Returns:
        Extracted file name.
    """
    if name.startswith('.'):
        return name[1:]
    elif '/' in name:
        return name.split('/')[-1]
    else:
        return name

`analyzer.helpers.file_dir(name)`

Extract the directory path from a file path.

Parameters:

Name	Type	Description	Default
`name`	`str`	File path string.	required

Returns:

Type	Description
`str`	Directory path (empty string if no directory).

Source code in analyzer/helpers.py

def file_dir(name: str) -> str:
    """Extract the directory path from a file path.

    Args:
        name: File path string.

    Returns:
        Directory path (empty string if no directory).
    """
    if name.startswith('.'):
        return name[1]
    elif '/' in name:
        return '/'.join(name.split('/')[:-1])
    else:
        return ''

`analyzer.helpers.save_file(file, storage_dir, file_name)`

Save binary file to specified directory.

Parameters:

Name	Type	Description	Default
`file`	`bytes`	Binary file content.	required
`storage_dir`	`str`	Directory path for storage.	required
`file_name`	`str`	Name of file to save.	required

Source code in analyzer/helpers.py

def save_file(file: bytes, storage_dir: str, file_name: str) -> None:
    """Save binary file to specified directory.

    Args:
        file: Binary file content.
        storage_dir: Directory path for storage.
        file_name: Name of file to save.
    """
    if not os.path.exists(storage_dir):
        os.makedirs(storage_dir)

    file_path = os.path.join(storage_dir, file_name)
    mode = 'xb' if not os.path.exists(file_path) else 'wb'

    with open(file_path, mode) as f:
        f.write(file)

`analyzer.helpers.get_file_type(file_path)`

Detect file type based on extension.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path or name of the file.	required

Returns:

Type	Description
`int`	FileExt enum value indicating the file type.

Source code in analyzer/helpers.py

def get_file_type(file_path: str) -> int:
    """Detect file type based on extension.

    Args:
        file_path: Path or name of the file.

    Returns:
        FileExt enum value indicating the file type.
    """
    name = file_name(file_path)

    if name.lower() in _SPECIAL_FILES:
        return common.FileExt.REQ_TXT

    ext = file_path.split('.')[-1].lower()
    return _get_extension_map().get(ext, common.FileExt.Text)

`analyzer.helpers.remove_comment(source, file_ext)`

Remove comments from source code based on file type.

Parameters:

Name	Type	Description	Default
`source`	`str`	Source code text.	required
`file_ext`	`int`	FileExt enum value indicating language type.	required

Returns:

Type	Description
`str`	Source code with comments removed.

Source code in analyzer/helpers.py

def remove_comment(source: str, file_ext: int) -> str:
    """Remove comments from source code based on file type.

    Args:
        source: Source code text.
        file_ext: FileExt enum value indicating language type.

    Returns:
        Source code with comments removed.
    """
    # C-like languages (C, Java, Go, CSS)
    if file_ext in [common.FileExt.C, common.FileExt.Java, common.FileExt.goland, common.FileExt.CSS]:
        source = _extract_noncomments_with_newlines(source, common.C_REGEX)

    # Python and config files
    elif file_ext in [common.FileExt.Python, common.FileExt.conf]:
        source = _extract_noncomments(source, common.PY_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_1_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_2_REGEX)

    # Shell scripts
    elif file_ext == common.FileExt.ShellScript:
        source = _extract_noncomments(source, common.SHELLSCRIPT_REGEX)

    # Perl
    elif file_ext == common.FileExt.Perl:
        source = _extract_noncomments(source, common.PERL_REGEX)

    # SQL
    elif file_ext == common.FileExt.SQL:
        source = _extract_noncomments(source, common.SQL_REGEX)

    # Rust
    elif file_ext == common.FileExt.RUST:
        source = _extract_noncomments(source, common.RUST_REGEX)

    # TypeScript/TSX
    elif file_ext == common.FileExt.TSX:
        source = _extract_noncomments(source, common.TSX_REGEX)

    # Solidity
    elif file_ext == common.FileExt.SOLIDITY:
        source = _extract_noncomments(source, common.SOLIDITY_REGEX)

    # Visual Basic
    elif file_ext == common.FileExt.VB:
        source = _extract_noncomments(source, common.VB_REGEX)

    # PHP
    elif file_ext == common.FileExt.PHP:
        source = _extract_noncomments_with_newlines(source, common.PHP_REGEX)

    # Ruby
    elif file_ext in [common.FileExt.Ruby, common.FileExt.GEMFILE]:
        source = _extract_noncomments_with_newlines(source, common.RUBY_REGEX)

    # JavaScript-like languages
    elif file_ext in [common.FileExt.Scala, common.FileExt.JavaScript, common.FileExt.TypeScript,
                      common.FileExt.Kotlin, common.FileExt.gradle, common.FileExt.svelte]:
        source = _extract_noncomments(source, common.JS_REGEX)
        source = _extract_noncomments(source, common.JS_PARTIAL_COMMENT_REGEX)

    # YAML
    elif file_ext == common.FileExt.yaml:
        source = _extract_noncomments(source, common.YAML_REGEX)
        source = re.sub(common.YAML_DOUBLE_QUOTE_REGEX, "", source)
        source = re.sub(common.YAML_SINGLE_QUOTE_REGEX, "", source)

    # Jupyter Notebook
    elif file_ext == common.FileExt.ipynb:
        json_data = json.loads(source)
        python_code = ''

        for cell in json_data['cells']:
            for line in cell['source']:
                python_code += line if line.endswith('\n') else line + '\n'

        source = _extract_noncomments(python_code, common.PY_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_1_REGEX)
        source = _extract_noncomments(source, common.PY_MULTILINE_2_REGEX)

    # JSON
    elif file_ext == common.FileExt.JSON:
        source = common.WHITESPACE_REGEX.sub("", source)
        source = source.lower()

    # XML-like languages
    elif file_ext in [common.FileExt.Xml, common.FileExt.markdown, common.FileExt.html]:
        source = _extract_noncomments(source, common.XML_REGEX)

    return source

`analyzer.helpers.remove_comments(source, file_ext)`

Alias for remove_comment to maintain backward compatibility.

Source code in analyzer/helpers.py

def remove_comments(source: str, file_ext: int) -> str:
    """Alias for `remove_comment` to maintain backward compatibility."""
    return remove_comment(source, file_ext)

`analyzer.helpers.timing(func)`

Decorator to measure and print function execution time.

Parameters:

Name	Type	Description	Default
`func`		Function to decorate.	required

Returns:

Type	Description
	Decorated function with timing output.

Source code in analyzer/helpers.py

def timing(func):
    """Decorator to measure and print function execution time.

    Args:
        func: Function to decorate.

    Returns:
        Decorated function with timing output.
    """
    @wraps(func)
    def wrap(*args, **kwargs):
        start_time = time()
        result = func(*args, **kwargs)
        end_time = time()
        elapsed = end_time - start_time
        print(f'func: {func.__name__} args: [{args}, {kwargs}] took: {elapsed:.4f} sec')
        return result

    return wrap

`analyzer.helpers.count_loc(file_path)`

Count total lines of code in a file.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to the file.	required

Returns:

Type	Description
`int`	Total number of lines in the file.

Source code in analyzer/helpers.py

def count_loc(file_path: str) -> int:
    """Count total lines of code in a file.

    Args:
        file_path: Path to the file.

    Returns:
        Total number of lines in the file.
    """
    with open(file_path, 'r') as f:
        return sum(1 for _ in f)

Helpers

Notes

Examples

API Reference

analyzer.helpers.unique(items)

analyzer.helpers.api_request(url, token)

analyzer.helpers.get_response(url, token_list, ct)

analyzer.helpers.file_name(name)

analyzer.helpers.file_dir(name)

analyzer.helpers.save_file(file, storage_dir, file_name)

analyzer.helpers.get_file_type(file_path)

analyzer.helpers.remove_comment(source, file_ext)

analyzer.helpers.remove_comments(source, file_ext)

analyzer.helpers.timing(func)

analyzer.helpers.count_loc(file_path)

See Also

`analyzer.helpers.unique(items)`

`analyzer.helpers.api_request(url, token)`

`analyzer.helpers.get_response(url, token_list, ct)`

`analyzer.helpers.file_name(name)`

`analyzer.helpers.file_dir(name)`

`analyzer.helpers.save_file(file, storage_dir, file_name)`

`analyzer.helpers.get_file_type(file_path)`

`analyzer.helpers.remove_comment(source, file_ext)`

`analyzer.helpers.remove_comments(source, file_ext)`

`analyzer.helpers.timing(func)`

`analyzer.helpers.count_loc(file_path)`