TUTORIALS 12 min read

AI-Powered Testing: Generate Unit Tests, Find Bugs, and Automate QA with LLMs

Use AI to write unit tests, generate test cases from requirements, and find bugs before users do. Complete tutorial with Python examples for pytest, coverage analysis, and CI integration.

By EgoistAI ·
AI-Powered Testing: Generate Unit Tests, Find Bugs, and Automate QA with LLMs

Testing is the part of software development that everyone agrees is important and nobody wants to do. Writing tests is tedious. Maintaining tests is worse. And the coverage gap — the code paths you didn’t test because you didn’t think of them — is where production bugs hide.

AI can help with all three problems. It can generate unit tests from source code, create test cases from requirements documents, identify untested edge cases, and even find bugs by reasoning about code logic. It won’t replace a QA engineer, but it will make every developer’s testing faster and more thorough.

Here’s how to integrate AI into your testing workflow.


What AI Testing Can (and Can’t) Do

Can do:

  • Generate unit tests from function signatures and docstrings
  • Identify edge cases humans miss (null inputs, boundary values, race conditions)
  • Create test data fixtures
  • Convert requirements into test cases
  • Explain what existing tests do (and don’t) cover
  • Suggest improvements to existing test suites

Can’t do (reliably):

  • Replace integration testing (AI doesn’t know your infrastructure)
  • Guarantee 100% correctness (generated tests may have bugs too)
  • Understand business logic without context
  • Replace human judgment on test priorities

Setup

pip install pytest pytest-cov anthropic python-dotenv

Step 1: Generate Unit Tests from Code

# test_generator.py
"""Generate unit tests from source code using Claude."""

import os
import json
import anthropic
from dotenv import load_dotenv

load_dotenv()


class TestGenerator:
    def __init__(self):
        self.client = anthropic.Anthropic(
            api_key=os.getenv('ANTHROPIC_API_KEY')
        )
    
    def generate_tests(
        self, 
        source_code: str, 
        test_framework: str = "pytest",
        coverage_targets: list[str] = None
    ) -> str:
        """
        Generate unit tests for given source code.
        
        Args:
            source_code: The Python code to test
            test_framework: Testing framework (pytest, unittest)
            coverage_targets: Specific functions/methods to focus on
        
        Returns:
            Generated test code as a string
        """
        targets = ""
        if coverage_targets:
            targets = f"\nFocus on testing these functions: {', '.join(coverage_targets)}"
        
        prompt = f"""Generate comprehensive {test_framework} unit tests for this code:

```python
{source_code}

{targets}

Requirements:

  1. Test all public functions and methods
  2. Include tests for:
    • Normal/happy path cases
    • Edge cases (empty inputs, None, zero, negative numbers)
    • Boundary values
    • Error handling (expected exceptions)
    • Type validation
  3. Use descriptive test names that explain what’s being tested
  4. Include docstrings for each test explaining intent
  5. Use {test_framework} fixtures where appropriate
  6. Add parametrize decorators for similar test cases
  7. Mock external dependencies if present

Return ONLY the test code. No explanation. Include all necessary imports."""

    response = self.client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=(
            "You are a senior test engineer. Generate thorough, "
            "production-quality unit tests. Every test should have "
            "a clear purpose and test exactly one behavior."
        ),
        messages=[{"role": "user", "content": prompt}]
    )
    
    test_code = response.content[0].text
    
    # Extract code from markdown if present
    if "```python" in test_code:
        test_code = test_code.split("```python")[1].split("```")[0]
    elif "```" in test_code:
        test_code = test_code.split("```")[1].split("```")[0]
    
    return test_code.strip()

def find_missing_coverage(
    self,
    source_code: str,
    existing_tests: str
) -> str:
    """Identify untested code paths and generate additional tests."""
    
    prompt = f"""Analyze this source code and its existing tests.

Identify code paths that are NOT covered by the existing tests. Generate additional tests for the uncovered paths.

Source code:

{source_code}

Existing tests:

{existing_tests}

Return:

  1. A comment block listing the uncovered code paths
  2. Additional pytest tests covering those paths

Return ONLY Python code (comments + tests). No prose."""

    response = self.client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system="You are a test coverage analyst. Find and fill testing gaps.",
        messages=[{"role": "user", "content": prompt}]
    )
    
    result = response.content[0].text
    if "```python" in result:
        result = result.split("```python")[1].split("```")[0]
    
    return result.strip()

---

## Step 2: Example — Generating Tests

Given this source code:

```python
# calculator.py
"""A calculator module with various operations."""

from typing import Union

Number = Union[int, float]


class Calculator:
    def __init__(self, precision: int = 2):
        self.precision = precision
        self.history: list[str] = []
    
    def add(self, a: Number, b: Number) -> Number:
        """Add two numbers."""
        result = round(a + b, self.precision)
        self.history.append(f"{a} + {b} = {result}")
        return result
    
    def divide(self, a: Number, b: Number) -> float:
        """Divide a by b. Raises ZeroDivisionError if b is 0."""
        if b == 0:
            raise ZeroDivisionError("Cannot divide by zero")
        result = round(a / b, self.precision)
        self.history.append(f"{a} / {b} = {result}")
        return result
    
    def percentage(self, value: Number, total: Number) -> float:
        """Calculate what percentage value is of total."""
        if total == 0:
            raise ValueError("Total cannot be zero")
        result = round((value / total) * 100, self.precision)
        return result
    
    def get_history(self) -> list[str]:
        """Return operation history."""
        return self.history.copy()
    
    def clear_history(self) -> None:
        """Clear operation history."""
        self.history.clear()

The AI generates:

# test_calculator.py (AI-generated)
import pytest
from calculator import Calculator


@pytest.fixture
def calc():
    """Create a Calculator instance for testing."""
    return Calculator(precision=2)


class TestAdd:
    """Tests for Calculator.add method."""
    
    def test_add_positive_integers(self, calc):
        """Adding two positive integers returns their sum."""
        assert calc.add(2, 3) == 5
    
    def test_add_negative_numbers(self, calc):
        """Adding negative numbers works correctly."""
        assert calc.add(-2, -3) == -5
    
    def test_add_mixed_sign(self, calc):
        """Adding positive and negative numbers works."""
        assert calc.add(5, -3) == 2
    
    def test_add_floats(self, calc):
        """Adding floats respects precision setting."""
        assert calc.add(1.111, 2.222) == 3.33
    
    def test_add_zero(self, calc):
        """Adding zero returns the other number."""
        assert calc.add(5, 0) == 5
        assert calc.add(0, 5) == 5
    
    @pytest.mark.parametrize("a,b,expected", [
        (0, 0, 0),
        (1e10, 1e10, 2e10),
        (-1e10, 1e10, 0),
        (0.1, 0.2, 0.3),
    ])
    def test_add_parametrized(self, calc, a, b, expected):
        """Test add with various input combinations."""
        assert calc.add(a, b) == expected
    
    def test_add_records_history(self, calc):
        """Adding records the operation in history."""
        calc.add(2, 3)
        assert len(calc.get_history()) == 1
        assert "2 + 3 = 5" in calc.get_history()[0]


class TestDivide:
    """Tests for Calculator.divide method."""
    
    def test_divide_normal(self, calc):
        """Normal division returns correct result."""
        assert calc.divide(10, 2) == 5.0
    
    def test_divide_by_zero_raises(self, calc):
        """Dividing by zero raises ZeroDivisionError."""
        with pytest.raises(ZeroDivisionError, match="Cannot divide by zero"):
            calc.divide(10, 0)
    
    def test_divide_zero_numerator(self, calc):
        """Dividing zero by any non-zero number returns 0."""
        assert calc.divide(0, 5) == 0.0
    
    def test_divide_float_result(self, calc):
        """Division with non-integer result respects precision."""
        assert calc.divide(1, 3) == 0.33
    
    def test_divide_negative(self, calc):
        """Division with negative numbers has correct sign."""
        assert calc.divide(-10, 2) == -5.0
        assert calc.divide(10, -2) == -5.0
        assert calc.divide(-10, -2) == 5.0


class TestPercentage:
    """Tests for Calculator.percentage method."""
    
    def test_percentage_normal(self, calc):
        """Normal percentage calculation."""
        assert calc.percentage(25, 100) == 25.0
    
    def test_percentage_zero_total_raises(self, calc):
        """Zero total raises ValueError."""
        with pytest.raises(ValueError, match="Total cannot be zero"):
            calc.percentage(10, 0)
    
    def test_percentage_over_100(self, calc):
        """Value greater than total gives percentage over 100."""
        assert calc.percentage(150, 100) == 150.0
    
    def test_percentage_zero_value(self, calc):
        """Zero value gives 0 percentage."""
        assert calc.percentage(0, 100) == 0.0


class TestHistory:
    """Tests for history tracking."""
    
    def test_empty_history(self, calc):
        """New calculator has empty history."""
        assert calc.get_history() == []
    
    def test_history_records_operations(self, calc):
        """Multiple operations are recorded in order."""
        calc.add(1, 2)
        calc.divide(10, 5)
        history = calc.get_history()
        assert len(history) == 2
    
    def test_clear_history(self, calc):
        """Clear history empties the list."""
        calc.add(1, 2)
        calc.clear_history()
        assert calc.get_history() == []
    
    def test_history_returns_copy(self, calc):
        """get_history returns a copy, not a reference."""
        calc.add(1, 2)
        history = calc.get_history()
        history.clear()
        assert len(calc.get_history()) == 1


class TestPrecision:
    """Tests for precision configuration."""
    
    def test_custom_precision(self):
        """Custom precision affects rounding."""
        calc = Calculator(precision=4)
        assert calc.add(1.11111, 2.22222) == 3.3333
    
    def test_zero_precision(self):
        """Zero precision rounds to integers."""
        calc = Calculator(precision=0)
        assert calc.add(1.5, 2.5) == 4

Step 3: Run and Measure Coverage

# Run tests with coverage
pytest test_calculator.py -v --cov=calculator --cov-report=term-missing

# Output:
# Name            Stmts   Miss  Cover   Missing
# -----------------------------------------------
# calculator.py      25      0   100%
# -----------------------------------------------
# TOTAL              25      0   100%

Step 4: Integrate into CI/CD

# .github/workflows/ai-tests.yml
name: AI-Assisted Testing

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      
      - name: Install dependencies
        run: pip install pytest pytest-cov

      - name: Run tests
        run: pytest --cov --cov-report=xml -v
      
      - name: Check coverage threshold
        run: |
          coverage report --fail-under=80

Step 5: Bug Finding with AI

def find_bugs(self, source_code: str) -> list[dict]:
    """Analyze code for potential bugs."""
    
    prompt = f"""Analyze this Python code for potential bugs, 
edge cases, and logic errors.

```python
{source_code}

For each issue found, return JSON: [ {{ “severity”: “critical|high|medium|low”, “line”: “approximate line number or function name”, “issue”: “description of the problem”, “example”: “input that triggers the bug”, “fix”: “suggested fix” }} ]

Look for:

  • Off-by-one errors
  • Null/None handling gaps
  • Type coercion issues
  • Race conditions
  • Resource leaks
  • Integer overflow
  • Incorrect error handling
  • Missing input validation

Return ONLY the JSON array."""

response = self.client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system="You are a senior code reviewer specializing in bug detection.",
    messages=[{"role": "user", "content": prompt}]
)

result_text = response.content[0].text
if "```" in result_text:
    result_text = result_text.split("```")[1]
    if result_text.startswith("json"):
        result_text = result_text[4:]
    result_text = result_text.split("```")[0]

try:
    return json.loads(result_text.strip())
except json.JSONDecodeError:
    return [{"error": "Failed to parse bug analysis"}]

---

## Best Practices

1. **Always review generated tests.** AI-generated tests can contain errors — wrong assertions, incorrect test logic, or tests that pass for the wrong reasons.

2. **Use AI for the first draft, humans for the final version.** Let AI generate 80% of your tests, then manually add the business-logic-specific cases that require domain knowledge.

3. **Run coverage analysis after AI test generation.** Use the coverage report to identify gaps, then ask the AI to fill them specifically.

4. **Don't trust AI for security testing.** AI can find basic input validation issues but misses complex security vulnerabilities. Use dedicated security tools (Bandit, safety, SAST tools) for that.

5. **Version control your test generation prompts.** As you refine your prompts to produce better tests, track those changes just like code.

The goal isn't AI-generated tests that replace human testing. It's AI-generated tests that give you a solid baseline, so human testing time is spent on the complex, creative test scenarios that actually catch the bugs that matter.

Write the boring tests with AI. Save your brain for the interesting ones.

Share this article

> Want more like this?

Get the best AI insights delivered weekly.

> Related Articles

Tags

AI testingunit testspytesttest automationQAtutorial

> Stay in the loop

Weekly AI tools & insights.