Parallel SCOAP Computation for Reconvergent Fanout Regions

Overview

The enhanced SCOAP module now supports parallel computation of testability metrics for reconvergent fanout regions while maintaining sequential computation for the rest of the circuit.

Architecture

Key Concepts

Reconvergent Fanout: When a signal fans out to multiple paths that later reconverge at a common gate
Reconvergent Cone: All gates reachable from a fanout point until reconvergence
Parallel Regions: Independent reconvergent cones that can be computed simultaneously
Sequential Regions: Non-reconvergent portions of the circuit

Implementation Strategy

┌─────────────────────────────────────────────────────────────┐
│                    PARALLEL SCOAP FLOW                       │
└─────────────────────────────────────────────────────────────┘

1. Parse Circuit & Identify Gates
   ↓
2. Initialize Primary Inputs (CC0=1, CC1=1)
   ↓
3. Classify Gates:
   ├─ Sequential Gates (non-reconvergent)
   └─ Reconvergent Cones (parallel regions)
   ↓
4. Compute Sequential Gates First
   │  (Provides prerequisites for cones)
   ↓
5. ┌─────────────────────────────────────┐
   │  PARALLEL COMPUTATION PHASE          │
   │                                      │
   │  Thread 1: Compute Cone A            │
   │  Thread 2: Compute Cone B            │
   │  Thread 3: Compute Cone C            │
   │  ...                                 │
   │                                      │
   │  [Synchronization Barrier]           │
   └─────────────────────────────────────┘
   ↓
6. Merge Results from All Cones
   ↓
7. Final Convergence Pass
   (Resolve cross-cone dependencies)
   ↓
8. Compute Observability (Sequential)
   ↓
9. Output Results

Example: Reconvergent DAG

Circuit Topology

Primary Inputs: a, b, c, d
Primary Outputs: z

         a ────┐
               ├─→ AND(n1) ────┐
         b ────┤                │
               │                ├─→ XOR(n5) ───→ BUF(z)
               └─→ AND(n2) ────┤
                  ↑             │
         c ───────┘             │
                                │
         d ─────────────────────┘

Reconvergent Fanout:
- Signal 'b' fans out to gates n1 and n2
- These paths reconverge at n5
- Therefore, 'b' is a reconvergent root

ASCII Diagram with Reconvergence Highlighted

                    [RECONVERGENT REGION rooted at 'b']
                    ╔═══════════════════════════════════╗
    a ──────────┐   ║                                   ║
                AND ║n1 ────────┐                       ║
    b ──────────┤   ║           │                       ║
       ║        │   ║           XOR n5 ─── BUF ─── z   ║
       ║        └───║─→ AND ────┘                       ║
       ║            ║    n2  ↑                          ║
       ║            ║        │                          ║
       ║            ╚════════│══════════════════════════╝
       ║                     │
       ╚═══════════════════> c
       
    d ──────────────────────┘

Legend:
  ║═══║  Reconvergent fanout path from 'b'
  ────   Normal signal path
  
Sequential Gates: none (all are in reconvergent cone)
Parallel Gates: n1, n2, n5, z (all computed in parallel)

SCOAP Computation Flow

Without Parallel Mode (Sequential)

Iteration 1:
  CC0_a = 1, CC1_a = 1  (primary input)
  CC0_b = 1, CC1_b = 1  (primary input)
  CC0_c = 1, CC1_c = 1  (primary input)
  CC0_d = 1, CC1_d = 1  (primary input)
  
  Compute n1: CC0_n1 = 1 + (1+1) = 3, CC1_n1 = 1 + min(1,1) = 2
  Compute n2: CC0_n2 = 1 + (1+1) = 3, CC1_n2 = 1 + min(1,1) = 2
  
Iteration 2:
  Compute n5: CC0_n5 = 1 + min(3+2, 2+3) = 6, CC1_n5 = 1 + min(3+3, 2+2) = 5
  
Iteration 3:
  Compute z: CC0_z = 1 + 6 = 7, CC1_z = 1 + 5 = 6
  
Total: 3 iterations, all gates in single thread

With Parallel Mode (Parallel)

Phase 1: Sequential Prerequisites
  ✓ Inputs initialized: a=1, b=1, c=1, d=1
  ✓ No sequential gates outside cone
  
Phase 2: Parallel Cone Computation
  [Thread 1] Computing cone rooted at 'b':
    └─ Gates: {n1, n2, n5, z}
    └─ Iteration 1: n1, n2 computed
    └─ Iteration 2: n5 computed  
    └─ Iteration 3: z computed
    └─ Result: CC0/CC1 for n1, n2, n5, z
    
  [Main Thread] Waiting for all cones...
  [Main Thread] ✓ Cone 'b' completed
  
Phase 3: Merge & Final Pass
  ✓ All results merged into global CC0/CC1
  ✓ Final convergence check (no changes)
  
Total: Parallel execution, 1 worker thread

API Usage

Basic Usage

from opentestability.core.scoap import run

# Sequential mode (default)
run('circuit.txt', 'output.txt')

# Parallel mode with reconvergent roots
run('circuit.txt', 'output.txt',
    reconvergent_roots=['b', 'n10', 'n23'],
    parallel=True,
    max_workers=4)

Finding Reconvergent Roots

from opentestability.core.dag_builder import create_dag_from_netlist
from opentestability.core.simple_reconvergence import SimpleReconvergenceDetector
import json

# Step 1: Create DAG
dag_path = create_dag_from_netlist('circuit.json')

# Step 2: Detect reconvergence
with open(dag_path) as f:
    dag_data = json.load(f)

detector = SimpleReconvergenceDetector(dag_data)
results = detector.run_complete_algorithm()

# Step 3: Extract fanout points
reconvergent_roots = list(detector.fanout_points)

# Step 4: Run parallel SCOAP
from opentestability.core.scoap import run
run('circuit.txt', 'output.txt',
    reconvergent_roots=reconvergent_roots,
    parallel=True,
    max_workers=len(reconvergent_roots))

Complete Workflow

from opentestability.parsers.verilog_parser import parse
from opentestability.parsers.json_converter import txt_to_json
from opentestability.core.dag_builder import create_dag_from_netlist
from opentestability.core.simple_reconvergence import SimpleReconvergenceDetector
from opentestability.core.scoap import run
import json

# 1. Parse Verilog
parse('circuit.v', 'circuit.txt')
txt_to_json('circuit.txt')

# 2. Create DAG
dag_path = create_dag_from_netlist('circuit.json')

# 3. Detect reconvergence
with open(dag_path) as f:
    dag_data = json.load(f)
detector = SimpleReconvergenceDetector(dag_data)
fanout_points = list(detector.fanout_points)

# 4. Run parallel SCOAP
run('circuit.txt', 'scoap_parallel.txt',
    reconvergent_roots=fanout_points,
    parallel=True,
    max_workers=4,
    json_flag=True)

print("Parallel SCOAP computation complete!")

Implementation Details

Thread Safety

Shared Dictionaries: CC0 and CC1 are shared across threads (read-only during parallel phase)
Local Copies: Each cone creates local copies for computation
Synchronization: as_completed() barrier ensures all cones finish before merging
Result Merging: Thread-safe update of global dictionaries after completion

Dependency Management

def _ensure_cone_prerequisites_ready(cone_gates, cones, ...):
    """
    Ensures all inputs to reconvergent cones have their 
    controllability computed BEFORE parallel execution starts.
    
    Critical for correctness - prevents race conditions.
    """

Cone Identification

def identify_reconvergent_cones(gates, reconvergent_roots):
    """
    Forward traversal from each root to identify all gates
    in the cone (all reachable nodes).
    """

Performance Characteristics

When to Use Parallel Mode

Good Use Cases:

Large circuits (>1000 gates)
Multiple independent reconvergent regions (>3 cones)
Complex reconvergent structures
Multi-core hardware available

Poor Use Cases:

Small circuits (<500 gates)
Single reconvergent cone
Simple topologies
Limited CPU cores

Overhead Analysis

Parallel Overhead:
- Thread pool creation: ~1-5ms
- Cone identification: O(V + E) per cone
- Synchronization: ~1ms per cone
- Final convergence: O(iterations × gates)

Speedup = T_sequential / (T_prereq + T_parallel + T_merge)

Theoretical maximum: N cores → N× speedup
Practical: 1.5-3× for typical circuits with 4 cores

Testing

Run the example:

cd OpenTestability
python examples/parallel_scoap_example.py

Expected output:

[⚡] Parallel mode: 1 reconvergent roots, 2 workers
[→] Computing 0 sequential gates...
[→] Checking cone prerequisites...
[⚡] Launching parallel computation for 1 cones...
[✓] Cone 'b' completed (3 iterations)
[→] Final convergence pass...
[✓] SCOAP results written to: data/results/example_parallel.txt

Limitations

Observability: Still computed sequentially (depends on all controllability values)
Cross-Cone Dependencies: Rare cases may require additional convergence iterations
Thread Count: Optimal workers = number of independent cones (not more)
Memory: Each cone creates local copies of CC0/CC1 (increased memory usage)

Future Enhancements

Parallel observability computation
Dynamic work stealing for unbalanced cones
Profiling and auto-tuning of worker count
GPU acceleration for very large cones
Distributed computation across machines

References

SCOAP Algorithm: Goldstein (1979), “SCOAP: Sandia Controllability/Observability Analysis Program”
Reconvergent Fanout: Xu & Edirisuriya (2004)
Python Threading: concurrent.futures documentation