Overview
The enhanced SCOAP module now supports parallel computation of testability metrics for reconvergent fanout regions while maintaining sequential computation for the rest of the circuit.
Architecture
Key Concepts
- Reconvergent Fanout: When a signal fans out to multiple paths that later reconverge at a common gate
- Reconvergent Cone: All gates reachable from a fanout point until reconvergence
- Parallel Regions: Independent reconvergent cones that can be computed simultaneously
- Sequential Regions: Non-reconvergent portions of the circuit
Implementation Strategy
┌─────────────────────────────────────────────────────────────┐
│ PARALLEL SCOAP FLOW │
└─────────────────────────────────────────────────────────────┘
1. Parse Circuit & Identify Gates
↓
2. Initialize Primary Inputs (CC0=1, CC1=1)
↓
3. Classify Gates:
├─ Sequential Gates (non-reconvergent)
└─ Reconvergent Cones (parallel regions)
↓
4. Compute Sequential Gates First
│ (Provides prerequisites for cones)
↓
5. ┌─────────────────────────────────────┐
│ PARALLEL COMPUTATION PHASE │
│ │
│ Thread 1: Compute Cone A │
│ Thread 2: Compute Cone B │
│ Thread 3: Compute Cone C │
│ ... │
│ │
│ [Synchronization Barrier] │
└─────────────────────────────────────┘
↓
6. Merge Results from All Cones
↓
7. Final Convergence Pass
(Resolve cross-cone dependencies)
↓
8. Compute Observability (Sequential)
↓
9. Output Results
Example: Reconvergent DAG
Circuit Topology
Primary Inputs: a, b, c, d
Primary Outputs: z
a ────┐
├─→ AND(n1) ────┐
b ────┤ │
│ ├─→ XOR(n5) ───→ BUF(z)
└─→ AND(n2) ────┤
↑ │
c ───────┘ │
│
d ─────────────────────┘
Reconvergent Fanout:
- Signal 'b' fans out to gates n1 and n2
- These paths reconverge at n5
- Therefore, 'b' is a reconvergent root
ASCII Diagram with Reconvergence Highlighted
[RECONVERGENT REGION rooted at 'b']
╔═══════════════════════════════════╗
a ──────────┐ ║ ║
AND ║n1 ────────┐ ║
b ──────────┤ ║ │ ║
║ │ ║ XOR n5 ─── BUF ─── z ║
║ └───║─→ AND ────┘ ║
║ ║ n2 ↑ ║
║ ║ │ ║
║ ╚════════│══════════════════════════╝
║ │
╚═══════════════════> c
d ──────────────────────┘
Legend:
║═══║ Reconvergent fanout path from 'b'
──── Normal signal path
Sequential Gates: none (all are in reconvergent cone)
Parallel Gates: n1, n2, n5, z (all computed in parallel)
SCOAP Computation Flow
Without Parallel Mode (Sequential)
Iteration 1:
CC0_a = 1, CC1_a = 1 (primary input)
CC0_b = 1, CC1_b = 1 (primary input)
CC0_c = 1, CC1_c = 1 (primary input)
CC0_d = 1, CC1_d = 1 (primary input)
Compute n1: CC0_n1 = 1 + (1+1) = 3, CC1_n1 = 1 + min(1,1) = 2
Compute n2: CC0_n2 = 1 + (1+1) = 3, CC1_n2 = 1 + min(1,1) = 2
Iteration 2:
Compute n5: CC0_n5 = 1 + min(3+2, 2+3) = 6, CC1_n5 = 1 + min(3+3, 2+2) = 5
Iteration 3:
Compute z: CC0_z = 1 + 6 = 7, CC1_z = 1 + 5 = 6
Total: 3 iterations, all gates in single thread
With Parallel Mode (Parallel)
Phase 1: Sequential Prerequisites
✓ Inputs initialized: a=1, b=1, c=1, d=1
✓ No sequential gates outside cone
Phase 2: Parallel Cone Computation
[Thread 1] Computing cone rooted at 'b':
└─ Gates: {n1, n2, n5, z}
└─ Iteration 1: n1, n2 computed
└─ Iteration 2: n5 computed
└─ Iteration 3: z computed
└─ Result: CC0/CC1 for n1, n2, n5, z
[Main Thread] Waiting for all cones...
[Main Thread] ✓ Cone 'b' completed
Phase 3: Merge & Final Pass
✓ All results merged into global CC0/CC1
✓ Final convergence check (no changes)
Total: Parallel execution, 1 worker thread
API Usage
Basic Usage
from opentestability.core.scoap import run
# Sequential mode (default)
run('circuit.txt', 'output.txt')
# Parallel mode with reconvergent roots
run('circuit.txt', 'output.txt',
reconvergent_roots=['b', 'n10', 'n23'],
parallel=True,
max_workers=4)
Finding Reconvergent Roots
from opentestability.core.dag_builder import create_dag_from_netlist
from opentestability.core.simple_reconvergence import SimpleReconvergenceDetector
import json
# Step 1: Create DAG
dag_path = create_dag_from_netlist('circuit.json')
# Step 2: Detect reconvergence
with open(dag_path) as f:
dag_data = json.load(f)
detector = SimpleReconvergenceDetector(dag_data)
results = detector.run_complete_algorithm()
# Step 3: Extract fanout points
reconvergent_roots = list(detector.fanout_points)
# Step 4: Run parallel SCOAP
from opentestability.core.scoap import run
run('circuit.txt', 'output.txt',
reconvergent_roots=reconvergent_roots,
parallel=True,
max_workers=len(reconvergent_roots))
Complete Workflow
from opentestability.parsers.verilog_parser import parse
from opentestability.parsers.json_converter import txt_to_json
from opentestability.core.dag_builder import create_dag_from_netlist
from opentestability.core.simple_reconvergence import SimpleReconvergenceDetector
from opentestability.core.scoap import run
import json
# 1. Parse Verilog
parse('circuit.v', 'circuit.txt')
txt_to_json('circuit.txt')
# 2. Create DAG
dag_path = create_dag_from_netlist('circuit.json')
# 3. Detect reconvergence
with open(dag_path) as f:
dag_data = json.load(f)
detector = SimpleReconvergenceDetector(dag_data)
fanout_points = list(detector.fanout_points)
# 4. Run parallel SCOAP
run('circuit.txt', 'scoap_parallel.txt',
reconvergent_roots=fanout_points,
parallel=True,
max_workers=4,
json_flag=True)
print("Parallel SCOAP computation complete!")
Implementation Details
Thread Safety
- Shared Dictionaries: CC0 and CC1 are shared across threads (read-only during parallel phase)
- Local Copies: Each cone creates local copies for computation
- Synchronization:
as_completed()barrier ensures all cones finish before merging - Result Merging: Thread-safe update of global dictionaries after completion
Dependency Management
def _ensure_cone_prerequisites_ready(cone_gates, cones, ...):
"""
Ensures all inputs to reconvergent cones have their
controllability computed BEFORE parallel execution starts.
Critical for correctness - prevents race conditions.
"""
Cone Identification
def identify_reconvergent_cones(gates, reconvergent_roots):
"""
Forward traversal from each root to identify all gates
in the cone (all reachable nodes).
"""
Performance Characteristics
When to Use Parallel Mode
Good Use Cases:
- Large circuits (>1000 gates)
- Multiple independent reconvergent regions (>3 cones)
- Complex reconvergent structures
- Multi-core hardware available
Poor Use Cases:
- Small circuits (<500 gates)
- Single reconvergent cone
- Simple topologies
- Limited CPU cores
Overhead Analysis
Parallel Overhead:
- Thread pool creation: ~1-5ms
- Cone identification: O(V + E) per cone
- Synchronization: ~1ms per cone
- Final convergence: O(iterations × gates)
Speedup = T_sequential / (T_prereq + T_parallel + T_merge)
Theoretical maximum: N cores → N× speedup
Practical: 1.5-3× for typical circuits with 4 cores
Testing
Run the example:
cd OpenTestability
python examples/parallel_scoap_example.py
Expected output:
[⚡] Parallel mode: 1 reconvergent roots, 2 workers
[→] Computing 0 sequential gates...
[→] Checking cone prerequisites...
[⚡] Launching parallel computation for 1 cones...
[✓] Cone 'b' completed (3 iterations)
[→] Final convergence pass...
[✓] SCOAP results written to: data/results/example_parallel.txt
Limitations
- Observability: Still computed sequentially (depends on all controllability values)
- Cross-Cone Dependencies: Rare cases may require additional convergence iterations
- Thread Count: Optimal workers = number of independent cones (not more)
- Memory: Each cone creates local copies of CC0/CC1 (increased memory usage)
Future Enhancements
- Parallel observability computation
- Dynamic work stealing for unbalanced cones
- Profiling and auto-tuning of worker count
- GPU acceleration for very large cones
- Distributed computation across machines
References
- SCOAP Algorithm: Goldstein (1979), “SCOAP: Sandia Controllability/Observability Analysis Program”
- Reconvergent Fanout: Xu & Edirisuriya (2004)
- Python Threading: concurrent.futures documentation