TRICYS Concurrent Execution¶
TRICYS is designed to fully leverage the computational power of modern multi-core processors by using concurrent execution to significantly reduce the total time required for parameter sweeps and batch simulations. Depending on the simulation type, TRICYS intelligently adopts two different concurrency strategies: multi-threading (for standard simulations) and multi-processing (for co-simulations).
You can control the concurrency behavior in config.json with the following parameters:
"simulation": {
"concurrent": true, // Set to true to enable concurrent mode
"max_workers": 8 // Set the maximum number of concurrent workers, defaults to the number of CPU cores
}
1. Standard Simulation: Multi-threaded Concurrency¶
For standard parameter sweep tasks that do not involve co-simulation, TRICYS defaults to a multi-threading model.
1.1. How It Works¶
TRICYS uses Python's concurrent.futures.ThreadPoolExecutor to manage a pool of worker threads. Each simulation task (i.e., a specific set of parameters) is submitted to the thread pool and executed by an available thread.
# Simplified conceptual code
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as executor:
# _run_single_job is the task to be executed by each thread
futures = [executor.submit(_run_single_job, config, params, i) for i, params in enumerate(jobs)]
# ... wait for and collect results ...
1.2. Why Use Multi-threading?¶
Although Python has a Global Interpreter Lock (GIL) that prevents true parallelism for CPU-bound code within a single process, multi-threading is still an efficient choice for standard Modelica simulation scenarios. Here's why:
- I/O-Bound: Calling
OMPythonto run a simulation is essentially launching an externalsimulate.exeexecutable. During this time, the Python thread is mostly in a waiting state (I/O-bound), waiting for the external process to complete its calculations and write back the result file. The GIL is released during this period, allowing other threads to run. - Low Overhead: Threads are more lightweight than processes. They are created and destroyed faster and consume fewer system resources. For a large number of tasks where each simulation is not particularly time-consuming, using threads can reduce the overhead of task scheduling.
2. Co-simulation: Multi-process Concurrency¶
For complex co-simulation tasks, TRICYS switches to a more robust multi-processing model.
2.1. How It Works¶
TRICYS uses concurrent.futures.ProcessPoolExecutor to create a process pool. Each co-simulation task (_run_co_simulation) runs in a completely separate child process.
# Simplified conceptual code
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=8) as executor:
# _run_co_simulation is executed in a separate process
futures = [executor.submit(_run_co_simulation, config, params, i) for i, params in enumerate(jobs)]
# ... wait for and collect results ...
2.2. Why Must Multi-processing Be Used?¶
The co-simulation workflow involves extensive reading, writing, and modification of the file system, which is the core reason why it must use multi-processing:
- Environment Isolation: Each co-simulation task requires a clean, independent execution environment. It copies the original Modelica model package to a temporary directory and then modifies it (e.g., by generating an interceptor or replacing a model directly). If multi-threading were used, multiple tasks would simultaneously modify the shared model files, leading to file corruption and incorrect results.
- State Conflict Avoidance: Processes have independent memory spaces and file handles. This ensures that modifications to model code, the execution of external Python handlers, and all generated intermediate files by one task will not interfere with other parallel tasks.
- Bypassing the GIL: The Python handlers in co-simulation can themselves be computationally intensive. In a multi-processing model, each process has its own Python interpreter and GIL, enabling true parallel computation and full utilization of all CPU cores.
3. Performance and Practical Recommendations¶
-
Set
max_workersReasonably:- For I/O-bound standard simulations, you can set
max_workersto be slightly higher than the number of CPU cores (e.g.,cores * 1.5). - For CPU-bound co-simulations,
max_workersis typically set to be equal to or slightly less than your number of CPU cores to avoid excessive context-switching overhead. - Always consider memory limits. Each process consumes a significant amount of memory. If
max_workersis set too high, it may lead to memory exhaustion.
- For I/O-bound standard simulations, you can set
-
Task Granularity:
- Concurrency itself has overhead. If a single simulation is very short (e.g., less than a second), the performance gain from concurrency may not be enough to offset the scheduling overhead. In this case, serial execution (
"concurrent": false) might be faster.
- Concurrency itself has overhead. If a single simulation is very short (e.g., less than a second), the performance gain from concurrency may not be enough to offset the scheduling overhead. In this case, serial execution (
-
Monitor System Resources:
- When running large-scale concurrent tasks, it is advisable to use system monitoring tools (like Task Manager or
htop) to observe CPU and memory usage, which can help in tuningmax_workersto its optimal value.
- When running large-scale concurrent tasks, it is advisable to use system monitoring tools (like Task Manager or