Software

We constantly release our software artifacts in the team's GitHub account SPEAR-UIC

Recent Release Heading link

  • CQSim: a trace-based, discrete event driven scheduling simulator

    If you use CQSim in your work, please cite the paper:
    X. Yang, Z. Zhou, S. Wallace, Z. Lan, W. Tang, S. Coghlan, and M. Papka, “Integrating Dynamic Pricing of Electricity into Energy Aware Scheduling for HPC Systems”, Proc. of SC’13, 2013.

  • CQGym: Gym environment for reinforcement-learned scheduling

    The team developed an automated cluster scheduling agent called DRAS (Deep Reinforcement Agent for Scheduling) by leveraging various deep reinforcement learning algorithms.CQGym, built on Gym interface, provides a platform for evaluating different RL algorithms with trace-based, event-driven scheduling environment. If you use CQGym in your work, please cite the paper:

    Y. Fan, et al., “DRAS: Deep Reinforcement Learning for Cluster Scheduling in High-Performance Computing”, IEEE Trans. on Parallel and Distributed Systems, 2022.

  • Union: in-situ skeleton workload module for CODES

    If you use Union in your work, please cite the paper:
    X. Wang, M. Mubarak, Y. Kang, R. Ross, and Z. Lan, “Union: An Automatic Workload Manager for Accelerating Network Simulations”, Proc. of IPDPS, 2020.

  • Q-adaptive: a multi-agent reinforcement learning based routing for Dragonfly systems

    If you use Q-adaptive/SST in your work, please cite the paper:
    Yao Kang, Xin Wang, and Zhiling Lan. “Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network”, Proc of HPDC 2021.

  • CODES: parallel discrete event-driven simulation toolkit for large-scale systems

    We are a part of CODES organization on GitHub.

  • DNPC: dynamic power capping library for parallel applications

    If you use DNPC in your work, please cite the paper:

    Sahil Sharma, Zhiling Lan, Xingfu Wu, and Valerie Taylor, “A Dynamic Power Capping Library for HPC Applications”, IEEE Cluster 2021 (2-page research poster).

  • Mantis: a unified performance and power profiling interface on heterogeneous systems

    Mantis is a suite to perform code-external profiling using a range of standard profiling tools and and producing a unified data format. It is under active development by Melanie Cornelius.

  • MRSch: a multi-resource scheduling framework that leverages multi-objective reinforcement learning algorithm

    If you use DNPC in your work, please cite the paper:

    B. Li, Y. Fan, M. Dearing, Z. Lan, P. Rich, W. Allcock, and M. Papka, “MRSch: Multi-Resource Scheduling for HPC”, Proc of IEEE Cluster 2022.