Determining number of nodes or cores available in an SGE Queue

To determine the status of a queue in SGE, one can issue the command qstat -g c to get such information like number of CPU available and the current CPU and memory load. However, this information can be misleading when nodes can be cross-listed in multiple Q’s. A Q can say X number of nodes are unused, when in reality, they are in use in a different Q. Consequently, a submitted parallel job asking for X cores can wait in limbo for quite some time depending on the cluster’s load. The following sgeQload.R R script uses some commands explained in the cheat sheet to output the number of cores really available:

 <pre class="example">#! /bin/env Rscript

This script shows me the number of cores available for each Q.

Since many Q’s on BDUC contain overlapping nodes, information from “qstat -g c” could be misleading and lead to submitted jobs that are waiting…

This script utilizes R, qconf

References

http://moo.nac.uci.edu/~hjm/bduc/sge-quick-reference_v3_cheatsheet.pdf

http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

qstatgc <- system(“qstat -g c”, intern=TRUE) qstatgc.list <- strsplit(qstatgc, split=”\s+”, perl=TRUE) ## remove — line and all.q qstatgc.list[[1]] <- qstatgc.list[[1]][-1] ## CLUSTER QUEUE is one thing -> QUEUE qstat <- t(sapply(qstatgc.list[-1], function(x) as.numeric(x[-1]))) colnames(qstat) <- qstatgc.list[[1]][-1] rownames(qstat) <- sapply(qstatgc.list[-1], function(x) x[1]) qstat <- cbind(qstat, NCPU=NA, LOAD=NA, AVAILABLE=NA)

for(Q in rownames(qstat)){ host.list <- strsplit(grep(“hostlist”, system(paste(“qconf -sq”, Q), intern=TRUE), value=TRUE), split=”\s+”, perl=TRUE)[[1]][-1] host.vec <- NULL for(host in host.list){ host.vec <- c(host.vec, strsplit(strsplit(gsub(“\”, “”, paste(system(paste(“qconf -shgrp”, host, sep=” “), intern=TRUE), collapse=” “), fixed=TRUE), “hostlist”, fixed=TRUE)[[1]][2], “\s+”, perl=TRUE)[[1]]) } host.vec <- unique(host.vec) host.vec <- host.vec[host.vec != “”] host.vec <- gsub(“.bduc”, “”, host.vec, fixed=TRUE)

qhost <- system(“qhost”, intern=TRUE) qhost.matrix <- do.call(rbind, strsplit(qhost[-1], “\s+”, perl=TRUE)) colnames(qhost.matrix) <- strsplit(qhost[1], “\s+”, perl=TRUE)[[1]] NCPU <- sum(as.numeric(qhost.matrix[qhost.matrix[, “HOSTNAME”] %in% host.vec, “NCPU”])) LOAD <- sum(as.numeric(qhost.matrix[qhost.matrix[, “HOSTNAME”] %in% host.vec, “LOAD”])) qstat[Q, “NCPU”] <- NCPU qstat[Q, “LOAD”] <- LOAD qstat[Q, “AVAILABLE”] <- NCPU-LOAD }

qstat

Note that this script is specific to the cluster I use. It should be modified for other clusters. It does not work immediately on another cluster I have access to.

About Vinh Nguyen

Statistician

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>