High-Throughput Computing (HTC)

Introduction

For many research and engineering projects, the quality of the research or the product is heavily dependent upon the quantity of computing cycles available. It is not uncommon to find problems that require weeks or months of computation to solve. Scientists and engineers engaged in this sort of work need a computing environment that delivers large amounts of computational power over a long period of time. Such an environment is called a High-Throughput Computing (HTC) environment. In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of compute power over a short period of time. HPC environments are often measured in terms of FLoating point Operations Per Second (FLOPS). A growing community is not concerned about operations per second, but operations per month or per year. Their problems are of a much larger scale. They are more interested in how many jobs they can complete over a long period of time instead of how fast an individual job can complete.

The key to HTC is to efficiently harness the use of all available resources. Years ago, the engineering and scientific community relied on a large, centralized mainframe or a supercomputer to do computational work. A large number of individuals and groups needed to pool their financial resources to afford such a machine. Users had to wait for their turn on the mainframe, and they had a limited amount of time allocated. While this environment was inconvenient for users, the utilization of the mainframe was high; it was busy nearly all the time.

As computers became smaller, faster, and cheaper, users moved away from centralized mainframes and purchased personal desktop workstations and PCs. An individual or small group could afford a computing resource that was available whenever they wanted it. The personal computer is slower than the large centralized machine, but it provides exclusive access. Now, instead of one giant computer for a large institution, there may be hundreds or thousands of personal computers. This is an environment of distributed ownership, where individuals throughout an organization own their own resources. The total computational power of the institution as a whole may rise dramatically as the result of such a change, but because of distributed ownership, individuals have not been able to capitalize on the institutional growth of computing power. And, while distributed ownership is more convenient for the users, the utilization of the computing power is lower. Many personal desktop machines sit idle for very long periods of time while their owners are busy doing other things (such as being away at lunch, in meetings, or at home sleeping).

Such is the case for many computer workstations in Annenberg, and throughout the CMS Department.

Condor is a software system that creates a High-Throughput Computing (HTC) environment. It effectively utilizes the computing power of workstations that communicate over a network. Condor can manage a dedicated cluster of workstations. Its power comes from the ability to effectively harness non-dedicated, preexisting resources under distributed ownership.

A user submits the job to Condor. Condor finds an available machine on the network and begins running the job on that machine. Condor has the capability to detect that a machine running a Condor job is no longer available (perhaps because the owner of the machine came back from lunch and started typing on the keyboard).

Condor can be a real time saver when a job must be run many (hundreds of) different times, perhaps with hundreds of different data sets. With one command, all of the hundreds of jobs are submitted to Condor. Depending upon the number of machines in the Condor pool, dozens or even hundreds of otherwise idle machines can be running the job at any given moment.

Condor provides powerful resource management by match-making resource owners with resource consumers. This is the cornerstone of a successful HTC environment. Other compute cluster resource management systems attach properties to the job queues themselves, resulting in user confusion over which queue to use as well as administrative hassle in constantly adding and editing queue properties to satisfy user demands. Condor implements ClassAds, a clean design that simplifies the user's submission of jobs.

ClassAds work in a fashion similar to the newspaper classified advertising want-ads. All machines in the Condor pool advertise their resource properties, both static and dynamic, such as available RAM memory, CPU type, CPU speed, virtual memory size, physical location, and current load average, in a resource offer ad. A user specifies a resource request ad when submitting a job. The request defines both the required and a desired set of properties of the resource to run the job. Condor acts as a broker by matching and ranking resource offer ads with resource request ads, making certain that all requirements in both ads are satisfied. During this match-making process, Condor also considers several layers of priority values: the priority the user assigned to the resource request ad, the priority of the user which submitted the ad, and desire of machines in the pool to accept certain types of ads over others.

Limitations

Condor imposes several limitations on jobs in the context of checkpointing -- allowing a running program to be frozen, migrated to another resource, and restarted from the point it left off -- however, Condor in the CMS Department was not built with support for checkpointing, so those limitations do not apply. Therefore, the main limitation in Annenberg is that when a running job needs to be migrated the job is started over from the beginning after migration is complete.

Condor in CMS

Currently, the only supported Condor HTC pool is in the ANN 104 lab, and LOGIN.CMS.CALTECH.EDU. You may freely submit jobs from any ANN 104 machine as well as LOGIN.CMS, however LOGIN.CMS is not an execute host as submitted jobs will run on idle machines in the ANN 104 lab. Any of the below commands or procedures can be followed only within the CMS Condor Pool.

How to Use Condor

Matchmaking with ClassAds

Before you learn about how to submit a job, it is important to understand how Condor allocates resources. Understanding the unique framework by which Condor matches submitted jobs with machines is the key to getting the most from Condor's scheduling algorithm.

Condor simplifies job submission by acting as a matchmaker of ClassAds. Condor's ClassAds are analogous to the classified advertising section of the newspaper. Sellers advertise specifics about what they have to sell, hoping to attract a buyer. Buyers may advertise specifics about what they wish to purchase. Both buyers and sellers list constraints that need to be satisfied. For instance, a buyer has a maximum spending limit, and a seller requires a minimum purchase price. Furthermore, both want to rank requests to their own advantage. Certainly a seller would rank one offer of $50 dollars higher than a different offer of $25. In Condor, users submitting jobs can be thought of as buyers of compute resources and machine owners are sellers.

All machines in a Condor pool advertise their attributes, such as available memory, CPU type and speed, virtual memory size, current load average, along with other static and dynamic properties. This machine ClassAd also advertises under what conditions it is willing to run a Condor job and what type of job it would prefer. These policy attributes can reflect the individual terms and preferences by which all the different owners have graciously allowed their machine to be part of the Condor pool. You may advertise that your machine is only willing to run jobs at night and when there is no keyboard activity on your machine. In addition, you may advertise a preference (rank) for running jobs submitted by you or one of your co-workers.

Likewise, when submitting a job, you specify a ClassAd with your requirements and preferences. The ClassAd includes the type of machine you wish to use. For instance, perhaps you are looking for the fastest floating point performance available. You want Condor to rank available machines based upon floating point performance. Or, perhaps you care only that the machine has a minimum of 128 Mbytes of RAM. Or, perhaps you will take any machine you can get! These job attributes and requirements are bundled up into a job ClassAd.

Condor plays the role of a matchmaker by continuously reading all the job ClassAds and all the machine ClassAds, matching and ranking job ads with machine ads. Condor makes certain that all requirements in both ClassAds are satisfied.

Inspecting Machine ClassAds with condor_status

Try the condor_status command to get a summary of information from ClassAds about the resources available in your pool. Type condor_status and hit enter to see a summary similar to the following:

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@arquebus.cms LINUX      INTEL  Claimed   Busy     0.000  1971  0+07:15:14
slot2@arquebus.cms LINUX      INTEL  Claimed   Busy     0.000  1971  0+07:15:15
slot1@ballista.cms LINUX      INTEL  Unclaimed Idle     0.000  1971  0+03:55:05
slot2@ballista.cms LINUX      INTEL  Unclaimed Idle     0.000  1971  0+15:45:23
slot1@bardiche.cms LINUX      INTEL  Owner     Idle     0.000  1971  0+00:55:06
slot2@bardiche.cms LINUX      INTEL  Claimed   Busy     0.000  1971  0+00:55:07
slot1@cadence.cms. LINUX      INTEL  Owner     Idle     0.050  1744  0+00:40:04
slot2@cadence.cms. LINUX      INTEL  Owner     Idle     0.000  1744  0+00:40:05

The condor_status command has options that summarize machine ads in a variet of ways. For example,

condor_status -available

shows only machines which are willing to run jobs now.

condor_status -run

shows only machines which are currently running jobs.

condor_status -l

lists the machine ClassAds for all machines in the pool.

Below shows the complete machine ClassAd for a single workstation: arquebus.cms.caltech.edu. Some of the listed attributes are used by Condor for scheduling. Other attributes are for information purposes. An important point is that any of the attributes in a machine ad can be utilized at job submission time as part of a request or preference on what machine to use. Additional attributes can be easily added. For example, your system administrator can add a physical location attribute to your machine ClassAds.

Machine = "arquebus.cms.caltech.edu"
LastHeardFrom = 1305928205
UpdateSequenceNumber = 896
JavaVersion = "1.6.0_18"
HasMPI = true
CpuIsBusy = false
HasVM = false
FileSystemDomain = "cms.caltech.edu"
JavaVendor = "Sun Microsystems Inc."
Name = "slot1@arquebus.cms.caltech.edu"
NumPids = 0
MonitorSelfTime = 1305927900
KeyboardIdle = 22501
TimeToLive = 2147483647
LastBenchmark = 1305698898
TotalDisk = 211690420
MaxJobRetirementTime = 0
Unhibernate = MY.MachineLastMatchTime =!= undefined
CondorPlatform = "$CondorPlatform: I686-Unknown_ $"
LastUpdate = 1305698898
HasJICLocalStdin = true
UpdatesTotal = 886
Cpus = 1
MonitorSelfCPUUsage = 0.004175
ClockDay = 5
IsWakeOnLanEnabled = true
JavaSpecificationVersion = "1.6"
StarterAbilityList = "HasMPI,HasVM,HasJICLocalStdin,HasJICLocalConfig,HasJava,HasJobDeferral,HasTDP,HasFileTransfer,HasPerFileEncryption,HasReconnect"
TotalTimeUnclaimedIdle = 243648
CondorVersion = "$CondorVersion: 7.6.0 May 16 2011 BuildID: UW_development $"
HasIOProxy = true
TotalTimeClaimedBusy = 3168
TotalTimeOwnerIdle = 13176
MonitorSelfImageSize = 9548.000000
HibernationSupportedStates = "S3,S4"
LastFetchWorkSpawned = 0
Requirements = ( START ) && ( IsValidCheckpointPlatform )
TotalMemory = 3943
DaemonStartTime = 1305667660
EnteredCurrentActivity = 1305906831
MyAddress = "<131.215.141.115:53895>"
HasJICLocalConfig = true
HasJava = true
EnteredCurrentState = 1305906831
CpuBusyTime = 0
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
COLLECTOR_HOST_STRING = "constantine.cms.caltech.edu"
Memory = 1971
IsWakeAble = true
MyCurrentTime = 1305928138
MonitorSelfRegisteredSocketCount = 1
TotalTimeUnclaimedBenchmarking = 42
TotalCpus = 2
ClockMin = 888
CurrentRank = 0.0
NextFetchWorkDelay = -1
AuthenticatedIdentity = "unauthenticated@unmapped"
OpSys = "LINUX"
State = "Unclaimed"
KFlops = 490745
UpdatesSequenced = 885
UpdatesHistory = "0x00000000000000000000000000000000"
Start = ( ( ( KeyboardIdle > 15 * 60 ) || ( ConsoleIdle > 15 * 60 ) ) && ( ( ( LoadAvg - CondorLoadAvg ) <= 0.300000 ) || ( State != "Unclaimed" && Stat e != "Owner" ) ) )
HasJobDeferral = true
MonitorSelfResidentSetSize = 4280
Arch = "INTEL"
Mips = 3993
Activity = "Idle"
IsWakeOnLanSupported = true
ConsoleIdle = 22501
HasTDP = true
SubnetMask = "255.255.255.0"
LastFetchWorkCompleted = 0
UpdatesLost = 0
StartdIpAddr = "<131.215.141.115:53895>"
WakeOnLanEnabledFlags = "Magic Packet"
TargetType = "Job"
TotalLoadAvg = 0.0
HasFileTransfer = true
HibernationLevel = 0
TotalTimeClaimedSuspended = 437
Rank = 0.0
HibernationState = "NONE"
MonitorSelfSecuritySessions = 3
JavaMFlops = 855.486877
MonitorSelfAge = 0
LoadAvg = 0.0
WakeOnLanSupportedFlags = "Magic Packet"
CheckpointPlatform = "LINUX INTEL 2.6.x normal N/A"
HasPerFileEncryption = true
CurrentTime = time()
Disk = 105845210
VirtualMemory = 1051646
TotalVirtualMemory = 2103292
TotalSlots = 2
UidDomain = "cms.caltech.edu"
SlotWeight = Cpus
SlotID = 1
HasReconnect = true
HardwareAddress = "84:2b:2b:a7:ef:65"
MyType = "Machine"
CanHibernate = true
CondorLoadAvg = 0.0
TotalCondorLoadAvg = 0.0

Road-map for Running Jobs

Here are all the steps needed to run a job in the CMS Condor pool.

Code preparation

A job run under Condor must be able to run as a background batch job. Condor runs the program unattended and in the background. A program that runs in the background will not be able to do interactive input and output. Condor can redirect console output (stdout and stderr) and keyboard input (stdin) to and from files for you. Create any needed files that contain the proper keystrokes needed for program input. Make certain the program will run correctly with the files.

The Condor Universe

Condor has several runtime environments (called universe) from which to choose. Of the universes, one likely choice when learning to submit a job to Condor: the vanilla universe.

Submit description file

Controlling the details of a job submission is a submit description file. The file contains information about the job such as what executable to run, the files to use for keyboard and screen data, the platform type required to run the program, and where to send e-mail when the job completes. You can also tell Condor how many times to run a program; it is simple to run the same program multiple times with multiple data sets.

Write a submit description file to go with the job. See below for examples.

Submit the Job

Submit the program to Condor with the condor_submit command.

Once submitted, Condor does the rest toward running the job. Monitor the job's progress with the condor_q and condor_status commands. You may modify the order in which Condor will run your jobs with condor_prio. If desired, Condor can even inform you in a log file every time your job is migrated to a different machine.

When your program completes, Condor will tell you (by e-mail) the exit status of your program and various statistics about its performances, including time used and I/O performed. If you are using a log file for the job (which is recommended) the exit status will be recorded in the log file. you can remove a job from the queue prematurely with condor_rm.

Choosing a Condor Universe

A universe in Condor defines an execution environment. Supported universes in CMS include:

  • Vanilla
  • Java

Vanilla Universe

Unfortunately, jobs run under the vanilla universe cannot checkpoint or use remote system calls. This has unfortunate consequences for a job that is partially completed when the remote machine running a job must be returned to its owner. Condor has only two choices. It can suspend the job, hoping to complete it at a later time, or it can give up and restart the job from the beginning on another machine in the pool.

Since Condor's remote system call features cannot be used with the vanilla universe, access to the job's input and output files becomes a concern. One option is for Condor to rely on a shared file system, such as NFS. Alternatively, Condor has a mechanism for transferring files on behalf of the user. In this case, Condor will transfer any files needed by a job to the execution site, run the job, and transfer the output back to the submitting machine.

Under Unix, Condor presumes a shared file system for vanilla jobs. However, if a shared file system is unavailable, a user can enable the Condor File Transfer mechanism.

Java Universe

A program submitted to the Java universe may run on any sort of machine with a JVM regardless of its location, owner, or JVM version. Condor will take care of all the details such as finding the JVM binary and setting the classpath.

Submitting a Job

A job is submitted for execution to Condor using the condor_submit command. condor_submit takes as an argument the name of a file called a submit description file. This file contains commands and keywords to direct the queuing of jobs. In the submit description file, Condor finds everything it needs to know about the job. Items such as the name of the executable to run, the initial working directory, and command-line arguments to the program all go into the submit description file. condor_submit creates a job ClassAd based upon the information, and Condor works toward running the job.

The contents of a submit file can save time for Condor users. It is easy to submit multiple runs of a program to Condor. To run the same program 500 times on 500 different input data sets, arrange your data files accordingly so that each run reads its own input, and each run writes its own output. Each individual run may have its own initial working directory, stdin, stdout, stderr, command-line arguments, and shell environment. A program that directly opens its own files will read the file names to use either from stdin or from the command line. A program that opens a static filename every time will need to use a separate subdirectory for the output of each run. For more information about the condor_submit command, see man condor_submit.

Sample submit description files

Example 1

####################
#
# Example 1: demonstrate use of multiple
# directories for data organization.
#
####################

Executable = mathematica
Universe   = vanilla
input      = test.data
output     = loop.out
error      = loop.error
Log        = loop.log

Initialdir = run_1
Queue

Initialdir = run_2
Queue

This example runs two copies of the program mathematica. The first copy will run in the directory run_1, and the second will run in the directory run_2. For both queued copies, stdin will be test.data, stdout will be loop.out, and stderr will be loop.error. There will be two sets of files written, as the files are each written to their own directories. This is a convenient way to organize data if you have a large group of Condor jobs to run. The example file shows program submission of mathematica as a vanilla universe job. This may be necessary if the source and/or object code to mathematica is not available.

Example 2

####################
# Show use of command line arguments.
####################

executable     = fib
universe       = vanilla

arguments      = 40
output         = fib.out
error          = fib.error
log            = fib.log

queue

This submit description file submits program fib one time for execution under Condor. The vanilla universe is explicitly specified. In those cases where the universe is not explicitly specified, a configuration variable may specify a universe to use. Where the configuration variable is not present, the vanilla universe is assumed. When program fib is executed, it is given as command-line arguments the string from the arguments command. So, the command line of the submitted job would appear as

fib 40

Matlab Jobs

Introduction

Condor is well suited to running large numbers of Matlab jobs concurrently. If the application applies the same kind of analysis to large data sets (so-called "embarrassingly parallel" applications) or carries out similar calculations based on different random initial data (e.g. applications based on Monte Carlo methods), Condor can significantly reduce the time needed to generate the results by processing the data in parallel on different hosts. In some cases, simulations and analyses that would have taken years on a single PC can be completed in a matter of days.

The application will need to perform three main steps:

  1. Create the initial input data and store it to file.
  2. Process the input data using the Condor pool and write the output to file.
  3. Collate the data in the output files to generate the results.

Licensing

Perhaps the biggest challenge to running Matlab under Condor is licensing. Matlab is proprietary software with strict licensing terms. Caltech has a sitewide license for Matlab, but this does not represent an infinite resource -- there is still a limit on the number of licenses for the campus that does not include a license for every computer workstation at Caltech.

The usual way of executing Matlab jobs is to create a M-file and then run this through the Matlab interpreter. This poses problems with a parallel implementation using Condor as each running job will require a licence to be checked out and is inefficient. To circumvent this, the Matlab M-file needs to be compiled into a standalone application which can run without the Matlab interpreter and without the need for a Matlab licence. This is described later.

Creating the M-files

As a trivial example, to see how Condor can be used to run Matlab jobs in paralllel, consider the case where we want to form the sum of p matrix-matrix products, i.e. calculate C where:

[[!img Error: Image::Magick is not installed]]

and A, B, and C are square matrices of order n. It is easy to see that the p matrix products could be calculated independently and therefor potentially in parallel.

In the first step we need to store A and B to Matlab data files which can be distributed by Condor for processing. Each Condor job will then need to read A and B from file, form the product AB and write the output to another Condor data file. The final step will sum all of the partial sums read from the output files to form the complete sum C.

The first step can be accomplished using an M-file similar to initialize.m:

function initialize(n)
  n = str2num(n);
  for i = 1:n
    index = i;
    A=rand(2,2)
    B=rand(2,2)
    filename=strcat('input',int2str(i));
    save( filename, 'A', 'B', 'index');
end

The elements of A and B are given random initial values and are saved to files using the Matlab 'save' command. Condor needs the input files to be indexed from 1:p-1 so the above code generates 'n' input files named input1.mat, input2.mat ... inputn.mat. The M-file also saves the index variable to file as this will be needed by the Condor job.

The second script will need to form the matrix-matrix products and will be run as a standalone application. A suitable M-file, product.m is:

function product(n)
  n = str2num(n);
  for i = 1:n
    filenameI = strcat( 'input', int2str( i ) );
    load( filenameI );
    S = A * B
    filenameO = strcat( 'output', int2str( i ) );
    save( filenameO, 'S' );
  end

Note that the function name must be the same as the M-file name (minus the extension).

Since the same executable is used for each parallel Condor job, the job will not know the specific input filename it has to deal with. Once the matrices have been loaded and the product formed, the output is written to file.

The final step is to collect all of the output files together to form the sum. This can be achieved using another M-file collect.m such as this:

function collect(n)
  n = str2num(n);
  C = zeros(2);
  for i = 1:n
    filename = strcat( 'output', int2str( i ) );
    load( filename );
    C = C + S
  end

This loads each output file in turn and forms the sum of the matrix-matrix products in the C variable.

Creating the standalone application

As indicated earlier, the M-file used by the Condor job needs to be compiled into a standalone application.

mcc -mv product.m

If the main M-file calls other functions stored in different M-files then these should be listed after the main M-file. Alternatively the -a option can be used to specify the directory they are held in. The mcc MATLAB compiler will create runtime shell executables of the form run_<function>.sh where is the name of the function and original binary such that MATLAB can load any dependant libraries/binaries in order to execute the compiled function.

Additionally, you could also submit the 'mcc -mv product.m' process as a Condor job to compile the M-file into an executable. An example of a Condor job to compile product.m with a submission script named build_product.submit:

universe = vanilla
executable = /opt/matlab/bin/mcc
arguments = "-mv product.m"
output = build_product.out
error = build_product.err
log = build_product.log
queue

And then submitting the Condor job with condor_submit build_product.submit. The arguments for mcc, "-mv product.m", state to compile the file product.m verbosely (-mv).. This produces two files, aside from the Condor output: product and run_product.sh.

Note that before using the compiler for the first time it needs to be configured using:

mbuild -setup

Creating the Condor files

Each Condor job needs a submission file to describe how the job should be run. For this example a submission file such as the one below can be used:

universe = vanilla
executable = run_product.sh
arguments = "/opt/matlab 10"
output = product$(PROCESS).out
log = product$(PROCESS).log
error = product$(PROCESS).err
notification = Error
queue

The 10 value in the arguments line is passed along to the run_product.sh script, and consequently also passed to the product compiled executable. The $(PROCESS) macro takes on the values of the Condor Job Process ID (typically zero for single-queued jobs). The input and output file lists can be modified to suit other applications. For production runs, the output files should always be specified just in case there is a run-time problem and they are not created. In this case, Condor will place tje job in the held ('H') state. To release these jobs and run them elsewhere, use:

condor_release -all

To find out why jobs have been held, use:

condor_q -held

Once the jobs have been completed, the directory should contain ten output files named outputN.mat, output1.mat ... output9.mat which can then be processed using collect.m to generate the final result. There will also be one log file (.log), one stdout files (.out) and one error files (*.err). These are largely insignificant if everything goes according to expectation, but can be used to track problems when things have gone unexpectedly.

This is a rather simplified example, however, as a rough benchmark with matrices of order n=4000, a serial implementation requires approximately 90 minutes of time whereas the Condor system requires on the order of 10 minutes.

Finally, the output of our final MATLAB example process, collect:

------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/opt/matlab/sys/java/jre/glnx86/jre/lib/i386/client:/opt/matlab/sys/java/jre/glnx86/jre/lib/i386
Warning: No display specified.  You will not be able to display graphics on the screen.

C =
    0.5276    0.2963
    0.6619    0.7518
C =
    1.5206    1.0850
    2.0565    1.6617
C =
    1.8254    2.1830
    2.6913    3.3354
C =
    2.3975    2.6859
    3.2551    3.8829
C =
    2.6208    2.9524
    3.5483    3.9551
C =
    3.0777    3.6426
    4.0091    4.6391
C =
    3.5873    3.8646
    4.6140    5.0676
C =
    4.4576    4.7594
    4.9268    5.3963
C =
    4.7315    5.6475
    5.1057    6.2381
C =
    5.1249    6.5890
    5.2415    6.5537