You can have a Condor pool that consists of both Unix and Windows machines.
Your central manager can be either Windows or Unix. For example, even if you had a pool consisting strictly of Unix machines, you could use a Windows box for your central manager, and vice versa.
Submitted jobs can originate from either a Windows or a Unix machine, and be destined to run on Windows or a Unix machine. Note that there are still restrictions on the supported universes for jobs executed on Windows machines.
So, in summary:
See Section 1.5, on page .
First, make sure that the program really does work outside of Condor under Windows, that the disk is not full, and that the system is not out of user resources.
As the next consideration, know that some Windows programs do not run properly because they are dynamically linked, and they cannot find the .dll files that they depend on. Version 6.4.x of Condor sets the PATH to be empty when running a job. To avoid these difficulties, do one of the following
getenv = truein the submit description file. This will copy your environment into the job's environment.
net start condoror start the Condor service from the Service Control Manager located in the Windows Control Panel.
Jobs submitted from a Windows machine require a stashed password in order for Condor to perform certain operations on the user's behalf. Refer to section 6.2.3 for information about password storage on Windows. The command which stashes a password for a user is condor_store_cred. See the manual page on on page for usage details.
The error message that Condor gives if a user has not stashed a password is of the form:
ERROR: No credential stored for username@machinename Correct this by running: condor_store_cred add
A difficulty with defaults causes jobs submitted from Unix for execution on a Windows platform to remain in the queue, but make no progress. For jobs with this problem, log files will contain error messages pointing to shadow exceptions.
This difficulty stems from the defaults for whether file transfer takes place. The workaround for this problem is to place the lines
should_transfer_files = YES when_to_transfer_output = ON_EXITinto the submit description file for jobs submitted from a Unix machine for execution on a Windows machine.
Condor uses the first network interface it sees on your machine. This problem usually means you have an extra, inactive network interface (such as a RAS dial up interface) defined before the regular network interface.
To solve this problem, either change the order of the network interfaces in the Control Panel, or explicitly set which network interface Condor should use by adding the following definition to the Condor configuration file:
NETWORK_INTERFACE = <ip-address>
Where <ip-address>
is the IP address of the interface that
Condor is to use.
This can occur when the machine your job is running on is missing a DLL (Dynamically Linked Library) required by your program. The solution is to find the DLL file the program needs and put it in the TRANSFER_INPUT_FILES list in the job's submit file.
To find out what DLLs your program depends on, right-click the program in Explorer, choose Quickview, and look under ``Import List''.
Five methods for making access of network files work with Condor are given in section 6.2.10.
Given the command
condor_off hostname2an error message of the form
Can't find address for master hostname2.somewhere.eduappears. Yet, when looking at the host names with
condor_status -masterthe output is of the form
hostname1.somewhere.edu hostname2 hostname3.somewhere.edu
To correct this incomplete host name, add an entry to the configuration file for DEFAULT_DOMAIN_NAME that specifies the domain name to be used. For the example given, the configuration entry will be
DEFAULT_DOMAIN_NAME = somewhere.edu
After adding this configuration file entry, use condor_restart to restart the Condor daemons and effect the change.
An example of a batch script sets environment variables:
REM set some environment variables set LICENSE_SERVER=192.168.1.202:5012 set MY_PARAMS=2 REM Run the actual job now %*
First, make sure the condor_schedd daemon is running.
Next, check the log file written by the condor_schedd daemon.
It will contain more detailed information about the failure.
Frequently, the error is a result of
PERMISSION DENIED
errors.
More information about proper configuration of
security settings is on page .
Windows is likely to be running out of desktop heap. Confirm this to be the case by looking in the log for the condor_schedd daemon to see if condor_shadow daemons are immediately exiting with status 128. If this is the case, increase the desktop heap size. Open the registry key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows
The SharedSection value can have three values separated by commas. The third value controls the desktop heap size for non-interactive desktops, which the Condor service uses. The default is 512 (Kbytes). 60 condor_shadow daemons consume about 256 Kbytes, hence 120 shadows can run with the default value. To be able to run a maximum of 300 condor_shadow daemons, set this value at 1280.
Reboot the system for the changes to take effect. For more information, see Microsoft Article Q184802.
Usually when Condor daemons exit in this manner, it is because the system in question has a non-standard Winsock Layered Service Provider (LSP) installed on it. An LSP is, in effect, a plug-in for the TCP/IP protocol stack. LSPs have been installed as part of anti-virus software and other security-related packages.
There are several tools available to check your system for the presence of LSPs. One with which we have had success is LSP-Fix, available at http://www.cexx.org/lspfix.htm. Any non-Microsoft LSPs identified by this tool may potentially be causing the WSAENOTSOCK error in Condor. Although the LSP-Fix tool allows the direct removal of an LSP, it is likely advisable to completely remove the application for which the LSP is a part via the Control Panel.
Another approach is to completely reset the TCP/IP stack to its
original state. This can be done using the netsh
tool:
netsh int ip reset reset-stack.logThe command will return the TCP/IP stack back to the state is was in when the OS was first installed. The log file defined above will record all the configuration changes made by
netsh
.
Condor on Windows platforms relies on built-in performance counters for its operation. If performance counters that Condor requires are disabled, daemons may exit with a message such as
1/26 09:16:42 (fd:2) (pid:5732) ERROR: "Unexpected performance counter size for total CPU: 0 (expected: 8)" at line 2846 in file ..\src\condor_procapi\procapi.cpp
or
1/20 15:29:14 (pid:2484) ERROR "unable to spawn the ProcD" at line 136 in file ..\src\condor_c++_util\proc_family_proxy.C
and even
4/16 10:49:13 loadavg thread died, restarting. (exit code=2)
To enable the performance counters, check the registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PerfProc\PerformanceIf a value for
Disable Performance Counters
exists, delete it or set
it to 0
.
This error results when the VBScript engine is not registered. Since Condor's installer depends on the VBScript engine for custom steps, the installer will fail if it cannot find the VBScript engine.
The fix is to register the VMScript engine. With Administrative privilege:
regsvr32 vbscript.dll
If successful, the message
DllRegisterServer in vbscript.dll succeeded.is printed.
Condor assumes that all floating point numbers are of the form x.y, which, depending on a computer's current locale, may not always be the case. This problem occurs even if Condor is running under an account that has had the locale configured correctly. The problem lies in the template user account which is used to create Condor's dynamic accounts. Even if the entire system is configured to use a new locale, this template account seems to retain the original system locale. The following steps can be used fix this problem.
To create a default user profile, you must be logged on as Administrator or be a member of the Administrators group. Create a new user profile for all new user accounts on a computer to be based on. To create subsequent profiles, you can use the new user account as a template. Here is how to use the new user profile as a template account to use as a new user's profile:
%WinDir%\Profiles\Default
;
%SystemDrive%\Documents and Settings\Defualt
;
%SystemDrive%\Users\Default
.
If Condor has already created some dynamic accounts, you will need to remove them so that Condor can re-create them with the new template account.