next up previous contents index
Next: 6.5 Troubleshooting Up: 6. Frequently Asked Questions Previous: 6.3 Running Condor Jobs

Subsections

6.4 Condor on Windows NT / Windows 2000

6.4.1 Will Condor work on a network of mixed Unix and NT machines?

You can have a Condor pool that consists of both Unix and NT machines.

Your central manager can be either Windows NT or Unix. For example, even if you had a pool consisting strictly of Unix machines, you could use an NT box for your central manager, and vice versa.

You can submit jobs destined to run on Windows NT from either an NT machine or a Unix machine. However, at this point in time you cannot submit jobs destined to run on Unix from NT. We do plan on adding this functionality, however.

So, in summary:

1.
A single Condor pool can consist of both Windows NT and Unix machines.

2.
It does not matter at all if your Central Manager is Unix or NT.

3.
Unix machines can submit jobs to run on other Unix or Windows NT machines.

4.
Windows NT machines can only submit jobs which will run on Windows NT machines.

6.4.2 When I run condor_status I get a communication error, or the Condor daemon log files report a failure to bind.

Condor uses the first network interface it sees on your machine. This problem usually means you have an extra, inactive network interface (such as a RAS dialup interface) defined before to your regular network interface.

To solve this problem, either change the order of your network interfaces in the Control Panel, or explicity set which network interface Condor should use by adding the following parameter to your Condor config file:

NETWORK_INTERFACE = ip-address

Where ``ip-address'' is the IP address of the interface you wish Condor to use.

6.4.3 My job starts but exits right away with status 128.

 

This can occur when the machine your job is running on is missing a DLL (Dynamically Linked Library) required by your program. The solution is to find the DLL file the program needs and put it in the TRANSFER_INPUT_FILES list in the job's submit file.

To find out what DLLs your program depends on, right-click the program in Explorer, choose Quickview, and look under ``Import List''.

 

6.4.4 How can I access network files with Condor on NT?

Features to allow Condor NT to work well with a network file server are coming very soon. However, there are a couple of work-arounds which you can do immediately with the current version of Condor NT in order to access a file server.

The heart of the problem is that on the execute machine, Condor creates a "temporary" user which will run the job... and your file server has never heard of this user before. So the workaround is to either

All of these workarounds have disadvantages, but they may be able to hold you until our code to support shared file servers in Condor is officially released.

Here are the three methods in more detail:

METHOD A - access the file server as a different user via a net use command with a login and password

Example: you want to copy a file off of a server before running it....

   @echo off
   net use \\myserver\someshare MYPASSWORD /USER:MYLOGIN
   copy \\myserver\someshare\my-program.exe
   my-program.exe

The idea here is to simply authenticate to the file server with a different login than the temporary Condor login. This is easy with the "net use" command as shown above. Of course, the obvious disadvantage is this user's password is stored and transferred as cleartext.

METHOD B - access the file server as guest

Example: you want to copy a file off of a server before running it as GUEST

   @echo off
   net use \\myserver\someshare
   copy \\myserver\someshare\my-program.exe
   my-program.exe

In this example, you'd contact the server MYSERVER as the Condor temporary user. However, if you have the GUEST account enabled on MYSERVER, you will be authenticated to the server as user "GUEST". If your file permissions (ACLs) are setup so that either user GUEST (or group EVERYONE) has access the share "someshare" and the directories/files that live there, you can use this method. The downside of this method is you need to enable the GUEST account on your file server. WARNING: This should be done *with extreme caution* and only if your file server is well protected behind a firewall that blocks SMB traffic.

METHOD C - access the file server with a "NULL" descriptor

One more option is to use NULL Security Descriptors. In this way, you can specify which shares are accessible by NULL Descriptor by adding them to your registry. You can then use the batch file wrapper like:

net use z: \\myserver\someshare /USER:""
z:\my-program.exe

so long as 'someshare' is in the list of allowed NULL session shares. To edit this list, run regedit.exe and navigate to the key:

HKEY_LOCAL_MACHINE\
   SYSTEM\
     CurrentControlSet\
       Services\
         LanmanServer\
           Parameters\
             NullSessionShares

and edit it. unfortunately it is a binary value, so you'll then need to type in the hex ascii codes to spell out your share. each share is separated by a null (0x00) and the last in the list is terminated with two nulls.

although a little more difficult to set up, this method of sharing is a relatively safe way to have one quasi-public share without opening the whole guest account. you can control specifically which shares can be accessed or not via the registry value mentioned above.

METHOD D - access with the contrib module from Bristol

Another option: some hardcore Condor users at Bristol University developed their own module for starting jobs under Condor NT to access file servers. It involves storing submitting user's passwords on a centralized server. Below I have included the README from this contrib module, which will soon appear on our website within a week or two. If you want it before that, let me know, and I could email it to you.

Here is the README from the Bristol Condor NT contrib module:

README
Compilation Instructions
Build the projects in the following order

CondorCredSvc
CondorAuthSvc
Crun
Carun
AfsEncrypt
RegisterService
DeleteService
Only the first 3 need to be built in order. This just makes sure that the 
RPC stubs are correctly rebuilt if required. The last 2 are only helper 
applications to install/remove the services. All projects are Visual Studio 
6 projects. The nmakefiles have been exported for each. Only the project 
for Carun should need to be modified to change the location of the AFS 
libraries if needed.

Details
CondorCredSvc
CondorCredSvc is a simple RPC service that serves the domain account 
credentials. It reads the account name and password from the registry of 
the machine it's running on. At the moment these details are stored in 
clear text under the key

HKEY_LOCAL_MACHINE\Software\Condor\CredService

The account name and password are held in REG_SZ values "Account" and 
"Password" respectively. In addition there is an optional REG_SZ value 
"Port" which holds the clear text port number (e.g. "1234"). If this value 
is not present the service defaults to using port 3654.

At the moment there is no attempt to encrypt the username/password when it 
is sent over the wire - but this should be reasonably straightforward to 
change. This service can sit on any machine so keeping the registry entries 
secure ought to be fine. Certainly the ACL on the key could be set to only 
allow administrators and SYSTEM access.

CondorAuthSvc and Crun
These two programs do the hard work of getting the job authenticated and 
running in the right place. CondorAuthSvc actually handles the process 
creation while Crun deals with getting the winstation/desktop/working 
directory and grabbing the console output from the job so that Condor's 
output handling mechanisms still work as advertised. Probably the easiest 
way to see how the two interact is to run through the job creation process:

The first thing to realize is that condor itself only runs Crun.exe. Crun 
treats its command line parameters as the program to really run. e.g. "Crun 
\\mymachine\myshare\myjob.exe" actually causes 
\\mymachine\myshare\myjob.exe to be executed in the context of the domain 
account served by CondorCredSvc. This is how it works:

When Crun starts up it gets its window station and desktop - these are the 
ones created by condor. It also gets its current directory - again already 
created by condor. It then makes sure that SYSTEM has permission to modify 
the DACL on the window station, desktop and directory. Next it creates a 
shared memory section and copies its environment variable block into it. 
Then, so that it can get hold of STDOUT and STDERR from the job it makes 
two named pipes on the machine it's running on and attaches a thread to 
each which just prints out anything that comes in on the pipe to the 
appropriate stream. These pipes currently have a NULL DACL, but only one 
instance of each is allowed so there shouldn't be any issues involving 
malicious people putting garbage into them. The shared memory section and 
both named pipes are tagged with the ID of Crun's process in case we're on 
a multi-processor machine that might be running more than one job. Crun 
then makes an RPC call to CondorAuthSvc to actually start the job, passing 
the names of the window station, desktop, executable to run, current 
directory, pipes and shared memory section (it only attempts to call 
CondorAuthSvc on the same machine as it is running on). If the jobs starts 
successfully it gets the process ID back from the RPC call and then just 
waits for the new process to finish before closing the pipes and exiting. 
Technically, it does this by synchronizing on a handle to the process and 
waiting for it to exit. CondorAuthSvc sets the ACL on the process to allow 
EVERYONE  to synchronize on it.

[ Technical note: Crun adds "C:\WINNT\SYSTEM32\CMD.EXE /C" to the start of 
the command line. This is because the process is created with the network 
context of the caller i.e. LOCALSYSTEM. Prepending cmd.exe gets round any 
unexpected "Access Denied" errors. ]

If Crun gets a WM_CLOSE (CTRL_CLOSE_EVENT) while the job is running it 
attempts to stop the job, again with an RPC call to CondorAuthSvc passing 
the job's process ID.

CondorAuthSvc runs as a service under the LOCALSYSTEM account and does the 
work of starting the job. By default it listens on port 3655, but this can 
be changed by setting the optional REG_SZ value "Port" under the registry key

HKEY_LOCAL_MACHINE\Software\Condor\AuthService

(Crun also checks this registry key when attempting to contact 
CondorAuthSvc.) When it gets the RPC to start a job CondorAuthSvc first 
connects to the pipes for STDOUT and STDERR to prevent anyone else sending 
data to them. It also opens the shared memory section with the environment 
stored by Crun.  It then makes an RPC call to CondorCredSvc (to get the 
name and password of the domain account) which is most likely running on 
another system. The location information is stored in the registry under 
the key

HKEY_LOCAL_MACHINE\Software\Condor\CredService

The name of the machine running CondorCredSvc must be held in the REG_SZ 
value "Host". This should be the fully qualified domain name of the 
machine. You can also specify the optional "Port" REG_SZ value in case you 
are running CondorCredSvc on a different port.

Once the domain account credentials have been received the account is 
logged on through a call to LogonUser. The DACLs on the window station, 
desktop and current directory are then modified to allow the domain account 
access to them and the job is started in that window station and desktop 
with a call to CreateProcessAsUser. The starting directory is set to the 
same as sent by Crun, STDOUT and STDERR handles are set to the named pipes 
and the environment sent by Crun is used. CondorAuthSvc also starts a 
thread which waits on the new process handle until it terminates to close 
the named pipes. If the process starts correctly the process ID is returned 
to Crun.

If Crun requests that the job be stopped (again via RPC), CondorAuthSvc 
loops over all windows on the window station and desktop specified until it 
finds the one associated with the required process ID. It then sends that 
window a WM_CLOSE message, so any termination handling built in to the job 
should work correctly.

[Security Note: CondorAuthSvc currently makes no attempt to verify the 
origin of the call starting the job. This is, in principal, a bad thing 
since if the format of the RPC call is known it could let anyone start a 
job on the machine in the context of the domain user. If sensible security 
practices have been followed and the ACLs on sensitive system directories 
(such as C:\WINNT) do not allow write access to anyone other than trusted 
users the problem should not be too serious.]

Carun and AFSEncrypt
Carun and AFSEncrypt are a couple of utilities to allow jobs to access AFS 
without any special recompliation. AFSEncrypt encrypts an AFS 
username/password into a file (called .afs.xxx) using a simple XOR 
algorithm. It's not a particularly secure way to do it, but it's simple and 
self-inverse. Carun reads this file and gets an AFS token before running 
whatever job is on its command line as a child process. It waits on the 
process handle and a 24 hour timer. If the timer expires first it briefly 
suspends the primary thread of the child process and attempts to get a new 
AFS token before restarting the job, the idea being that the job should 
have uninterrupted access to AFS if it runs for more than 25 hours (the 
default token lifetime). As a security measure, the AFS credentials are 
cached by Carun in memory and the .afs.xxx file deleted as soon as the 
username/password have been read for the first time.

Carun needs the machine to be running either the IBM AFS client or the 
OpenAFS client to work. It also needs the client libraries if you want to 
rebuild it.

For example, if you wanted to get a list of your AFS tokens under Condor 
you would run the following:

Crun \\mymachine\myshare\Carun tokens.exe

Running a job
To run a job using this mechanism specify the following in your job 
submission (assuming Crun is in C:\CondorAuth):

Executable= c:\CondorAuth\Crun.exe
Arguments = \\mymachine\myshare\carun.exe 
\\anothermachine\anothershare\myjob.exe
Transfer_Input_Files = .afs.xxx

along with your usual settings.

Installation
A basic installation script for use with the Inno Setup installation 
package compiler can be found in the Install folder.

6.4.5 Does Condor run under Windows 2000?

Condor does run under Windows 2000 Professional.

Condor does not run under Windows 2000 Server.

There will be problems if you have more than 2 Gigabytes of RAM or swap space.

A Personal Condor installation will not work.


next up previous contents index
Next: 6.5 Troubleshooting Up: 6. Frequently Asked Questions Previous: 6.3 Running Condor Jobs
condor-admin@cs.wisc.edu