Database scheme

Database scheme

Database scheme (red lines seem PRIMARY KEY, blue lines seem INDEX)

Note : all dates and duration are stored in an integer manner (number of seconds since the EPOCH).

accounting

Fields

Types

Descriptions

window_start

INT UNSIGNED

start date of the accounting interval

window_stop

INT UNSIGNED

stop date of the accounting interval

accounting_user

VARCHAR(20)

user name

accounting_project

VARCHAR(255)

name of the related project

queue_name

VARCHAR(100)

queue name

consumption_type

ENUM(“ASKED”, “USED”)

“ASKED” corresponds to the walltimes specified by the user. “USED” corresponds to the effective time used by the user.

consumption

INT UNSIGNED

number of seconds used

Primary key

window_start, window_stop, accounting_user, queue_name, accounting_project, consumption_type

Index fields

window_start, window_stop, accounting_user, queue_name, accounting_project, consumption_type

This table is a summary of the consumption for each user on each queue. This increases the speed of queries about user consumptions and statistic generation.

Data are inserted through the command oaraccounting (when a job is treated the field accounted in table jobs is passed into “YES”). So it is possible to regenerate this table completely in this way :

  • Delete all data of the table:

    DELETE FROM accounting;
    
  • Set the field accounted in the table jobs to “NO” for each row:

    UPDATE jobs SET accounted = "NO";
    
  • Run the oaraccounting command.

You can change the amount of time for each window : edit the oar configuration file and change the value of the tag ACCOUNTING_WINDOW.

schema

Fields

Types

Descriptions

version

VARCHAR(255)

database schema version number

name

VARCHAR(255)

optional name

This table is used to store the version of the database schema.

So the oar-database command be used to automatically upgrade the schema from any version with:

oar-database --setup

admission_rules

Fields

Types

Descriptions

id

INT UNSIGNED

id number

rule

TEXT

rule written in Perl applied when a job is going to be registered

Primary key

id

Index fields

None

You can use these rules to change some values of some properties when a job is submitted. So each admission rule is executed in the order of the id field and it can set several variables. If one of them exits then the others will not be evaluated and oarsub returns an error.

The rules can be added with the following command:

oaradmissionrules -n

Some examples are better than a long description:

  • Specify the default value for queue parameter

    if (not defined($queue_name)) {
        $queue_name="default";
    }
    
  • Avoid users except oar to go in the admin queue

    if (($queue_name eq "admin") && ($user ne "oar")) {
      die("[ADMISSION RULE] Only oar user can submit jobs in the admin queue\n");
    }
    
  • Restrict the maximum of the walltime for interactive jobs

    my $max_walltime = OAR::IO::sql_to_duration("12:00:00");
    if ($jobType eq "INTERACTIVE"){
      foreach my $mold (@{$ref_resource_list}){
        if (
          (defined($mold->[1])) and
          ($max_walltime < $mold->[1])
        ){
          print("[ADMISSION RULE] Walltime to big for an INTERACTIVE job so it is set to $max_walltime.\n");
          $mold->[1] = $max_walltime;
        }
      }
    }
    
  • Specify the default walltime

    my $default_wall = OAR::IO::sql_to_duration("2:00:00");
    foreach my $mold (@{$ref_resource_list}){
      if (!defined($mold->[1])){
        print("[ADMISSION RULE] Set default walltime to $default_wall.\n");
        $mold->[1] = $default_wall;
      }
    }
    
  • How to perform actions if the user name is in a file

    open(FILE, "/tmp/users.txt");
    while (($queue_name ne "admin") and ($_ = <FILE>)){
      if ($_ =~ m/^\\s*$user\\s*$/m){
        print("[ADMISSION RULE] Change assigned queue into admin\n");
        $queue_name = "admin";
      }
    }
    close(FILE);
    
  • How to automatically add a job type depending of the walltime and an estimation of the number of resources of the job

    foreach my $e (estimate_job_nb_resources($dbh_ro, $ref_resource_list, $jobproperties)){
      #print("AREA: $e->{nbresources} x $e->{walltime} = ".$e->{nbresources} * $e->{walltime}."\n");
      if ($e->{nbresources} * $e->{walltime} > 24*3600*1){
        print("[ADMISSION RULE] Your job is of the 'big' type\n");
        push(@{$type_list},"big");
        last;
      }
    }
    

You can print all the admission rules with:

oaradmissionrules -S -f

event_logs

Fields

Types

Descriptions

event_id

INT UNSIGNED

event identifier

type

VARCHAR(50)

event type

job_id

INT UNSIGNED

job related of the event

date

INT UNSIGNED

event date

description

VARCHAR(255)

textual description of the event

to_check

ENUM(‘YES’, ‘NO’)

specify if the module NodeChangeState must check this event to Suspect or not some nodes

Primary key

event_id

Index fields

type, to_check

The different event types are:

  • “PING_CHECKER_NODE_SUSPECTED” : the system detected via the module “finaud” that a node is not responding.

  • “PROLOGUE_ERROR” : an error occurred during the execution of the job prologue (exit code != 0).

  • “EPILOGUE_ERROR” : an error occurred during the execution of the job epilogue (exit code != 0).

  • “CANNOT_CREATE_TMP_DIRECTORY” : OAR cannot create the directory where all information files will be stored.

  • “CAN_NOT_WRITE_NODE_FILE” : the system was not able to write file which had to contain the node list on the first node (/tmp/OAR_job_id).

  • “CAN_NOT_WRITE_PID_FILE” : the system was not able to write the file which had to contain the pid of oarexec process on the first node (/tmp/pid_of_oarexec_for_job_id).

  • “USER_SHELL” : the system was not able to get informations about the user shell on the first node.

  • “EXIT_VALUE_OAREXEC” : the oarexec process terminated with an unknown exit code.

  • “SEND_KILL_JOB” : signal that OAR has transmitted a kill signal to the oarexec of the specified job.

  • “LEON_KILL_BIPBIP_TIMEOUT” : Leon module has detected that something wrong occurred during the kill of a job and so kill the local bipbip process.

  • “EXTERMINATE_JOB” : Leon module has detected that something wrong occurred during the kill of a job and so clean the database and terminate the job artificially.

  • “WORKING_DIRECTORY” : the directory from which the job was submitted does not exist on the node assigned by the system.

  • “OUTPUT_FILES” : OAR cannot write the output files (stdout and stderr) in the working directory.

  • “CANNOT_NOTIFY_OARSUB” : OAR cannot notify the oarsub process for an interactive job (maybe the user has killed this process).

  • “WALLTIME” : the job has reached its walltime.

  • “SCHEDULER_REDUCE_NB_NODES_FOR_RESERVATION” : this means that there is not enough nodes for the reservation and so the scheduler do the best and gives less nodes than the user wanted (this occurres when nodes become Suspected or Absent).

  • “BESTEFFORT_KILL” : the job is of the type besteffort and was killed because a normal job wanted the nodes.

  • “FRAG_JOB_REQUEST” : someone wants to delete a job.

  • “CHECKPOINT” : the checkpoint signal was sent to the job.

  • “CHECKPOINT_ERROR” : OAR cannot send the signal to the job.

  • “CHECKPOINT_SUCCESS” : system has sent the signal correctly.

  • “SERVER_EPILOGUE_TIMEOUT” : epilogue server script has time outed.

  • “SERVER_EPILOGUE_EXIT_CODE_ERROR” : epilogue server script did not return 0.

  • “SERVER_EPILOGUE_ERROR” : cannot find epilogue server script file.

  • “SERVER_PROLOGUE_TIMEOUT” : prologue server script has time outed.

  • “SERVER_PROLOGUE_EXIT_CODE_ERROR” : prologue server script did not return 0.

  • “SERVER_PROLOGUE_ERROR” : cannot find prologue server script file.

  • “CPUSET_CLEAN_ERROR” : OAR cannot clean correctly cpuset files for a job on the remote node.

  • “MAIL_NOTIFICATION_ERROR” : a mail cannot be sent.

  • “USER_MAIL_NOTIFICATION” : user mail notification cannot be performed.

  • “USER_EXEC_NOTIFICATION_ERROR” : user script execution notification cannot be performed.

  • “BIPBIP_BAD_JOBID” : error when retrieving informations about a running job.

  • “BIPBIP_CHALLENGE” : OAR is configured to detach jobs when they are launched on compute nodes and the job return a bad challenge number.

  • “RESUBMIT_JOB_AUTOMATICALLY” : the job was automatically resubmitted.

  • “WALLTIME” : the job reached its walltime.

  • “REDUCE_RESERVATION_WALLTIME” : the reservation job was shrunk.

  • “SSH_TRANSFER_TIMEOUT” : node OAR part script was too long to transfer.

  • “BAD_HASHTABLE_DUMP” : OAR transfered a bad hashtable.

  • “LAUNCHING_OAREXEC_TIMEOUT” : oarexec was too long to initialize itself.

  • “RESERVATION_NO_NODE” : All nodes were detected as bad for the reservation job.

event_log_hostnames

Fields

Types

Descriptions

event_id

INT UNSIGNED

event identifier

hostname

VARCHAR(255)

name of the node where the event has occured

Primary key

event_id

Index fields

hostname

This table stores hostnames related to events like “PING_CHECKER_NODE_SUSPECTED”.

files

Fields

Types

Descriptions

idFile

INT UNSIGNED

md5sum

VARCHAR(255)

location

VARCHAR(255)

method

VARCHAR(255)

compression

VARCHAR(255)

size

INT UNSIGNED

Primary key

idFile

Index fields

md5sum

frag_jobs

Fields

Types

Descriptions

frag_id_job

INT UNSIGNED

job id

frag_date

INT UNSIGNED

kill job decision date

frag_state

ENUM(‘LEON’, ‘TIMER_ARMED’ , ‘LEON_EXTERMINATE’, ‘FRAGGED’) DEFAULT ‘LEON’

state to tell Leon what to do

Primary key

frag_id_job

Index fields

frag_state

What do these states mean:

  • “LEON” : the Leon module must try to kill the job and change the state into “TIMER_ARMED”.

  • “TIMER_ARMED” : the Sarko module must wait a response from the job during a timeout (default is 60s)

  • “LEON_EXTERMINATE” : the Sarko module has decided that the job time outed and asked Leon to clean up the database.

  • “FRAGGED” : job is fragged.

gantt_jobs_resources

Fields

Types

Descriptions

moldable_job_id

INT UNSIGNED

moldable job id

resource_id

INT UNSIGNED

resource assigned to the job

Primary key

moldable_job_id, resource_id

Index fields

None

This table specifies which resources are attributed to which jobs.

gantt_jobs_resources_visu

Fields

Types

Descriptions

moldable_job_id

INT UNSIGNED

moldable job id

resource_id

INT UNSIGNED

resource assigned to the job

Primary key

moldable_job_id, resource_id

Index fields

None

This table is the same as gantt_jobs_resources and is used by visualisation tools. It is updated atomically (a lock is used).

gantt_jobs_predictions

Fields

Types

Descriptions

moldable_job_id

INT UNSIGNED

job id

start_time

INT UNSIGNED

date when the job is scheduled to start

Primary key

moldable_job_id

Index fields

None

With this table and gantt_jobs_resources you can know exactly what are the decisions taken by the schedulers for each waiting jobs.

note

The special job id “0” is used to store the scheduling reference date.

gantt_jobs_predictions_visu

Fields

Types

Descriptions

moldable_job_id

INT UNSIGNED

job id

start_time

INT UNSIGNED

date when the job is scheduled to start

Primary key

job_id

Index fields

None

This table is the same as gantt_jobs_predictions and is used by visualisation tools. It is made up to date in an atomic action (with a lock).

jobs

Fields

Types

Descriptions

job_id

INT UNSIGNED

job identifier

array_id

INT

array identifier

array_index

INT

index of the job in the array

initial_request

TEXT

oarsub initial arguments

job_name

VARCHAR(100)

name given by the user

cpuset_name

VARCHAR(255)

name of the cpuset directory used for this job on each nodes

job_type

ENUM(‘INTERACTIVE’, ‘PASSIVE’) DEFAULT ‘PASSIVE’

specify if the user wants to launch a program or get an interactive shell

info_type

VARCHAR(255)

some informations about oarsub command

state

ENUM(‘Waiting’,’Hold’, ‘toLaunch’, ‘toError’, ‘toAckReservation’, ‘Launching’, ‘Running’ ‘Suspended’, ‘Resuming’, , ‘Finishing’, ‘Terminated’, ‘Error’)

job state

reservation

ENUM(‘None’, ‘toSchedule’, ‘Scheduled’) DEFAULT ‘None’

specify if the job is a reservation and the state of this one

message

VARCHAR(255)

readable information message for the user

job_user

VARCHAR(255)

user name

command

TEXT

program to run

queue_name

VARCHAR(100)

queue name

properties

TEXT

properties that assigned nodes must match

launching_directory

TEXT

path of the directory where to launch the user process

submission_time

INT UNSIGNED

date when the job was submitted

start_time

INT UNSIGNED

date when the job was launched

stop_time

INT UNSIGNED

date when the job was stopped

file_id

INT UNSIGNED

accounted

ENUM(“YES”, “NO”) DEFAULT “NO”

specify if the job was considered by the accounting mechanism or not

notify

VARCHAR(255)

gives the way to notify the user about the job (mail or script )

assigned_moldable_job

INT UNSIGNED

moldable job chosen by the scheduler

checkpoint

INT UNSIGNED

number of seconds before the walltime to send the checkpoint signal to the job

checkpoint_signal

INT UNSIGNED

signal to use when checkpointing the job

stdout_file

TEXT

file name where to redirect program STDOUT

stderr_file

TEXT

file name where to redirect program STDERR

resubmit_job_id

INT UNSIGNED

if a job is resubmitted then the new one store the previous

project

VARCHAR(255)

arbitrary name given by the user or an admission rule

suspended

ENUM(“YES”,”NO”)

specify if the job was suspended (oarhold)

job_env

TEXT

environment variables to set for the job

exit_code

INT DEFAULT 0

exit code for passive jobs

job_group

VARCHAR(255)

not used

Primary key

job_id

Index fields

state, reservation, queue_name, accounted, suspended

Explications about the “state” field:

  • “Waiting” : the job is waiting OAR scheduler decision.

  • “Hold” : user or administrator wants to hold the job (oarhold command). So it will not be scheduled by the system.

  • “toLaunch” : the OAR scheduler has attributed some nodes to the job. So it will be launched.

  • “toError” : something wrong occurred and the job is going into the error state.

  • “toAckReservation” : the OAR scheduler must say “YES” or “NO” to the waiting oarsub command because it requested a reservation.

  • “Launching” : OAR has launched the job and will execute the user command on the first node.

  • “Running” : the user command is executing on the first node.

  • “Suspended” : the job was in Running state and there was a request (oarhold with “-r” option) to suspend this job. In this state other jobs can be scheduled on the same resources (these resources has the “suspended_jobs” field to “YES”).

  • “Finishing” : the user command has terminated and OAR is doing work internally

  • “Terminated” : the job has terminated normally.

  • “Error” : a problem has occurred.

Explications about the “reservation” field:

  • “None” : the job is not a reservation.

  • “toSchedule” : the job is a reservation and must be approved by the scheduler.

  • “Scheduled” : the job is a reservation and is scheduled by OAR.

job_dependencies

Fields

Types

Descriptions

job_id

INT UNSIGNED

job identifier

job_id_required

INT UNSIGNED

job needed to be completed before launching job_id

Primary key

job_id, job_id_required

Index fields

job_id, job_id_required

This table is feeded by oarsub command with the “-a” option.

moldable_job_descriptions

Fields

Types

Descriptions

moldable_id

INT UNSIGNED

moldable job identifier

moldable_job_id

INT UNSIGNED

corresponding job identifier

moldable_walltime

INT UNSIGNED

instance duration

Primary key

moldable_id

Index fields

moldable_job_id

A job can be described with several instances. Thus OAR scheduler can choose one of them. For example it can calculate which instance will finish first. So this table stores all instances for all jobs.

job_resource_groups

Fields

Types

Descriptions

res_group_id

INT UNSIGNED

group identifier

res_group_moldable_id

INT UNSIGNED

corresponding moldable job identifier

res_group_property

TEXT

SQL constraint properties

Primary key

res_group_id

Index fields

res_group_moldable_id

As you can specify job global properties with oarsub and the “-p” option, you can do the same thing for each resource groups that you define with the “-l” option.

job_resource_descriptions

Fields

Types

Descriptions

res_job_group_id

INT UNSIGNED

corresponding group identifier

res_job_resource_type

VARCHAR(255)

resource type (name of a field in resources)

res_job_value

INT

wanted resource number

res_job_order

INT UNSIGNED

order of the request

Primary key

res_job_group_id, res_job_resource_type, res_job_order

Index fields

res_job_group_id

This table store the hierarchical resource description given with oarsub and the “-l” option.

job_state_logs

Fields

Types

Descriptions

job_state_log_id

INT UNSIGNED

identifier

job_id

INT UNSIGNED

corresponding job identifier

job_state

ENUM(‘Waiting’, ‘Hold’, ‘toLaunch’, ‘toError’, ‘toAckReservation’, ‘Launching’, ‘Finishing’, ‘Running’, ‘Suspended’, ‘Resuming’, ‘Terminated’, ‘Error’)

job state during the interval

date_start

INT UNSIGNED

start date of the interval

date_stop

INT UNSIGNED

end date of the interval

Primary key

job_state_log_id

Index fields

job_id, job_state

This table keeps informations about state changes of jobs.

job_types

Fields

Types

Descriptions

job_type_id

INT UNSIGNED

identifier

job_id

INT UNSIGNED

corresponding job identifier

type

VARCHAR(255)

job type like “deploy”, “timesharing”, …

type_index

ENUM(‘CURRENT’, ‘LOG’)

index field

Primary key

job_type_id

Index fields

job_id, type

This table stores job types given with the oarsub command and “-t” options.

resources

Fields

Types

Descriptions

resource_id

INT UNSIGNED

resource identifier

type

VARCHAR(100) DEFAULT “default”

resource type (used for licence resources for example)

network_address

VARCHAR(100)

node name (used to connect via SSH)

state

ENUM(‘Alive’, ‘Dead’ , ‘Suspected’, ‘Absent’)

resource state

next_state

ENUM(‘UnChanged’, ‘Alive’, ‘Dead’, ‘Absent’, ‘Suspected’) DEFAULT ‘UnChanged’

state for the resource to switch

finaud_decision

ENUM(‘YES’, ‘NO’) DEFAULT ‘NO’

tell if the actual state results in a “finaud” module decision

next_finaud_decision

ENUM(‘YES’, ‘NO’) DEFAULT ‘NO’

tell if the next node state results in a “finaud” module decision

state_num

INT

corresponding state number (useful with the SQL “ORDER” query)

suspended_jobs

ENUM(‘YES’,’NO’)

specify if there is at least one suspended job on the resource

scheduler_priority

INT UNSIGNED

arbitrary number given by the system to select resources with more intelligence

switch

VARCHAR(50)

name of the switch

cpu

INT UNSIGNED

global cluster cpu number

cpuset

INT UNSIGNED

field used with the JOB_RESOURCE_MANAGER_PROPERTY_DB_FIELD

besteffort

ENUM(‘YES’,’NO’)

accept or not besteffort jobs

deploy

ENUM(‘YES’,’NO’)

specify if the resource is deployable

expiry_date

INT UNSIGNED

field used for the desktop computing feature

desktop_computing

ENUM(‘YES’,’NO’)

tell if it is a desktop computing resource (with an agent)

last_job_date

INT UNSIGNED

store the date when the resource was used for the last time

available_upto

INT UNSIGNED

used with compute mode features to know if an Absent resource can be switch on

Primary key

resource_id

Index fields

state, next_state, type, suspended_jobs

State explications:

  • “Alive” : the resource is ready to accept a job.

  • “Absent” : the oar administrator has decided to pull out the resource. This computer can come back.

  • “Suspected” : OAR system has detected a problem on this resource and so has suspected it (you can look in the event_logs table to know what has happened). This computer can come back (automatically if this is a “finaud” module decision).

  • “Dead” : The oar administrator considers that the resource will not come back and will be removed from the pool.

This table permits to specify different properties for each resources. These can be used with the oarsub command (“-p” and “-l” options).

You can add your own properties with oarproperty command.

These properties can be updated with the oarnodesetting command (“-p” option).

Several properties are added by default:

  • switch : you have to register the name of the switch where the node is plugged.

  • cpu : this is a unique name given to each cpus. This enables OAR scheduler to distinguish all cpus.

  • cpuset : this is the name of the cpu on the node. The Linux kernel sets this to an integer beginning at 0. This field is linked to the configuration tag JOB_RESOURCE_MANAGER_PROPERTY_DB_FIELD.

resource_logs

Fields

Types

Descriptions

resource_log_id

INT UNSIGNED

unique id

resource_id

INT UNSIGNED

resource identifier

attribute

VARCHAR(255)

name of corresponding field in resources

value

VARCHAR(255)

value of the field

date_start

INT UNSIGNED

interval start date

date_stop

INT UNSIGNED

interval stop date

finaud_decision

ENUM(‘YES’,’NO’)

store if this is a system change or a human one

Primary key

None

Index fields

resource_id, attribute

This table permits to keep a trace of every property changes (consequence of the oarnodesetting command with the “-p” option).

assigned_resources

Fields

Types

Descriptions

moldable_job_id

INT UNSIGNED

job id

resource_id

INT UNSIGNED

resource assigned to the job

Primary key

moldable_job_id, resource_id

Index fields

moldable_job_id

This table keeps informations for jobs on which resources they were scheduled.

queues

Fields

Types

Descriptions

queue_name

VARCHAR(100)

queue name

priority

INT UNSIGNED

the scheduling priority

scheduler_policy

VARCHAR(100)

path of the associated scheduler

state

ENUM(‘Active’, ‘notActive’) DEFAULT ‘Active’

permits to stop the scheduling for a queue

Primary key

queue_name

Index fields

None

This table contains the schedulers executed by the oar_meta_scheduler module. Executables are launched one after one in the specified priority.

challenges

Fields

Types

Descriptions

job_id

INT UNSIGNED

job identifier

challenge

VARCHAR(255)

challenge string

ssh_private_key

TEXT DEFAULT NULL

ssh private key given by the user (in grid usage it enables to connect onto all nodes of the job of all clusers with oarsh)

ssh_public_key

TEXT DEFAULT NULL

ssh public key

Primary key

job_id

Index fields

None

This table is used to share a secret between OAR server and oarexec process on computing nodes (avoid a job id being stolen/forged by malicious user).

For security reasons, this table must not be readable for a database account given to users who want to access OAR internal informations(like statistics).