diff -pNaru linux-2.4.27/Documentation/Configure.help linux/Documentation/Configure.help
--- linux-2.4.27/Documentation/Configure.help	2004-08-07 16:26:04 -07:00
+++ linux/Documentation/Configure.help	2004-12-01 11:30:26 -08:00
@@ -22257,6 +22257,56 @@ CONFIG_MAGIC_SYSRQ
   keys are documented in <file:Documentation/sysrq.txt>. Don't say Y
   unless you really know what this hack does.
 
+Process Aggregates support
+CONFIG_PAGG
+  Say Y here if you will be loading modules which provide support
+  for process aggregate containers.  Currently, this option is only
+  applicable to Intel architectures. Examples of such modules include the
+  Linux Jobs module and the Linux Array Sessions module.  If you will not be
+  using such modules, say N.
+
+Process Aggregates Job support
+CONFIG_PAGG_JOB
+  The Job feature implements a type of process aggregate,
+  or grouping.  A job is the collection of all processes that
+  are descended from a point-of-entry process.  Examples of such
+  points-of-entry include telnet, rlogin, and console logins.
+  A job differs from a session and process group since the job
+  container (or group) is inescapable.  Only root level processes,
+  or those with the CAP_SYS_RESOURCE capability, can create new jobs
+  or escape from a job.
+
+  A job is identified by a unique job identifier (jid).  Currently,
+  that jid can be used to obtain status information about the job
+  and the processes it contians.  The jid can also be used to send
+  signals to all processes contained in the job.  In addition,
+  other processes can wait for the completion of a job - the event
+  where the last process contained in the job has exited.
+
+  If you want to compile support for jobs into the kernel, select
+  this entry using Y.  If you want the support for jobs provided as
+  a module, select this entry using M.  If you do not want support
+  for jobs, select N.
+
+CSA Job Accounting
+CONFIG_CSA
+  Comprehensive System Accounting (CSA) provides job level accounting
+  of resource usage.  The accounting records are written by the
+  kernel into a file.  CSA user level scripts and commands process
+  the binary accounting records and combine them by job identifier
+  within system boot uptime periods.  These accounting records are
+  then used to produce reports and charge fees to users.
+
+  Say Y here if you want job level accounting to be compiled into
+  the kernel.  Say M here if you want the writing of accounting
+  records portion of this feature to be a loadable module.  Say
+  N here if you do not want job level accounting (the default).
+
+  The CSA commands and scripts package needs to be installed to
+  process the CSA accounting records.  See http://oss.sgi.com/projects/csa
+  for further information about CSA and download instructions for the CSA
+  commands package and documentation.
+
 ISDN support
 CONFIG_ISDN
   ISDN ("Integrated Services Digital Networks", called RNIS in France)
diff -pNaru linux-2.4.27/Documentation/job.txt linux/Documentation/job.txt
--- linux-2.4.27/Documentation/job.txt	1969-12-31 16:00:00 -08:00
+++ linux/Documentation/job.txt	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,104 @@
+Linux Jobs - A Process Aggregate (PAGG) Module
+----------------------------------------------
+
+1. Overview
+
+This document provides two additional sections.  Section 2 provides a
+listing of the manual page that describes the particulars of the Linux
+job implementation.  Section 3 provides some information about using 
+the user job library to interface to jobs.
+
+2. Job Man Page
+
+
+JOB(7)		       Linux User's Manual		   JOB(7)
+
+
+NAME
+       job - Linux Jobs kernel module overview
+
+DESCRIPTION
+       A job is a group of related processes all descended from a
+       point of entry process and  identified  by  a  unique  job
+       identifier  (jid).   A  job  can	 contain multiple process
+       groups or sessions, and all processes in one of these sub-
+       groups can only be contained within a single job.
+
+       The  primary  purpose  for  having  jobs is to provide job
+       based resource limits.  The  current  implementation  only
+       provides	 the  job  container  and resource limits will be
+       provided in a later implementation.  When  an  implementa-
+       tion  that provides job limits is available, this descrip-
+       tion will be expanded to provide	 further  explanation  of
+       job based limits.
+
+       Not  every  process  on the system is part of a job.  That
+       is, only processes which are started by a login	initiator
+       like  login, rlogin, rsh and so on, get assigned a job ID.
+       In the Linux environment, jobs are created via a PAM  mod-
+       ule.
+
+       Jobs on Linux are provided using a loadable kernel module.
+       Linux jobs have the following characteristics:
+
+       o   A job is an inescapable container.  A  process  cannot
+	   leave the job nor can a new process be created outside
+	   the job without explicit action,  that  is,	a  system
+	   call with root privilege.
+
+       o   Each	 new  process  inherits	 the jid and limits [when
+	   implemented] from its parent process.
+
+       o   All point of entry processes (job initiators) create a
+	   new	job  and  set  the  job limits [when implemented]
+	   appropriately.
+
+       o   Job initiation on Linux is performed via a PAM session
+	   module.
+
+       o   The job initiator performs authentication and security
+	   checks.
+
+       o   Users can raise and lower their own job limits  within
+	   maximum  values  specified by the system administrator
+	   [when implemented].
+
+       o   Not all processes on a system need be members of a job.
+
+       o   The	process control initialization process (init(1M))
+	   and startup scripts called by init are not part  of	a
+	   job.
+
+
+       Job initiators can be categorized as either interactive or
+       batch processes.	 Limit domain names are	 defined  by  the
+       system  administrator when the user limits database (ULDB)
+       is created.  [The ULDB will be implemented in  conjunction
+       with future job limits work.]
+
+       Note: The existing command jobs(1) applies to shell "jobs"
+       and it is not related to the  Linux  Kernel  Module  jobs.
+       The  at(1),  atd(8),  atq(1), batch(1), atrun(8), atrm(1))
+       man pages refer to  shell  scripts  as  a  job.	 a  shell
+       script.
+
+SEE ALSO
+       job(1), jwait(1), jstat(1), jkill(1)
+
+
+
+
+
+
+
+
+
+3. User Job Library
+
+For developers who wish to make software using Linux Jobs, there exists
+a user job library.  This library contains functions for obtaining information
+about running jobs, creating jobs, detaching, etc.  
+
+The library is part of the job package and can be obtained from oss.sgi.com
+using anonymous ftp.  Look in the /projects/pagg/download directory.  See the
+README in the job source package for more information.
diff -pNaru linux-2.4.27/Documentation/pagg.txt linux/Documentation/pagg.txt
--- linux-2.4.27/Documentation/pagg.txt	1969-12-31 16:00:00 -08:00
+++ linux/Documentation/pagg.txt	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,32 @@
+Linux Process Aggregates (PAGG)
+-------------------------------
+
+The process aggregates infrastructure, or PAGG, provides a generalized
+mechanism for providing arbitrary process groups in Linux.  PAGG consists
+of a series of functions for registering and unregistering support
+for new types of process aggregation containers with the kernel.
+This is similar to the support currently provided within Linux that
+allows for dynamic support of filesystems, block and character devices,
+symbol tables, network devices, serial devices, and execution domains.
+This implementation of PAGG provides developers the basic hooks necessary
+to implement kernel modules for specific process containers, such as
+the job container.
+
+The do_fork function in the kernel was altered to support PAGG.  If a
+process is attached to any PAGG containers and subsequently forks a
+child process, the child process will also be attached to the same PAGG
+containers.  The PAGG containers involved during the fork are notified
+that a new process has been attached.  The notification is accomplished
+via a callback function provided by the PAGG module.
+
+The do_exit function in the kernel has also been altered.  If a process
+is attached to any PAGG containers and that process is exiting, the PAGG
+containers are notified that a process has detached from the container.
+The notification is accomplished via a callback function provided by
+the PAGG module.
+
+The sys_execve function has been modified to support an optional callout
+that can be run when a process in a pagg list does an exec.  It can be 
+used, for example, by other kernel modules that wish to do advanced CPU
+placement on multi-processor systems (just one example).
+
diff -pNaru linux-2.4.27/arch/i386/config.in linux/arch/i386/config.in
--- linux-2.4.27/arch/i386/config.in	2004-02-18 05:36:30 -08:00
+++ linux/arch/i386/config.in	2004-12-01 11:30:26 -08:00
@@ -320,6 +320,11 @@ fi
 bool 'System V IPC' CONFIG_SYSVIPC
 bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
 bool 'Sysctl support' CONFIG_SYSCTL
+bool 'Support for process aggregates (PAGGs)' CONFIG_PAGG
+if [ "$CONFIG_PAGG" = "y" ]; then
+    tristate '  Process aggregate based jobs' CONFIG_PAGG_JOB
+fi
+dep_tristate '    CSA Job Accounting' CONFIG_CSA $CONFIG_PAGG_JOB
 if [ "$CONFIG_PROC_FS" = "y" ]; then
    choice 'Kernel core (/proc/kcore) format' \
 	"ELF		CONFIG_KCORE_ELF	\
diff -pNaru linux-2.4.27/arch/ia64/config.in linux/arch/ia64/config.in
--- linux-2.4.27/arch/ia64/config.in	2004-02-18 05:36:30 -08:00
+++ linux/arch/ia64/config.in	2004-12-01 11:30:26 -08:00
@@ -123,6 +123,11 @@ bool 'Networking support' CONFIG_NET
 bool 'System V IPC' CONFIG_SYSVIPC
 bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
 bool 'Sysctl support' CONFIG_SYSCTL
+bool 'Support for process aggregates (PAGGs)' CONFIG_PAGG
+if [ "$CONFIG_PAGG" = "y" ]; then
+    tristate '  Process aggregate based jobs' CONFIG_PAGG_JOB
+fi
+dep_tristate '    CSA Job Accounting' CONFIG_CSA $CONFIG_PAGG_JOB
 tristate 'Kernel support for ELF binaries' CONFIG_BINFMT_ELF
 tristate 'Kernel support for MISC binaries' CONFIG_BINFMT_MISC
 
diff -pNaru linux-2.4.27/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c
--- linux-2.4.27/drivers/block/ll_rw_blk.c	2004-04-14 06:05:29 -07:00
+++ linux/drivers/block/ll_rw_blk.c	2004-12-01 11:30:26 -08:00
@@ -633,6 +633,7 @@ static struct request *get_request(reque
 static struct request *__get_request_wait(request_queue_t *q, int rw)
 {
 	register struct request *rq;
+	unsigned long start_wait = jiffies;
 	DECLARE_WAITQUEUE(wait, current);
 
 	add_wait_queue_exclusive(&q->wait_for_requests, &wait);
@@ -651,6 +652,7 @@ static struct request *__get_request_wai
 	} while (rq == NULL);
 	remove_wait_queue(&q->wait_for_requests, &wait);
 	current->state = TASK_RUNNING;
+	current->bwtime += jiffies - start_wait;
 
 	return rq;
 }
@@ -705,9 +707,11 @@ inline void drive_stat_acct (kdev_t dev,
 	if (rw == READ) {
 		kstat.dk_drive_rio[major][index] += new_io;
 		kstat.dk_drive_rblk[major][index] += nr_sectors;
+		current->rblk += nr_sectors;
 	} else if (rw == WRITE) {
 		kstat.dk_drive_wio[major][index] += new_io;
 		kstat.dk_drive_wblk[major][index] += nr_sectors;
+		current->wblk += nr_sectors;
 	} else
 		printk(KERN_ERR "drive_stat_acct: cmd not R/W?\n");
 }
diff -pNaru linux-2.4.27/fs/exec.c linux/fs/exec.c
--- linux-2.4.27/fs/exec.c	2004-02-18 05:36:31 -08:00
+++ linux/fs/exec.c	2004-12-01 11:30:26 -08:00
@@ -39,7 +39,8 @@
 #include <linux/utsname.h>
 #define __NO_VERSION__
 #include <linux/module.h>
-
+#include <linux/pagg.h>
+#include <linux/csa_internal.h>
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
 #include <asm/mmu_context.h>
@@ -957,9 +958,14 @@ int do_execve(char * filename, char ** a
 		goto out; 
 
 	retval = search_binary_handler(&bprm,regs);
-	if (retval >= 0)
+	if (retval >= 0) {
+		pagg_exec(current);
 		/* execve success */
+		/* no-op if CONFIG_CSA not set */
+		csa_update_integrals();
+		update_mem_hiwater();
 		return retval;
+	}
 
 out:
 	/* Something went wrong, return the inode and free the argument pages*/
diff -pNaru linux-2.4.27/fs/read_write.c linux/fs/read_write.c
--- linux-2.4.27/fs/read_write.c	2003-08-25 04:44:43 -07:00
+++ linux/fs/read_write.c	2004-12-01 11:30:26 -08:00
@@ -173,8 +173,13 @@ asmlinkage ssize_t sys_read(unsigned int
 			if (!ret) {
 				ssize_t (*read)(struct file *, char *, size_t, loff_t *);
 				ret = -EINVAL;
-				if (file->f_op && (read = file->f_op->read) != NULL)
+				if (file->f_op && (read = file->f_op->read) != NULL) {
 					ret = read(file, buf, count, &file->f_pos);
+					if (ret > 0) {
+						current->rchar += ret;
+					}
+					current->syscr++;
+				}
 			}
 		}
 		if (ret > 0)
@@ -199,8 +204,13 @@ asmlinkage ssize_t sys_write(unsigned in
 			if (!ret) {
 				ssize_t (*write)(struct file *, const char *, size_t, loff_t *);
 				ret = -EINVAL;
-				if (file->f_op && (write = file->f_op->write) != NULL)
+				if (file->f_op && (write = file->f_op->write) != NULL) {
 					ret = write(file, buf, count, &file->f_pos);
+					if (ret > 0) {
+						current->wchar += ret;
+					}
+					current->syscw++;
+				}
 			}
 		}
 		if (ret > 0)
@@ -341,6 +351,10 @@ asmlinkage ssize_t sys_readv(unsigned lo
 	    (file->f_op->readv || file->f_op->read))
 		ret = do_readv_writev(VERIFY_WRITE, file, vector, count);
 	fput(file);
+	if (ret > 0) {
+		current->rchar += ret;
+	}
+	current->syscr++;
 
 bad_file:
 	return ret;
@@ -361,6 +375,10 @@ asmlinkage ssize_t sys_writev(unsigned l
 	    (file->f_op->writev || file->f_op->write))
 		ret = do_readv_writev(VERIFY_READ, file, vector, count);
 	fput(file);
+	if (ret > 0) {
+		current->wchar += ret;
+	}
+	current->syscw++;
 
 bad_file:
 	return ret;
@@ -393,8 +411,11 @@ asmlinkage ssize_t sys_pread(unsigned in
 	if (pos < 0)
 		goto out;
 	ret = read(file, buf, count, &pos);
-	if (ret > 0)
+	if (ret > 0) {
 		dnotify_parent(file->f_dentry, DN_ACCESS);
+		current->rchar += ret;
+	}
+	current->syscr++;
 out:
 	fput(file);
 bad_file:
@@ -425,8 +446,11 @@ asmlinkage ssize_t sys_pwrite(unsigned i
 		goto out;
 
 	ret = write(file, buf, count, &pos);
-	if (ret > 0)
+	if (ret > 0) {
 		dnotify_parent(file->f_dentry, DN_MODIFY);
+		current->wchar += ret;
+	}
+	current->syscw++;
 out:
 	fput(file);
 bad_file:
diff -pNaru linux-2.4.27/include/linux/csa.h linux/include/linux/csa.h
--- linux-2.4.27/include/linux/csa.h	1969-12-31 16:00:00 -08:00
+++ linux/include/linux/csa.h	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,554 @@
+/*
+ * Copyright (c) 2000 Silicon Graphics, Inc and LANL  All Rights Reserved.
+ * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as 
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ * 
+ * This program is distributed in the hope that it would be useful, but 
+ * WITHOUT ANY WARRANTY; without even the implied warranty of 
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details. 
+ *
+ * You should have received a copy of the GNU General Public License along 
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
+ *
+ * Contact information:  Silicon Graphics, Inc., 1600 Amphitheatre Pkwy, 
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ */
+/*
+ *  CSA (Comprehensive System Accounting)
+ *  Job Accounting for Linux
+ *
+ *  This header file contains the definitions needed for job
+ *  accounting. The kernel CSA accounting module code and all
+ *  user-level programs that try to write or process the binary job 
+ *  accounting data must include this file.
+ *
+ *
+ */
+
+#ifndef _LINUX_CSA_H
+#define _LINUX_CSA_H
+
+#ifndef __KERNEL__
+#include <stdint.h>
+#include <sys/types.h>
+#endif
+
+/*
+ *  accounting flags per-process
+ */
+#define AFORK		0x01	/* fork, but did not exec */
+#define ASU		0x02	/* super-user privileges */
+#define ACKPT   	0x04	/* process has been checkpointed */
+#define ACORE		0x08	/* produced corefile */
+#define AXSIG		0x10	/* killed by a signal */
+#define AMORE		0x20	/* more CSA acct records for this process */
+#define AINC		0x40	/* incremental accounting record */
+
+#define AHZ		100
+
+/*
+ * Magic number - for achead.ah_magic in the 1st header.  The magic number
+ *                in the 2nd header is the inverse of this.
+ */
+#define ACCT_MAGIC_BIG          030510  /* big-endian */
+#define ACCT_MAGIC_LITTLE       030512  /* little-endian */
+#ifdef __LITTLE_ENDIAN
+#define ACCT_MAGIC ACCT_MAGIC_LITTLE
+#else
+#define ACCT_MAGIC ACCT_MAGIC_BIG
+#endif
+
+/*
+ * Record types - for achead.ah_type in the 1st header.
+ */
+#define	ACCT_KERNEL_CSA		0001	/* Kernel: CSA base record */
+#define	ACCT_KERNEL_MEM		0002	/* Kernel: memory record */
+#define	ACCT_KERNEL_IO		0004	/* Kernel: input/output record */
+#define	ACCT_KERNEL_MT 		0006	/* Kernel: multi-tasking record */
+#define	ACCT_KERNEL_MPP		0010	/* Kernel: multi-PE appl record */
+#define	ACCT_KERNEL_SOJ		0012	/* Kernel: start-of-job record */
+#define	ACCT_KERNEL_EOJ		0014	/* Kernel: end-of-job record */
+#define	ACCT_KERNEL_CFG		0020	/* Kernel: configuration record */
+
+#define	ACCT_KERNEL_SITE0	0100	/* Kernel: reserved for site */
+#define	ACCT_KERNEL_SITE1	0101	/* Kernel: reserved for site */
+
+#define	ACCT_DAEMON_NQS		0120	/* Daemon: NQS record */
+#define	ACCT_DAEMON_WKMG      	0122	/* Daemon: workload management record,
+					           i.e., LSF */
+#define	ACCT_DAEMON_TAPE	0124	/* Daemon: tape record */
+#define	ACCT_DAEMON_DMIG	0126	/* Daemon: data migration record */
+#define	ACCT_DAEMON_SOCKET	0130	/* Daemon: socket record */
+
+#define	ACCT_DAEMON_SITE0	0200	/* Daemon: reserved for site */
+#define	ACCT_DAEMON_SITE1	0201	/* Daemon: reserved for site */
+
+#define	ACCT_JOB_HEADER		0220	/* csabuild: job header record */
+#define	ACCT_CACCT		0222	/* cacct:    consolidated data */
+#define	ACCT_CMS		0224	/* cms:      command summary data */
+
+/* Record types - for achead.ah_type in the 2nd header. */
+#define	ACCT_MEM	1<<0	/* Process generated memory record */
+#define	ACCT_IO		1<<1	/* Process generated I/O record */
+#define	ACCT_MT		1<<2	/* Process used multi-tasking */
+#define	ACCT_MPP	1<<3	/* Process used multi-PE */
+
+/*
+ * Record revision levels.
+ *
+ * These are incremented to indicate that a record's format has changed since
+ * a previous release.
+ */
+#define	REV_CSA		02400	/* Kernel: CSA base record */
+#define	REV_MEM		02400	/* Kernel: memory record */
+#define	REV_IO		02400	/* Kernel: I/O record */
+#define	REV_MT 		02400	/* Kernel: multi-tasking record */
+#define	REV_MPP		02400	/* Kernel: multi-PE appl record */
+#define	REV_SOJ		02400	/* Kernel: start-of-job record */
+#define	REV_EOJ		02400	/* Kernel: end-of-job record */
+#define	REV_CFG		02400	/* Kernel: configuration record */
+
+#define REV_NQS		02400 	/* Daemon: NQS record */
+#define REV_WKMG	02400 	/* Daemon: workload management (i.e., LSF)
+				           record */
+#define REV_TAPE	02400	/* Daemon: tape record */
+#define REV_DMIG	02400	/* Daemon: data migration record */
+#define REV_SOCKET	02400	/* Daemon: socket record */
+
+#define REV_JOB		02400	/* csabuild: job header record */
+#define REV_CACCT	02400	/* cacct:    consolidated data */
+#define REV_CMS		02400	/* cms:      command summary data */
+
+/*
+ * Record header
+ */
+struct achead
+{
+	unsigned int	ah_magic:17;	/* Magic */
+	unsigned int	ah_revision:15;	/* Revision */
+	unsigned int	ah_type:8;	/* Record type */
+	unsigned int	ah_flag:8;	/* Record flags */
+	unsigned int	ah_size:16;	/* Size of record */
+};
+
+/*
+ *  In order to keep the accounting records the same size across different
+ *  machine types, record fields will be defined to types that won't
+ *  vary (i.e. uint_32_t instead of uid_t).
+*/
+
+/*
+ * Per process base accounting record.
+ */
+struct acctcsa
+{
+	struct achead	ac_hdr1;	/* Header */
+	struct achead	ac_hdr2;	/* 2nd header for continued records */ 
+	double 		ac_sbu;		/* System billing units */
+	unsigned int	ac_stat:8;	/* Exit status */
+	unsigned int	ac_nice:8;	/* Nice value */
+	unsigned char	ac_sched;	/* Scheduling discipline */
+	unsigned int	:8;		/* Unused */
+	uint32_t	ac_uid;		/* User ID */
+	uint32_t	ac_gid;		/* Group ID */
+	uint64_t	ac_ash;		/* Array session handle */
+	uint64_t	ac_jid;		/* Job ID */
+	uint64_t	ac_prid;	/* Project ID -> account ID */
+	uint32_t	ac_pid;		/* Process ID */
+	uint32_t	ac_ppid;	/* Parent process ID */
+	time_t		ac_btime;	/* Beginning time [sec since 1970] */
+	char		ac_comm[16];	/* Command name */
+/*	CPU resource usage information. */
+	uint64_t	ac_etime;	/* Elapsed time [usecs] */
+	uint64_t	ac_utime;	/* User CPU time [usec] */
+	uint64_t	ac_stime;	/* System CPU time [usec] */
+	uint64_t	ac_spare;	/* Spare field */
+	uint64_t        ac_spare1;	/* Spare field */
+};
+
+/*
+ * Memory accounting structure
+ * This structure is part of the acctmem record.
+ */
+struct memint
+{
+	uint64_t	himem;	/* Hiwater memory usage [Kbytes] */
+	uint64_t	mem1;	/* Memory integral 1 [Mbytes/uSec] */
+	uint64_t	mem2;	/* Memory integral 2 - not used */
+	uint64_t	mem3;	/* Memory integral 3 - not used */
+};
+
+/*
+ * Memory accounting record
+ */
+struct acctmem
+{
+	struct achead	ac_hdr;		/* Header */
+	double 		ac_sbu;		/* System billing units */
+	struct memint	ac_core;	/* Core memory integrals */
+	struct memint	ac_virt;	/* Virtual memory integrals */
+	uint64_t	ac_pgswap;	/* # of pages swapped  */
+	uint64_t	ac_minflt;	/* # of minor page faults */
+	uint64_t	ac_majflt;	/* # of major page faults */
+	uint64_t	ac_spare;	/* Spare field */
+};
+
+/*
+ * Input/Output accounting record
+ */
+struct acctio
+{
+	struct achead		ac_hdr;	   /* Header */
+	double 			ac_sbu;	   /* System billing units */
+	uint64_t	ac_bwtime; /* Block I/O wait time [usecs] */
+	uint64_t	ac_rwtime; /* Raw I/O wait time [usecs] */
+	uint64_t	ac_chr;    /* Number of chars (bytes) read */
+	uint64_t	ac_chw;	   /* Number of chars (bytes) written */
+	uint64_t	ac_bkr;	   /* Number of blocks read */
+	uint64_t	ac_bkw;	   /* Number of blocks written */
+	uint64_t	ac_scr;	   /* Number of read system calls */
+	uint64_t	ac_scw;	   /* Number of write system calls */
+	uint64_t	ac_spare;  /* Spare field */
+};
+
+/*
+ * Multi-tasking accounting structure
+ * This structure is part of the acctmt record.
+ */
+struct mtask
+{
+	uint64_t	mt;		/* CPU+1 connect time [usecs] */
+	uint64_t	spare1;		/* Spare field */
+	uint64_t	spare2;		/* Spare field */
+};
+
+/*
+ * Multi-tasking accounting record - currently not used, adapted from UNICOS.
+ */
+#define	ACCT_MAXCPUS	512	/* Maximum number of CPUs supported */
+
+struct acctmt
+{
+	struct achead	ac_hdr;		/* Header */
+	double 		ac_sbu;		/* System billing units */
+	unsigned int	ac_numcpu:16;	/* Max number of CPUs used */
+	unsigned int	ac_maxcpu:16;	/* Max number of CPUs available */
+	unsigned int	:32;		/* Unused */
+	int64_t		ac_smwtime;	/* Semaphore wait time [usec] */
+	struct mtask	ac_mttime[ACCT_MAXCPUS]; /* Time connected to (i+1)
+						    CPUs [usec] */
+};
+
+/*
+ * MPP PE accounting structure - MPP hardware specific.
+ * This structure is part of the acctmpp record.
+ */
+struct acctpe
+{
+	uint64_t	utime;	 /* User CPU time [usecs] */
+	uint64_t	srtime;	 /* System & remote CPU time [usecs] */
+	uint64_t	io;	 /* Number of chars transferred */
+};
+
+/*
+ * MPP accounting record - MPP hardware specific; currently not used.
+ */
+#define	ACCT_MAXPES	1024	/* Maximum number of PEs */
+
+struct acctmpp
+{
+	struct achead 	ac_hdr;		/* Header */
+	double 		ac_sbu;		/* System billing units */
+	unsigned int	ac_mpbesu:8;	/* Number of BESUs used	*/
+	unsigned int	ac_mppe:24;	/* Number of PEs used */
+	uint64_t	ac_himem; /* Maximum memory hiwater [Mbytes] */
+
+	struct acctpe	ac_mpp[ACCT_MAXPES];	/* Per PE information */
+};
+
+/*
+ * MPP Detailed PE accounting structure - currently not used
+ */
+struct acctdpe
+{
+	struct achead 	ac_hdr;		/* Header */
+
+	uint64_t	utime;		/* User CPU time [usecs] */
+	uint64_t	stime;		/* System CPU time [usecs] */
+	uint64_t	rtime;		/* Remote CPU time [usecs] */
+
+	uint64_t	ctime;		/* Connect CPU time [usecs] */
+	uint64_t	io;		/* Number of chars transferred */
+	uint64_t	spare;		/* Spare field */
+};
+
+/*
+ * Start-of-job record
+ * Written when a job is created.
+ */
+
+typedef enum
+{
+        AC_INIT_LOGIN,          /* Initiated by login */
+        AC_INIT_NQS,            /* Initiated by NQS */
+        AC_INIT_LSF,            /* Initiated by LSF */
+        AC_INIT_CROND,          /* Initiated by crond */
+        AC_INIT_FTPD,           /* Initiated by ftpd */
+        AC_INIT_INETD,          /* Initiated by inetd */
+        AC_INIT_TELNETD,        /* Initiated by telnetd */
+        AC_INIT_MAX
+} ac_inittype;
+
+
+#define AC_SOJ	1	/* Start-of-job record type */
+#define AC_ROJ	2	/* Restart-of-job record type */
+
+struct acctsoj
+{
+	struct achead 	ac_hdr;		/* Header */
+	unsigned int	ac_type:8;	/* Record type (AC_SOJ, AC_ROJ) */
+	ac_inittype	ac_init:8;	/* Initiator - currently not used */
+	unsigned int	:16;		/* Unused */
+	uint32_t	ac_uid;		/* User ID */
+	uint64_t	ac_jid;		/* Job ID */
+	time_t	 	ac_btime;	/* Start time [secs since 1970] */
+	time_t	 	ac_rstime;	/* Restart time [secs since 1970] */
+};
+
+/*
+ * End-of-job record
+ * Written when the last process of a job exits.
+ */
+struct accteoj
+{
+	struct achead	ac_hdr1;	/* Header */ 
+	struct achead	ac_hdr2;	/* 2nd header for continued records */ 
+	double 		ac_sbu;		/* System billing units */
+	ac_inittype	ac_init:8;	/* Initiator - currently not used */
+	unsigned int	ac_nice:8;	/* Nice value */
+	unsigned int	:16;		/* Unused */
+	uint32_t	ac_uid;		/* User ID */
+	uint32_t	ac_gid;		/* Group ID */
+	uint64_t	ac_ash;		/* Array session handle; not used */
+	uint64_t	ac_jid;		/* Job ID */
+	uint64_t	ac_prid;	/* Project ID; not used */
+	time_t	 	ac_btime;	/* Job start time [secs since 1970] */
+	time_t  	ac_etime;	/* Job end time   [secs since 1970] */
+	uint64_t	ac_corehimem;	/* Hiwater core mem [Kbytes] */
+	uint64_t	ac_virthimem;	/* Hiwater virt mem [Kbytes] */
+/*	CPU resource usage information. */
+	uint64_t	ac_utime;  /* User CPU time [usec]  */
+	uint64_t	ac_stime; /* System CPU time [usec] */
+	uint32_t	ac_spare;	
+};
+
+/*
+ * Accounting configuration uname structure
+ * This structure is part of the acctcfg record.
+ */
+struct ac_utsname
+{
+	char	 sysname[26];
+	char	 nodename[26];
+	char	 release[42];
+	char	 version[41];
+	char	 machine[26];
+};
+
+/*
+ * Accounting configuration record
+ * Written for accounting configuration changes.
+ */
+typedef enum
+{
+        AC_CONFCHG_BOOT,        /* Boot time (always first) */
+        AC_CONFCHG_FILE,        /* Reporting pacct file change */
+        AC_CONFCHG_ON,          /* Reporting xxx ON */
+        AC_CONFCHG_OFF,         /* Reporting xxx OFF */
+        AC_CONFCHG_INC_DELTA,   /* Report incremental acct clock delta change */        AC_CONFCHG_INC_EVENT,   /* Report incremental accounting event */
+        AC_CONFCHG_MAX
+} ac_eventtype;
+
+struct acctcfg
+{
+	struct achead	ac_hdr;		/* Header */
+	unsigned int	ac_kdmask;	/* Kernel and daemon config mask */
+	unsigned int	ac_rmask;	/* Record configuration mask */
+	int64_t		ac_uptimelen;	/* Bytes from the end of the boot
+					   record to the next boot record */
+	ac_eventtype	ac_event:8;	/* Accounting configuration event */
+	unsigned int	:24;		/* Unused */
+	time_t		ac_boottime;	/* System boot time [secs since 1970]*/
+	time_t		ac_curtime;	/* Current time [secs since 1970] */
+	struct ac_utsname  ac_uname;	/* Condensed uname information */
+};
+
+
+/*
+ * Accounting control status values.
+ */
+typedef	enum
+{
+	ACS_OFF,	/* Accounting stopped for this entry */
+	ACS_ERROFF,	/* Accounting turned off by kernel */
+	ACS_ON		/* Accounting started for this entry */
+} ac_status;
+
+/*
+ * Function codes for CSA library interface
+ */
+typedef	enum
+{
+	AC_START,	/* Start kernel, daemon, or record accounting */
+	AC_STOP,	/* Stop kernel, daemon, or record accounting */
+	AC_HALT,	/* Stop all kernel, daemon, and record accounting */
+	AC_CHECK,	/* Check a kernel, daemon, or record accounting state*/
+	AC_KDSTAT,	/* Check all kernel & daemon accounting states */
+	AC_RCDSTAT,	/* Check all record accounting states */
+	AC_JASTART,	/* Start user job accounting  */
+	AC_JASTOP,	/* Stop user job accounting */
+	AC_WRACCT,	/* Write accounting record for daemon */
+	AC_AUTH,	/* Verify executing user is authorized */
+	AC_INCACCT,	/* Control incremental accounting */
+	AC_MREQ
+} ac_request;
+
+/*
+ * Define the CSA accounting record type indices.
+ */
+typedef	enum
+{
+	ACCT_KERN_CSA,		/* Kernel CSA accounting */
+	ACCT_KERN_JOB_PROC,	/* Kernel job process summary accounting */
+	ACCT_KERN_ASH,		/* Kernel array session summary accounting */
+	ACCT_DMD_NQS, 		/* Daemon NQS accounting */
+	ACCT_DMD_WKMG, 		/* Daemon workload management (i.e. LSF) acct */
+	ACCT_DMD_TAPE,		/* Daemon tape accounting */
+	ACCT_DMD_DMIG,		/* Daemon data migration accounting */
+	ACCT_DMD_SOCKET,	/* Daemon socket accounting */
+	ACCT_DMD_SITE1,		/* Site reserved daemon acct */
+	ACCT_DMD_SITE2,		/* Site reserved daemon acct */
+	ACCT_MAXKDS,		/* Max # kernel and daemon entries */
+
+	ACCT_RCD_MPPDET,	/* Record acct for MPP detail exit info */
+	ACCT_RCD_MEM,		/* Record acct for memory */
+	ACCT_RCD_IO,		/* Record acct for input/output */
+	ACCT_RCD_MT,		/* Record acct for multi-tasking */
+	ACCT_RCD_MPP,		/* Record acct for MPP accumulated info */
+	ACCT_THD_MEM,		/* Record acct for memory size threshhold */
+	ACCT_THD_TIME,		/* Record acct for CPU time threshhold */
+	ACCT_RCD_INCACCT,	/* Record acct for incremental accounting */
+	ACCT_RCD_APPACCT,	/* Record acct for application accounting */
+	ACCT_RCD_SITE1,		/* Site reserved record acct */
+	ACCT_RCD_SITE2,		/* Site reserved record acct */
+	ACCT_MAXRCDS		/* Max # record entries */
+} ac_kdrcd;
+
+#define	ACCT_RCDS	ACCT_RCD_MPPDET /* Record acct low range definition */
+#define	NUM_KDS		(ACCT_MAXKDS - ACCT_KERN_CSA)
+#define	NUM_RCDS	(ACCT_MAXRCDS - ACCT_RCDS)
+#define	NUM_KDRCDS	(NUM_KDS + NUM_RCDS)
+
+
+/*
+ * The following structures are used to get status of a CSA accounting type.
+ */
+
+/*
+ * Accounting entry status structure
+ */
+struct actstat
+{
+	ac_kdrcd	ac_ind;		/* Entry index */
+	ac_status	ac_state;	/* Entry status */
+	int64_t		ac_param;	/* Entry parameter */
+};
+
+/*
+ * Accounting control and status structure
+ */
+#define	ACCT_PATH	128	/* Max path length for accounting file */
+
+struct actctl
+{
+	int	ac_sttnum;		/* Number of status array entries */
+	char	ac_path[ACCT_PATH];	/* Path name for accounting file */
+	struct actstat	ac_stat[NUM_KDRCDS];	/* Entry status array */
+};
+
+/*
+ * Function codes for incremental accounting; currently not used
+ */
+typedef	enum
+{
+	IA_NONE,	/* Zero entry place holder */
+	IA_DELTA,	/* Change clock delta for incremental accounting */
+	IA_EVENT,	/* Cause incremental accounting event now */
+	IA_MAX
+} ac_iafnc;
+
+/*
+ * Incremental accounting structure; currently not used
+ */
+struct actinc
+{
+	int		ac_ind;		/* Entry index */
+	ac_iafnc	ac_fnc;		/* Entry function */
+	int64_t		ac_param;	/* Entry parameter */
+};
+
+/*
+ * Daemon write accounting structure
+ */
+#define	MAX_WRACCT	1024	/* Maximum buffer size of wracct() */
+
+struct actwra
+{
+	int	 ac_did;		/* Daemon index */
+	int	 ac_len;		/* Length of buffer (bytes) */
+	uint64_t ac_jid;		/* Job ID */
+	char	*ac_buf;		/* Daemon accounting buffer */
+};
+
+/* These definitions are used with the CSA /proc IOCTL interface */
+#define CSA_PROC	"csa"
+#define CSA_IOCTL_NUM	'A'
+
+/* Start kernel, daemon, or record accounting */
+#define CSA_IOC_START	_IOWR(CSA_IOCTL_NUM,AC_START,void *)
+/* Stop kernel, daemon, or record accounting */
+#define CSA_IOC_STOP	_IOWR(CSA_IOCTL_NUM,AC_STOP,void *)
+/* Stop all kernel, daemon, and record accounting */
+#define CSA_IOC_HALT	_IOW(CSA_IOCTL_NUM,AC_HALT,void *)
+/* Check a kernel, daemon, or record accounting state*/
+#define CSA_IOC_CHECK	_IOWR(CSA_IOCTL_NUM,AC_CHECK,void *)
+/* Check all kernel & daemon accounting states */
+#define CSA_IOC_KDSTAT	_IOWR(CSA_IOCTL_NUM,AC_KDSTAT,void *)
+/* Check all record accounting states */
+#define CSA_IOC_RCDSTAT	_IOWR(CSA_IOCTL_NUM,AC_RCDSTAT,void *)
+/* Start user job accounting  */
+#define CSA_IOC_JASTART	_IOW(CSA_IOCTL_NUM,AC_JASTART,void *)
+/* Stop user job accounting */
+#define CSA_IOC_JASTOP	_IOW(CSA_IOCTL_NUM,AC_JASTOP,void *)
+/* Write accounting record for daemon */
+#define CSA_IOC_WRACCT	_IOW(CSA_IOCTL_NUM,AC_WRACCT,void *)
+/* Verify executing user is authorized */
+#define CSA_IOC_AUTH	_IOW(CSA_IOCTL_NUM,AC_AUTH,void *)
+/* Control incremental accounting */
+#define CSA_IOC_INCACCT	_IOW(CSA_IOCTL_NUM,AC_INCACCT,void *)
+#define CSA_IOC_MREQ	_IO(CSA_IOCTL_NUM,AC_MREQ,void *)
+
+#ifndef __KERNEL__
+extern int    acctctl(int func, void *act);
+#endif
+
+
+#endif	/* _LINUX_CSA_H */
diff -pNaru linux-2.4.27/include/linux/csa_internal.h linux/include/linux/csa_internal.h
--- linux-2.4.27/include/linux/csa_internal.h	1969-12-31 16:00:00 -08:00
+++ linux/include/linux/csa_internal.h	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2000-2002 Silicon Graphics, Inc and LANL  All Rights Reserved.
+ * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ *
+ * http://www.sgi.com
+ */
+
+/*
+ *  CSA (Comprehensive System Accounting)
+ *  Job Accounting for Linux
+ *
+ *  This header file contains the definitions needed for communication
+ *  between the kernel and the CSA module.
+ */
+
+#ifndef _LINUX_CSA_INTERNAL_H
+#define _LINUX_CSA_INTERNAL_H
+
+#include <linux/config.h>
+
+extern void (*do_csa_acct) (int, struct task_struct *);
+
+#if defined (CONFIG_CSA) || defined (CONFIG_CSA_MODULE)
+
+#include <linux/linkage.h>
+#include <linux/ptrace.h>
+
+static inline void csa_update_integrals(void)
+{
+	long delta;
+
+	if (current->mm) {
+		delta = current->times.tms_stime - current->csa_stimexpd;
+		current->csa_stimexpd = current->times.tms_stime;
+		current->csa_rss_mem1 += delta * current->mm->rss;
+		current->csa_vm_mem1 += delta * current->mm->total_vm;
+	}
+}
+
+static inline void csa_clear_integrals(struct task_struct *tsk)
+{
+	if (tsk) {
+		tsk->csa_stimexpd = 0;
+		tsk->csa_rss_mem1 = 0;
+		tsk->csa_vm_mem1 = 0;
+	}	
+}
+
+/*
+ *  This is the wrapper for the CSA end-of-process accounting record, which
+ *  is written by the CSA csa.c code when a task within a job exits.
+ */
+static inline void
+csa_acct(int exitcode, struct task_struct *p)
+{
+	if (do_csa_acct != NULL) {
+		do_csa_acct(exitcode, p);
+	}
+}
+
+#else	/* CONFIG_CSA || CONFIG_CSA_MODULE */
+
+#define csa_update_integrals()		do { } while (0);
+#define csa_clear_integrals(task)	do { } while (0);
+#define csa_acct(exitcode, task)	do { } while (0);
+#endif	/* CONFIG_CSA || CONFIG_CSA_MODULE */
+
+#endif	/* _LINUX_CSA_INTERNAL_H */
diff -pNaru linux-2.4.27/include/linux/job.h linux/include/linux/job.h
--- linux-2.4.27/include/linux/job.h	1969-12-31 16:00:00 -08:00
+++ linux/include/linux/job.h	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,122 @@
+/*
+ * PAGG Job kernel definitions & interfaces
+ *
+ *
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ * 
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:  This file, include/linux/job.h, contains the data 
+ * 		 structure definitions and functions prototypes used
+ * 		 by other kernel bits that communicate with the job
+ * 		 module.  One such example is Comprehensive System 
+ * 		 Accounting  (CSA).
+ */
+
+#ifndef _LINUX_JOB_H
+#define _LINUX_JOB_H
+
+/* 
+ * ================
+ * GENERAL USE INFO
+ * ================
+ */
+
+/* 
+ * The job start/stop events: These will identify the 
+ * the reason the jobstart and jobend callbacks are being 
+ * called.
+ */
+enum {
+    JOB_EVENT_IGNORE =  0,
+    JOB_EVENT_START =   1,
+    JOB_EVENT_RESTART = 2,
+    JOB_EVENT_END =  3,
+};
+
+
+/* 
+ * =========================================
+ * INTERFACE INFO FOR ACCOUNTING SUBSCRIBERS 
+ * =========================================
+ */
+
+/* To register as a job dependent accounting module */
+struct job_acctmod {
+	int     	type;   /* CSA or something else */
+	int     	(*jobstart)(int event, void *data);
+	int     	(*jobend)(int event, void *data);
+	struct module	*module;
+};
+
+
+/* 
+ * Subscriber type: Each module that registers as a accounting data
+ * "subscriber" has to have a type.  This type will identify the 
+ * the appropriate structs and macros to use when exchanging data.
+ */
+#define JOB_ACCT_CSA	0
+#define JOB_ACCT_COUNT	1 /* Number of entries available */	
+
+
+/*
+ * --------------
+ * CSA ACCOUNTING 
+ * --------------
+ */
+
+/* 
+ * For data exchange betwee job and csa.  The embedded defines
+ * identify the sub-fields
+ */
+struct job_csa {
+#define                 JOB_CSA_JID             001
+	u64		job_id;
+#define                 JOB_CSA_UID             002
+	uid_t		job_uid;
+#define                 JOB_CSA_START           004
+	time_t		job_start;
+#define                 JOB_CSA_COREHIMEM       010
+	u64		job_corehimem;
+#define                 JOB_CSA_VIRTHIMEM       020
+	u64		job_virthimem;
+#define                 JOB_CSA_ACCTFILE        040
+	struct file	*job_acctfile;
+};
+
+
+/* 
+ * ===================
+ * FUNCTION PROTOTYPES
+ * ===================
+ */
+int job_register_acct(struct job_acctmod *);
+int job_unregister_acct(struct job_acctmod *);
+u64 job_getjid(struct task_struct *);
+int job_getacct(u64, int, void *);
+int job_setacct(u64, int, int, void *);
+
+#endif /* _LINUX_JOB_H */
diff -pNaru linux-2.4.27/include/linux/mm.h linux/include/linux/mm.h
--- linux-2.4.27/include/linux/mm.h	2003-11-28 10:26:21 -08:00
+++ linux/include/linux/mm.h	2004-12-01 11:30:26 -08:00
@@ -12,6 +12,7 @@
 #include <linux/mmzone.h>
 #include <linux/swap.h>
 #include <linux/rbtree.h>
+#include <linux/csa_internal.h>
 
 extern unsigned long max_mapnr;
 extern unsigned long num_physpages;
@@ -661,6 +662,9 @@ static inline int expand_stack(struct vm
 	vma->vm_mm->total_vm += grow;
 	if (vma->vm_flags & VM_LOCKED)
 		vma->vm_mm->locked_vm += grow;
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
 	spin_unlock(&vma->vm_mm->page_table_lock);
 	return 0;
 }
diff -pNaru linux-2.4.27/include/linux/pagg.h linux/include/linux/pagg.h
--- linux-2.4.27/include/linux/pagg.h	1969-12-31 16:00:00 -08:00
+++ linux/include/linux/pagg.h	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,191 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Data structure definitions and function prototypes used to implement
+ * process aggregates (paggs).
+ *
+ * Paggs provides a generalized way to implement process groupings or
+ * containers.  Modules use these functions to register with the kernel as
+ * providers of process aggregation containers. The pagg data structures
+ * define the callback functions and data access pointers back into the
+ * pagg modules.
+ */
+
+#ifndef _LINUX_PAGG_H
+#define _LINUX_PAGG_H
+
+#include <linux/sched.h>
+
+#ifdef CONFIG_PAGG
+
+#define PAGG_NAMELN	32		/* Max chars in PAGG module name */
+
+
+/* Macro used to initialize a pagg_list structure after declaration 
+ *
+ * Macro arguments:
+ * 	l:	Task struct to init the pagg_list and semaphore in
+ */
+#define INIT_PAGG_LIST(_l)						\
+do {									\
+	INIT_LIST_HEAD(&(_l)->pagg_list);					\
+	init_rwsem(&(_l)->pagg_sem);						\
+} while(0)
+	
+
+/*
+ * Used by task_struct to manage list of pagg attachments for the process.  
+ * Each pagg provides the link between the process and the 
+ * correct pagg container.
+ *
+ * STRUCT MEMBERS:
+ *     hook:	Reference to pagg module structure.  That struct
+ *     		holds the name key and function pointers.
+ *     data:	Opaque data pointer - defined by pagg modules.
+ *     entry:	List pointers
+ */
+struct pagg {
+       struct pagg_hook	*hook;
+       void		*data;
+       struct list_head	entry;
+};
+
+/*
+ * Used by pagg modules to define the callback functions into the 
+ * module.
+ *
+ * STRUCT MEMBERS:
+ *     name:           The name of the pagg container type provided by
+ *                     the module. This will be set by the pagg module.
+ *     attach:         Function pointer to function used when attaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     detach:         Function pointer to function used when detaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     init:           Function pointer to initialization function.  This
+ *                     function is used when the module is loaded to attach
+ *                     existing processes to a default container as defined by
+ *                     the pagg module. This is optional and may be set to 
+ *                     NULL if it is not needed by the pagg module.
+ *     data:           Opaque data pointer - defined by pagg modules.
+ *     module:         Pointer to kernel module struct.  Used to increment & 
+ *                     decrement the use count for the module.
+ *     entry:	       List pointers
+ *     exec:           Function pointer to function used when a process
+ *                     in the pagg container exec's a new process. This
+ *                     is optional and may be set to NULL if it is not 
+ *                     needed by the pagg module.
+ *     refcnt:         Keep track of user count of the pagg hook
+ */
+struct pagg_hook {
+       struct module	*module;
+       char		*name;	/* Name Key - restricted to 32 characters */
+       void		*data;	/* Opaque module specific data */
+       struct list_head	entry;	/* List pointers */
+		 atomic_t refcnt; /* usage counter */
+       int		(*init)(struct task_struct *, struct pagg *);
+       int		(*attach)(struct task_struct *, struct pagg *, void*);
+       int		(*detach)(struct task_struct *, struct pagg *);
+       void		(*exec)(struct task_struct *, struct pagg *);
+};
+
+
+/* Kernel service functions for providing PAGG support */
+extern struct pagg *pagg_get(struct task_struct *task, char *key);
+extern struct pagg *pagg_alloc(struct task_struct *task, 
+			       struct pagg_hook *pt);
+extern void pagg_free(struct pagg *pagg);
+extern int pagg_hook_register(struct pagg_hook *pt_new);
+extern int pagg_hook_unregister(struct pagg_hook *pt_old);
+extern int __pagg_attach(struct task_struct *to_task, 
+			 struct task_struct *from_task);
+extern int __pagg_detach(struct task_struct *task);
+extern int __pagg_exec(struct task_struct *task);
+
+/* function used when a child process must inherit attachment to pagg
+ * containers from the parent.
+ */
+static inline int pagg_attach(struct task_struct *child, 
+			      struct task_struct *parent)
+{
+	INIT_PAGG_LIST(child);
+	if (!list_empty(&parent->pagg_list))
+		return __pagg_attach(child, parent);
+	return 0;
+}
+
+
+/* 
+ * Function used when a process must detach from pagg containers to which it
+ * is currenlty a member.
+ *
+ */
+static inline void pagg_detach(struct task_struct *task)
+{
+	if (!list_empty(&task->pagg_list))
+		__pagg_detach(task);
+}
+
+/* 
+ * function used when a process exec's.
+ *
+ */
+static inline void pagg_exec(struct task_struct *task)
+{
+	if (!list_empty(&task->pagg_list))
+		__pagg_exec(task);
+}
+
+/*
+ * Marco Used in INIT_TASK to set the head and sem of pagg_list.
+ * If CONFIG_PAGG is off, it is defined as an empty macro below.
+ */
+#define INIT_TASK_PAGG(tsk) \
+	.pagg_list = LIST_HEAD_INIT(tsk.pagg_list),     \
+	.pagg_sem  = __RWSEM_INITIALIZER(tsk.pagg_sem)  
+
+#else  /* CONFIG_PAGG */
+
+/* 
+ * Replacement macros used when PAGG (Process Aggregates) support is not
+ * compiled into the kernel.
+ */
+#define INIT_TASK_PAGG(tsk)
+#define INIT_PAGG_LIST(l) do { } while(0)
+#define pagg_attach(ct, pt)  do { } while(0)
+#define pagg_detach(t)  do {  } while(0)     
+#define pagg_exec(t)  do {  } while(0)     
+
+#endif /* CONFIG_PAGG */
+
+#endif /* _LINUX_PAGG_H */
diff -pNaru linux-2.4.27/include/linux/paggctl.h linux/include/linux/paggctl.h
--- linux-2.4.27/include/linux/paggctl.h	1969-12-31 16:00:00 -08:00
+++ linux/include/linux/paggctl.h	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,179 @@
+/* 
+ * 
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane, 
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ * 
+ * 
+ * Description:   This file, include/linux/paggctl.h, contains the data
+ *       definitions used by job to communicate with pagg via the /proc/job
+ *       ioctl interface.
+ * 
+ */
+
+#ifndef _LINUX_PAGGCTL_H
+#define _LINUX_PAGGCTL_H
+#ifndef __KERNEL__
+#include <stdint.h>
+#include <sys/types.h>
+#include <asm/unistd.h>
+#endif
+
+#define PAGG_NAMELN  32    /* Max chars in PAGG module name */
+#define PAGG_NAMESTR PAGG_NAMELN+1  /* PAGG mod name string including
+												 * room for end-of-string = '\0' */
+
+/*
+ * ====================
+ * JOB PAGG definitions
+ * ====================
+ */
+#define PAGG_JOB 	"job"	/* PAGG module identifier string */
+
+
+
+/* 
+ * ================
+ * KERNEL INTERFACE
+ * ================
+ */
+#define JOB_PROC_ENTRY	"job"	/* /proc entry name */
+#define JOB_IOCTL_NUM	'A'
+
+
+/*
+ * 
+ * Define ioctl options available in the job module 
+ *
+ */
+
+#define JOB_NOOP	_IOWR(JOB_IOCTL_NUM, 0, void *)	/* No-op options */
+
+#define JOB_CREATE	_IOWR(JOB_IOCTL_NUM, 1, void *)	/* Create a job - uid = 0 only */
+#define JOB_ATTACH	_IOWR(JOB_IOCTL_NUM, 2, void *)	/* RESERVED */
+#define JOB_DETACH	_IOWR(JOB_IOCTL_NUM, 3, void *)	/* RESERVED */
+#define JOB_GETJID	_IOWR(JOB_IOCTL_NUM, 4, void *)	/* Get Job ID for specificed pid */
+#define JOB_WAITJID	_IOWR(JOB_IOCTL_NUM, 5, void *)	/* Wait for job to complete */	
+#define JOB_KILLJID	_IOWR(JOB_IOCTL_NUM, 6, void *)	/* Send signal to job */
+#define JOB_GETJIDCNT	_IOWR(JOB_IOCTL_NUM, 9, void *)	/* Get number of JIDs on system */
+#define JOB_GETJIDLST	_IOWR(JOB_IOCTL_NUM, 10, void *)	/* Get list of JIDs on system */
+#define JOB_GETPIDCNT	_IOWR(JOB_IOCTL_NUM, 11, void *)	/* Get number of PIDs in JID */
+#define JOB_GETPIDLST	_IOWR(JOB_IOCTL_NUM, 12, void *)	/* Get list of PIDs in JID */
+#define JOB_SETJLIMIT	_IOWR(JOB_IOCTL_NUM, 13, void *)	/* Future: set job limits info */
+#define JOB_GETJLIMIT	_IOWR(JOB_IOCTL_NUM, 14, void *)	/* Future: get job limits info */
+#define JOB_GETJUSAGE	_IOWR(JOB_IOCTL_NUM, 15, void *)	/* Future: get job res. usage */
+#define JOB_FREE	_IOWR(JOB_IOCTL_NUM, 16, void *)	/* Future: Free job entry */
+#define JOB_GETUSER	_IOWR(JOB_IOCTL_NUM, 17, void *)	/* Get owner for job */
+#define JOB_GETPRIMEPID	_IOWR(JOB_IOCTL_NUM, 18, void *)	/* Get prime pid for job */
+#define JOB_SETHID	_IOWR(JOB_IOCTL_NUM, 19, void *)	/* Set HID for jid values */
+#define JOB_DETACHJID	_IOWR(JOB_IOCTL_NUM, 20, void *)	/* Detach all tasks from job */
+#define JOB_DETACHPID	_IOWR(JOB_IOCTL_NUM, 21, void *)	/* Detach a task from job */
+#define JOB_OPT_MAX	_IOWR(JOB_IOCTL_NUM, 22 , void *)	/* Should always be highest number */	
+
+
+/*
+ * Define ioctl request structures for job module 
+ */
+
+struct job_create {
+	u64 	r_jid;	/* Return value of JID */
+	u64 	jid;	/* Jid value requested */
+	int 	user;	/* UID of user associated with job */
+	int 	options;/* creation options - unused */
+};
+
+
+struct job_getjid {
+	u64 	r_jid;	/* Returned value of JID */
+	pid_t 	pid;	/* Info requested for PID */
+};
+
+
+struct job_waitjid {
+	u64 	r_jid;	/* Returned value of JID */
+	u64 	jid;	/* Waiting on specified JID */
+	int 	stat;	/* Status information on JID */
+	int 	options;/* Waiting options */ 
+};
+
+
+struct job_killjid {
+	int	r_val;	/* Return value of kill request */
+	u64	jid;	/* Sending signal to all PIDs in JID */
+	int	sig;	/* Signal to send */
+};
+
+
+struct job_jidcnt {
+	int	r_val;	/* Number of JIDs on system */
+};
+
+
+struct job_jidlst {
+	int	r_val;	/* Number of JIDs in list */
+	u64	*jid;	/* List of JIDs */
+};
+
+
+struct job_pidcnt {
+	int	r_val;	/* Number of PIDs in JID */
+	u64	jid;	/* Getting count of JID */
+};
+
+
+struct job_pidlst {
+	int	r_val;	/* Number of PIDs in list */
+	pid_t	*pid;	/* List of PIDs */
+	u64	jid;
+};
+
+
+struct job_user {
+	int	r_user; /* The UID of the owning user */
+	u64	jid;    /* Get the UID for this job */
+};
+
+struct job_primepid {
+	pid_t	r_pid; /* The prime pid */
+	u64	jid;   /* Get the prime pid for this job */
+};
+
+struct job_sethid {
+	unsigned long	r_hid; /* Value that was set */
+	unsigned long	hid;   /* Value to set to */
+};
+
+
+struct job_detachjid {
+	int	r_val; /* Number of tasks detached from job */
+	u64	jid;   /* Job to detach processes from */
+};
+
+struct job_detachpid {
+	u64	r_jid; /* Jod ID task was attached to */
+	pid_t	pid;   /* Task to detach from job */
+};
+
+#endif /* _LINUX_PAGGCTL_H */
diff -pNaru linux-2.4.27/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.4.27/include/linux/sched.h	2004-08-07 16:26:06 -07:00
+++ linux/include/linux/sched.h	2004-12-01 11:30:26 -08:00
@@ -231,6 +231,7 @@ struct mm_struct {
 
 	/* Architecture-specific MM context */
 	mm_context_t context;
+	unsigned long hiwater_rss, hiwater_vm;
 };
 
 extern int mmlist_nr;
@@ -415,6 +416,18 @@ struct task_struct {
 
 /* journalling filesystem info */
 	void *journal_info;
+
+#ifdef CONFIG_PAGG
+/* List of pagg (process aggregate) attachments */
+	struct list_head pagg_list;
+	struct rw_semaphore pagg_sem;
+#endif
+/* i/o counters(bytes read/written, blocks read/written, #syscalls, waittime */
+	unsigned long rchar, wchar, rblk, wblk, syscr, syscw, bwtime;
+#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE)
+	unsigned long csa_rss_mem1, csa_vm_mem1;
+	clock_t csa_stimexpd;
+#endif
 };
 
 /*
@@ -465,6 +478,9 @@ extern void yield(void);
  */
 extern struct exec_domain	default_exec_domain;
 
+#include <linux/pagg.h>  /* needed for INIT_TASK_PAGG, included here to
+			  * avoid conflicts with pagg in the task_struct */
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -510,6 +526,7 @@ extern struct exec_domain	default_exec_d
     blocked:		{{0}},						\
     alloc_lock:		SPIN_LOCK_UNLOCKED,				\
     journal_info:	NULL,						\
+    INIT_TASK_PAGG(tsk) 						\
 }
 
 
@@ -774,6 +791,19 @@ static inline void mmdrop(struct mm_stru
 extern void mmput(struct mm_struct *);
 /* Remove the current tasks stale references to the old mm_struct */
 extern void mm_release(void);
+
+/* Update highwater values */
+static inline void update_mem_hiwater(void)
+{
+	if (current->mm) {
+		if (current->mm->hiwater_rss < current->mm->rss) {
+			current->mm->hiwater_rss = current->mm->rss;
+		}
+		if (current->mm->hiwater_vm < current->mm->total_vm) {
+			current->mm->hiwater_vm = current->mm->total_vm;
+		}
+	}
+}
 
 /*
  * Routines for handling the fd arrays
diff -pNaru linux-2.4.27/kernel/Makefile linux/kernel/Makefile
--- linux-2.4.27/kernel/Makefile	2001-09-16 21:22:40 -07:00
+++ linux/kernel/Makefile	2004-12-01 11:30:26 -08:00
@@ -9,7 +9,7 @@
 
 O_TARGET := kernel.o
 
-export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o exec_domain.o printk.o
+export-objs = signal.o sys.o kmod.o context.o ksyms.o pm.o exec_domain.o printk.o pagg.o job.o
 
 obj-y     = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
 	    module.o exit.o itimer.o info.o time.o softirq.o resource.o \
@@ -19,6 +19,9 @@ obj-y     = sched.o dma.o fork.o exec_do
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += ksyms.o
 obj-$(CONFIG_PM) += pm.o
+obj-$(CONFIG_PAGG) += pagg.o
+obj-$(CONFIG_PAGG_JOB) += job.o
+obj-$(CONFIG_CSA) += csa.o
 
 ifneq ($(CONFIG_IA64),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -pNaru linux-2.4.27/kernel/csa.c linux/kernel/csa.c
--- linux-2.4.27/kernel/csa.c	1969-12-31 16:00:00 -08:00
+++ linux/kernel/csa.c	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,2182 @@
+/*
+ * Copyright (c) 2000 Silicon Graphics, Inc and LANL  All Rights Reserved.
+ * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ *
+ * http://www.sgi.com
+ */
+
+/*
+ *  Description:
+ *	This file, csa.c, contains the procedures that handle kernel CSA
+ *	job accounting. It configures CSA, writes CSA accounting
+ *	records, and processes the acctctl /proc ioctl.  This code can
+ *	either be compiled directly into the kernel or compiled as
+ *	a loadable module.
+ *
+ *	During initialization, this code registers procedure callbacks
+ *	with the PAGG job code.
+ *
+ *  Author:
+ *	Marlys Kohnke (kohnke@sgi.com)
+ *	Jay Lan (jlan@sgi.com)
+ *
+ *  Contributors:
+ *
+ *  Changes:
+ *	January 31, 2001  (kohnke)  Changed to use semaphores rather than
+ *	spinlocks.  Was seeing a spinlock deadlock sometimes when an accounting
+ *	record was being written to disk with 2.4.0 (didn't happen with 
+ *	2.4.0-test7).
+ *
+ *	February 2, 2001  (kohnke)  Changed to handle being compiled directly
+ *	into the kernel, not just compiled as a loadable module. Renamed
+ *	init_module() as init_csa() and cleanup_module() as cleanup_csa().
+ *	Added calls to module_init() and module_exit().
+ *
+ *	January 21, 2003 (jlan)  Changed to provide /proc ioctl interface.
+ *	Also, provided MODULE_* clause.
+ */
+
+
+#include <linux/config.h>
+
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/csa_internal.h>
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/file.h>
+#include <linux/utsname.h>
+#include <linux/proc_fs.h>
+#include <asm/uaccess.h>
+#include <asm/semaphore.h>
+
+#include <linux/csa.h>
+#include <linux/job.h>
+
+#define CSA_SYSCALL	1
+
+#ifdef CSA_SYSCALL
+int do_acctctl(int, void *);
+#endif
+
+static int csa_registered = 0;
+
+MODULE_AUTHOR("Silicon Graphics, Inc.");
+MODULE_DESCRIPTION("CSA Kernel Module");
+MODULE_LICENSE("GPL");
+
+int		csa_jstart(int, void *);
+int		csa_jexit(int, void *);
+void		csa_acct_eop(int, struct task_struct *);
+static int	csa_modify_buf(char *, struct acctcsa *, struct acctmem *,
+			struct acctio *, int, int);
+static int	csa_write(char *, int, int, uint64_t, int, struct job_csa *);
+static void	csa_config_make(ac_eventtype, struct acctcfg *);
+static int	csa_config_write(ac_eventtype,struct file *);
+static void	csa_header(struct achead *, int, int, int);
+static long int sc_CLK(long int);
+
+#if defined __ia64__
+#define JID_ERR1 "do_csa_acct:  No job table entry for jid 0x%lx.\n"
+#define JID_ERR2 "csa user job accounting write error %d, jid 0x%lx\n"
+#define JID_ERR3 "Can't disable csa user job accounting jid 0x%lx\n"
+#define JID_ERR4 "csa user job accounting disabled, jid 0x%lx\n"
+#else
+#define JID_ERR1 "do_csa_acct:  No job table entry for jid 0x%llx.\n"
+#define JID_ERR2 "csa user job accounting write error %d, jid 0x%llx\n"
+#define JID_ERR3 "Can't disable csa user job accounting jid 0x%llx\n"
+#define JID_ERR4 "csa user job accounting disabled, jid 0x%llx\n"
+#endif
+
+/* #define CSA_DEBUG 1 */
+
+#ifdef CSA_DEBUG
+#define PRINTK(args...) printk(args)
+#else
+#define PRINTK(args...)
+#endif /* CSA_DEBUG */
+
+/* this defines can be removed once they're available in kernel header files */
+#define USEC_PER_SEC	1000000L	/* number of usecs for 1 second */
+#define USEC_PER_TICK	(USEC_PER_SEC/HZ)
+#define NBPC		PAGE_SIZE 	/* Number of bytes per click */
+#define ctob(x) ((uint64_t)(x)*NBPC)
+
+static struct file	*csa_acctvp = (struct file *)NULL;
+static time_t boottime = 0;
+
+struct  timeval acct_now;               /* present time (sec, usec) */
+
+static DECLARE_MUTEX(csa_sem);
+static DECLARE_MUTEX(csa_write_sem);
+
+static int     csa_flag = 0;           /* accounting start state flag */
+char    csa_path[ACCT_PATH] = "";      /* current accounting file path name */
+char    new_path[ACCT_PATH] = "";       /* new accounting file path name */
+
+static int csa_ioctl( struct inode *, struct file *, unsigned int, 
+		unsigned long);
+/* /proc dir entry */
+static struct proc_dir_entry *csa_proc_entry;
+
+/* File Operations for our proc file. */
+static struct file_operations csa_file_ops = {
+	owner: THIS_MODULE,
+	ioctl: csa_ioctl
+};
+
+static struct job_acctmod csa_job_callbacks = {
+	.type = JOB_ACCT_CSA,
+	.jobstart = csa_jstart,
+	.jobend = csa_jexit,
+	.module = THIS_MODULE
+};
+
+
+/* modify this when changes are made to ac_kdrcd in csa.h */ 
+char *acct_dmd_name[ACCT_MAXKDS] = 
+		{"CSA",
+		 "JOB",
+		 "ASH",
+		 "NQS",
+		 "WORKLOAD MGMT",
+		 "TAPE",
+		 "DATA MIGRATION",
+		 "SOCKET",
+		 "SITE1",
+		 "SITE2" };
+
+typedef enum {
+        A_SYS,          /* system accounting action     (0) */
+        A_CJA,          /* Job accounting action        (1) */
+        A_DMD,          /* daemon accounting action     (2) */
+        A_MAX} a_fnc;
+
+struct  actstat acct_dmd[ACCT_MAXKDS][A_MAX];
+struct  actstat acct_rcd[ACCT_MAXRCDS-ACCT_RCDS][A_MAX];
+
+/*  Initialize the CSA accounting state information. */
+#define INIT_DMD(t, i, s, p)    acct_dmd[i][t].ac_ind = i;              \
+                                acct_dmd[i][t].ac_state = s;            \
+                                acct_dmd[i][t].ac_param = p;
+#define INIT_RCD(t, i, s, p)    acct_rcd[i-ACCT_RCDS][t].ac_ind = i;    \
+                                acct_rcd[i-ACCT_RCDS][t].ac_state = s;  \
+                                acct_rcd[i-ACCT_RCDS][t].ac_param = p;
+
+
+/*
+ *	register procedure callbacks with the kernel/csa.c CSA
+ *	code and with the PAGG job code
+ */
+static int __init
+init_csa(void)
+{
+	int retval = 0;
+
+	if (csa_registered) {
+		/*
+		 *
+		 * incorrectly using csa_job_acct.c as a loadable module and
+		 * compiled into the kernel??
+		 */     
+		 printk(KERN_WARNING "init_csa: %s\n",
+			"Multiple attempts to register CSA support\n");
+		return -EBUSY;
+	} else {
+		csa_registered = 1;
+	}
+
+	/*
+	 * register callbacks with the PAGG job code to process 
+	 * start-of-job and end-of-job accounting records.  If this is a
+	 * module, this registration will also increment the job module
+	 * use count so the job module won't be unloaded out from under
+	 * the CSA module.
+	 */
+	retval = job_register_acct(&csa_job_callbacks);
+	if (retval != 0) {
+		printk(KERN_INFO "CSA: failed to register job\n");
+		return retval;
+	}
+
+	/* setup our /proc entry file */
+	csa_proc_entry = create_proc_entry(CSA_PROC, S_IFREG|S_IRUGO,
+				&proc_root);
+	if (!csa_proc_entry) {
+		csa_registered = 0;
+		job_unregister_acct(&csa_job_callbacks);
+		return -1;
+	}
+
+	csa_proc_entry->proc_fops = &csa_file_ops;
+	csa_proc_entry->proc_iops = NULL;
+
+	do_csa_acct = csa_acct_eop;
+#ifdef CSA_SYSCALL
+	do_csa_syscalls = do_acctctl;
+#endif
+
+	printk(KERN_INFO "CSA: initialized\n");
+
+	return retval;
+}
+
+
+/*
+ *	Do module cleanup before the module is removed; unregister
+ *	procedure callbacks with the kernel non-module CSA code and
+ *	with the PAGG job module (which decrements the job module use count).
+ */
+static void __exit
+cleanup_csa(void)
+{
+	int retval = 0;
+
+	csa_registered = 0;
+	do_csa_acct = NULL;
+#ifdef CSA_SYSCALL
+	do_csa_syscalls = NULL;
+#endif
+
+	retval = job_unregister_acct(&csa_job_callbacks);
+	if (retval < 0) {
+		printk(KERN_ERR "CSA module can't unregister with job module."
+		       "Continuing with CSA module cleanup.\n");
+	} 
+	remove_proc_entry(CSA_PROC, &proc_root);
+
+	printk(KERN_INFO "CSA removed\n");
+	return;
+}
+
+/*
+ *	Initialize the CSA accounting state table.
+ *	Modify this when changes are made to ac_kdrcd in csa.h
+ *	
+ */
+static void
+csa_init_acct(int flag)
+{
+	csa_flag = flag;
+
+	boottime = xtime.tv_sec - (jiffies / HZ);
+
+	/*  Initialize system accounting states. */
+	INIT_DMD(A_SYS, ACCT_KERN_CSA,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_KERN_JOB_PROC,	ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_KERN_ASH,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_NQS,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_WKMG,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_TAPE,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_SOCKET,	ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_DMIG,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_SITE1,		ACS_OFF, 0);
+	INIT_DMD(A_SYS, ACCT_DMD_SITE2,		ACS_OFF, 0);
+
+	INIT_RCD(A_SYS, ACCT_RCD_MPPDET,	ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_MEM,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_IO,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_MT,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_MPP,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_THD_MEM,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_THD_TIME,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_INCACCT,	ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_APPACCT,	ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_SITE1,		ACS_OFF, 0);
+	INIT_RCD(A_SYS, ACCT_RCD_SITE2,		ACS_OFF, 0);
+
+	return;
+}
+
+/*
+ *	convert ticks into microseconds; necessary kernel math ops not
+ *	available on 32-bit systems, so can't use uint64_t
+ */
+static long int
+sc_CLK(long int clock)
+{
+	long int sec, split;
+
+	sec = clock / HZ;
+	split = (clock % HZ) * 1000000 / HZ;
+
+	return ((sec * 1000000) + split);
+}
+
+/*  Initialize CSA accounting header. */
+static void
+csa_header(struct achead *head, int revision, int type, int size)
+{
+	head->ah_magic = ACCT_MAGIC;
+	head->ah_revision = revision;
+	head->ah_type = type;
+	head->ah_flag = 0;
+	head->ah_size = size;
+
+	return;
+}
+
+/*
+ *  Create a CSA end-of-process accounting record and write it to 
+ *  appropriate file(s)
+ */
+void
+csa_acct_eop(int exitcode, struct task_struct *p)
+{
+	char	acctent[sizeof(struct acctcsa) +
+			sizeof(struct acctmem) +
+			sizeof(struct acctio) ];
+	char	modacctent[sizeof(struct acctcsa) +
+			   sizeof(struct acctmem) +
+			   sizeof(struct acctio) ];
+	struct	acctcsa	*csa = NULL;
+	struct  acctmem *mem = NULL;
+	struct  acctio  *io = NULL;
+	struct	achead	*hdr1, *hdr2;
+	char	*cb = acctent;
+	struct job_csa job_acctbuf;
+	uint64_t jid = 0;
+	int	len = 0;
+	int	csa_enabled = 0;
+	int	ja_enabled = 0;
+	int	io_enabled = 0;
+	int	mem_enabled = 0;
+	int	retval = 0;
+	uint64_t memtime;
+
+	if (p == NULL) {
+		printk(KERN_ERR "do_csa_acct: CSA null task pointer\n");
+		return;
+	}
+	jid = job_getjid(p);
+	if (jid <= 0) {
+		/* no job table entry; not all processes are part of a job */
+		return;
+	}
+	memset(&job_acctbuf, 0, sizeof(job_acctbuf));
+	retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf);
+	if (retval != 0) {
+		/* couldn't get accounting info stored in the job table entry */
+		printk(KERN_WARNING JID_ERR1, jid);
+		return;
+	}
+
+	down(&csa_sem);
+	/*
+	 * figure out what's turned on, which determines which record types
+	 * need to be written.  All records are written to a user job
+	 * accounting file.  Only those record types configured on are
+	 * written to the system pacct file
+	 */
+	if (job_acctbuf.job_acctfile != (struct file *)NULL) {
+		ja_enabled = 1;
+	}
+        if (acct_dmd[ACCT_KERN_CSA][A_SYS].ac_state == ACS_ON) {
+		csa_enabled = 1;
+	}
+        if (acct_rcd[ACCT_RCD_IO-ACCT_RCDS][A_SYS].ac_state == ACS_ON) {
+		io_enabled = 1;
+	}
+        if (acct_rcd[ACCT_RCD_MEM-ACCT_RCDS][A_SYS].ac_state == ACS_ON) {
+		mem_enabled = 1;
+	}
+
+	if (!ja_enabled && !csa_enabled) {
+		/* nothing to do */
+		up(&csa_sem);
+		return;
+	}
+	up(&csa_sem);
+
+	csa = (struct acctcsa *)acctent;
+	memset(csa, 0, sizeof(struct acctcsa));
+	hdr1 = &csa->ac_hdr1;
+	csa_header(hdr1, REV_CSA, ACCT_KERNEL_CSA, sizeof(struct acctcsa) );
+	hdr2 = &csa->ac_hdr2;
+	csa_header(hdr2, REV_CSA, ACCT_KERNEL_CSA, 0 );
+	hdr2->ah_magic = ~ACCT_MAGIC;
+ 
+	csa->ac_stat = exitcode;
+	csa->ac_uid  = p->uid;
+	csa->ac_gid  = p->gid;
+
+	/* XXX change this when array session handle info available */
+	csa->ac_ash  = 0;
+	csa->ac_jid  = job_acctbuf.job_id;
+	/* XXX change this when project ids are available */
+	csa->ac_prid = 0;
+	csa->ac_nice = task_nice(p);
+	csa->ac_sched = p->policy;
+
+	csa->ac_pid  = p->pid;
+	csa->ac_ppid = (p->parent) ? p->parent->pid : 0;
+	if (p->flags & PF_FORKNOEXEC) {
+		csa->ac_hdr1.ah_flag |= AFORK;
+	}
+	if (p->flags & PF_SUPERPRIV) {
+		csa->ac_hdr1.ah_flag |= ASU;
+	}
+	if (p->flags & PF_DUMPCORE) {
+		csa->ac_hdr1.ah_flag |= ACORE;
+	}
+	if (p->flags & PF_SIGNALED) {
+		csa->ac_hdr1.ah_flag |= AXSIG;
+	}
+	csa->ac_hdr1.ah_flag &= ~ACKPT;
+
+	strncpy(csa->ac_comm, p->comm, sizeof(csa->ac_comm));
+	csa->ac_btime = CT_TO_SECS(p->start_time) + (xtime.tv_sec -
+		(jiffies / HZ));
+	/*
+	 * cpu usage is accumulated by the kernel in ticks. 
+	 * convert from clock ticks to microseconds; each process gets
+	 * a minimum of a tick for elapsed time.  If the granularity
+	 * changes to something finer than a tick in the future,
+	 * then these zero cpu and elapsed time modifications should be 
+	 * looked at again.
+	 */
+	csa->ac_etime = (jiffies - p->start_time == 0) ? (USEC_PER_TICK) : 
+		((uint64_t)(jiffies - p->start_time) * USEC_PER_TICK);
+
+	cb += sizeof(struct acctcsa);
+	len += sizeof(struct acctcsa);
+
+	/* microseconds */
+	csa->ac_utime = (p->utime.tv_sec * USEC_PER_SEC) + p->utime.tv_usec;
+	csa->ac_stime = (p->stime.tv_sec * USEC_PER_SEC) + p->stime.tv_usec;
+	/* Each process gets a minimum of a half tick cpu time */
+	if ((csa->ac_utime == 0) && (csa->ac_stime == 0)) {
+		csa->ac_stime = USEC_PER_TICK/2;
+	}
+
+	/*   Create the memory record if needed */
+	if (ja_enabled || mem_enabled) {
+		mem = (struct acctmem *)cb;
+		memset(mem, 0, sizeof(struct acctmem));
+		hdr1->ah_flag |= AMORE;
+		hdr2->ah_type |= ACCT_MEM;
+		hdr1 = &mem->ac_hdr;
+		csa_header(hdr1, REV_MEM, ACCT_KERNEL_MEM,
+			sizeof(struct acctmem) );
+
+		/* adjust from pages/ticks to Mb/usec */
+		memtime = sc_CLK((long int)p->csa_rss_mem1);
+		mem->ac_core.mem1 = ctob(memtime) / (1024 * 1024);
+		memtime = sc_CLK((long int)p->csa_vm_mem1);
+		mem->ac_virt.mem1 = ctob(memtime) / (1024 * 1024);
+
+		/* adjust page size to 1K units */
+		if (p->mm) {
+		    mem->ac_virt.himem = p->mm->hiwater_vm * (PAGE_SIZE / 1024);
+		    mem->ac_core.himem = p->mm->hiwater_rss * (PAGE_SIZE/1024);
+		    /*
+		     * For processes with zero systime, set the integral
+		     * to the highwater mark rather than leave at zero
+		     */
+		    if (mem->ac_core.mem1 == 0) {
+			mem->ac_core.mem1 = mem->ac_core.himem / 1024;
+		    }
+		    if (mem->ac_virt.mem1 == 0) {
+			mem->ac_virt.mem1 = mem->ac_virt.himem / 1024;
+		    }
+		}
+
+		mem->ac_pgswap = p->nswap;
+		mem->ac_minflt = p->min_flt;
+		mem->ac_majflt = p->maj_flt;
+
+		cb += sizeof(struct acctmem);
+		hdr2->ah_size += sizeof(struct acctmem);
+		len += sizeof(struct acctmem);
+	}
+	/*  Create the I/O record */
+	if (ja_enabled || io_enabled) {
+		io = (struct acctio *)cb;
+		memset(io, 0, sizeof(struct acctio));	
+		hdr1->ah_flag |= AMORE;
+		hdr2->ah_type |= ACCT_IO;
+		hdr1 = &io->ac_hdr;
+		csa_header(hdr1, REV_IO, ACCT_KERNEL_IO,
+			sizeof(struct acctio) );
+
+		/* convert from ticks to microseconds */
+		/* XXX when able to do kernel 64 bit divide, change type */
+		PRINTK(KERN_INFO "CSA: block wait time %lu\n",(unsigned long int)p->bwtime);
+		io->ac_bwtime = CT_TO_USECS((unsigned long int)p->bwtime);
+		PRINTK(KERN_INFO "CSA: converted bwtime %lu\n",io->ac_bwtime);
+
+		io->ac_bkr = p->rblk;
+		io->ac_bkw = p->wblk;
+
+		/* raw wait time; currently not used */
+		io->ac_rwtime = 0;
+
+		io->ac_chr = p->rchar;
+		io->ac_chw = p->wchar;
+		io->ac_scr  = p->syscr;
+		io->ac_scw  = p->syscw;
+
+		cb += sizeof(struct acctio);
+		hdr2->ah_size += sizeof(struct acctio);
+		len += sizeof(struct acctio);
+	}
+
+	/* record always written to a user job accounting file */
+	if ((len > 0) && (job_acctbuf.job_acctfile != (struct file *)NULL) ) {
+		csa_write((caddr_t)&acctent, ACCT_KERN_CSA,
+			len, jid, A_CJA, &job_acctbuf);
+	}
+	/*
+	 * check the cpu time and virtual memory thresholds before writing
+	 * this record to the system pacct file
+	 */
+	if ((acct_rcd[ACCT_THD_MEM-ACCT_RCDS][A_SYS].ac_state == ACS_ON) &&
+	    (ja_enabled || mem_enabled)) {
+		if (mem->ac_virt.himem < 
+	            acct_rcd[ACCT_THD_MEM-ACCT_RCDS][A_SYS].ac_param) {
+			/* don't write record to pacct */
+			return;
+		}
+	}
+	if ((acct_rcd[ACCT_THD_TIME-ACCT_RCDS][A_SYS].ac_state == ACS_ON)) {
+	     if ((csa->ac_utime + csa->ac_stime) <
+	          acct_rcd[ACCT_THD_TIME-ACCT_RCDS][A_SYS].ac_param) {
+			/* don't write record to pacct */
+			return;
+	     }
+	}
+				
+	if ((len > 0) && (csa_acctvp != (struct file *)NULL) && csa_enabled ) {
+		if (io_enabled && mem_enabled) {
+			/* write out buffer as is to system pacct file */
+			csa_write((caddr_t)&acctent, ACCT_KERN_CSA,
+				len, jid, A_SYS, &job_acctbuf);
+		} else {
+			/* only write out record types turned on */
+			len = csa_modify_buf(modacctent, csa, mem, io,
+				io_enabled, mem_enabled);
+			csa_write((caddr_t)&modacctent, ACCT_KERN_CSA,
+				len, jid, A_SYS, &job_acctbuf);
+		}
+	}
+	return;
+}
+
+/*
+ *	Copy needed accounting records into buffer, skipping record
+ *	types which are not enabled.  May need to adjust downward
+ *	the second header size if not both memory and io continuation
+ *	records are written, plus adjust the second header types and
+ * 	first header flags.
+ */
+static int
+csa_modify_buf(char *modacctent, struct acctcsa *csa, struct acctmem *mem,
+	       struct acctio *io, int io_enabled, int mem_enabled)
+{
+	int size = 0;
+	int len = 0;
+	char *bufptr;
+	struct achead *hdr1, *hdr2;
+
+	size = sizeof(struct acctcsa) + sizeof(struct acctmem) +
+		sizeof(struct acctio);
+	memset(modacctent, 0, size);
+	bufptr = modacctent;
+	/*
+	 * adjust values that might not be correct anymore if all of
+	 * the continuation records aren't written out to the pacct file
+	 */
+	hdr1 = &csa->ac_hdr1;
+	hdr2 = &csa->ac_hdr2;
+	hdr1->ah_flag &= ~AMORE;
+	hdr2->ah_type = ACCT_KERNEL_CSA;
+	hdr2->ah_size = 0;
+	if (mem_enabled) {
+		hdr1->ah_flag |= AMORE;
+		hdr2->ah_type |= ACCT_MEM;
+		hdr2->ah_size += sizeof(struct acctmem);
+		hdr1 = &mem->ac_hdr;
+		hdr1->ah_flag &= ~AMORE;
+	}
+	if (io_enabled) {
+		hdr1->ah_flag |= AMORE;
+		hdr2->ah_type |= ACCT_IO;
+		hdr2->ah_size += sizeof(struct acctio);
+		hdr1 = &io->ac_hdr;
+		hdr1->ah_flag &= ~AMORE;
+	}	
+	memcpy(bufptr, csa, sizeof(struct acctcsa));
+	bufptr += sizeof(struct acctcsa);
+	len += sizeof(struct acctcsa);
+
+	if (mem_enabled) {
+		memcpy(bufptr, mem, sizeof(struct acctmem));
+		len += sizeof(struct acctmem);
+		bufptr += sizeof(struct acctmem);
+	}
+	if(io_enabled) {
+		memcpy(bufptr, io, sizeof(struct acctio));
+		len += sizeof(struct acctio);
+	}
+
+	return len;
+}
+
+
+/*
+ * csa_ioctl
+ *
+ */
+static int
+csa_ioctl(
+	struct inode *inode,
+	struct file *file,
+	unsigned int req,
+	unsigned long data)
+{
+	struct	actctl	actctl;
+	struct	actstat	actstat;
+
+	int	daemon = 0;
+	int	error = 0;
+	int	err = 0;
+	static	int	flag = 010000;
+	int	ind;
+	int	id;
+	int	len;
+	int	num;
+
+	PRINTK(KERN_INFO "CSA: csa_ioctl, req=%d\n", req);
+	down(&csa_sem);
+	if (!csa_flag) {
+		csa_init_acct(flag++);
+	}
+	up(&csa_sem);
+
+	num = (req & 0x0ff);
+	if ((num < AC_START) || (num >= AC_MREQ) ) {
+		return -EINVAL;
+	}
+
+	memset(&actctl, 0, sizeof(struct actctl));
+	memset(&actstat, 0, sizeof(struct actstat));
+
+	switch (req) {
+	/*
+	 *  Start specified types of accounting.
+	 */
+	case CSA_IOC_START:
+	    {
+		int id, ind;
+		struct file *newvp;
+
+		PRINTK(KERN_INFO "CSA: CSA_IOC_START\n");
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+
+		if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum;
+		if ((num < 0) || (num > NUM_KDRCDS) ) {
+			error = -EINVAL;
+			break;
+
+		}
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_from_user(&actctl, (void*)data, len)) {
+			error = -EFAULT;
+			break;
+		}
+		/*
+		 *	Verify all indexes in actstat structures specified.
+	 	 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if ((id < 0) || (id >= ACCT_MAXRCDS) ) {
+				error = -EINVAL;
+				break;
+			}
+
+			if (id == ACCT_MAXKDS) {
+				error = -EINVAL;
+				break;
+			}
+		}
+		down(&csa_sem);
+		/*
+		 *	If an accounting file was specified, make sure
+		 *	that we can access it.
+		 */
+		if (strlen(actctl.ac_path) ) {
+			strncpy(new_path, actctl.ac_path, ACCT_PATH);
+			newvp = filp_open(new_path,O_WRONLY|O_APPEND, 0);
+			if (IS_ERR(newvp)) {
+				error = PTR_ERR(newvp);
+				up(&csa_sem);
+				break;
+			} else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) {
+				error = -EACCES;
+				filp_close(newvp, NULL);
+				up(&csa_sem);
+				break;
+			} else if (!newvp->f_op->write) {
+				error = -EIO;
+				filp_close(newvp, NULL);
+				up(&csa_sem);
+				break;
+			}
+			if ((csa_acctvp != (struct file *)NULL) &&
+					csa_acctvp == newvp) {
+				/*
+				 * this file already being used, so ignore
+				 * request to use this file; just continue on
+				 */
+				filp_close(newvp, NULL);
+				newvp = (struct file *)NULL;
+			}
+
+		} else {
+			newvp = (struct file *)NULL;
+		}
+		/*
+		 *	If a new accounting file was specified and there's
+		 *	an old accounting file, stop writing to it.
+		 */
+		if (newvp != (struct file *)NULL) {
+			if (csa_acctvp != (struct file *)NULL) {
+				error = csa_config_write(AC_CONFCHG_FILE,NULL);
+				filp_close(csa_acctvp, NULL);
+			} else if (!csa_flag) {
+				csa_init_acct(flag++);
+			}
+
+			strncpy(csa_path, new_path, ACCT_PATH);
+			down(&csa_write_sem);
+			csa_acctvp = newvp;
+			up(&csa_write_sem);
+
+		} else {
+			if (csa_acctvp == (struct file *)NULL) {
+				error = -EINVAL;
+				up(&csa_sem);
+				break;
+			}
+		}
+
+		/*
+		 *  Loop through each actstat block and turn ON that accounting.
+		 */
+		for(ind = 0; ind < num; ind++) {
+			struct	actstat	*stat;
+
+			id = actctl.ac_stat[ind].ac_ind;
+			stat = &actctl.ac_stat[ind];
+			if (id < ACCT_RCDS)  {
+				acct_dmd[id][A_SYS].ac_state = ACS_ON;
+				acct_dmd[id][A_SYS].ac_param = stat->ac_param;
+
+				stat->ac_state = acct_dmd[id][A_SYS].ac_state;
+				stat->ac_param = acct_dmd[id][A_SYS].ac_param;
+			} else {
+				int	tid = id -ACCT_RCDS;
+
+				acct_rcd[tid][A_SYS].ac_state = ACS_ON;
+				acct_rcd[tid][A_SYS].ac_param = stat->ac_param;
+
+				stat->ac_state = acct_rcd[tid][A_SYS].ac_state;
+				stat->ac_param = acct_rcd[tid][A_SYS].ac_param;
+			}
+		}
+
+		up(&csa_sem);
+		error = csa_config_write(AC_CONFCHG_ON, NULL);
+		/*
+		 *  Return the accounting states to the user.
+	 	 */
+		if (copy_to_user((void*)data, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Stop specified types of accounting.
+	 */
+	case CSA_IOC_STOP:
+	    {
+		int	id, ind;
+
+		PRINTK(KERN_INFO "CSA: CSA_IOC_STOP\n");
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+
+		if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum;
+		if ((num <= 0) || (num > NUM_KDRCDS) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_from_user(&actctl, (void*)data, len)) {
+			error = -EFAULT;
+			break;
+		}
+
+		/*
+		 *  Verify all of the indexes in actstat structures specified.
+	 	 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if ((id < 0) || (id >= NUM_KDRCDS) ) {
+				error = -EINVAL;
+				break;
+			}
+		}
+
+		/*
+		 * Loop through each actstat block and turn off that accounting.
+		 */
+		down(&csa_sem);
+		/*
+		 *	Disable accounting for this entry.
+		 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if (id < ACCT_RCDS) {
+				acct_dmd[id][A_SYS].ac_state = ACS_OFF;
+				acct_dmd[id][A_SYS].ac_param = 0;
+
+				actctl.ac_stat[ind].ac_state =
+					acct_dmd[id][A_SYS].ac_state;
+				actctl.ac_stat[ind].ac_param = 0;
+			} else {
+				int	tid = id -ACCT_RCDS;
+
+				acct_rcd[tid][A_SYS].ac_state = ACS_OFF;
+				acct_rcd[tid][A_SYS].ac_param = 0;
+				actctl.ac_stat[ind].ac_state =
+					acct_rcd[tid][A_SYS].ac_state;
+				actctl.ac_stat[ind].ac_param = 
+					acct_rcd[tid][A_SYS].ac_param;
+			}
+		}		/* end of for(ind) */
+		/*
+		 *  Check the daemons to see if any are still on.
+	 	 */
+		for(ind = 0; ind < ACCT_MAXKDS; ind++) {
+			if (acct_dmd[ind][A_SYS].ac_state == ACS_ON) {
+				daemon += 1<<ind;
+			}
+		}
+		up(&csa_sem);
+		/*
+		 *  If all daemons are off and there's an old accounting file,
+		 *	stop writing to it.
+	 	*/
+		if (!daemon && (csa_acctvp != (struct file *)NULL) ) {
+			error = csa_config_write(AC_CONFCHG_OFF,NULL);
+			filp_close(csa_acctvp, NULL);
+			down(&csa_write_sem);
+			csa_acctvp = (struct file *)NULL;
+			up(&csa_write_sem);
+		} else {
+			error = csa_config_write(AC_CONFCHG_OFF, NULL);
+		}
+		/*
+		 *  Return the accounting states to the user.
+	 	*/
+		if (copy_to_user((void*)data, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Halt all accounting.
+	 */
+	case CSA_IOC_HALT:
+	    {
+		int	ind;
+
+		PRINTK(KERN_INFO "CSA: CSA_IOC_HALT\n");
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		down(&csa_sem);
+	 	/*  Turn off all accounting if any is on. */
+		for(ind = 0; ind <ACCT_MAXKDS; ind++) {
+			acct_dmd[ind][A_SYS].ac_state = ACS_OFF;
+			acct_dmd[ind][A_SYS].ac_param = 0;
+		}
+
+		for(ind = ACCT_RCDS; ind < ACCT_MAXRCDS; ind++) {
+			int	tid = ind -ACCT_RCDS;
+
+			acct_rcd[tid][A_SYS].ac_state = ACS_OFF;
+			acct_rcd[tid][A_SYS].ac_param = 0;
+		}
+ 
+		up(&csa_sem);
+	 	/*  If there's an old accounting file, stop writing to it. */
+		if (csa_acctvp != (struct file *)NULL) {
+			error = csa_config_write(AC_CONFCHG_OFF,NULL);
+			filp_close(csa_acctvp, NULL);
+			down(&csa_write_sem);
+			csa_acctvp = (struct file *)NULL;
+			up(&csa_write_sem);
+		}
+	    }
+	    break;
+
+	/*
+	 * Process daemon/record status function.
+	 */
+	case CSA_IOC_CHECK:
+	    {
+		PRINTK(KERN_INFO "CSA: CSA_IOC_CHECK\n");
+		if (copy_from_user(&actstat, (void*)data, sizeof(struct actstat)) ) {
+			error = -EFAULT;
+			break;
+		}
+		id = actstat.ac_ind;
+		if ((id >= 0) && (id < ACCT_MAXKDS) ) {
+			actstat.ac_state = acct_dmd[id][A_SYS].ac_state;
+			actstat.ac_param = acct_dmd[id][A_SYS].ac_param;
+
+		} else if ((id >= ACCT_RCDS) && (id < ACCT_MAXRCDS) ) {
+			int	tid = id-ACCT_RCDS;
+
+			actstat.ac_state = acct_rcd[tid][A_SYS].ac_state;
+			actstat.ac_param = acct_rcd[tid][A_SYS].ac_param;
+
+		} else {
+			error = -EINVAL;
+			break;
+		}
+		if (copy_to_user((void*)data, &actstat, sizeof(struct actstat)) ) {
+			error = -EFAULT;
+		}
+	    }
+		break;
+
+	/*
+	 *  Process daemon status function.
+	 */
+	case CSA_IOC_KDSTAT:
+	    {
+		PRINTK(KERN_INFO "CSA: CSA_IOC_KDSTAT\n");
+		if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = actctl.ac_sttnum;
+
+		if (num <= 0) {
+			error = EINVAL;
+			break;
+		} else if (num > NUM_KDS) {
+			num = NUM_KDS;
+		}
+		for(ind = 0; ind < num; ind++) {
+			actctl.ac_stat[ind].ac_ind   =
+				acct_dmd[ind][A_SYS].ac_ind;
+			actctl.ac_stat[ind].ac_state =
+				acct_dmd[ind][A_SYS].ac_state;
+			actctl.ac_stat[ind].ac_param =
+				acct_dmd[ind][A_SYS].ac_param;
+		}		/* end of for(ind) */
+		actctl.ac_sttnum = num;
+		strncpy(actctl.ac_path, csa_path, ACCT_PATH);
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_to_user((void*)data, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Process record status function.
+	 */
+	case CSA_IOC_RCDSTAT:
+	    {
+		PRINTK(KERN_INFO "CSA: CSA_IOC_RCDSTAT\n");
+		if (copy_from_user(&actctl, (void*)data, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+		num = actctl.ac_sttnum;
+
+		if (num <= 0) {
+			error = -EINVAL;
+			break;
+		} else if (num > NUM_RCDS) {
+			num = NUM_RCDS;
+		}
+		for(ind = 0; ind < num; ind++) {
+			actctl.ac_stat[ind].ac_ind =
+				acct_rcd[ind][A_SYS].ac_ind;
+			actctl.ac_stat[ind].ac_state =
+				acct_rcd[ind][A_SYS].ac_state;
+			actctl.ac_stat[ind].ac_param =
+				acct_rcd[ind][A_SYS].ac_param;
+		}
+		actctl.ac_sttnum = num;
+		strncpy(actctl.ac_path, csa_path, ACCT_PATH);
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_to_user((void*)data, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Turn user job accounting ON or OFF.
+	 */
+	case CSA_IOC_JASTART:
+	case CSA_IOC_JASTOP:	
+	    {
+		char	localpath[ACCT_PATH];
+		struct	file	*newvp = NULL;
+		struct	file	*oldvp;
+		uint64_t	jid;
+		struct job_csa job_acctbuf;
+		int retval = 0;
+
+		if (req == CSA_IOC_JASTART)
+			PRINTK(KERN_INFO "CSA: CSA_IOC_JASTART\n");
+		else
+			PRINTK(KERN_INFO "CSA: CSA_IOC_JASTOP\n");
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * (NUM_KDRCDS -1);
+		if (copy_from_user(&actctl, (void*)data, len)) {
+			error = -EFAULT;
+			break;
+		}
+		/*
+		 * If an accounting file was specified, make sure
+		 * that we can access it.
+		 */
+		if (strlen(actctl.ac_path)) {
+			strncpy(localpath, actctl.ac_path, ACCT_PATH);
+			newvp = filp_open(localpath,O_WRONLY|O_APPEND,0);
+			if (IS_ERR(newvp)) {
+				error = PTR_ERR(newvp);
+				break;
+			} else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) {
+				error = -EACCES;
+				filp_close(newvp, NULL);
+				break;
+			} else if (!newvp->f_op->write) {
+				error = -EIO;
+				filp_close(newvp, NULL);
+				break;
+			}
+		} else if (req == CSA_IOC_JASTART) {
+			error = -EINVAL;
+			break;
+		}
+		if (req == CSA_IOC_JASTOP) {
+			newvp = (struct file *)NULL;
+		}
+		jid = job_getjid(current);
+		if (jid <= 0) {
+			/* no job table entry */
+			error = -ENOENT;
+			break;
+		}
+		memset(&job_acctbuf, 0, sizeof(job_acctbuf));
+		retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf);
+		if (retval != 0) {
+			/* couldn't get csa info in the job table entry */
+			error = retval;
+			break;
+		}
+		/* Use this semaphore since csa_write() can also change this
+		 * file pointer.
+		 */
+		down(&csa_write_sem);
+		if ((oldvp = job_acctbuf.job_acctfile) != (struct file *)NULL) {
+			/* Stop writing to the old job accounting file */
+			filp_close(oldvp, NULL);
+		}
+
+	 	/* Establish new job accounting file or stop job accounting */
+		job_acctbuf.job_acctfile = newvp;
+
+		retval = job_setacct(jid, JOB_ACCT_CSA, JOB_CSA_ACCTFILE,
+			&job_acctbuf);
+		if (retval != 0) {
+			/* couldn't set the new file name in the job entry */
+			error = retval;
+			up(&csa_write_sem);
+			break;
+		}
+		up(&csa_write_sem);
+		/* Write a config record so ja has uname info */
+		if (req == CSA_IOC_JASTART) {
+			error = csa_config_write(AC_CONFCHG_ON,
+				 job_acctbuf.job_acctfile);
+		}
+	    }
+	    break;
+
+	/*
+	 *  Write an accounting record for a system daemon.
+	 */
+	case CSA_IOC_WRACCT:
+	    {
+		int	len;
+		int retval = 0;
+		uint64_t	jid;
+		struct job_csa job_acctbuf;
+		struct	actwra	actwra;
+
+		PRINTK(KERN_INFO "CSA: CSA_IOC_WRACCT\n");
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		if (copy_from_user(&actwra, (void*)data, sizeof(struct actwra))) {
+			error = -EFAULT;
+			break;
+		}
+	 	/*  Verify the parameters. */
+		jid = actwra.ac_jid;
+		if (jid < 0) {
+			error = -EINVAL;
+			break;
+		}
+
+		id = actwra.ac_did;
+		if ((id < 0) || (id >= ACCT_MAXKDS) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		len = actwra.ac_len;
+		if ((len <= 0) || (len > MAX_WRACCT) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		if (actwra.ac_buf == (char *)NULL) {
+			error = -EINVAL;
+			break;
+		}
+
+		/*  If the daemon type is on, write out the daemon buffer. */
+		if ((acct_dmd[id][A_SYS].ac_state == ACS_ON) &&
+				(csa_acctvp != (struct file *)NULL) ) {
+			error = csa_write(actwra.ac_buf, id, len,
+				jid, A_DMD, NULL);
+		}
+
+		/* get the job table entry for this jid */
+		memset(&job_acctbuf, 0, sizeof(job_acctbuf));
+		retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf);
+		if (retval != 0) {
+			/* couldn't get accounting info stored in job table */
+			error = retval;
+			break;
+		}
+
+		/* maybe write out daemon record to ja user accounting file */
+		if (job_acctbuf.job_acctfile != NULL) {
+			error = csa_write(actwra.ac_buf, id, len, jid, A_CJA,
+					&job_acctbuf);
+		}
+	    }
+	    break;
+
+	/*
+	 *  Return authorized state information.
+	 */
+	case CSA_IOC_AUTH:
+	    {
+		PRINTK(KERN_INFO "CSA: CSA_IOC_AUTH\n");
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		/*
+		 *  Process user authorization request...If we get to this spot,
+		 *  the user is authorized.
+		 */
+	    }
+	    break;
+
+	/*
+	 *  Process the incremental accounting request.
+	 */
+	case CSA_IOC_INCACCT:
+		PRINTK(KERN_INFO "CSA: CSA_IOC_INCACCT\n");
+                error = -EINVAL;
+		break;
+
+	default:
+		PRINTK(KERN_INFO "CSA: Unknown request %d\n", req);
+		error = -EINVAL;
+
+	}  /* end of switch(req) */
+
+	return(error ? error : err);
+}
+
+
+/*
+ *	Create a configuration change accounting record.
+ */
+static void
+csa_config_make(ac_eventtype event, struct acctcfg *cfg)
+{
+	int	daemon = 0;
+	int	record = 0;
+	int	ind;
+	int	nmsize = 0;
+
+	memset(cfg, 0, sizeof(struct acctcfg));
+	/*  Setup the record and header. */
+	csa_header(&cfg->ac_hdr, REV_CFG, ACCT_KERNEL_CFG,
+		sizeof(struct acctcfg) );
+	cfg->ac_event = event;
+	if (!boottime) {
+		boottime = xtime.tv_sec - (jiffies / HZ);
+	}
+	cfg->ac_boottime = boottime;
+	cfg->ac_curtime  = xtime.tv_sec;
+
+	/*
+	 *  Create the masks of the types that are on.
+	 */
+	for(ind = 0; ind < ACCT_MAXKDS; ind++) {
+		if (acct_dmd[ind][A_SYS].ac_state == ACS_ON) {
+			daemon += 1<<ind;
+		}
+	}
+	for(ind = ACCT_RCDS; ind < ACCT_MAXRCDS; ind++) {
+		int	tid = ind -ACCT_RCDS;
+
+		if (acct_rcd[tid][A_SYS].ac_state == ACS_ON) {
+			record += 1<<tid;
+		}
+	}
+	cfg->ac_kdmask = daemon;
+	cfg->ac_rmask = record;
+
+	nmsize = sizeof(cfg->ac_uname.sysname);
+	memcpy(cfg->ac_uname.sysname, system_utsname.sysname, nmsize-1);
+	cfg->ac_uname.sysname[nmsize-1] = '\0';
+	nmsize = sizeof(cfg->ac_uname.nodename);
+	memcpy(cfg->ac_uname.nodename, system_utsname.nodename, nmsize-1);
+	cfg->ac_uname.nodename[nmsize-1] = '\0';
+	nmsize = sizeof(cfg->ac_uname.release);
+	memcpy(cfg->ac_uname.release, system_utsname.release, nmsize-1);
+	cfg->ac_uname.release[nmsize-1] = '\0';
+	nmsize = sizeof(cfg->ac_uname.version);
+	memcpy(cfg->ac_uname.version, system_utsname.version, nmsize-1);
+	cfg->ac_uname.version[nmsize-1] = '\0';
+	nmsize = sizeof(cfg->ac_uname.machine);
+	memcpy(cfg->ac_uname.machine, system_utsname.machine, nmsize-1);
+	cfg->ac_uname.machine[nmsize-1] = '\0';
+
+	return;
+}
+
+
+/*
+ *      Create and write a configuration change accounting record.
+ */
+static int
+csa_config_write(ac_eventtype event, struct file *job_acctfile)
+{
+	int	error = 0;	/* errno */
+        struct  acctcfg acctcfg;
+	mm_segment_t fs;
+
+        /* write record to process accounting file. */
+        csa_config_make(event, &acctcfg);
+
+	down(&csa_write_sem);
+	if (csa_acctvp != (struct file *)NULL) {
+		fs = get_fs();
+		set_fs(KERNEL_DS);
+		error = csa_acctvp->f_op->write(csa_acctvp, (char *)&acctcfg,
+			sizeof(struct acctcfg), &csa_acctvp->f_pos);
+		set_fs(fs);
+        }
+	if (job_acctfile != (struct file *)NULL) {
+		fs = get_fs();
+		set_fs(KERNEL_DS);
+		error = job_acctfile->f_op->write(job_acctfile,(char *)&acctcfg,
+			sizeof(struct acctcfg), &job_acctfile->f_pos);
+		set_fs(fs);
+	}
+	if (error >= 0) {
+		error = 0;
+	}
+	up(&csa_write_sem);
+        return(error);
+}
+
+
+
+/*
+ *	When first process in a job is created.
+ */
+int
+csa_jstart(int event, void *data)
+{
+	struct job_csa *job_sojbuf = (struct job_csa *)data;
+	struct acctsoj	acctsoj;	/* start of job record */
+
+	 /*  Are we doing any accounting?  */
+	if (csa_acctvp == (struct file *)NULL) {
+		return 0;
+	}
+
+	if (!job_sojbuf) {
+		/* bad pointer */
+		printk(KERN_ERR
+		    "csa_jstart: Received bad soj pointer, pid %d.\n",
+		     current->pid);
+		return -1;
+	}
+		
+	memset(&acctsoj, 0, sizeof(struct acctsoj));
+	csa_header(&acctsoj.ac_hdr, REV_SOJ, ACCT_KERNEL_SOJ,
+		sizeof(struct acctsoj));
+	acctsoj.ac_jid = job_sojbuf->job_id;
+	acctsoj.ac_uid = job_sojbuf->job_uid;
+	if (event == JOB_EVENT_START) {
+		acctsoj.ac_type = AC_SOJ;
+		acctsoj.ac_btime = CT_TO_SECS(job_sojbuf->job_start) +
+			(xtime.tv_sec - (jiffies / HZ) );
+	} else if (event == JOB_EVENT_RESTART) {
+		acctsoj.ac_type = AC_ROJ;
+		acctsoj.ac_rstime = CT_TO_SECS(job_sojbuf->job_start) +
+			(xtime.tv_sec - (jiffies / HZ) );
+	} else {
+		return -1;
+	}
+
+	/*
+	 *  Write the accounting record to the process accounting
+	 *  file if any accounting is enabled.
+ 	 */
+	if (csa_acctvp != (struct file *)NULL) {
+		(void)csa_write((caddr_t)&acctsoj, ACCT_KERN_CSA,
+			sizeof(acctsoj), job_sojbuf->job_id, A_SYS, job_sojbuf);
+	}
+
+	return 0;
+}
+
+/*
+ *	When last process in a job is done, write an EOJ record
+ */
+int
+csa_jexit(int event, void *data)
+{
+	struct	achead	*hdr1, *hdr2;
+	struct	accteoj	eoj;	/* end of job record */
+	struct job_csa *job_eojbuf = (struct job_csa *)data;
+
+	/*  Are we doing any accounting? */
+	if (csa_acctvp == (struct file *)NULL) {
+		return 0;
+	}
+
+	if (!job_eojbuf) {
+		/* bad pointer */
+		printk(KERN_ERR 
+		    "csa_jexit: Received bad eoj pointer, pid %d.\n",
+		    current->pid);
+		return -1;
+	}
+
+	memset(&eoj, 0, sizeof(struct accteoj));
+
+	/*  Set up record. */
+	hdr1 = &eoj.ac_hdr1;
+	csa_header(hdr1, REV_EOJ, ACCT_KERNEL_EOJ, sizeof(struct accteoj) );
+	hdr2 = &eoj.ac_hdr2;
+	csa_header(hdr2, REV_EOJ, ACCT_KERNEL_EOJ, 0 );
+	hdr2->ah_magic = ~ACCT_MAGIC;
+
+	eoj.ac_nice = task_nice(current);
+	eoj.ac_uid = job_eojbuf->job_uid;
+	eoj.ac_gid = current->gid;
+
+	eoj.ac_jid = job_eojbuf->job_id;
+
+	eoj.ac_btime = CT_TO_SECS(job_eojbuf->job_start) +
+		(xtime.tv_sec - (jiffies / HZ) );
+	eoj.ac_etime = xtime.tv_sec;
+
+	/*
+	 * XXX Once we have real values in these two fields, convert them
+	 * to Kbytes.
+	 */
+	eoj.ac_corehimem = job_eojbuf->job_corehimem;
+	eoj.ac_virthimem = job_eojbuf->job_virthimem;
+
+	/*
+	 *  Write the accounting record to the process accounting
+	 *  file if job accounting is enabled.
+ 	 */
+	if (csa_acctvp != (struct file *)NULL) {
+		(void) csa_write((caddr_t)&eoj, ACCT_KERN_CSA,
+			sizeof(struct accteoj), job_eojbuf->job_id, A_SYS,
+			job_eojbuf);
+	}
+
+	return 0;
+}
+
+/*
+ *	Write buf out to the accounting file.
+ *	If an error occurs, return the error code to the caller
+ */
+int
+csa_write(char *buf, int did, int nbyte, uint64_t jid, int type,
+	struct job_csa *jp)
+{
+	int	error = 0;	/* errno */
+	int	retval = 0;
+	struct file	*vp;	/* acct file */
+	mm_segment_t fs;
+	unsigned long limit;
+
+	down(&csa_write_sem);
+	 /*  Locate the accounting type. */
+	switch (type) {
+	case A_SYS:
+	case A_DMD:
+		vp = csa_acctvp;
+		break;
+
+	case A_CJA:
+		if (jp != (struct job_csa *)NULL) {
+			vp = jp->job_acctfile;
+		} else {
+			vp = (struct file *)NULL;
+		}
+		break;
+
+	default:
+		up(&csa_write_sem);
+		return -EINVAL;
+
+	}	/* end of switch(type) */
+
+	/*  Check if this type of accounting is turned on. */
+	if (vp == (struct file *)NULL) {
+		up(&csa_write_sem);
+		return 0;
+	}
+	fs = get_fs();
+	set_fs(KERNEL_DS);
+
+	/* make sure we don't get hit by a process file size limit */
+	limit = current->rlim[RLIMIT_FSIZE].rlim_cur;
+	current->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY;
+	error = vp->f_op->write(vp,buf, nbyte, &vp->f_pos);
+	current->rlim[RLIMIT_FSIZE].rlim_cur = limit;
+
+	set_fs(fs);
+	if (error >= 0) {
+		error = 0;
+	}
+	/*  If an error occurred, disable this type of accounting. */
+	if (error) {
+		switch(type) {
+
+		case A_SYS:
+		case A_DMD:
+			csa_acctvp = (struct file *)NULL;
+			acct_dmd[did][A_SYS].ac_state = ACS_ERROFF;
+			acct_dmd[ACCT_KERN_CSA][A_SYS].ac_state = ACS_ERROFF;
+			printk(KERN_ALERT
+			   "csa accounting pacct write error %d; %s disabled\n",
+			    error, acct_dmd_name[did]);
+			filp_close(vp, NULL);
+			break;
+		case A_CJA:
+			jp->job_acctfile = (struct file *)NULL;
+			retval = job_setacct(jid, JOB_ACCT_CSA,
+				JOB_CSA_ACCTFILE, jp);
+			printk(KERN_WARNING JID_ERR2, error, jid);
+			if (retval != 0) {
+			    printk(KERN_WARNING JID_ERR3, jid);
+			} else {
+			    printk(KERN_WARNING JID_ERR4, jid);
+			}
+			filp_close(vp, NULL);
+			break;
+		}
+		up(&csa_write_sem);
+		return(error);
+	} 
+	up(&csa_write_sem);
+	return(error);
+}
+
+
+#ifdef CSA_SYSCALL
+/* CSA syscalls */
+int
+do_acctctl(int req, void *act)
+{
+	struct	actctl	actctl;
+	struct	actstat	actstat;
+
+	int	daemon = 0;
+	int	error = 0;
+	int	err = 0;
+	static	int	flag = 010000;
+	int	ind;
+	int	id;
+	int	len;
+	int	num;
+
+	down(&csa_sem);
+	if (!csa_flag) {
+		csa_init_acct(flag++);
+	}
+	up(&csa_sem);
+
+	if ((req < 0) || (req >= AC_MREQ) ) {
+		return -EINVAL;
+	}
+
+	memset(&actctl, 0, sizeof(struct actctl));
+	memset(&actstat, 0, sizeof(struct actstat));
+
+	switch (req) {
+	/*
+	 *  Start specified types of accounting.
+	 */
+	case AC_START:
+	    {
+		int id, ind;
+		struct file *newvp;
+
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+
+		if (copy_from_user(&actctl, act, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum;
+		if ((num < 0) || (num > NUM_KDRCDS) ) {
+			error = -EINVAL;
+			break;
+
+		}
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_from_user(&actctl, act, len)) {
+			error = -EFAULT;
+			break;
+		}
+		/*
+		 *	Verify all indexes in actstat structures specified.
+	 	 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if ((id < 0) || (id >= ACCT_MAXRCDS) ) {
+				error = -EINVAL;
+				break;
+			}
+
+			if (id == ACCT_MAXKDS) {
+				error = -EINVAL;
+				break;
+			}
+		}
+		down(&csa_sem);
+		/*
+		 *	If an accounting file was specified, make sure
+		 *	that we can access it.
+		 */
+		if (strlen(actctl.ac_path) ) {
+			strncpy(new_path, actctl.ac_path, ACCT_PATH);
+			newvp = filp_open(new_path,O_WRONLY|O_APPEND, 0);
+			if (IS_ERR(newvp)) {
+				error = PTR_ERR(newvp);
+				up(&csa_sem);
+				break;
+			} else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) {
+				error = -EACCES;
+				filp_close(newvp, NULL);
+				up(&csa_sem);
+				break;
+			} else if (!newvp->f_op->write) {
+				error = -EIO;
+				filp_close(newvp, NULL);
+				up(&csa_sem);
+				break;
+			}
+			if ((csa_acctvp != (struct file *)NULL) &&
+					csa_acctvp == newvp) {
+				/*
+				 * this file already being used, so ignore
+				 * request to use this file; just continue on
+				 */
+				filp_close(newvp, NULL);
+				newvp = (struct file *)NULL;
+			}
+
+		} else {
+			newvp = (struct file *)NULL;
+		}
+		/*
+		 *	If a new accounting file was specified and there's
+		 *	an old accounting file, stop writing to it.
+		 */
+		if (newvp != (struct file *)NULL) {
+			if (csa_acctvp != (struct file *)NULL) {
+				error = csa_config_write(AC_CONFCHG_FILE,NULL);
+				filp_close(csa_acctvp, NULL);
+			} else if (!csa_flag) {
+				csa_init_acct(flag++);
+			}
+
+			strncpy(csa_path, new_path, ACCT_PATH);
+			down(&csa_write_sem);
+			csa_acctvp = newvp;
+			up(&csa_write_sem);
+
+		} else {
+			if (csa_acctvp == (struct file *)NULL) {
+				error = -EINVAL;
+				up(&csa_sem);
+				break;
+			}
+		}
+
+		/*
+		 *  Loop through each actstat block and turn ON that accounting.
+		 */
+		for(ind = 0; ind < num; ind++) {
+			struct	actstat	*stat;
+
+			id = actctl.ac_stat[ind].ac_ind;
+			stat = &actctl.ac_stat[ind];
+			if (id < ACCT_RCDS)  {
+				acct_dmd[id][A_SYS].ac_state = ACS_ON;
+				acct_dmd[id][A_SYS].ac_param = stat->ac_param;
+
+				stat->ac_state = acct_dmd[id][A_SYS].ac_state;
+				stat->ac_param = acct_dmd[id][A_SYS].ac_param;
+			} else {
+				int	tid = id -ACCT_RCDS;
+
+				acct_rcd[tid][A_SYS].ac_state = ACS_ON;
+				acct_rcd[tid][A_SYS].ac_param = stat->ac_param;
+
+				stat->ac_state = acct_rcd[tid][A_SYS].ac_state;
+				stat->ac_param = acct_rcd[tid][A_SYS].ac_param;
+			}
+		}
+
+		up(&csa_sem);
+		error = csa_config_write(AC_CONFCHG_ON, NULL);
+		/*
+		 *  Return the accounting states to the user.
+	 	 */
+		if (copy_to_user(act, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Stop specified types of accounting.
+	 */
+	case AC_STOP:
+	    {
+		int	id, ind;
+
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+
+		if (copy_from_user(&actctl, act, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = (actctl.ac_sttnum == 0) ? 1 : actctl.ac_sttnum;
+		if ((num <= 0) || (num > NUM_KDRCDS) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_from_user(&actctl, act, len)) {
+			error = -EFAULT;
+			break;
+		}
+
+		/*
+		 *  Verify all of the indexes in actstat structures specified.
+	 	 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if ((id < 0) || (id >= NUM_KDRCDS) ) {
+				error = -EINVAL;
+				break;
+			}
+		}
+
+		/*
+		 * Loop through each actstat block and turn off that accounting.
+		 */
+		down(&csa_sem);
+		/*
+		 *	Disable accounting for this entry.
+		 */
+		for(ind = 0; ind < num; ind++) {
+			id = actctl.ac_stat[ind].ac_ind;
+			if (id < ACCT_RCDS) {
+				acct_dmd[id][A_SYS].ac_state = ACS_OFF;
+				acct_dmd[id][A_SYS].ac_param = 0;
+
+				actctl.ac_stat[ind].ac_state =
+					acct_dmd[id][A_SYS].ac_state;
+				actctl.ac_stat[ind].ac_param = 0;
+			} else {
+				int	tid = id -ACCT_RCDS;
+
+				acct_rcd[tid][A_SYS].ac_state = ACS_OFF;
+				acct_rcd[tid][A_SYS].ac_param = 0;
+				actctl.ac_stat[ind].ac_state =
+					acct_rcd[tid][A_SYS].ac_state;
+				actctl.ac_stat[ind].ac_param = 
+					acct_rcd[tid][A_SYS].ac_param;
+			}
+		}		/* end of for(ind) */
+		/*
+		 *  Check the daemons to see if any are still on.
+	 	 */
+		for(ind = 0; ind < ACCT_MAXKDS; ind++) {
+			if (acct_dmd[ind][A_SYS].ac_state == ACS_ON) {
+				daemon += 1<<ind;
+			}
+		}
+		up(&csa_sem);
+		/*
+		 *  If all daemons are off and there's an old accounting file,
+		 *	stop writing to it.
+	 	*/
+		if (!daemon && (csa_acctvp != (struct file *)NULL) ) {
+			error = csa_config_write(AC_CONFCHG_OFF,NULL);
+			filp_close(csa_acctvp, NULL);
+			down(&csa_write_sem);
+			csa_acctvp = (struct file *)NULL;
+			up(&csa_write_sem);
+		} else {
+			error = csa_config_write(AC_CONFCHG_OFF, NULL);
+		}
+		/*
+		 *  Return the accounting states to the user.
+	 	*/
+		if (copy_to_user(act, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Halt all accounting.
+	 */
+	case AC_HALT:
+	    {
+		int	ind;
+
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		down(&csa_sem);
+	 	/*  Turn off all accounting if any is on. */
+		for(ind = 0; ind <ACCT_MAXKDS; ind++) {
+			acct_dmd[ind][A_SYS].ac_state = ACS_OFF;
+			acct_dmd[ind][A_SYS].ac_param = 0;
+		}
+
+		for(ind = ACCT_RCDS; ind < ACCT_MAXRCDS; ind++) {
+			int	tid = ind -ACCT_RCDS;
+
+			acct_rcd[tid][A_SYS].ac_state = ACS_OFF;
+			acct_rcd[tid][A_SYS].ac_param = 0;
+		}
+ 
+		up(&csa_sem);
+	 	/*  If there's an old accounting file, stop writing to it. */
+		if (csa_acctvp != (struct file *)NULL) {
+			error = csa_config_write(AC_CONFCHG_OFF,NULL);
+			filp_close(csa_acctvp, NULL);
+			down(&csa_write_sem);
+			csa_acctvp = (struct file *)NULL;
+			up(&csa_write_sem);
+		}
+	    }
+	    break;
+
+	/*
+	 * Process daemon/record status function.
+	 */
+	case AC_CHECK:
+	    {
+		if (copy_from_user(&actstat, act, sizeof(struct actstat)) ) {
+			error = -EFAULT;
+			break;
+		}
+		id = actstat.ac_ind;
+		if ((id >= 0) && (id < ACCT_MAXKDS) ) {
+			actstat.ac_state = acct_dmd[id][A_SYS].ac_state;
+			actstat.ac_param = acct_dmd[id][A_SYS].ac_param;
+
+		} else if ((id >= ACCT_RCDS) && (id < ACCT_MAXRCDS) ) {
+			int	tid = id-ACCT_RCDS;
+
+			actstat.ac_state = acct_rcd[tid][A_SYS].ac_state;
+			actstat.ac_param = acct_rcd[tid][A_SYS].ac_param;
+
+		} else {
+			error = -EINVAL;
+			break;
+		}
+		if (copy_to_user(act, &actstat, sizeof(struct actstat)) ) {
+			error = -EFAULT;
+		}
+	    }
+		break;
+
+	/*
+	 *  Process daemon status function.
+	 */
+	case AC_KDSTAT:
+	    {
+		if (copy_from_user(&actctl, act, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+
+		num = actctl.ac_sttnum;
+
+		if (num <= 0) {
+			error = EINVAL;
+			break;
+		} else if (num > NUM_KDS) {
+			num = NUM_KDS;
+		}
+		for(ind = 0; ind < num; ind++) {
+			actctl.ac_stat[ind].ac_ind   =
+				acct_dmd[ind][A_SYS].ac_ind;
+			actctl.ac_stat[ind].ac_state =
+				acct_dmd[ind][A_SYS].ac_state;
+			actctl.ac_stat[ind].ac_param =
+				acct_dmd[ind][A_SYS].ac_param;
+		}		/* end of for(ind) */
+		actctl.ac_sttnum = num;
+		strncpy(actctl.ac_path, csa_path, ACCT_PATH);
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_to_user(act, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Process record status function.
+	 */
+	case AC_RCDSTAT:
+	    {
+		if (copy_from_user(&actctl, act, sizeof(int)) ) {
+			error = -EFAULT;
+			break;
+		}
+		num = actctl.ac_sttnum;
+
+		if (num <= 0) {
+			error = -EINVAL;
+			break;
+		} else if (num > NUM_RCDS) {
+			num = NUM_RCDS;
+		}
+		for(ind = 0; ind < num; ind++) {
+			actctl.ac_stat[ind].ac_ind =
+				acct_rcd[ind][A_SYS].ac_ind;
+			actctl.ac_stat[ind].ac_state =
+				acct_rcd[ind][A_SYS].ac_state;
+			actctl.ac_stat[ind].ac_param =
+				acct_rcd[ind][A_SYS].ac_param;
+		}
+		actctl.ac_sttnum = num;
+		strncpy(actctl.ac_path, csa_path, ACCT_PATH);
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * NUM_KDRCDS + 
+		    sizeof(struct actstat) * num;
+		if (copy_to_user(act, &actctl, len)) {
+			error = -EFAULT;
+			break;
+		}
+	    }
+	    break;
+
+	/*
+	 *  Turn user job accounting ON or OFF.
+	 */
+	case AC_JASTART:
+	case AC_JASTOP:	
+	    {
+		char	localpath[ACCT_PATH];
+		struct	file	*newvp = NULL;
+		struct	file	*oldvp;
+		uint64_t	jid;
+		struct job_csa job_acctbuf;
+		int retval = 0;
+
+		len = sizeof(struct actctl) -
+		    sizeof(struct actstat) * (NUM_KDRCDS -1);
+		if (copy_from_user(&actctl, act, len)) {
+			error = -EFAULT;
+			break;
+		}
+		/*
+		 * If an accounting file was specified, make sure
+		 * that we can access it.
+		 */
+		if (strlen(actctl.ac_path)) {
+			strncpy(localpath, actctl.ac_path, ACCT_PATH);
+			newvp = filp_open(localpath,O_WRONLY|O_APPEND,0);
+			if (IS_ERR(newvp)) {
+				error = PTR_ERR(newvp);
+				break;
+			} else if (!S_ISREG(newvp->f_dentry->d_inode->i_mode)) {
+				error = -EACCES;
+				filp_close(newvp, NULL);
+				break;
+			} else if (!newvp->f_op->write) {
+				error = -EIO;
+				filp_close(newvp, NULL);
+				break;
+			}
+		} else if (req == AC_JASTART) {
+			error = -EINVAL;
+			break;
+		}
+		if (req == AC_JASTOP) {
+			newvp = (struct file *)NULL;
+		}
+		jid = job_getjid(current);
+		if (jid <= 0) {
+			/* no job table entry */
+			error = -ENOENT;
+			break;
+		}
+		memset(&job_acctbuf, 0, sizeof(job_acctbuf));
+		retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf);
+		if (retval != 0) {
+			/* couldn't get csa info in the job table entry */
+			error = retval;
+			break;
+		}
+		/* Use this semaphore since csa_write() can also change this
+		 * file pointer.
+		 */
+		down(&csa_write_sem);
+		if ((oldvp = job_acctbuf.job_acctfile) != (struct file *)NULL) {
+			/* Stop writing to the old job accounting file */
+			filp_close(oldvp, NULL);
+		}
+
+	 	/* Establish new job accounting file or stop job accounting */
+		job_acctbuf.job_acctfile = newvp;
+
+		retval = job_setacct(jid, JOB_ACCT_CSA, JOB_CSA_ACCTFILE,
+			&job_acctbuf);
+		if (retval != 0) {
+			/* couldn't set the new file name in the job entry */
+			error = retval;
+			up(&csa_write_sem);
+			break;
+		}
+		up(&csa_write_sem);
+		/* Write a config record so ja has uname info */
+		if (req == AC_JASTART) {
+			error = csa_config_write(AC_CONFCHG_ON,
+				 job_acctbuf.job_acctfile);
+		}
+	    }
+	    break;
+
+	/*
+	 *  Write an accounting record for a system daemon.
+	 */
+	case AC_WRACCT:
+	    {
+		int	len;
+		int retval = 0;
+		uint64_t	jid;
+		struct job_csa  job_acctbuf;
+		struct	actwra	actwra;
+
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		if (copy_from_user(&actwra, act, sizeof(struct actwra))) {
+			error = -EFAULT;
+			break;
+		}
+	 	/*  Verify the parameters. */
+		jid = actwra.ac_jid;
+		if (jid < 0) {
+			error = -EINVAL;
+			break;
+		}
+
+		id = actwra.ac_did;
+		if ((id < 0) || (id >= ACCT_MAXKDS) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		len = actwra.ac_len;
+		if ((len <= 0) || (len > MAX_WRACCT) ) {
+			error = -EINVAL;
+			break;
+		}
+
+		if (actwra.ac_buf == (char *)NULL) {
+			error = -EINVAL;
+			break;
+		}
+
+		/*  If the daemon type is on, write out the daemon buffer. */
+		if ((acct_dmd[id][A_SYS].ac_state == ACS_ON) &&
+				(csa_acctvp != (struct file *)NULL) ) {
+			error = csa_write(actwra.ac_buf, id, len,
+				jid, A_DMD, NULL);
+		}
+
+		/* get the job table entry for this jid */
+		memset(&job_acctbuf, 0, sizeof(job_acctbuf));
+		retval = job_getacct(jid, JOB_ACCT_CSA, &job_acctbuf);
+		if (retval != 0) {
+			/* couldn't get accounting info stored in job table */
+			error = retval;
+			break;
+		}
+
+		/* maybe write out daemon record to ja user accounting file */
+		if (job_acctbuf.job_acctfile != NULL) {
+			error = csa_write(actwra.ac_buf, id, len, jid, A_CJA,
+					&job_acctbuf);
+		}
+	    }
+	    break;
+
+	/*
+	 *  Return authorized state information.
+	 */
+	case AC_AUTH:
+	    {
+		if (!capable(CAP_SYS_PACCT) ) {
+			error = -EPERM;
+			break;
+		}
+		/*
+		 *  Process user authorization request...If we get to this spot,
+		 *  the user is authorized.
+		 */
+	    }
+	    break;
+
+	/*
+	 *  Process the incremental accounting request.
+	 */
+	case AC_INCACCT:
+                error = -EINVAL;
+		break;
+
+	default:
+		error = -EINVAL;
+
+	}  /* end of switch(req) */
+
+	return(error ? error : err);
+}
+#endif
+
+module_init(init_csa);
+module_exit(cleanup_csa);
diff -pNaru linux-2.4.27/kernel/exit.c linux/kernel/exit.c
--- linux-2.4.27/kernel/exit.c	2002-11-28 15:53:15 -08:00
+++ linux/kernel/exit.c	2004-12-01 11:30:26 -08:00
@@ -16,6 +16,8 @@
 #ifdef CONFIG_BSD_PROCESS_ACCT
 #include <linux/acct.h>
 #endif
+#include <linux/pagg.h>
+#include <linux/csa_internal.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -23,6 +25,7 @@
 
 extern void sem_exit (void);
 extern struct task_struct *child_reaper;
+void (*do_csa_acct) (int, struct task_struct *) = NULL;
 
 int getrusage(struct task_struct *, int, struct rusage *);
 
@@ -436,9 +439,14 @@ NORET_TYPE void do_exit(long code)
 	del_timer_sync(&tsk->real_timer);
 
 fake_volatile:
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	acct_process(code);
 #endif
+	/* no-op if CONFIG_CSA not set */
+	csa_acct(code, tsk);
 	__exit_mm(tsk);
 
 	lock_kernel();
@@ -457,6 +465,9 @@ fake_volatile:
 		__MOD_DEC_USE_COUNT(tsk->binfmt->module);
 
 	tsk->exit_code = code;
+
+	pagg_detach(tsk);
+
 	exit_notify();
 	schedule();
 	BUG();
diff -pNaru linux-2.4.27/kernel/fork.c linux/kernel/fork.c
--- linux-2.4.27/kernel/fork.c	2004-04-14 06:05:40 -07:00
+++ linux/kernel/fork.c	2004-12-01 11:30:26 -08:00
@@ -22,6 +22,7 @@
 #include <linux/namespace.h>
 #include <linux/personality.h>
 #include <linux/compiler.h>
+#include <linux/pagg.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -79,6 +80,9 @@ void __init fork_init(unsigned long memp
 
 	init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+
+	/* Initialize the pagg list in pid 0 before it can clone itself. */
+	INIT_PAGG_LIST(current);
 }
 
 /* Protects next_safe and last_pid. */
@@ -367,6 +371,9 @@ static int copy_mm(unsigned long clone_f
 	 */
 	copy_segments(tsk, mm);
 
+	mm->hiwater_rss = mm->rss;
+	mm->hiwater_vm = mm->total_vm;	
+
 good_mm:
 	tsk->mm = mm;
 	tsk->active_mm = mm;
@@ -724,6 +731,10 @@ int do_fork(unsigned long clone_flags, u
 	p->tty_old_pgrp = 0;
 	p->times.tms_utime = p->times.tms_stime = 0;
 	p->times.tms_cutime = p->times.tms_cstime = 0;
+	p->rchar = p->wchar = p->rblk = p->wblk = p->syscr = p->syscw = 0;
+	p->bwtime = 0;
+	/* no-op if CONFIG_CSA not set */
+	csa_clear_integrals(p);
 #ifdef CONFIG_SMP
 	{
 		int i;
@@ -762,6 +773,12 @@ int do_fork(unsigned long clone_flags, u
 	   These must match for thread signalling to apply */
 	   
 	p->parent_exec_id = p->self_exec_id;
+
+	/*
+	 * call pagg modules to properly attach new process to the same
+	 * process aggregate containers as the parent process.
+	 */
+	pagg_attach(p, current);
 
 	/* ok, now we should be set up.. */
 	p->swappable = 1;
diff -pNaru linux-2.4.27/kernel/job.c linux/kernel/job.c
--- linux-2.4.27/kernel/job.c	1969-12-31 16:00:00 -08:00
+++ linux/kernel/job.c	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,2053 @@
+/*
+ * Linux Job kernel module
+ *
+ *
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane, 
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:	This file implements a type of process grouping called jobs.
+ * 		For further information about jobs, consult the file
+ * 		Documentation/job.txt. Jobs are implemented as a type of PAGG
+ * 		(process aggregate).  For further information about PAGGs,
+ * 		consult the file Documentation/pagg.txt.
+ */
+
+/*
+ * LOCKING INFO
+ *
+ * There are currently two levels of locking in this module.  So, we
+ * have two classes of locks: 
+ *
+ *	(1) job table lock (always, job_table_sem)
+ *	(2) job entry lock (usually, job->sem)
+ *
+ * Most of the locking used is read/write sempahores.  In  rare cases, a
+ * spinlock is also used.  Those cases requiring a spinlock concern when the
+ * tasklist_lock must be locked (such as when looping over all tasks on the
+ * system).
+ *
+ * There is only one job_table_sem.  There is a job->sem for each job
+ * entry in the job_table.  This job module is a PAGG module (Process
+ * Aggregation).  Each task has a special lock that protects its PAGG
+ * information - this is called the pagg list lock. There are special macros
+ * used to lock/unlock a task's pagg list lock.  The pagg list lock is really
+ * a semaphore.
+ *
+ * Purpose:
+ *
+ *	(1) The job_table_sem protects all entries in the table.
+ *	(2) The job->sem protects all data and task attachments for the job.
+ *
+ * Truths we hold to be self-evident:
+ *
+ * Only the holder of a write lock for the job_table_lock may add or
+ * delete a job entry from the job_table. The job_table includes all job
+ * entries in the hash table and chains off the hash table locations.
+ *
+ * Only the holder of a write lock for a job->lock may attach or detach
+ * processes/tasks from the attached list for the job.
+ *
+ * If you hold a read lock of job_table_lock, you can assume that the
+ * job entries in the table will not change.  The link pointers for
+ * the chains of job entries will not change, the job ID (jid) value
+ * will not change, and data changes will be (mostly) atomic.
+ *
+ * If you hold a read lock of a job->lock, you can assume that the
+ * attachments to the job will not change.  The link pointers for the
+ * attachment list will not change and the attachments will not change.
+ *
+ * If you are going to grab nested locks, the nesting order is:
+ *
+ *	down_write/up_write/down_read/up_read(&task->pagg_sem)
+ *	job_table_sem
+ *	job->sem
+ *
+ * However, it is not strictly necessary to down the job_table_sem
+ * before downing job->sem. 
+ *
+ * Also, the nesting order allows you to lock in this order:
+ *
+ *	down_write/up_write/down_read/up_read(&task->pagg_sem)
+ *	job->sem
+ *
+ * without locking job_table_sem between the two.
+ *
+ */
+
+/* standard for kernel modules */
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/kmod.h>
+#include <linux/init.h>
+#include <linux/list.h>
+
+#include <asm/uaccess.h>	/* for get_user & put_user */
+
+#include <linux/sched.h>	/* for current */
+#include <linux/tty.h>		/* for the tty declarations */
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <linux/proc_fs.h>
+
+#include <linux/string.h>
+#include <asm/semaphore.h>
+
+#include <linux/pagg.h>		/* to use pagg hooks */
+#include <linux/job.h>
+#include <linux/paggctl.h>
+
+MODULE_AUTHOR("Silicon Graphics, Inc.");
+MODULE_DESCRIPTION("PAGG-based inescapable jobs");
+MODULE_LICENSE("GPL");
+
+#define HASH_SIZE	1024
+
+/* The states for a job */ 
+#define FETAL	1	/* being born, not ready for attachments yet */
+#define RUNNING 2	/* Running job */
+#define STOPPED 3  	/* Stopped job */
+#define ZOMBIE  4	/* Dead job */
+
+/* Job creation tags for the job HID (host ID) */ 
+#define DISABLED	0xffffffff	/* New job creation disabled */
+#define LOCAL		0x0		/* Only creating local sys jobs */
+
+
+#ifdef 	__BIG_ENDIAN
+#define		iptr_hid(ll) 	((u32 *)&(ll))
+#define		iptr_sid(ll) 	(((u32 *)(&(ll) + 1)) - 1)
+#else	/* __LITTLE_ENDIAN */
+#define		iptr_hid(ll) 	(((u32 *)(&(ll) + 1)) - 1)
+#define		iptr_sid(ll) 	((u32 *)&(ll))
+#endif	/* __BIG_ENDIAN */
+
+#define		jid_hash(ll) 	(*(iptr_sid(ll)) % HASH_SIZE)
+
+
+/* Job info entry for member tasks */
+struct job_attach {
+	struct task_struct	*task;	/* task we are attaching to job */
+	struct pagg		*pagg;	/* our pagg entry in the task */
+	struct job_entry	*job;	/* the job we are attaching task to */
+	struct list_head	entry; 	/* list stuff */
+};
+
+struct job_waitinfo {
+	int		status;		/* For tasks waiting on job exit */
+};
+
+struct job_csainfo {
+	u64		corehimem;	/* Accounting - highpoint, phys mem */
+	u64		virthimem;	/* Accounting - highpoint, virt mem */
+	struct file	*acctfile;	/* The accounting file for job */
+}; 
+
+/* Job table entry type */
+struct job_entry {
+	u64		    jid;	/* Our job ID */
+	int	    	    refcnt;	/* Number of tasks attached to job */
+	int		    state;	/* State of job - RUNNING,... */
+	struct rw_semaphore sem;	/* lock for the job */
+	uid_t		    user;	/* user that owns the job */
+	time_t		    start;	/* When the job began */
+	struct job_csainfo  csa;	/* CSA accounting info */
+	wait_queue_head_t   zombie;	/* queue last task - during wait */
+	wait_queue_head_t   wait;	/* queue of tasks waiting on job */
+	int		    waitcnt;	/* Number of tasks waiting on job */
+	struct job_waitinfo waitinfo;	/* Status info for waiting tasks */ 
+	struct list_head    attached;	/* List of attached tasks */
+	struct list_head    entry;	/* List of other jobs - same hash */
+};
+
+
+/* Job container tables */
+static struct list_head  job_table[HASH_SIZE];
+static int	    	 job_table_refcnt = 0;
+static 			 DECLARE_RWSEM(job_table_sem);
+
+
+/* Accounting subscriber list */
+static struct job_acctmod 	*acct_list[JOB_ACCT_COUNT];
+static 				DECLARE_RWSEM(acct_list_sem);
+
+
+/* Host ID for the localhost */
+static u32   jid_hid = DISABLED;
+
+static char 	   *hid = NULL;	    
+MODULE_PARM(hid, "s");
+
+/* Function prototypes */
+static int job_sys_create(struct job_create *);
+static int job_sys_getjid(struct job_getjid *);
+static int job_sys_waitjid(struct job_waitjid *);
+static int job_sys_killjid(struct job_killjid *);
+static int job_sys_getjidcnt(struct job_jidcnt *);
+static int job_sys_getjidlst(struct job_jidlst *);
+static int job_sys_getpidcnt(struct job_pidcnt *);
+static int job_sys_getpidlst(struct job_pidlst *);
+static int job_sys_getuser(struct job_user *);
+static int job_sys_getprimepid(struct job_primepid *);
+static int job_sys_sethid(struct job_sethid *);
+static int job_sys_detachjid(struct job_detachjid *);
+static int job_sys_detachpid(struct job_detachpid *);
+static int job_attach(struct task_struct *, struct pagg *, void *);
+static int job_detach(struct task_struct *, struct pagg *);
+static struct job_entry *job_getjob(u64 jid);
+static int job_syscall(unsigned int, unsigned long);
+
+u64 job_getjid(struct task_struct *);
+
+int job_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
+
+/* Job container kernel pagg entry */
+static struct pagg_hook pagg_hook = {
+	.module	= THIS_MODULE,
+	.name	= PAGG_JOB,
+	.data	= &job_table,
+	.init	= NULL,
+	.entry	= LIST_HEAD_INIT(pagg_hook.entry),
+	.attach	= job_attach,
+	.detach	= job_detach,
+	.exec		= NULL,
+};
+
+/* proc dir entry */
+struct proc_dir_entry *job_proc_entry;
+
+/* file operations for proc file */
+static struct file_operations job_file_ops = {
+	.owner	= THIS_MODULE,
+	.ioctl	= job_ioctl
+};
+
+#ifdef DEBUG
+
+#define DBG_PRINTINIT(s)	\
+	char *dbg_fname = s		
+
+#define DBG_PRINTENTRY()					\
+do {								\
+	printk(KERN_DEBUG "job: %s: entry\n", dbg_fname);	\
+} while(0)
+
+#define DBG_PRINTEXIT(c)				 		\
+do {							 		\
+	printk(KERN_DEBUG "job: %s: exit, code = %d\n", dbg_fname, c);	\
+} while(0)
+
+/* write lock semaphore */
+#define JOB_WLOCK(l)					\
+do {							\
+	printk(KERN_DEBUG "job: wlock = %p\n", l);	\
+	down_write(l);					\
+} while(0);
+
+/* write unlock semaphore */
+#define JOB_WUNLOCK(l)					\
+do {							\
+	printk(KERN_DEBUG "job: wunlock = %p\n", l);	\
+	up_write(l);					\
+} while(0);
+
+/* read lock semaphore */
+#define JOB_RLOCK(l)					\
+do {							\
+	printk(KERN_DEBUG "job: rlock = %p\n", l);	\
+	down_read(l);					\
+} while(0);
+
+/* read unlock semaphore */
+#define JOB_RUNLOCK(l)					\
+do {							\
+	printk(KERN_DEBUG "job: runlock = %p\n", l);	\
+	up_read(l);					\
+} while(0);
+
+
+#else /* #ifdef DEBUG */
+
+#define DBG_PRINTINIT(s)	
+
+#define DBG_PRINTENTRY() 	\
+do {				\
+} while(0)
+
+#define DBG_PRINTEXIT(c)	\
+do {				\
+} while(0)
+
+/* write lock semaphore */
+#define JOB_WLOCK(l)	\
+do {			\
+	down_write(l);	\
+} while(0);
+
+/* write unlock semaphore */
+#define JOB_WUNLOCK(l)	\
+do {			\
+	up_write(l);	\
+} while(0);
+
+/* read lock semaphore */
+#define JOB_RLOCK(l)	\
+do {			\
+	down_read(l);	\
+} while(0);
+
+/* read unlock semaphore */
+#define JOB_RUNLOCK(l)	\
+do {			\
+	up_read(l);	\
+} while(0);
+
+
+#endif /* #ifdef DEBUG */
+
+
+
+/* 
+ * job__getjob
+ *
+ * Given a jid value, find the entry in the job_table and return a pointer
+ * to the job entry or NULL if not found.
+ *
+ * You should normally JOB_RLOCK the job_table_sem before calling this 
+ * function. 
+ */
+struct job_entry *
+job_getjob(u64 jid)
+{
+	struct list_head *entry = NULL;
+	struct job_entry *tjob = NULL;
+	struct job_entry *job = NULL;
+
+	list_for_each(entry,  &job_table[ jid_hash(jid) ]) {
+		tjob = list_entry(entry, struct job_entry, entry);
+		if (tjob->jid == jid) {
+			job = tjob;
+			break;
+		}
+	}
+	return job;
+}
+
+	
+/*
+ * job_attach
+ *
+ * Attach the task to the job specified in the target data (old_data).
+ * This function will add the task to the list of attached tasks for the job.
+ * In addition, a link from the task to the job is created and added to the 
+ * task via the data pointer reference.  
+ *
+ * The process that owns the target data should be at least read locked (using
+ * down_read(&task->pagg_sem)) during this call.  This help in ensuring
+ * that the job cannot be removed since at least one process will 
+ * still be referencing the job (the one owning the target_data).
+ *
+ * It is expected that this function will be called from within the
+ * pagg_attach() function in the kernel, when forking (do_fork) a child
+ * process represented by task.
+ *
+ * If this function is called form some other point, then it is possible that
+ * task and data could be altered while going through this function.  In such
+ * a case, the caller should also lock the pagg list for the task
+ * task_struct.
+ *
+ * the function returns 0 upon success, and -1 upon failure.
+ */
+static int
+job_attach(struct task_struct *task, struct pagg *new_pagg, 
+		void  *old_data)
+{
+	struct job_entry  *job        = ((struct job_attach *)old_data)->job;
+	struct job_attach *attached   = NULL;
+	int          errcode     = 0;
+	DBG_PRINTINIT("job_attach");
+
+	DBG_PRINTENTRY();
+
+	/* 
+	 * Lock the job for writing. The task owning target_data has its
+	 * pagg_sem locked, so we know there is at least one active reference
+	 * to the job - therefore, it cannot have been removed before we
+	 * have gotten this write lock established.
+	 */
+	JOB_WLOCK(&job->sem);
+
+	if (job->state == ZOMBIE) {
+		/* If the job is a zombie (dying), bail out of the attach */
+		printk(KERN_WARNING "Attach task(pid=%d) to job"
+				" failed - job is ZOMBIE\n", 
+				task->pid);
+		errcode = -EINPROGRESS;
+		JOB_WUNLOCK(&job->sem);
+		goto error_return;
+	}
+
+
+	/* Allocate memory that we will need */
+
+	attached = (struct job_attach *)kmalloc(sizeof(struct job_attach), 
+			GFP_KERNEL);
+	if (!attached) {
+		/* error */
+		printk(KERN_ERR "Attach task(pid=%d) to job"
+				" failed on memory error in kernel\n", 
+				task->pid);
+		errcode = -ENOMEM;
+		goto error_return;
+	}
+
+
+	attached->task  = task;
+	attached->pagg  = new_pagg;
+	attached->job   = job;
+	new_pagg->data  = (void *)attached;
+	list_add_tail(&attached->entry, &job->attached);
+	++job->refcnt;  
+
+	JOB_WUNLOCK(&job->sem);  
+
+	DBG_PRINTEXIT(0);
+	return 0;
+
+error_return:
+	DBG_PRINTEXIT(errcode);
+	if (attached) kfree(attached);
+	return errcode;
+}
+
+
+/*
+ * job_detach 
+ *
+ * Detach the task from the job attached to via the pagg reference.
+ * This function will remove the task from the list of attached tasks for the
+ * job specified via the pagg pointer.  In addition, the link to the job
+ * provided via the data pointer will also be removed.
+ *
+ * The pagg_list should be write locked for task before entering
+ * this function (using down_write(&task->pagg_sem)).
+ *
+ * the function returns 0 uopn success, and -1 uopn failure.
+ */
+static int
+job_detach(struct task_struct *task, struct pagg *pagg)
+{
+	struct job_attach *attached   = ((struct job_attach *)(pagg->data));
+	struct job_entry  *job        = attached->job;
+	DBG_PRINTINIT("job_detach");
+
+	DBG_PRINTENTRY();
+
+	/*
+	 * Obtain the lock on the the job_table_sem and the job->sem for 
+	 * this job.
+	 */
+	JOB_WLOCK(&job_table_sem);
+	JOB_WLOCK(&job->sem);  
+
+	job->refcnt--;
+	list_del(&attached->entry);
+	pagg->data = NULL;
+	kfree(attached);
+
+	if (job->refcnt == 0) {
+		int waitcnt;
+
+		list_del(&job->entry);
+		--job_table_refcnt;
+
+		/* 
+		 * The job is removed from the job_table.
+		 * We can remove the job_table_sem now since
+		 * nobody can access the job via the table.
+		 */
+		JOB_WUNLOCK(&job_table_sem);
+
+		job->state = ZOMBIE;
+		job->waitinfo.status = task->exit_code;
+
+		waitcnt = job->waitcnt;
+
+		/* 
+		 * Release the job semaphore.  You cannot hold
+		 * this lock if you want the wakeup to work
+		 * properly.
+		 */
+		JOB_WUNLOCK(&job->sem);
+
+		if (waitcnt > 0) {
+			wake_up_interruptible(&job->wait);
+			wait_event(job->zombie, job->waitcnt == 0);
+		} 
+
+		/* 
+		 * Job is exiting, all processes waiting for job to exit
+		 * have been notified.  Now we call the accounting
+		 * subscribers.
+		 */
+
+#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE)
+		/* - CSA accounting */
+		if (acct_list[JOB_ACCT_CSA]) {
+			struct job_acctmod *acct = acct_list[JOB_ACCT_CSA];
+			if (acct->module) 
+				__MOD_INC_USE_COUNT(acct->module);
+			if (acct->jobend) {
+				int res = 0;
+				struct job_csa csa;
+
+				csa.job_id = job->jid;
+				csa.job_uid = job->user;
+				csa.job_start = job->start;
+				csa.job_corehimem = job->csa.corehimem;
+				csa.job_virthimem = job->csa.virthimem;
+				csa.job_acctfile = job->csa.acctfile;
+
+				res = acct->jobend(JOB_EVENT_END,
+						&csa);
+				if (res) {
+					printk(KERN_WARNING
+						"job_detach: CSA -"
+						" jobend failed.\n");
+				}
+			}
+			if (acct->module) 
+				__MOD_DEC_USE_COUNT(acct->module);
+		} else {
+			printk(KERN_WARNING "job_detach: CSA - attempt"
+					" to lock CSA module failed.\n");
+		}
+#endif /* CONFIG_CSA || defined(CONFIG_CSA_MODULE) */
+
+
+		/* 
+		 * Every process attached or waiting on this job should be
+	         * detached and finished waiting, so now we can free the
+		 * memory for the job.
+		 */
+		kfree(job);
+		MOD_DEC_USE_COUNT; 
+
+	} else {
+		/* This is case where job->refcnt was greater than 1, so
+		 * we were not going to delete the job after the detach.
+		 * Therefore, only the job->sem is being held - the 
+		 * job_table_sem was released earlier.
+		 */
+		JOB_WUNLOCK(&job->sem);
+		JOB_WUNLOCK(&job_table_sem);
+	}
+
+	DBG_PRINTEXIT(0);
+
+	return 0;
+}
+
+/* 
+ * job_sys_create
+ *
+ * This function is used to create a new job and attache the calling process
+ * to that new job.
+ *
+ * Returns 0 on success, and negative on failure (negative errno value).
+ */
+static int
+job_sys_create(struct job_create *create_args)
+{
+	struct job_create		create;
+	struct job_entry		*job 	      = NULL;
+	struct job_attach		*attached     = NULL;
+	struct pagg		*pagg	      = NULL;
+	struct pagg		*old_pagg	= NULL;
+	int			errcode       = 0;
+	DBG_PRINTINIT("job_sys_create");
+
+	DBG_PRINTENTRY();
+
+	/* We are creating an new job.  Increment the module use
+	 * count to reflect that we have another user of the module
+	 * (the new job container).  If we have an error when creating
+	 * the new job, this count will be decremented - since no new 
+	 * job will have been created.
+	 */
+	MOD_INC_USE_COUNT; 
+
+	/* 
+	 * if the job ID - host ID segment is set to DISABLED, we will
+	 * not be creating new jobs.  We don't mark it as an error, but
+	 * the jid value returned will be 0.
+	 */
+	if (jid_hid == DISABLED) {
+		errcode = 0;
+		goto error_return;
+	}
+
+
+#if 0	/* XXX - Use if capable is not present */
+	if (current->euid != 0)
+		return -EPERM;
+#else	
+	if (!capable(CAP_SYS_RESOURCE)) {
+		errcode = -EPERM;
+		goto error_return;
+	}
+#endif
+	if (!create_args) {
+		errcode = -EINVAL;
+		goto error_return;
+	}
+
+	if (copy_from_user(&create, create_args, sizeof(create)))  {
+		errcode = -EFAULT;
+		goto error_return;
+	}
+		
+	/* 
+	 * Allocate some of the memory we might need, before we start
+	 * locking
+	 */
+
+	attached = (struct job_attach *)kmalloc(sizeof(struct job_attach), GFP_KERNEL);
+	if (!attached) {
+		/* error */
+		errcode = -ENOMEM;
+		goto error_return;
+	}
+
+	job = (struct job_entry *)kmalloc(sizeof(struct job_entry), GFP_KERNEL);
+	if (!job) {
+		/* error */
+		errcode = -ENOMEM;
+		goto error_return;
+	}
+
+	/* We keep the old pagg around in case we need it in an error condition.
+	 * If, for example, a job_getjob call fails because the requested JID is
+	 * already in use, we don't want to detach that job.  Having this ability 
+	 * is complicated by the locking.
+	 */
+	down_write(&current->pagg_sem); /* write lock pagg list */
+	old_pagg = pagg_get(current, pagg_hook.name);
+
+	/* 
+	 * Lock the job_table and add the pointers for the new job.
+	 * Since the job is new, we won't need to lock the job.
+	 */
+	JOB_WLOCK(&job_table_sem);  
+
+	/*
+	 * Determine if create should use specified JID or one that is
+	 * generated.
+	 */
+	if (create.jid != 0) {
+		/* We use the specified JID value */
+
+		if (job_getjob(create.jid)) { 
+			/* JID already in use, bail */
+			/* error_return doesn't do JOB_WUNLOCK */
+			JOB_WUNLOCK(&job_table_sem);
+			/* we haven't allocated a new pagg yet so error_return won't unlock 
+			 * this.  We'll unlock here */
+			up_write(&current->pagg_sem);
+			errcode = -EBUSY;
+			/* error_return doesn't touch old_pagg so we don't detach */
+			goto error_return;
+		} else {
+			/* Using specifiec JID */
+			job->jid = create.jid;
+		}
+
+	} else {	
+
+		/* We generate a new JID value */
+		*(iptr_hid(job->jid)) = jid_hid;
+		*(iptr_sid(job->jid)) = current->pid;
+	}
+
+	pagg = pagg_alloc(current, &pagg_hook);
+	if (!pagg) {
+		/* error */
+		up_write(&current->pagg_sem); /* write unlock pagg list */
+		errcode = -ENOMEM;
+		goto error_return;
+	}
+
+	/* Initialize job entry values & lists */
+	job->refcnt = 1;
+	job->user = create.user;
+	job->start = jiffies;
+	job->csa.corehimem = 0;
+	job->csa.virthimem = 0;
+	job->csa.acctfile  = NULL;
+	job->state = RUNNING;
+	init_rwsem(&job->sem);
+	INIT_LIST_HEAD(&job->attached);
+	list_add_tail(&attached->entry, &job->attached);
+	init_waitqueue_head(&job->wait);
+	init_waitqueue_head(&job->zombie);
+	job->waitcnt = 0;
+	job->waitinfo.status = 0;
+
+	/* set link from entry in attached list to task and job entry */
+	attached->task = current;
+	attached->job = job;
+	attached->pagg = pagg;
+	pagg->data = (void *)attached;
+
+	/* Insert new job into front of chain list */
+	list_add_tail(&job->entry, &job_table[ jid_hash(job->jid) ]);;
+	++job_table_refcnt;
+
+	JOB_WUNLOCK(&job_table_sem); 
+	/* At this point, the possible error conditions where we would need the
+	 * old pagg are gone.  So we can remove it.  We remove after we unlock
+	 * because the pagg hook detach function does job table lock of its own.
+	 */
+	if (old_pagg) {
+		/* 
+		 * Detaching paggs for jobs never has a failure case,
+		 * so we don't need to worry about error codes.
+		 */
+		old_pagg->hook->detach(current, old_pagg);
+		pagg_free(old_pagg);
+	} 
+	up_write(&current->pagg_sem); /* write unlock pagg list */
+
+	/* Issue callbacks into accounting subscribers */
+
+#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE)
+	/* - CSA subscriber */
+	if (acct_list[JOB_ACCT_CSA]) {
+		struct job_acctmod *acct = acct_list[JOB_ACCT_CSA];
+		if (acct->module) 
+			__MOD_INC_USE_COUNT(acct->module);
+		if (acct->jobstart) {
+			int res;
+			struct job_csa csa;
+
+			csa.job_id = job->jid;
+			csa.job_uid = job->user;
+			csa.job_start = job->start;
+			csa.job_corehimem = job->csa.corehimem;
+			csa.job_virthimem = job->csa.virthimem;
+			csa.job_acctfile = job->csa.acctfile;
+
+			res = acct->jobstart(JOB_EVENT_START, &csa);
+			if (res < 0) {
+				printk(KERN_WARNING "job_sys_create: CSA -"
+						" jobstart failed.\n");
+			}
+		}
+		if (acct->module) 
+			__MOD_DEC_USE_COUNT(acct->module);
+	}
+#endif /* CONFIG_CSA || defined(CONFIG_CSA_MODULE) */
+
+
+	create.r_jid = job->jid;
+	if (copy_to_user(create_args, &create, sizeof(create))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	DBG_PRINTEXIT(0);
+	return 0;
+
+error_return:
+	DBG_PRINTEXIT(errcode);
+	MOD_DEC_USE_COUNT;	/* no new job, so decrement use count */
+	if (attached) kfree(attached);
+	if (job) kfree(job);
+	if (pagg) {
+		pagg->hook->detach(current, pagg);  /* detach the pagg */
+		pagg_free(pagg);
+		/* This was locked at pagg_alloc call */
+		up_write(&current->pagg_sem); /* write unlock pagg list */
+	}
+	create.r_jid = 0;
+	if (copy_to_user(create_args, &create, sizeof(create))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	return errcode;
+}
+
+
+/*
+ * job_sys_getjid
+ *
+ * Function retrieves the job ID (jid) for the specified process (pid).
+ *
+ * returns 0 on success, negative errno value on exit.
+ */
+static int
+job_sys_getjid(struct job_getjid *getjid_args) 
+{
+	struct job_getjid	   getjid;
+	int		   errcode = 0;
+	struct task_struct *task;
+	DBG_PRINTINIT("job_sys_getjid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&getjid, getjid_args, sizeof(getjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	/* lock the tasklist until we grab the specific task */
+	read_lock(&tasklist_lock);
+
+	if (getjid.pid == current->pid) {
+		task = current;
+	} else {
+		task = find_task_by_pid(getjid.pid);
+	}
+	if (task) {
+		get_task_struct(task); /* Ensure the task doesn't vanish on us */
+		read_unlock(&tasklist_lock); /* unlock the task list */
+		getjid.r_jid = job_getjid(task);
+		free_task_struct(task); /* We're done accessing the task */
+		if (getjid.r_jid == 0) {
+			errcode = -ENODATA;
+		}
+	} else {
+		read_unlock(&tasklist_lock);
+		getjid.r_jid = 0;
+		errcode = -ESRCH;
+	}
+
+
+	DBG_PRINTEXIT(errcode);
+	if (copy_to_user(getjid_args, &getjid, sizeof(getjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return errcode;
+}
+
+
+/* 
+ * job_sys_waitjid
+ *
+ * This job allows a process to wait until a job exits & it returns the 
+ * status information for the last process to exit the job.
+ *
+ * On success returns 0, failure it returns the negative errno value.
+ */
+static int
+job_sys_waitjid(struct job_waitjid *waitjid_args)
+{
+	struct job_waitjid 	waitjid;
+	struct job_entry	*job;
+	int		retcode = 0;
+	DBG_PRINTINIT("job_sys_waitjid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&waitjid, waitjid_args, sizeof(waitjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+
+	waitjid.r_jid = waitjid.stat = 0;
+
+	if (waitjid.options != 0) {
+		retcode = -EINVAL;
+		goto general_return;
+	}
+
+	/* Lock the job table so that the current jobs don't change */
+	JOB_RLOCK(&job_table_sem);
+
+
+	if ((job = job_getjob(waitjid.jid)) == NULL ) {
+		JOB_RUNLOCK(&job_table_sem);
+		retcode = -ENODATA;
+		goto general_return;
+	} 
+
+	/* 
+	 * We got the job we need, we can release the job_table_sem
+	 */
+	JOB_WLOCK(&job->sem);
+	JOB_RUNLOCK(&job_table_sem);
+
+	++job->waitcnt; 
+
+	JOB_WUNLOCK(&job->sem);
+
+	/* We shouldn't hold any locks at this point! The increment of the
+	 * jobs waitcnt will ensure that the job is not removed without
+	 * first notifying this current task */
+	retcode = wait_event_interruptible(job->wait, 
+			job->refcnt == 0);
+
+	if (!retcode) {
+		/* 
+		 * This data is static at this point, we will 
+		 * not need a lock to read it.
+		 */
+		waitjid.stat = job->waitinfo.status;
+		waitjid.r_jid = job->jid;
+	}
+
+	JOB_WLOCK(&job->sem);
+	--job->waitcnt;
+	
+	if (job->waitcnt == 0)  {
+		JOB_WUNLOCK(&job->sem);
+
+		/* 
+		 * We shouldn't hold any locks at this point!  Else, the
+		 * last process in the job will not be able to remove the
+		 * job entry.
+		 *
+		 * That process is stuck waiting for this wake_up, so the
+		 * job shouldn't disappear until after this function call.
+		 * The job entry is not longer in the job table, so no
+		 * other process can get to the entry to foul things up.
+		 */
+		wake_up(&job->zombie);
+	} else {
+		JOB_WUNLOCK(&job->sem);
+	}
+
+general_return:
+
+	DBG_PRINTEXIT(retcode);
+	if (copy_to_user(waitjid_args, &waitjid, sizeof(waitjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return retcode;
+}
+
+
+/*
+ * job_sys_killjid
+ *
+ * This functions allows a signal to be sent to all processes in a job.
+ *
+ * returns 0 on success, negative of errno on failure.
+ */
+static int
+job_sys_killjid(struct job_killjid *killjid_args)
+{
+	struct job_killjid	 killjid;
+	struct job_entry	 *job;
+	struct list_head *attached_entry;
+	struct siginfo   info;
+	int retcode = 0;
+	DBG_PRINTINIT("job_sys_killjid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&killjid, killjid_args, sizeof(killjid))) {
+		retcode = -EFAULT;
+		goto cleanup_0locks_return;
+	}
+
+	killjid.r_val = -1;
+
+	/* A signal of zero is really a status check and is handled as such
+	 * by send_sig_info.  So we have < 0 instead of <= 0 here.
+	 */
+	if (killjid.sig < 0) {
+		retcode = -EINVAL;
+		goto cleanup_0locks_return;
+	} 
+
+	JOB_RLOCK(&job_table_sem);
+	job = job_getjob(killjid.jid);
+	if (!job) {
+		/* Job not found, copy back data & bail with error */
+		retcode = -ENODATA;
+		goto cleanup_1locks_return;
+	}
+
+	JOB_RLOCK(&job->sem);
+
+	/* 
+         * Check capability to signal job.  The signaling user must be
+	 * the owner of the job or have CAP_SYS_RESOURCE capability.
+	 */
+#if 0		/* Use this if not capability is available */
+	if (current->uid != 0) { 
+#else
+	if (!capable(CAP_SYS_RESOURCE)) {
+#endif
+		if (current->uid != job->user) {
+			retcode = -EPERM;
+			goto cleanup_2locks_return;
+		}
+	}
+
+	info.si_signo = killjid.sig;
+	info.si_errno = 0;
+	info.si_code = SI_USER;
+	info.si_pid = current->pid;
+	info.si_uid = current->uid;
+
+	list_for_each(attached_entry, &job->attached) {
+		int err;
+		struct job_attach *attached;
+
+		attached = list_entry(attached_entry, struct job_attach, entry);
+		err = send_sig_info(killjid.sig, &info, 
+				attached->task);
+		if (err != 0) {
+			/* 
+			 * XXX - the "prime" process, or initiating process
+			 * for the job may not be owned by the user.  So,
+			 * we would get an error in this case.  However, we
+			 * ignore the error for that specific process - it
+			 * should exit when all the child processes exit. It 
+			 * should ignore all signals from the user.
+			 *
+			 */
+			if (attached->entry.prev != &job->attached) {
+				retcode = err;
+			}
+		}
+
+	}
+
+cleanup_2locks_return:
+	JOB_RUNLOCK(&job->sem);
+cleanup_1locks_return:
+	JOB_RUNLOCK(&job_table_sem);
+cleanup_0locks_return:
+	killjid.r_val = retcode;
+	
+	DBG_PRINTEXIT(retcode);
+	if (copy_to_user(killjid_args, &killjid, sizeof(killjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return retcode;
+}
+
+
+/*
+ * job_sys_getjidcnt
+ *
+ * Retun the number of jobs currently on the system.
+ *
+ * returns 0 on success & it always succeeds.
+ */ 
+static int
+job_sys_getjidcnt(struct job_jidcnt *jidcnt_args)
+{
+	struct job_jidcnt 	jidcnt;
+	DBG_PRINTINIT("job_sys_getjidcnt");
+
+	DBG_PRINTENTRY();
+
+	/* read lock might be overdoing it in this case */
+	JOB_RLOCK(&job_table_sem);
+	jidcnt.r_val = job_table_refcnt;
+	JOB_RUNLOCK(&job_table_sem);
+
+	DBG_PRINTEXIT(0);
+
+	if (copy_to_user(jidcnt_args, &jidcnt, sizeof(jidcnt))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+		
+
+/*
+ * job_sys_getjidlst
+ *
+ * Get the list of all jids currently on the system (limited by the number of 
+ * jobs there are and the number you say you can accept.
+ */
+static int
+job_sys_getjidlst(struct job_jidlst *jidlst_args)
+{
+	struct job_jidlst	 jidlst;
+	u64	 *jid;
+	struct job_entry	 *job;
+	struct list_head *job_entry;
+	int		 i;
+	int 		 count;
+	DBG_PRINTINIT("job_sys_getjidlst");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&jidlst, jidlst_args, sizeof(jidlst))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+
+	if (jidlst.r_val == 0)  {
+		DBG_PRINTEXIT(0);
+		return 0;
+	}
+
+	jid = (u64 *)kmalloc(sizeof(u64)*jidlst.r_val, GFP_KERNEL);
+	if (!jid) {
+		jidlst.r_val = 0;
+		DBG_PRINTEXIT(-ENOMEM);
+		if (copy_to_user(jidlst_args, &jidlst, sizeof(jidlst))) {
+			DBG_PRINTEXIT(-EFAULT);
+			return -EFAULT;
+		}
+		return -ENOMEM;
+	}
+
+
+	count = 0;
+	JOB_RLOCK(&job_table_sem);
+	for (i = 0; i < HASH_SIZE && count < jidlst.r_val; i++) {
+		list_for_each(job_entry, &job_table[i]) {
+			job = list_entry(job_entry, struct job_entry, entry);
+			jid[count++] = job->jid;
+			if (count == jidlst.r_val) {
+				break;
+			}
+		}
+	}
+	JOB_RUNLOCK(&job_table_sem);
+
+	DBG_PRINTEXIT(0);
+	jidlst.r_val = count;
+
+	for (i = 0; i < count; i++) {
+		if (copy_to_user(jidlst.jid+i, &jid[i], sizeof(u64))) {
+			DBG_PRINTEXIT(-EFAULT);
+			return -EFAULT;
+		}
+	}
+
+	kfree(jid);
+
+	if (copy_to_user(jidlst_args, &jidlst, sizeof(jidlst))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+
+/*
+ * job_sys_getpidcnt
+ *
+ * Get the number of processes currently attached to a specific job.
+ *
+ * returns 0 on success, or negative errno value on failure.
+ */
+static int
+job_sys_getpidcnt(struct job_pidcnt *pidcnt_args)
+{
+	struct job_pidcnt pidcnt;
+	struct job_entry  *job;
+	int	     retcode = 0;
+	DBG_PRINTINIT("job_sys_getpidcnt");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&pidcnt, pidcnt_args, sizeof(pidcnt))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	pidcnt.r_val = 0;
+
+	JOB_RLOCK(&job_table_sem);
+	job = job_getjob(pidcnt.jid);
+	if (!job) {
+		retcode = -ENODATA;
+	} else {
+		/* Read lock might be overdoing it for this case */
+		JOB_RLOCK(&job->sem);
+		pidcnt.r_val = job->refcnt;
+		JOB_RUNLOCK(&job->sem);
+	}
+	JOB_RUNLOCK(&job_table_sem);
+
+	DBG_PRINTEXIT(retcode);
+
+	if (copy_to_user(pidcnt_args, &pidcnt, sizeof(pidcnt))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return retcode;
+}
+
+/*
+ * job_getpidlst
+ *
+ * Get the list of processes (pids) currently attached to the specified
+ * job.  The number of processes provided is limited by the number the user
+ * specivies that they can accept (have memory for) and the number currently
+ * attached.
+ *
+ * returns 0 on success, negative errno value on failure.
+ */
+static int
+job_sys_getpidlst(struct job_pidlst *pidlst_args)
+{
+	struct job_pidlst	 pidlst;
+	struct job_entry	 *job;
+	struct job_attach	 *attached;
+	struct list_head *attached_entry;
+	pid_t		 *pid;
+	int		 max;
+	int		 i;
+	DBG_PRINTINIT("job_sys_getpidlst");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&pidlst, pidlst_args, sizeof(pidlst))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+
+	if (pidlst.r_val == 0) {
+		DBG_PRINTEXIT(0);
+		return 0;
+	}
+
+	max = pidlst.r_val;
+	pidlst.r_val = 0;
+	pid = (pid_t *)kmalloc(sizeof(pid_t)*max, GFP_KERNEL);
+	if (!pid) {
+		DBG_PRINTEXIT(-ENOMEM);
+		if (copy_to_user(pidlst_args, &pidlst, sizeof(pidlst))) {
+			DBG_PRINTEXIT(-EFAULT);
+			return -EFAULT;
+		}
+		return -ENOMEM;
+	}
+
+	JOB_RLOCK(&job_table_sem);
+
+	job = job_getjob(pidlst.jid);
+	if (!job) {
+
+		JOB_RUNLOCK(&job_table_sem);
+
+		DBG_PRINTEXIT(-ENODATA);
+		if (copy_to_user(pidlst_args, &pidlst, sizeof(pidlst))) {
+			DBG_PRINTEXIT(-EFAULT);
+			return -EFAULT;
+		}
+		return -ENODATA;
+	} else {
+
+		JOB_RLOCK(&job->sem);
+		JOB_RUNLOCK(&job_table_sem);
+
+		i = 0;
+		list_for_each(attached_entry, &job->attached) {
+			if (i == max) {
+				break;
+			}
+			attached = list_entry(attached_entry, struct job_attach, 
+					entry);
+			pid[i++] = attached->task->pid;
+		}
+		pidlst.r_val = i;
+
+		JOB_RUNLOCK(&job->sem);
+	}
+
+	for (i = 0; i < pidlst.r_val; i++) {
+		if (copy_to_user(pidlst.pid+i, &pid[i], sizeof(pid_t))) {
+			DBG_PRINTEXIT(-EFAULT);
+			return -EFAULT;
+		}
+	}
+	kfree(pid);
+
+	DBG_PRINTEXIT(0);
+	copy_to_user(pidlst_args, &pidlst, sizeof(pidlst));
+	return 0;
+}
+
+
+/*
+ * job_sys_getuser
+ *
+ * Get the uid of the user that owns the job.
+ *
+ * returns 0 on success, returns negative errno on failure.
+ */
+static int
+job_sys_getuser(struct job_user *user_args)
+{
+	struct job_entry *job;
+	struct job_user user;
+	int        retcode = 0;
+	DBG_PRINTINIT("job_sys_getuser");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&user, user_args, sizeof(user))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return(-EFAULT);
+	}
+
+	user.r_user = 0;
+
+	JOB_RLOCK(&job_table_sem);
+
+	job = job_getjob(user.jid);
+	if (!job) {
+		retcode = -ENODATA;
+	} else {
+		JOB_RLOCK(&job->sem);
+		user.r_user = job->user;
+		JOB_RUNLOCK(&job->sem);
+	}
+
+	JOB_RUNLOCK(&job_table_sem);
+
+	if (copy_to_user(user_args, &user, sizeof(user))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	DBG_PRINTEXIT(retcode);
+	return retcode;
+}
+
+
+/* 
+ * job_sys_getprimepid
+ *
+ * Get the primary process - the oldest process in the job.
+ *
+ * returns 0 on success, negative errno on failure.
+ */
+static int
+job_sys_getprimepid(struct job_primepid *primepid_args)
+{
+	struct job_primepid   primepid;
+	struct job_entry      *job = NULL;
+	struct job_attach     *attached = NULL;
+	int              retcode = 0;
+	DBG_PRINTINIT("getprimepid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&primepid, primepid_args, sizeof(primepid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	primepid.r_pid = 0;
+
+	JOB_RLOCK(&job_table_sem);
+
+	job = job_getjob(primepid.jid);
+	if (!job) {
+		JOB_RUNLOCK(&job_table_sem);
+		/* Job not found, return INVALID VALUE */
+		DBG_PRINTEXIT(-ENODATA);
+		return -ENODATA;
+	}
+
+	/* 
+	 * Job found, now look at first pid entry in the 
+	 * attached list.
+	 */
+	JOB_RLOCK(&job->sem);
+	JOB_RUNLOCK(&job_table_sem);
+	if (list_empty(&job->attached)) {
+		retcode = -ESRCH;
+		primepid.r_pid = 0;
+	}  else {
+		attached = list_entry(job->attached.next, struct job_attach, entry);
+		if (!attached->task) {
+			retcode = -ESRCH;
+		} else {
+			primepid.r_pid = attached->task->pid;
+		}
+	}
+	JOB_RUNLOCK(&job->sem);
+
+	if (copy_to_user(primepid_args, &primepid, sizeof(primepid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	DBG_PRINTEXIT(retcode);
+	return retcode;
+}
+
+
+/* 
+ * job_sys_sethid
+ *
+ * This function is used to set the host ID segment for the job IDs (jid).
+ * If this does not get set, then the jids upper 32 bits will be set to 
+ * 0 and the jid cannot be used reliably in a cluster environment.
+ *
+ * returns -errno value on fail, 0 on success
+ */
+static int
+job_sys_sethid(struct job_sethid *sethid_args)
+{
+	struct job_sethid	sethid;
+	int			errcode = 0;
+	DBG_PRINTINIT("job_sys_sethid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&sethid, sethid_args, sizeof(sethid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	if (!capable(CAP_SYS_RESOURCE)) {
+		errcode = -EPERM;
+		sethid.r_hid = 0;
+		goto cleanup_return;
+	}
+
+	/* 
+	 * Set job_table_sem, so no jobs can be deleted while doing
+	 * this operation.
+	 */
+	JOB_WLOCK(&job_table_sem); 
+
+	sethid.r_hid = jid_hid = sethid.hid;
+
+	JOB_WUNLOCK(&job_table_sem);
+
+cleanup_return:
+	DBG_PRINTEXIT(errcode);
+	if (copy_to_user(sethid_args, &sethid, sizeof(sethid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return errcode;
+}
+
+
+/* 
+ * job_sys_detachjid
+ *
+ * This function is detach all the processes from a job, but allows the 
+ * processes to continue running.  You need CAP_SYS_RESOURCE capability
+ * for this to succeed. Since all processes will be detached, the job will
+ * exit.
+ *
+ * returns -errno value on fail, 0 on success
+ */
+static int
+job_sys_detachjid(struct job_detachjid *detachjid_args)
+{
+	struct job_detachjid	   detachjid;
+	struct job_entry	   *job;
+	struct list_head   *entry;
+	int		   count;
+	int		   errcode = 0;
+	struct task_struct *task;
+	struct pagg *pagg;
+
+	DBG_PRINTINIT("job_sys_detachjid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&detachjid, detachjid_args, sizeof(detachjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	detachjid.r_val = 0;
+
+	if (!capable(CAP_SYS_RESOURCE)) {
+		errcode = -EPERM;
+		goto cleanup_return;
+	}
+
+	/* 
+	 * Set job_table_sem, so no jobs can be deleted while doing
+	 * this operation.
+	 */
+	JOB_WLOCK(&job_table_sem); 
+
+	job = job_getjob(detachjid.jid);
+
+	if (job) {
+
+		JOB_WLOCK(&job->sem);
+
+		/* Mark job as ZOMBIE so no new processes can attach to it */	
+		job->state = ZOMBIE;
+
+		count = job->refcnt;
+
+		/* Okay, no new processes can attach to the job.  We can 
+		 * release the locks on the job_table and job since the only
+		 * way for the job to change now is for tasks to detach and
+		 * the job to be removed.  And this is what we want to happen
+		 */
+		JOB_WUNLOCK(&job_table_sem);
+		JOB_WUNLOCK(&job->sem);
+
+
+		/* Walk through list of attached tasks and unset the 
+		 * pagg entries. 
+		 * 
+		 * We don't test with list_empty because that actually means NO tasks
+		 * left rather than one task.  If we used !list_empty or list_for_each,
+		 * we could reference memory freed by the pagg hook detach function 
+		 * (job_detach).
+		 * 
+		 * We know there is only one task left when job->attached.next and
+		 * job->attached.prev both point to the same place.
+		 */
+		while (job->attached.next != job->attached.prev) {
+			entry = job->attached.next;
+
+			task = (list_entry(entry, struct job_attach, entry))->task;
+			pagg = (list_entry(entry, struct job_attach, entry))->pagg;
+
+			down_write(&task->pagg_sem); /* write lock pagg list */
+			pagg->hook->detach(task, pagg);
+			pagg_free(pagg);
+			up_write(&task->pagg_sem); /* write unlock pagg list */
+
+		}
+		/* At this point, there is only one task left */
+
+		entry = job->attached.next;
+
+		task = (list_entry(entry, struct job_attach, entry))->task;
+		pagg = (list_entry(entry, struct job_attach, entry))->pagg;
+
+		down_write(&task->pagg_sem); /* write lock pagg list */
+		pagg->hook->detach(task, pagg);
+		pagg_free(pagg);
+		up_write(&task->pagg_sem); /* write unlock pagg list */
+			
+		detachjid.r_val = count;
+
+	} else {
+		errcode = -ENODATA;
+		JOB_WUNLOCK(&job_table_sem);
+	}
+
+cleanup_return:
+	DBG_PRINTEXIT(errcode);
+	if (copy_to_user(detachjid_args, &detachjid, sizeof(detachjid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return errcode;
+}
+
+
+/* 
+ * job_sys_detachpid
+ *
+ * This function is detach a process from the job it is attached too, 
+ * but allows the processes to continue running.  You need 
+ * CAP_SYS_RESOURCE capability for this to succeed. 
+ *
+ * returns -errno value on fail, 0 on success
+ */
+static int
+job_sys_detachpid(struct job_detachpid *detachpid_args)
+{
+	struct job_detachpid	   detachpid;
+	struct task_struct *task;
+	struct pagg *pagg;
+	int		   errcode = 0;
+	DBG_PRINTINIT("job_sys_detachpid");
+
+	DBG_PRINTENTRY();
+
+	if (copy_from_user(&detachpid, detachpid_args, sizeof(detachpid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+
+	detachpid.r_jid = 0;
+
+	if (!capable(CAP_SYS_RESOURCE)) {
+		errcode = -EPERM;
+		goto cleanup_return;
+	}
+
+	/* Lock the task list while we find a specific task */
+	read_lock(&tasklist_lock);
+	task = find_task_by_pid(detachpid.pid);
+	if (!task) {
+		errcode = -ESRCH;
+		/* We need to unlock the tasklist here too or the lock is held forever */
+		read_unlock(&tasklist_lock);
+		goto cleanup_return;
+	}
+
+	/* We have a valid task now */
+	get_task_struct(task); /* Ensure the task doesn't vanish on us */
+	read_unlock(&tasklist_lock);
+	down_write(&task->pagg_sem); /* write lock pagg list */
+
+	pagg = pagg_get(task, pagg_hook.name);
+	if (pagg) {
+		detachpid.r_jid = ((struct job_attach *)pagg->data)->job->jid;
+		pagg->hook->detach(task, pagg);
+		pagg_free(pagg);
+	} else {
+		errcode = -ENODATA;
+	}
+	free_task_struct(task);  /* Done accessing the task */
+	up_write(&task->pagg_sem); /* write unlock pagg list */
+
+cleanup_return:
+	DBG_PRINTEXIT(errcode);
+	if (copy_to_user(detachpid_args, &detachpid, sizeof(detachpid))) {
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;
+	}
+	return errcode;
+}
+
+
+/*
+ * job_register_acct
+ *
+ * This function is used by modules that are registering to provide job 
+ * accounting services.
+ *
+ * returns -errno value on fail, 0 on success.
+ */
+int 
+job_register_acct(struct job_acctmod *am)
+{
+	DBG_PRINTINIT("job_register_acct");
+
+	DBG_PRINTENTRY();
+
+	if (!am) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;	/* error, invalid value */
+	}
+	if (am->type < 0 || am->type > (JOB_ACCT_COUNT-1)) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;	/* error, invalid value */
+	}
+
+	JOB_WLOCK(&acct_list_sem);
+	if (acct_list[am->type] != NULL) {
+		JOB_WUNLOCK(&acct_list_sem);
+		DBG_PRINTEXIT(-EBUSY);
+		return -EBUSY;	/* error, duplicate entry */
+	}
+
+	acct_list[am->type] = am;
+	JOB_WUNLOCK(&acct_list_sem);
+	DBG_PRINTEXIT(0);
+	return 0;
+}
+
+
+/*
+ * job_unregister_acct
+ *
+ * This is used by accounting modules to unregister with the job module as
+ * subscribers for job accounting information.
+ *
+ * Returns -errno on failure and 0 on success.
+ */
+int 
+job_unregister_acct(struct job_acctmod *am)
+{
+	DBG_PRINTINIT("job_unregister_acct");
+
+	DBG_PRINTENTRY();
+
+	if (!am) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;	/* error, invalid value */
+	}
+	if (am->type < 0 || am->type > (JOB_ACCT_COUNT-1))  {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;	/* error, invalid value */
+	}
+
+	JOB_WLOCK(&acct_list_sem);
+	if (acct_list[am->type] != am) {
+		JOB_WUNLOCK(&acct_list_sem);
+		DBG_PRINTEXIT(-EFAULT);
+		return -EFAULT;	/* error, not matching entry */
+	}
+
+	acct_list[am->type] = NULL;
+	JOB_WUNLOCK(&acct_list_sem);
+	DBG_PRINTEXIT(0);
+	return 0;
+}
+
+/*
+ * job_getjid
+ *
+ * This function will return the Job ID for the given task.  If
+ * the task is not attached to a job, then 0 is returned.
+ *
+ */
+u64 job_getjid(struct task_struct *task)
+{
+	struct pagg *pagg = NULL;
+	struct job_entry	   *job = NULL;
+	u64	   jid = 0;
+	DBG_PRINTINIT("job_getjid");
+
+	DBG_PRINTENTRY();
+
+	down_read(&task->pagg_sem); /* lock pagg list */
+	pagg = pagg_get(task, pagg_hook.name);
+	if (pagg) {
+		job = ((struct job_attach *)pagg->data)->job;
+		JOB_RLOCK(&job->sem);
+		jid = job->jid;
+		JOB_RUNLOCK(&job->sem);
+	}
+	up_read(&task->pagg_sem);
+
+	DBG_PRINTEXIT((int)jid);
+	return jid;
+}
+
+
+/*
+ * job_getacct
+ *
+ * This function is used by accounting subscribers to get accounting 
+ * information about a job.
+ *
+ * The caller must supply the Job ID (jid) that specifies the job. The
+ * "type" argument indicates the type of accounting data to be returned.
+ * The data will be returned in the memory accessed via the data pointer
+ * argument.  The data pointer is void so that this function interface
+ * can handle different types of accounting data.
+ */
+int job_getacct(u64 jid, int type, void *data)
+{
+	struct job_entry	*job;
+	DBG_PRINTINIT("job_getacct");
+
+	DBG_PRINTENTRY();
+
+	if (!data) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;
+	}
+
+	if (!jid) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;
+	}
+
+	JOB_RLOCK(&job_table_sem);
+	job = job_getjob(jid);
+	if (!job) {
+		JOB_RUNLOCK(&job_table_sem);
+		DBG_PRINTEXIT(-ENODATA);
+		return -ENODATA;
+	}
+
+	JOB_RLOCK(&job->sem);
+	JOB_RUNLOCK(&job_table_sem);
+
+	switch (type) {
+#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE)
+		case JOB_ACCT_CSA: 
+		{
+			struct job_csa *csa = (struct job_csa *)data;
+
+			csa->job_id = job->jid;
+			csa->job_uid = job->user;
+			csa->job_start = job->start;
+			csa->job_corehimem = job->csa.corehimem;
+			csa->job_virthimem = job->csa.virthimem;
+			csa->job_acctfile = job->csa.acctfile;
+			break;
+		}
+#endif
+		default:
+			JOB_RUNLOCK(&job->sem);
+			DBG_PRINTEXIT(-EINVAL);
+			return -EINVAL;
+			break;
+	}
+	JOB_RUNLOCK(&job->sem);
+	DBG_PRINTEXIT(0);
+	return 0;
+}
+
+/*
+ * job_setacct
+ *
+ * This function is used by accounting subscribers to set specific
+ * accounting information in the job (so that the job remembers it
+ * in relation to a specific job).
+ *
+ * The job is identified by the jid argument.  The type indicates the
+ * type of accounting the information is associated with.  The subfield
+ * is a bitmask that indicates exactly what subfields are to be changed.
+ * The data that is used to set the values is supplied by the data pointer.
+ * The data pointer is a void type so that the interface can be used for
+ * different types of accounting information.
+ */
+int job_setacct(u64 jid, int type, int subfield, void *data)
+{
+	struct job_entry	*job;
+	DBG_PRINTINIT("job_setacct");
+
+	DBG_PRINTENTRY();
+
+	if (!data) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;
+	}
+
+	if (!jid) {
+		DBG_PRINTEXIT(-EINVAL);
+		return -EINVAL;
+	}
+
+	JOB_RLOCK(&job_table_sem);
+	job = job_getjob(jid);
+	if (!job) {
+		JOB_RUNLOCK(&job_table_sem);
+		DBG_PRINTEXIT(-ENODATA);
+		return -ENODATA;
+	}
+
+	JOB_RLOCK(&job->sem);
+	JOB_RUNLOCK(&job_table_sem);
+
+	switch (type) {
+#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE)
+		case JOB_ACCT_CSA:
+		{
+			struct job_csa *csa = (struct job_csa *)data;
+			
+			if (subfield & JOB_CSA_ACCTFILE) {
+				job->csa.acctfile = csa->job_acctfile;
+			}
+			break;
+		}
+#endif
+		default:
+			JOB_RUNLOCK(&job->sem);
+			DBG_PRINTEXIT(-EINVAL);
+			return -EINVAL;
+			break;
+	}
+	JOB_RUNLOCK(&job->sem);
+	DBG_PRINTEXIT(0);
+	return 0;
+}
+
+
+
+/*
+ * job_syscall
+ *
+ * Function to handle job syscall requests.
+ *
+ * Returns 0 on success and -(ERRNO VALUE) upon failure.
+ */
+int
+job_syscall(unsigned int request, unsigned long data)
+{                 
+	int rc=0;
+
+	DBG_PRINTINIT("job_syscall");
+
+	DBG_PRINTENTRY();
+
+	switch (request) {
+		case JOB_CREATE:
+			rc = job_sys_create((struct job_create *)data);
+			break;
+		case JOB_ATTACH:
+		case JOB_DETACH:
+			/* RESERVED */
+			rc = -EBADRQC;
+			break;
+		case JOB_GETJID:
+			rc = job_sys_getjid((struct job_getjid *)data);
+			break;
+		case JOB_WAITJID:
+			rc = job_sys_waitjid((struct job_waitjid *)data);
+			break;
+		case JOB_KILLJID:
+			rc = job_sys_killjid((struct job_killjid *)data);
+			break;
+		case JOB_GETJIDCNT:
+			rc = job_sys_getjidcnt((struct job_jidcnt *)data);
+			break;
+		case JOB_GETJIDLST:
+			rc = job_sys_getjidlst((struct job_jidlst *)data);
+			break;
+		case JOB_GETPIDCNT:
+			rc = job_sys_getpidcnt((struct job_pidcnt *)data);
+			break;
+		case JOB_GETPIDLST:
+			rc = job_sys_getpidlst((struct job_pidlst *)data);
+			break;
+		case JOB_GETUSER:
+			rc = job_sys_getuser((struct job_user *)data);
+			break;
+		case JOB_GETPRIMEPID:
+			rc = job_sys_getprimepid((struct job_primepid *)data);
+			break;
+		case JOB_SETHID:
+			rc = job_sys_sethid((struct job_sethid *)data);
+			break;
+		case JOB_DETACHJID:
+			rc = job_sys_detachjid((struct job_detachjid *)data);
+			break;
+		case JOB_DETACHPID:
+			rc = job_sys_detachpid((struct job_detachpid *)data);
+			break;
+		case JOB_SETJLIMIT:
+		case JOB_GETJLIMIT:
+		case JOB_GETJUSAGE:
+		case JOB_FREE:
+		default:
+			rc = -EBADRQC;
+			break;
+	}
+
+	DBG_PRINTEXIT(rc);
+	return rc;
+}
+
+
+/*
+ * job_ioctl
+ *
+ * Function to handle job ioctl call requests.
+ *
+ * Returns 0 on success and -(ERRNO VALUE) upon failure.
+ */
+int
+job_ioctl(struct inode *inode, struct file *file, unsigned int request,
+	  unsigned long data)        
+{                 
+	return job_syscall(request, data);
+}
+
+
+/* 
+ * init_module
+ *
+ * This function is called when a module is inserted into a kernel. This
+ * function allocates any necessary structures and sets initial values for
+ * module data.
+ *
+ * If the function succeeds, then 0 is returned.  On failure, -1 is returned.
+ */
+static int __init
+init_job(void) 
+{
+	int i,rc;
+
+
+	/* Initialize the job table chains */
+	for (i = 0; i < HASH_SIZE; i++) {
+		INIT_LIST_HEAD(&job_table[i]);
+	}
+
+	/* Initialize the list for accounting subscribers */
+	for (i = 0; i < JOB_ACCT_COUNT; i++) {
+		acct_list[i] = NULL;
+	}
+
+	/* Get hostID string and fill in jid_template hostID segment */
+	if (hid) {
+		jid_hid = (int)simple_strtoul(hid, &hid, 16);
+	} else {
+		jid_hid = 0;
+	}
+
+	rc = pagg_hook_register(&pagg_hook);
+	if (rc < 0) {
+		return -1;
+	}
+
+	/* Setup our /proc entry file */
+	job_proc_entry = create_proc_entry(JOB_PROC_ENTRY,
+		S_IFREG | S_IRUGO, &proc_root);
+
+	if (!job_proc_entry) {
+		pagg_hook_unregister(&pagg_hook);
+		return -1;
+	}
+
+	job_proc_entry->proc_fops = &job_file_ops;
+	job_proc_entry->proc_iops = NULL;
+
+
+	return 0;
+}
+module_init(init_job);
+
+/*
+ * cleanup_module
+ *
+ * This function is called to cleanup after a module when it is removed.
+ * All memory allocated for this module will be freed.
+ *
+ * This function does not take any inputs or produce and output.
+ */
+static void __exit
+cleanup_job(void)
+{
+	remove_proc_entry(JOB_PROC_ENTRY, &proc_root);
+	pagg_hook_unregister(&pagg_hook);
+	return;
+}
+module_exit(cleanup_job);
+
+EXPORT_SYMBOL(job_register_acct);
+EXPORT_SYMBOL(job_unregister_acct);
+EXPORT_SYMBOL(job_getjid);
+EXPORT_SYMBOL(job_getacct);
+EXPORT_SYMBOL(job_setacct);
diff -pNaru linux-2.4.27/kernel/ksyms.c linux/kernel/ksyms.c
--- linux-2.4.27/kernel/ksyms.c	2004-02-18 05:36:32 -08:00
+++ linux/kernel/ksyms.c	2004-12-01 11:30:26 -08:00
@@ -52,6 +52,8 @@
 #include <linux/firmware.h>
 #include <asm/checksum.h>
 
+#include <linux/csa_internal.h>
+
 #if defined(CONFIG_PROC_FS)
 #include <linux/proc_fs.h>
 #endif
@@ -618,6 +620,9 @@ EXPORT_SYMBOL(unshare_files);
 
 /* debug */
 EXPORT_SYMBOL(dump_stack);
+
+/* csa job accounting */
+EXPORT_SYMBOL(do_csa_acct);
 
 /* To match ksyms with System.map */
 extern const char _end[];
diff -pNaru linux-2.4.27/kernel/pagg.c linux/kernel/pagg.c
--- linux-2.4.27/kernel/pagg.c	1969-12-31 16:00:00 -08:00
+++ linux/kernel/pagg.c	2004-12-01 11:30:26 -08:00
@@ -0,0 +1,385 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:  This file, kernel/pagg.c, contains the routines used
+ *               to implement process aggregates (paggs).  The pagg
+ *               extends the task_struct to allow for various process
+ *               aggregation continers.  Examples of such containers
+ *               include "jobs" and cluster applications IDs.  Process
+ *               sessions and groups could have been implemented using
+ *               paggs (although there would be little purpose in
+ *               making that change at this juncture).  The pagg
+ *               structure maintains pointers to callback functions and
+ *               data strucures maintained in modules that have
+ *               registered with the kernel as pagg container
+ *               providers.
+ */
+
+#include <linux/config.h>
+
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/pagg.h>
+#include <asm/semaphore.h>
+
+/* list of pagg hook entries that reference the "module" implementations */
+static LIST_HEAD(pagg_hook_list);
+static DECLARE_RWSEM(pagg_hook_list_sem);
+
+
+/* 
+ * pagg_get
+ *
+ * Given a pagg_list list structure, this function will return
+ * a pointer to the pagg struct that matches the search
+ * key.  If the key is not found, the function will return NULL.
+ *
+ * The caller should hold at least a read lock on the pagg_list
+ * for task using down_read(&task->pagg_list.sem).
+ */
+struct pagg *
+pagg_get(struct task_struct *task, char *key)
+{
+	struct pagg *pagg;
+
+	list_for_each_entry(pagg, &task->pagg_list, entry) {
+		if (!strcmp(pagg->hook->name,key))
+			return pagg;
+	}
+	return NULL;
+}
+
+
+/*
+ * pagg_alloc
+ *
+ * Given a task and a pagg hook, this function will allocate
+ * a new pagg structure, initialize the settings, and insert the pagg into
+ * the pagg_list for the task.
+ *
+ * The caller for this function should hold at least a read lock on the
+ * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be 
+ * removed. If this function was called from the pagg module (usually the
+ * case), then the caller need not hold this lock. The caller should hold 
+ * a write lock on for the tasks pagg_sem.  This can be locked using 
+ * down_write(&task->pagg_sem)
+ */
+struct pagg *
+pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook)
+{
+	struct pagg *pagg;
+
+	pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL);
+	if (!pagg)
+		return NULL;
+
+	pagg->hook = pagg_hook;
+	pagg->data = NULL;
+	atomic_inc(&pagg_hook->refcnt);  /* Increase hook's reference count */
+	list_add_tail(&pagg->entry, &task->pagg_list);
+	return pagg;
+}
+
+
+/*
+ * pagg_free
+ *
+ * This function will ensure the pagg is deleted form 
+ * the list of pagg entries for the task. Finally, the memory for the 
+ * pagg is discarded.
+ *
+ * The caller of this function should hold a write lock on the pagg_sem
+ * for the task. This can be locked using down_write(&task->pagg_sem).
+ *
+ * Prior to calling pagg_free, the pagg should have been detached from the
+ * pagg container represented by this pagg.  That is usually done using
+ * p->hook->detach(task, pagg);
+ */
+void
+pagg_free(struct pagg *pagg) 
+{
+	atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */
+	list_del(&pagg->entry);
+	kfree(pagg);
+}
+
+
+/*
+ * get_pagg_hook
+ *
+ * Given a pagg hook name key, this function will return a pointer
+ * to the pagg_hook struct that matches the name.
+ * 
+ * You should hold either the write or read lock for pagg_hook_list_sem
+ * before using this function.  This will ensure that the pagg_hook_list
+ * does not change while iterating through the list entries.
+ */
+static struct pagg_hook *
+get_pagg_hook(char *key)
+{
+	struct pagg_hook *pagg_hook;
+
+	list_for_each_entry(pagg_hook, &pagg_hook_list, entry) {
+		if (!strcmp(pagg_hook->name, key)) {
+			return pagg_hook;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * pagg_hook_register
+ *
+ * Used to register a new pagg hook and enter it into the pagg_hook_list.
+ * The service name for a pagg hook is restricted to 32 characters.
+ *
+ * In the future an initialization function may also be defined so that all
+ * existing tasks can be assigned to a default pagg entry for the hook.
+ * However, this would require iterating through the tasklist.  To do that
+ * requires that the tasklist_lock be read locked.  Since the initialization
+ * function might be in a module, and therefore it might sleep (implementors
+ * decision), holding the tasklist_lock seems like a bad idea. It may be a
+ * requirement that the initialization function will be strictly forbidden
+ * from locking - by gentlemans agreement... 
+ *
+ * If a memory error is encountered, the pagg hook is unregistered and any
+ * tasks that have been attached to the initial pagg container are detached
+ * from that container.
+ */
+int
+pagg_hook_register(struct pagg_hook *pagg_hook_new)
+{
+	struct pagg_hook *pagg_hook = NULL;
+
+	/* ADD NEW PAGG MODULE TO ACCESS LIST */
+	if (!pagg_hook_new)
+		return -EINVAL;			/* error */
+	if (!list_empty(&pagg_hook_new->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) 
+		return -EINVAL;			/* error */
+
+	/* Try to insert new hook entry into the pagg hook list */
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_new->name);
+
+	if (pagg_hook) {
+		up_write(&pagg_hook_list_sem);
+		printk(KERN_WARNING "Attempt to register duplicate"
+				" PAGG support (name=%s)\n", pagg_hook_new->name);
+		return -EBUSY;
+	}
+
+	/* Okay, we can insert into the pagg hook list */
+	list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+	/* set the ref count to zero */
+	atomic_set(&pagg_hook_new->refcnt, 0);
+	/* printk("DEBUG - pagg hook register - refcnt now: %d\n", 
+		     atomic_read(&pagg_hook_new->refcnt)); */
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
+			pagg_hook_new->name);
+
+	return 0;					/* success */
+
+}
+
+
+/*
+ * pagg_hook_unregister
+ *
+ * Used to unregister pagg hooks and remove them from the pagg_hook_list.
+ * Once the pagg hook entry in the pagg_hook_list is found, we check if 
+ * the pagg hook is still in use.  
+ */
+int
+pagg_hook_unregister(struct pagg_hook *pagg_hook_old)
+{
+	struct pagg_hook *pagg_hook;
+
+	/* Check the validity of the arguments */
+	if (!pagg_hook_old)
+		return -EINVAL;			/* error */
+	if (list_empty(&pagg_hook_old->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_old->name == NULL)
+		return -EINVAL;			/* error */
+
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_old->name);
+
+	/* printk("DEBUG - pagg hook unregister - refcnt now: %d\n", 
+	 *	     atomic_read(&pagg_hook->refcnt));
+	 */
+
+	if (pagg_hook && pagg_hook == pagg_hook_old) {
+		/* Is the pagg hook busy?  Check if the refcnt is zero */
+		if (atomic_read(&pagg_hook->refcnt) != 0) {
+			up_write(&pagg_hook_list_sem);
+			printk(KERN_INFO "Failed attempt to unregister a PAGG hook from: %s\n", pagg_hook_old->name);
+			return -EBUSY;
+		}
+		list_del_init(&pagg_hook->entry);
+		up_write(&pagg_hook_list_sem);
+
+		printk(KERN_INFO "Unregistering PAGG support for"
+				" (name=%s)\n", pagg_hook_old->name);
+
+		return 0;			/* success */
+	}
+
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)"
+			" failed - not found\n", pagg_hook_old->name);
+	
+	return -EINVAL;				/* error */
+}
+
+
+/*
+ * __pagg_attach
+ *
+ * Used to attach a new task to the same pagg containers to which it's parent
+ * is attached.
+ *
+ * The "from" argument is the parent task.  The "to" argument is the child
+ * task. 
+ *
+ */
+int __pagg_attach(struct task_struct *to_task, struct task_struct *from_task)
+{
+	int  		   retcode = 0;
+	struct pagg *from_pagg;
+
+	/* lock the parents pagg_list we are copying from */
+	down_read(&from_task->pagg_sem); /* read lock the pagg list */
+
+	list_for_each_entry(from_pagg, &from_task->pagg_list, entry) {
+		struct pagg *to_pagg = NULL;
+
+		to_pagg = pagg_alloc(to_task, from_pagg->hook);
+		if (!to_pagg) {
+			retcode = -ENOMEM;
+			goto error_return;
+		}
+		retcode = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data);
+		if (retcode != 0) {
+			/* attach should issue error message */
+			goto error_return;
+		}
+	}
+
+	up_read(&from_task->pagg_sem); /* unlock the pagg list */
+
+	return 0;					/* success */
+
+  error_return:
+	/* 
+	 * Clean up all the pagg attachments made on behalf of the new
+	 * task.  Set new task pagg ptr to NULL for return.
+	 */
+	up_read(&from_task->pagg_sem); /* unlock the pagg list */
+	__pagg_detach(to_task);
+	return retcode;				/* failure */
+}
+
+/*
+ * __pagg_detach
+ *
+ * Used to detach a task from all pagg containers to which it is attached.
+ * 
+ * list_for_each used here because we need to reset the list after a 
+ * pagg is detached.
+ */
+int
+__pagg_detach(struct task_struct *task)
+{
+	struct pagg *pagg;
+	struct pagg *paggtmp;
+	int retcode = 0;
+	int rettmp = 0;
+
+	/* Remove ref. to paggs from task immediately */
+	down_write(&task->pagg_sem); /* write lock pagg list */
+
+	list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) {
+		rettmp = pagg->hook->detach(task, pagg);
+		if (rettmp) {
+			/* an error message should be logged in free_pagg */
+			retcode = rettmp;
+		}
+		pagg_free(pagg);
+	}
+                                                                                
+	up_write(&task->pagg_sem); /* write unlock the pagg list */
+                                                                                
+	return retcode;   /* 0 = success, else return last code for failure */
+}
+
+
+/*
+ * __pagg_exec
+ *
+ * Used to when a process that is in a pagg container does an exec.
+ *
+ * The "from" argument is the task.  The "name" argument is the name
+ * of the process being exec'ed.
+ *
+ */
+int __pagg_exec(struct task_struct *task) 
+{
+	struct pagg	*pagg;
+
+	/* lock the parents pagg_list we are copying from */
+	down_read(&task->pagg_sem); /* lock the pagg list */
+
+	list_for_each_entry(pagg, &task->pagg_list, entry) {
+		if (pagg->hook->exec) /* conditional because it's optional */
+			pagg->hook->exec(task, pagg);
+	}
+
+	up_read(&task->pagg_sem); /* unlock the pagg list */
+	return 0;
+}
+
+
+EXPORT_SYMBOL(pagg_get);
+EXPORT_SYMBOL(pagg_alloc);
+EXPORT_SYMBOL(pagg_free);
+EXPORT_SYMBOL(pagg_hook_register);
+EXPORT_SYMBOL(pagg_hook_unregister);
diff -pNaru linux-2.4.27/mm/memory.c linux/mm/memory.c
--- linux-2.4.27/mm/memory.c	2003-11-28 10:26:21 -08:00
+++ linux/mm/memory.c	2004-12-01 11:30:26 -08:00
@@ -45,6 +45,7 @@
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
 #include <linux/module.h>
+#include <linux/csa_internal.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -396,6 +397,8 @@ void zap_page_range(struct mm_struct *mm
 		mm->rss -= freed;
 	else
 		mm->rss = 0;
+	/* no-op unless CONFIG_CSA is set */
+	csa_update_integrals();
 	spin_unlock(&mm->page_table_lock);
 }
 
@@ -981,8 +984,12 @@ static int do_wp_page(struct mm_struct *
 	 */
 	spin_lock(&mm->page_table_lock);
 	if (pte_same(*page_table, pte)) {
-		if (PageReserved(old_page))
+		if (PageReserved(old_page)) {
 			++mm->rss;
+			/* no-op if CONFIG_CSA not set */
+			csa_update_integrals();
+			update_mem_hiwater();
+		}
 		break_cow(vma, new_page, address, page_table);
 		lru_cache_add(new_page);
 
@@ -1167,6 +1174,10 @@ static int do_swap_page(struct mm_struct
 		remove_exclusive_swap_page(page);
 
 	mm->rss++;
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
+  
 	pte = mk_pte(page, vma->vm_page_prot);
 	if (write_access && can_share_swap_page(page))
 		pte = pte_mkdirty(pte_mkwrite(pte));
@@ -1213,6 +1224,9 @@ static int do_anonymous_page(struct mm_s
 			return 1;
 		}
 		mm->rss++;
+		/* no-op if CONFIG_CSA not set */
+		csa_update_integrals();
+		update_mem_hiwater();
 		flush_page_to_ram(page);
 		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
 		lru_cache_add(page);
@@ -1289,6 +1303,10 @@ static int do_no_page(struct mm_struct *
 	if (pte_none(*page_table)) {
 		if (!PageReserved(new_page))
 			++mm->rss;
+		/* no-op if CONFIG_CSA not set */
+		csa_update_integrals();
+		update_mem_hiwater();
+
 		flush_page_to_ram(new_page);
 		flush_icache_page(vma, new_page);
 		entry = mk_pte(new_page, vma->vm_page_prot);
diff -pNaru linux-2.4.27/mm/mmap.c linux/mm/mmap.c
--- linux-2.4.27/mm/mmap.c	2004-02-18 05:36:32 -08:00
+++ linux/mm/mmap.c	2004-12-01 11:30:26 -08:00
@@ -591,6 +591,9 @@ out:	
 		mm->locked_vm += len >> PAGE_SHIFT;
 		make_pages_present(addr, addr + len);
 	}
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
 	return addr;
 
 unmap_and_free_vma:
@@ -1113,6 +1116,9 @@ out:
 		mm->locked_vm += len >> PAGE_SHIFT;
 		make_pages_present(addr, addr + len);
 	}
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
 	return addr;
 }
 
diff -pNaru linux-2.4.27/mm/mremap.c linux/mm/mremap.c
--- linux-2.4.27/mm/mremap.c	2004-04-14 06:05:41 -07:00
+++ linux/mm/mremap.c	2004-12-01 11:30:26 -08:00
@@ -9,6 +9,7 @@
 #include <linux/shm.h>
 #include <linux/mman.h>
 #include <linux/swap.h>
+#include <linux/csa_internal.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -205,6 +206,10 @@ static inline unsigned long move_vma(str
 				make_pages_present(new_addr + old_len,
 						   new_addr + new_len);
 		}
+		/* no-op if CONFIG_CSA not set */
+		csa_update_integrals();
+		update_mem_hiwater();
+
 		return new_addr;
 	}
 	if (allocated_vma)
@@ -332,6 +337,9 @@ unsigned long do_mremap(unsigned long ad
 				make_pages_present(addr + old_len,
 						   addr + new_len);
 			}
+			/* no-op if CONFIG_CSA not set */
+			csa_update_integrals();
+			update_mem_hiwater();
 			ret = addr;
 			goto out;
 		}
diff -pNaru linux-2.4.27/mm/swapfile.c linux/mm/swapfile.c
--- linux-2.4.27/mm/swapfile.c	2003-08-25 04:44:44 -07:00
+++ linux/mm/swapfile.c	2004-12-01 11:30:26 -08:00
@@ -14,6 +14,7 @@
 #include <linux/vmalloc.h>
 #include <linux/pagemap.h>
 #include <linux/shm.h>
+#include <linux/csa_internal.h>
 
 #include <asm/pgtable.h>
 
@@ -375,6 +376,9 @@ static inline void unuse_pte(struct vm_a
 	set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	swap_free(entry);
 	++vma->vm_mm->rss;
+	/* no-op if CONFIG_CSA not set */
+	csa_update_integrals();
+	update_mem_hiwater();
 }
 
 /* mmlist_lock and vma->vm_mm->page_table_lock are held */
diff -pNaru linux-2.4.27/mm/vmscan.c linux/mm/vmscan.c
--- linux-2.4.27/mm/vmscan.c	2004-02-18 05:36:32 -08:00
+++ linux/mm/vmscan.c	2004-12-01 11:30:26 -08:00
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/highmem.h>
 #include <linux/file.h>
+#include <linux/csa_internal.h>
 
 #include <asm/pgalloc.h>
 
@@ -120,6 +121,8 @@ set_swap_pte:
 		set_pte(page_table, swp_entry_to_pte(entry));
 drop_pte:
 		mm->rss--;
+		/* no-op if CONFIG_CSA not set */
+		csa_update_integrals();
 		UnlockPage(page);
 		{
 			int freeable = page_count(page) - !!page->buffers <= 2;
