EECS Instructional Support, University of California at Berkeley
[ ISG home page ] [ who we are ] [ send us email ] [ search ] [ FAQ ]


                   University of California at Berkeley
         Department of Electrical Engineering & Computer Science
                   Instructional Systems Support Group

/share/b/pub/reports/manager/Spring_1994


           Report on EECS Instructional Computing Facilities
           -------------------------------------------------
                          Spring Semester 1994


by: Kevin Mullally, Manager of EECS Instructional Systems


week of May 30-April 3, 1994:

  This was the week of Spring Break.  Systems work was done to create more
  user disk space, and to reduce the loads on Ara and Cory.  A new file
  server, Bard, was added.  Bard is a Sun4 that was given to Instruction
  by the Robotics group (it was formerly Zeus and Zephyr).

  The /share/a, /share/b and /cad partitions were moved from Ara to Bard,
  and a second file server (Congo) was set up for the Ara clients.  The
  central mail service for the Instructional system in Cory Hall was moved
  off of Cory to a decicated mail server machine (Pasteur).

  A new 670 MB filesystem was provided for sole use by CS162.  The /cad
  filesystem was increased by 300 MB for use by CS250.

  Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am.
  Cory crashed and rebooted on the 4/3 at 5:30pm.


week of April 4-10, 1994:

  Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm.
  Cory crashed and rebooted on the 4/9 at 6pm.

  A number of users complainted about email forgeries.  

  We received a complaint from outside the university about an offensive
  posting to net news.  Legal action was threatened.  We disabled the
  account of the student who was responsible.
  

week of April 11-17, 1994:
  
  Bard was down from 11am-7pm on 4/11 due to a hardware failure.  These
  symtoms resulted:
    - X11 not available for Ara clients (fixed around 3pm)
    - Workview not available (fixed about 6pm)
    - scm not available (fixed about 6pm)
    - some users' logins would fail (fixed about 6pm)
    - Hspice not available (fixed about 7pm)
  
  Volga crashed and rebooted itself on the afternoon of 4/12.
  Danube crashed and rebooted itself on 4/14 at 4am.
  The Snake clients have been experiencing reboots (from crashes or from 
  users?) frequently at night.  This was observed after the fact to have
  happened on 4/6 at about 10:05am,  4/9 at about 11:10am and 2:05pm, and
  4/16 between 12am and 11pm.

  The Internet address for Moby was changed, and that broke the license for
  "codecenter" for a couple of days. 

  There were a number of stolen mouse balls; most were returned.  This is
  happening with increasing frequency.

  5-6 users complainted about people hogging the workstations, either by
  playing games or running screen locks.  These behaviors are prohibited,
  but it is difficult to police against them.


week of April 18-24, 1994:

  Danube crashed and rebooted:
	4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm
  Volga crashed and rebooted 4/22 at 11am.
  Po crashed and rebooted 4/23 at 4pm.
  There were a number of "borrowed" mouse balls, one stolen mouse.
  The /home/e filesystem (contains the CS9E accounts) was full for about 24
  hours between 4/20 and 4/21.  The sys admins cleared some space.
  
week of April 25-May 1, 1994:

  Danube crashed and rebooted:
	4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm

  Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk).

  Parker started denying much of its NFS service on 4/29 at about 3pm.
  At 4:30pm it was rebooted to clear the problem.   Users with home dirs
  on Parker were affected:  their attempts to compile (gcc) and run
  HSPICE were thwarted by "NFS time out" and "can't start a new shell"
  error messages.  The problem was cleared by 4:45pm.

  Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am

  Users are complaining increasingly about other users who leave screen
  lock programs running on idle workstations.  There is no safe way for 
  the sys admins to detect this and prevent it, so we must rely upon
  peer pressure and reports to prevent it.


week of May2-May 8, 1994:

  This is the final week of the semester, before exams, and the workstation
  labs are extemely crowded.   There have been dozens of complaints to
  "root" about other students who leave screen locks running on our work-
  staions for hours.   We have temporarily turned off some accounts for 
  that when we catch them.

  Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm.

  Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am.

  On May 2, we disabled further logins to Danube, which should reduce the 
  crashes (to 0, we hope).  The users will get an explanation when they try 
  to login.  They can still access their directories on Danube over the net 
  from other systems.  Danube is running an older version of the operating
  system, and until we upgrade that, the vendor is unable to diagnose the
  cause of the crashes.

  Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm.

  We are pushing DEC to look closely at the diagnostic kernel that we have
  installed on Po, in the hopes that they will identify and fix the cause
  of these crashes.   We suspect that they are triggered by socket-related
  coding in the final CS162 project, and we have asked those students to
  avoid executing their "nachos" code on Po.
	
  Torus started denying much of its NFS service on the evening of 5/1.  It
  failed to reboot properly the next morning; it seems to have a bad system
  disk.   We replaced it with another disk (taken from a workstation) and
  the CS184 and CS284 home directories were available again by about 4pm 
  on 5/2.   The Iris workstations were available again by about noon on
  5/3.


week of May 9-May 15, 1994:

  Po crashed and rebooted: 5/11 at 8am.

  Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm.

  Danube crashed and rebooted: 5/09 at 10pm.   Users had been allowed to
  login there again, by accident.   While logins have been denied, Danube
  has not crashed.  Logins are now denied again.

  We disciplined perhaps a dozen students for leaving screen locks on the
  workstations.  We have posted larger signs saying that screen locks are
  prohibited.


week of May 16-May 22, 1994:
  
  Parker started denying much of its NFS service again and was rebooted
  on 5/17 at 11:30am.  Users with home dirs on Parker were affected.  
  The problem was cleared by 11:45am.

  We disciplined perhaps a dozen students for leaving screen locks and
  playing games on the workstations.  We have posted larger signs saying 
  that these pehaviors are prohibited.



Improvements, Spring '94:

   - converted Parker HPs to dataful (improved performance)
   - freed 2 GB for home directory disk space
   - freed 670MB for CS162
   - freed 400MB for CS250
   - improved password file distribution routines
   - installed a Sun4 file server
   - decreased load on Ara file server; added second cluster server (Congo)
   - moved mail server off of Cory onto a dedicated system
   - worked with individual instructors to customize setups (CS61A, CS162)
   - decreased NFS dependency: duplicated some critical software, moved
     some disks to extra servers
   - decreased NFS network load: installed automouters ("amd")
   - set up X11 (xdm) default user interface on HPs; Vue is an option
   - set up keymappings for CS clases on HPs;  provide help files for 
     users about keyboard and windowing on each architecture
   - obtained software for new CS2 course
   - obtained HSPICE
   - ported Berkeley scm, SPIM, etc to the HPs
   - ported Berkeley scm to PCs and Macs (Gambit)


Problems, Spring '94:

   - user disks filling up; downtime to fix that
   - system crashes (mostly Danube, once Bard)
   - performance delays (mostly from server and network bottlenecks)
   - failed to survey the faculty in December about Spring computing needs


Improvements (pending), Fall '94:

   - expanding to 4 nets (2 in Soda, 2 in Cory)
   - adding 100+ HP workstations
   - moving CS61B and CS61C from WEB to Soda labs
   - CAP improvements to 199 Cory, adding networking for wkstns
   - adding "Instructional Reports" documentation of events and performance
   - porting nachos to HPs


                     University of California at Berkeley
           Department of Electrical Engineering & Computer Science
                     Instructional Systems Support Group


	     Report on EECS Instructional Computing Facilities
	     -------------------------------------------------
				April 1994


by: Kevin Mullally, Manager of EECS Instructional Systems


week of May 30-April 3, 1994:

  This was the week of Spring Break.  Systems work was done to create more
  user disk space, and to reduce the loads on Ara and Cory.  A new file
  server, Bard, was added.  Bard is a Sun4 that was given to Instruction
  by the Robotics group (it was formerly Zeus and Zephyr).

  The /share/a, /share/b and /cad partitions were moved from Ara to Bard,
  and a second file server (Congo) was set up for the Ara clients.  The
  central mail service for the Instructional system in Cory Hall was moved
  off of Cory to a decicated mail server machine (Pasteur).

  A new 670 MB filesystem was provided for sole use by CS162.  The /cad
  filesystem was increased by 300 MB for use by CS250.

  Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am.
  Cory crashed and rebooted on the 4/3 at 5:30pm.


week of April 4-10, 1994:

  Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm.
  Cory crashed and rebooted on the 4/9 at 6pm.

  A number of users complainted about email forgeries.  

  We received a complaint from outside the university about an offensive
  posting to net news.  Legal action was threatened.  We disabled the
  account of the student who was responsible.
  

week of April 11-17, 1994:
  
  Bard was down from 11am-7pm on 4/11 due to a hardware failure.  These
  symtoms resulted:
    - X11 not available for Ara clients (fixed around 3pm)
    - Workview not available (fixed about 6pm)
    - scm not available (fixed about 6pm)
    - some users' logins would fail (fixed about 6pm)
    - Hspice not available (fixed about 7pm)
  
  Volga crashed and rebooted itself on the afternoon of 4/12.
  Danube crashed and rebooted itself on 4/14 at 4am.
  The Snake clients have been experiencing reboots (from crashes or from 
  users?) frequently at night.  This was observed after the fact to have
  happened on 4/6 at about 10:05am,  4/9 at about 11:10am and 2:05pm, and
  4/16 between 12am and 11pm.

  The Internet address for Moby was changed, and that broke the license for
  "codecenter" for a couple of days. 

  There were a number of stolen mouse balls; most were returned.  This is
  happening with increasing frequency.

  5-6 users complainted about people hogging the workstations, either by
  playing games or running screen locks.  These behaviors are prohibited,
  but it is difficult to police against them.


week of April 18-24, 1994:

  Danube crashed and rebooted:
	4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm
  Volga crashed and rebooted 4/22 at 11am.
  Po crashed and rebooted 4/23 at 4pm.
  There were a number of "borrowed" mouse balls, one stolen mouse.
  The /home/e filesystem (contains the CS9E accounts) was full for about 24
  hours between 4/20 and 4/21.  The sys admins cleared some space.
  
week of April 25-May 1, 1994:

  Danube crashed and rebooted:
	4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm

  Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk).

  Parker started denying much of its NFS service on 4/29 at about 3pm.
  At 4:30pm it was rebooted to clear the problem.   Users with home dirs
  on Parker were affected:  their attempts to compile (gcc) and run
  HSPICE were thwarted by "NFS time out" and "can't start a new shell"
  error messages.  The problem was cleared by 4:45pm.

  Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am

  Users are complaining increasingly about other users who leave screen
  lock programs running on idle workstations.  There is no safe way for 
  the sys admins to detect this and prevent it, so we must rely upon
  peer pressure and reports to prevent it.

                     University of California at Berkeley
           Department of Electrical Engineering & Computer Science
                     Instructional Systems Support Group


	     Report on EECS Instructional Computing Facilities
	     -------------------------------------------------
				   May 1994


by: Kevin Mullally, Manager of EECS Instructional Systems


week of May2-May 8, 1994:

  This is the final week of the semester, before exams, and the workstation
  labs are extemely crowded.   There have been dozens of complaints to
  "root" about other students who leave screen locks running on our work-
  staions for hours.   We have temporarily turned off some accounts for 
  that when we catch them.

  Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm.

  Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am.

  On May 2, we disabled further logins to Danube, which should reduce the 
  crashes (to 0, we hope).  The users will get an explanation when they try 
  to login.  They can still access their directories on Danube over the net 
  from other systems.  Danube is running an older version of the operating
  system, and until we upgrade that, the vendor is unable to diagnose the
  cause of the crashes.

  Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm.

  We are pushing DEC to look closely at the diagnostic kernel that we have
  installed on Po, in the hopes that they will identify and fix the cause
  of these crashes.   We suspect that they are triggered by socket-related
  coding in the final CS162 project, and we have asked those students to
  avoid executing their "nachos" code on Po.
	
  Torus started denying much of its NFS service on the evening of 5/1.  It
  failed to reboot properly the next morning; it seems to have a bad system
  disk.   We replaced it with another disk (taken from a workstation) and
  the CS184 and CS284 home directories were available again by about 4pm 
  on 5/2.   The Iris workstations were available again by about noon on
  5/3.


week of May 9-May 15, 1994:

  Po crashed and rebooted: 5/11 at 8am.

  Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm.

  Danube crashed and rebooted: 5/09 at 10pm.   Users had been allowed to
  login there again, by accident.   While logins have been denied, Danube
  has not crashed.  Logins are now denied again.

  We disciplined perhaps a dozen students for leaving screen locks on the
  workstations.  We have posted larger signs saying that screen locks are
  prohibited.


week of May 16-May 22, 1994:
  
  Parker started denying much of its NFS service again and was rebooted
  on 5/17 at 11:30am.  Users with home dirs on Parker were affected.  
  The problem was cleared by 11:45am.

  We disciplined perhaps a dozen students for leaving screen locks and
  playing games on the workstations.  We have posted larger signs saying 
  that these pehaviors are prohibited.