EECS Instructional Support, University of California at Berkeley
[ ISG home page ] [ who we are ] [ send us email ] [ search ] [ FAQ ]

                            College of Engineering
                   EECS Instructional Systems Support Group

  /share/b/pub/reports/manager/Spring_1998
  June 22, 1998

                EECS Instructional Computing - Review and Plans
		-----------------------------------------------
			       Spring 1998


  Standard Procedures:

    computer accounts:  We use  2  types  of  accounts  for  the  students:

     "class" accounts and "named" accounts:

        Class  accounts  ("cs61a-xx",  etc)  are  typically  used  when  an
        instructor  needs to control the login environment of the students'
        accounts or when the students need group accounts.  Class  accounts
        are  deleted  after  the  semester ends.   We give a stack of class
        account forms to the instructor,  and  the  instructor  distributes
        them.

        Named accounts ("jdoe", etc) are given whenever class accounts  are
        not.   All  EECS/CS  majors  has permanent named accounts, and non-
        majors get named accounts that expire at the end of  the  semester.
        Students  request  named  accounts  by  logging in as 'newacct' (no
        password) in 271 Soda or 105 Cory.

    Cardkey procedures:  Students who do not have an EECS cardkey should go
    to  391  Cory  (8:30am-noon,  1pm-4:30pm) to sign up for a new cardkey.
    Students who already have an EECS cardkey can go to either 391 Cory  or
    387  Soda.    We use TeleBears enrollment lists to pre-approve students
    for the access they need.  Instructors are encouraged to  send  updated
    enrollent lists to newacct@cory.eecs as the enrollments change.

UNIX labs:

    Room 199 Cory now houses our 20 new DEC Alpha workstations,  which  are
    used  primarily  by  EE  classes for HSPICE and other applications.  We
    have finally retired all the  DEC  mips  workstations  running  Ultrix.
    There  are still 2 Ultrix cpu servers available (cory.eecs and po.eecs)
    for any legacy applications.

    Room 310 Davis becomes the new lab for CS61A on Jan 20.  We are  moving
    30  HP  workstations  from  275 Soda to 310 Davis.  CS61A students will
    have 24-hour cardkey access to 310 Davis, as  well  as  to  the  second
    floor  labs  in  Soda, but the class will no longer need to reserve lab
    time in 275 Soda.  This frees up 275 Soda for 24-hour drop-in access.
    Room 275 Soda will house 30 Intel PCs that were donated to  Instruction
    through  a grant that Prof Canny won from Intel in Spring 1997.   These
    PCs run Solaris x86.  Solaris  x86  is  our  preferred  UNIX  for  PCs,
    becuase of its performance and support.  It runs all Instructional UNIX
    applications except Matlab.

Windows NT and Mac OS labs:

    Room 119 Cory, formerly a DEC workstation lab, will  soon  have  23  HP
    Vectra  and Intel PCs that are equiped for video editing and multimedia
    applications.  The Vectras were donated to us through a grant that Prof
    Zakhor  won  from  HP  in  August  1998, and the Intel PCs were donated
    through a grant that Prof Canny won from Intel in  Spring  1997.   This
    will  be  a  general  drop-in  lab for students with NT accounts in the
    INSTRUCTIONAL domain, and it may be reserved at times for specific mul-
    timedia classes.
  
    Room 111 Cory now has 8 Macintosh  PowerPCs  with  3D  Quicktime  video
    boards.  Four of those Macs were recently  donated through a grant that
    Prof Barsky won from Apple.  The systems are intended primarily for use
    by  CS39A  and  related  courses,  but  other  Instructional  users may
    schedule access as well.  EECS1 will probably  run  LogicWorks  in  111
    Cory this semester.

Electronics labs:

    353 Cory was renovated and dedicated in May 1998.  Generousd onations 
    from Rockwell International provided for new flooring, lab furniture,
    electrical and netork conduit and LCD overhead projection.  The lab 
    will be used by several EE classes.

Notable events this semester:
    -----------------------------------------------------------------------
    (June 1-5)  Pasteur.EECS.Berkeley.EDU, the mail server for the EECS
    instructional machines, was upgraded from an old DEC Ultrix system to
    a new dual-processor Pentium PC running Solaris X86.  IMAP and POP3
    server support were added.  Increased security restrictions were added,
    limiting email filter programs to: filter, procmail, slocal and vacation.
    A backup mail server was setup to spool incoming mail is Pasteur.EECS is
    disabled.
    
    Mail delivery was delayed during some of this time.
    -----------------------------------------------------------------------
    (May 21-22) Parker.EECS was down Thu  3-5 p.m. and Fri May 22.
    to reinstall the operating system.
    -----------------------------------------------------------------------
    (May 19) Campus-wide POWER OUTAGE: 12pm - 6pm.  Most of the Instructional 
    servers were back online by 8:30pm.  Cory.EECS was down for 2 days.
    -----------------------------------------------------------------------
    (May 15)  Moved the "www-inst.eecs" Web Server from parker.eecs (an
    HP server) to saidin.eecs (a more powerful DEC Alpha server).  CGI 
    programs and the search engine were not available until for a week or 
    2 later.
    -----------------------------------------------------------------------
    (May 15) Parker.eecs experienced frequent crashes.  TMA and Suprem software
    were unavailable for most of the day.
    -----------------------------------------------------------------------
    (Apr 29) server downtime: saidar.eecs (9:20am-10:30am)
    Symptom: failed logins and hung-up login sessions for accounts with home 
    directories on saidar.eecs
    Cause: network maintenance caused unintentional interruption
    Login Downtime: about 9:20am-10:30am
    -----------------------------------------------------------------------
    (Apr 15) server downtime: Franklin.CS (10:30pm-8am)
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: unknown (required a reboot of Franklin)
    Login Downtime: about 10:30pm-8am
    -----------------------------------------------------------------------
    (Apr 14) Franklin's load was excessive and users could not access
    nfs mounted disks.
    Cause: A cs61c user was running multiple copies of a program
    designed to figure out the amount of swap on the machine, and doing so
    exhausted the machine's resources.
    Solution:  cs61c users have been prevented from logging in
    to Franklin.CS and Parker.EECS.
    -----------------------------------------------------------------------
    (Apr 14) server downtime: Franklin.CS (@4:00-8:15 a.m.)
    (Apr 12) server downtime: Franklin.CS (@8:00-9:45 p.m.)
    Symptom: Franklin quits exporting its filesystems; users
    in cs152 and cs162 cannot find their home directories.
    -----------------------------------------------------------------------
    (Apr 13) server downtime: Franklin.CS scheduled for reinstallation 4/14
    Symptom: franklin has been increasingly unstable the last few
    weeks.  A reinstallation of the operating system will fix this.
    Affects: home directories for cs162 and cs152 will be offline
    as well as applications such as emacs-19.34 in /share/d and powerview.
    Login Downtime: about 5pm-10pm
    -----------------------------------------------------------------------
    (Apr 09) server downtime: Franklin.CS (10:30pm-11:15pm) 
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: unknown (required a reboot of Franklin)
    Login Downtime: about 10:30pm-11:15pm
    -----------------------------------------------------------------------
    (Apr 08) server downtime: Franklin.CS (midnight-9:00am)
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: unknown (required a reboot of Franklin)
    Login Downtime: about midnight-8:50am
    -----------------------------------------------------------------------
    (Apr 07) slow network, delayed logins on 2nd floor of Soda (6pm-7pm)
    Symptom: very slow logins and file access on Franklin and on workstations 
    on the second floor of Soda
    Symptom: Franklin was rebooted 
    Cause: the 42 subnet was saturated by a flood of packets directed at one 
    of our computers from several off-campus sites.  This attack has been 
    reported to campus security staff for further analysis.  
    Login Downtime: approx 6:30-7pm
    -----------------------------------------------------------------------
    (Mar 31) server downtime: Franklin.CS (12midnite-9:00am; 11:30am-1:30pm)
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: the file server franklin.cs stopped exporting its filesystems 
    ("NFS server Franklin not responding") sometime about midnight, and again 
    at 11:30am.
    Response: We have installed a new HP-UX patch that claims to fix a bug 
    caused by the last patch...
    Login Downtime: Tues Mar 31 (about 12midnite-9:00am, 11:30am-1:30pm)
    -----------------------------------------------------------------------
    (Mar 29) server downtime: Franklin.CS (1am-12:15pm)
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: the file server franklin.cs stopped exporting its filesystems 
    ("NFS server Franklin not responding") sometime after midnight
    Login Downtime: Sun Mar 29 (about 1am-12:15pm)
    -----------------------------------------------------------------------
    (Mar 27) server downtime: Cochise.CS (1:30-3:40pm)
    Symptom: failed logins for classes on cochise.cs 
    (cs61a, cs61b, cs61c, cs186, etc) 
    Symptom: Accounts that had logged in before 1:30pm continued to have 
    access to their files between 1:30pm and 3:20pm.
    Symptom: no printing in Soda Hall
    Cause: the file server cochise.cs froze up at 1:30pm.  It was 
    overwhelmed with processes and was rebooted at 3:20pm.   
    Response: We have installed a new kernel on Cochise in an effort to 
    stop the problems.  We also plan to move the /usr/sww filesystem from 
    Cochise to another server, to reduce the load on Cochise.
    Login Downtime: Fri Mar 27 (1:30-3:40pm)
    -----------------------------------------------------------------------
    (Mar 27) server downtime: Franklin.CS (4:30 & 6:30pm)
    Symptom: failed logins for classes on franklin.cs (cs152, cs162, etc)
    Symptom: the login sessions froze up for users who were already logged in 
    Cause: the file server franklin.cs stopped exporting its filesystems 
    ("NFS server Franklin not responding") at 4:30pm and was rebooted.   
    Repeated at 6:30pm.  
    Login Downtime: Fri Mar 27 (4:30pm-5pm)(6:30-7pm)
    -----------------------------------------------------------------------
    (Mar 16) Problems in C61A lab (Sat/Sun Mar 14/15)
    Symptom: no logins or printing in 310 Davis
    Cause: the dv310.cs server kept crashing; was rebooted Sat 5pm, Sun 3am, 
    Sun 8pm, Mon 8am.
    Davis 310 Downtime: Sat Mar 14 (1:30pm-4pm); Sun Mar 15  (1:30am-8pm)
    -----------------------------------------------------------------------
    Symptom: no printing in Soda Hall
    Cause: (1) the cochise.cs server had a full disk, caused by excessive 
    error logging from some faulty workstations.  This was fixed on Sat at 
    7pm; (2) the cochise.cs server froze up on Sun and was rebooted on Mon 
    at 8am.
    Printer Downtime: Sat (?-7pm); Sun/Mon (10pm-8:30am)
    -----------------------------------------------------------------------
    Symptom: failed logins for classes on cochise.cs
    Cause: the cochise.cs server froze up on Sun and was rebooted on Mon at 8am.
    Login Downtime: Sun/Mon (10pm-8:30am)
    -----------------------------------------------------------------------
    (Mar 09) Problems accessing /usr/sww in Cory Hall
    At about 4pm Nexus.eecs was cut off from the net to the Instructional 
    and some other subnets in Cory Hall.  Nexus serves /usr/sww and 
    /usr/eesww to the UNIX systems in Cory Hall.   This means that rlogins, 
    email and access to some application programs have been blocked for
    Po, Cory, Pasteur, Saidar, Parker, Saidin, Volga, 199 and 105 Cory
    -----------------------------------------------------------------------
    (Mar 05) Problems with CAD tools?
    We have had reports of errors running HSPICE and the CAD programs 
    (such as mwaves, irsim and ext2sim), and we have been making changes
    in an effort to make it easier for users to find these programs on
    our various UNIX environments.  Details are in /share/b/pub/cad.help.
    -----------------------------------------------------------------------
    (Feb 12)Saidar & Pasteur were down for 30 minutes
    Saidar (home dirs and /share/ in Cory) and Pasteur (mail server)
    were down from about 2:50pm - 3:20pm on Thu Feb 12.    Symptoms:

    1) login sessions hung up for users whose home dirs are on Saidar
    2) email was inaccessible
    3) new logins hung up (waiting for the mail server)

    This was the unexpected result of an intentional ("3-minute") reboot 
    of Saidar, which was to clear a lesser problem.  We apologize for the 
    inconvenience.
    -----------------------------------------------------------------------
    (Feb 4) New restrictions on "pasteur" POP server
    Starting Feb 6, the "pasteur.eecs.berkeley.edu" POP server will
    require that you configure your host computer name to be an EECS 
    or CS computer, such as "cory.eecs.berkeley.edu".  You do this in 
    your POP client (Eudora, etc), so that the "From" line in your 
    outgoing mail says that you are using an EECS or CS computer.

    Otherwise, our POP server will reject your requests to connect to 
    it and download your mail.  This is an added security feature.
    -----------------------------------------------------------------------
    (Feb 1) Network Failure in Cory
    (6pm) From about 6pm on Sun Feb 1 until 8am on Mon Feb 2, the 134
    and 138 subnets were cut off from some other systems, notably the
    /usr/sww server (Nexus).   Some symptoms:

- the Instructional Web server (http://inst.eecs) was unreachable
      from most other computers 
    - some email filter programs failed on the Instructional mail server 
      (pasteur.eecs)
    - logins to home directories on po.eecs and saidar.eecs were so slow
      that they appeared to fail
    - services for application programs over the 134 or 138 networks 
      failed (UNIX commands would hang up, license servers would be
      unreachable)

    This was caused by a bug in the new Cory Hall ATM switch, and we are
    told that the vendor is providing a solution.
    -----------------------------------------------------------------------
    (Feb 1) Cochise.CS server crash
    (Sun afternoon) Cochise.cs crashed sometime this afternoon and was
    rebooted at about 7pm.  The symptoms included:

    - login failures for users with home directories on cochise.cs
      (cs61* and other class accounts)
 	- loss of the programs in /usr/sww on the Instructional HP systems
      in Soda Hall (also prevented some logins)
    - loss of the YP password service on some Instructional systems in
      Soda Hall (also prevented some logins)

    We do not know why Cochise crashed (second time in 7 days); HP was
    unable to diagnose the cause of a similar problem on Franklin last
    semester.  We suspect that we either need to add or remove a HP-UX 
    kernel patch; the vendors does not cross-check the various patches 
    for compatability, so the poor customer (us!) is left guessing.
    -----------------------------------------------------------------------
    (Jan 26) Network delays in Cory
    (4pm) Cory Hall is having network problems.   Some systems cannot
    access the /usr/sww server.   That is causing 

    - slow logins on our Cory Hall systems and for users whose 
      home directories are on saidar.eecs or po.eecs
    - delays in email delivery through pasteur.eecs (when a filter 
      such as /usr/sww/lib/mh/slocal is needed)

    Departmental staff are working with the vendor of our new network
    hardware to correct the problem.

    (6:30pm) We believe this problem has been corrected now.
    -----------------------------------------------------------------------


For additional information, please contact us or our WWW  home  pages  via:
http://inst.eecs.berkeley.edu

    Kevin Mullally, Manager                   Ferenc Kovac, Associate Manager
    EECS Instructional & Electronics          EECS Instructional & Electronics
    378 Cory Hall, (510) 643-6141             377 Cory Hall, (510) 642-6952
    kevinm@eecs.berkeley.edu                  ferenc@eecs.berkeley.edu


source: /share/b/pub/reports/manager/Spring.1998 - revised June 22, 1998