College of Engineering EECS Instructional & Electronics Support Group /usr/pub/Instructional_Plans /usr/pub/reports/manager/Fall_1998 January 31, 1999 EECS Instructional Computing - Review and Plans ----------------------------------------------- Fall 1998 References about Common IESG Services: http://inst.eecs.berkeley.edu/ Instructional WEB server, links to class home pages, student home pages, information about Instructional UNIX accounts, modem access, cardkey access, computers and labs, software. http://www-iesg.eecs.berkeley.edu/ Electronics Support WEB server, links to information about electronics labs, AV services and Windows NT services. Plans: Prof Sequin plans to reclaim some Instructional labs in Soda for grad student offices and upgrade other Instructional labs for upper division labs such as multimedia authoring and 3D animation. The plan would be implemented over a number of years. We are starting to search for alternate labs on campus for CS Lower Division classes. Profs Neureuther and Lee have requested that room 105 Cory be converted to an electronics lab for use by a new course, EECS20, in Spring 1999. Software Changes this summer: - upgraded UNIX password service to NIS+ for improved performance and security. - upgraded all UNIX PCs to Solaris x86 v 2.6 with NFS3. - upgraded HP-UX and DEC Alpha systems to NFS3. - converted the Pasteur mail server from an old DEC system to a new dual-processor PC running Solaris v 2.6. - added IMAP and POP3 capability to Pasteur mail server. - adapted Prof Hilfinger's versions of "submit" and "grade" for use by any class on UNIX. - reviewed and enhanced UNIX network security (o/s patches and upgrades) Lab Changes this summer: 347 Soda: Instruction has vacated; formerly shared predominantly by CS184 and CS152. The 14 HPs from 347 Soda, which are high-end graphics ] systems, have been moved to 273 Soda. CS184 will continue to use 349 Soda; CS152 may use the HPs in 273 Soda or new NT systems in 119 Cory. 310 Davis: 9 other HP workstations have been moved from the second floor of Soda to 310 Davis, the CS61A lab, for a total of 38 systems. 105 Cory: Older HP workstations have been removed and the lab is being converted into electronics lab for EECS20. EECS20 is a new course this Fall, taught by Prof Lee, and will require computing for 300 students in Spring 1999. 27 new HP Kayak PCs running NT are being installed. The HP PCs were donated via a grant won by Prof Zakhor. 117 Cory: 16 of the HP UNIX workstations formerly in 105 Cory will be moved to 117 Cory. 7 other HP UNIX workstations are in 199 Cory. These UNIX systems will continue to meet the needs of EE classes that run SUPREM and other UNIX applications. 353 Cory was renovated and dedicated in May 1998. Generous donations from Rockwell International provided for new flooring, lab furniture, electrical and network conduit and LCD overhead projection. The lab will be used by several EE classes. 140 Cory and 143 Cory are being upgraded and repainted. New Cardkey System In Cory: New cardkey readers will be installed in Cory Hall in 1998-99. Alex Para is the project manager. The third floor of Cory will be converted in the Fall, and the first floor is scheduled to be converted in January 1999. These will require a different type of cardkey that is used now, which means that users who have acccess to both Soda and Cory Halls will need 2 different cardkeys. Users will be charged $10 ($5 refundable) for the new Cory cardkeys. For additional information, please see http://inst.eecs.berkeley.edu or contact us. Notable events this semester: ----------------------------------------------------------------------- (Dec 22) workstations off; SERVERS remain on We have shut down most workstations for the holiday, until about Jan 11. These UNIX servers will remain up and available for remote access ("telnet", "rlogin", "ssh"): cory.eecs.berkeley.edu (DEC UNIX) parker.eecs.berkeley.edu (HP-UX) torus.cs.berkeley.edu (SOLARIS x86) ----------------------------------------------------------------------- (Nov 25) Saidar down 4:15 - 5:20pm symptom - home directories not accessible for named accounts, ee class accounts cause - saidar stopped exporting and we rebooted it ----------------------------------------------------------------------- (Nov 25) - Franklin down midnight - 8:00a.m. symptom - home directories not accessible for some named accounts, CS162/CS164 class accounts cause - franklin stopped exporting and we rebooted it ----------------------------------------------------------------------- (Nov 20) - Saidar down 11:00 - 11:30 symptom - home directories not accessible for named accounts, ee class accounts cause - saidar stopped exporting and we rebooted it ----------------------------------------------------------------------- (Nov 17) - Zip Drives on Solaris PCs are once again functional Please read /usr/pub/Solaris.help for more info on how to use the drives. ----------------------------------------------------------------------- (Nov 15-16) - franklin down - 23:45 PM - 01:30 AM symptom - unable to log in for upper-div. CS class & named accounts cause - /home/ll failed to time out (i.e., same deal as usual) ----------------------------------------------------------------------- (Nov 13) - cochise down 12:45 - 2:15 AM symptom - unable to log in for class (cs61abc) accounts cause - repeated kernel panics on cochise probably from disk usage being more than what cochise can handle ----------------------------------------------------------------------- (Nov 5) - Web Server downtime 9:00am-9:30am on Fri 11/6 Moving contents of web server to a new disk so that we will avoid any further disk crunches. ----------------------------------------------------------------------- (Nov 2) - Saidar crash, 11:30am-12:30pm Symptom: home directories inaccessible, can't login. Affected: named accounts, ee class accounts, DEC Alpha workstations in 199 Cory Cause: faulty memory, probably; we are working with the vendor to identify the problemm. Duration: about 11:30am-12:30pm ----------------------------------------------------------------------- (Oct 26) - network failure in 117 Cory and 2 HP workstations in 199 Cory Symptom: existing login sessions freeze up; new logins denied Affected: login attempts to all HP workstations in 117 Cory and 2 HP workstations in 199 Cory Cause: Failure of a network hub Duration: about 1:30pm-3pm ----------------------------------------------------------------------- (Oct 30) - modems will be down for maintenance on Mon Nov 2 The EECS Instructional modems (642-0070 and 642-6679) will be unavailable during the day on Mon Nov, 2 while they are tested and repaired. We have received many reports of problems in the last month. Faulty modems in the modem pool have failed to answer; passwords have not worked; connections have been lost soon after the modem answers. Apparently the modem bank lost power altogether for several hours on Wedsnesday Oct 28. We hope these problems will be corrected on Monday. ----------------------------------------------------------------------- (Oct 27) - Pasteur (mail server) unresponsiveness Symptom: can't check mail on any Instructional workstation or via POP or IMAP; if you check mail on login, then you can't login (unless you cancel the mail check). some workstations refused logins completely as well (pasteur is a backup NIS+ server.) Affected: all EECS Instructional users Cause: pasteur started hundreds of mail processes to attempt to deliver the mail that was failing to get to saidar -- each one caused the available CPU cycles to be reduced to each process, and eventually the process table was filled. eventually also pasteur ran out of memory entirely. Duration: 10:00PM - 11:00PM ----------------------------------------------------------------------- (Oct 27) - Saidar crash from memory failure Symptom: home directories inaccessible, can't login. Affected: named accounts, ee class accounts, DEC Alpha workstations in 199 Cory Cause: saidar ran out of memory .... again Duration: 7:45 to 10:45 PM ----------------------------------------------------------------------- (Oct 26) - network failure in 117 Cory and 2 HP workstations in 199 Cory Symptom: existing login sessions freeze up; new logins denied Affected: login attempts to all HP workstations in 117 Cory and 2 HP workstations in 199 Cory Cause: Failure of a network hub Duration: about 1:30pm-3pm ----------------------------------------------------------------------- (Oct 25) - franklin.cs was down twice on Sun Oct 25 Symptom: "no home directory" Affected: CS162, CS164, users with home directories on /home/{hh,ii,jj, kk,ll,mm,nn,pp} Cause: Recurrent freezing of a disk on the differential SCSI bus. Duration: 2:30am-12:30pm, 5:10pm-5:35pm Response: We will replace Franklin with a new file server soon; we have set up a PC for it and have ordered a new disk tower. ----------------------------------------------------------------------- (Oct 21) - loss of power in Cory labs, loss of access to servers (3-4:15pm) Symptoms: all power died in 1st floor Cory labs loss of access to MAIL server (pasteur.eecs) loss of access to WEB servers (http:/www-inst.eecs, http:/iesg.eecs) loss of access to UNIX servers (cory.eecs, parker.eecs, po.eecs, saidar.eecs) loss of access to NT servers (\\fischer, \\ntsww) no UNIX and NT home directories for many users Affected: most Instructional UNIX and NT accounts Cause: power failure in Cory Hall first floor ----------------------------------------------------------------------- (Oct 11) - cochise.cs crashed (4-4:30pm) Symptoms: no home directories, active logins froze up loss of access to /usr/sww for Instructional HP UNIX systems in Soda and Davis Halls Affected: CS61A, CS61B and others Cause: Cochise.eecs file server crashed, perhaps due to excess NFS activity. Response: We are working to replace the old NFS servers with a new server using a RAID disk array by Jan 1999. ----------------------------------------------------------------------- (Oct 11) - cochise.cs was down Sun Oct. 11 2:15 - 4:15 PM Symptom: no home directory (class accounts unavailable) can't run emacs/java/netscape on HPs (/usr/sww on HPs unavailable) Affected: CS61A,B,C, CS184, CS186, other users on /home/{aa,bb,cc,dd,ee,ff,gg,qq,rr} Cause: Same as franklin's recurrent disk bug, this time affecting /home/bb A bunch of HP workstations could not access /usr/sww on cochise after the left-hand disk tower was power-cycled; after these workstations were rebooted, everything seemed fine again. --brg@cory.eecs ----------------------------------------------------------------------- (Sep 26) - franklin.cs was down Sat Sep 26, 10:15-10:45 am, 11am-noon (Sep 24) - franklin.cs was down Thu Sep 24, 12:15pm-1:30pm. Symptom: "no home directory" Affected: CS162, CS164, users with home directories on /home/{hh,ii,jj, kk,ll,mm,nn,pp} Cause: Recurrent freezing of a disk on the differential SCSI bus. History: It's not always the same disk, but it causes the entire bus to freeze. It started happening last spring and has increased in frequency. It is probably load-related (didn't happen over the summer). Last spring, we asked HP tech support to look at it. They did not diagnose it; the only course seems to be to replace parts until it stops. We replaced the SCSI controller (the entire motherboard in fact) on Sep 15. Later, we suspected one disk and took it out of service. Nevertheless, the problem keeps happening. Next, we may focus our attention on the 2 disk expansion towers and their power supplies. In parallel, we will work to replace the entire file server. (kevinm@eecs) ----------------------------------------------------------------------- (Sep 22) - Saidar.eecs was down Mon Sep 21, 12:15pm-4:30pm. Symptom: "no home directory" when you login; "command not found" errors Affected: users with home directories on /home/{b,c,d,e,f}; computers that use /share/b from Saidar (UNIX computers in 199 & 117 Cory, cory.eecs, parker.eecs) Downtime: Mon Sep 21, 12:15pm-4:30pm Cause: The problem was the apparent failure on saidar.eecs of all 5 disks on one channel of the RAID controller. First we investigated possible hardware failures. In fact, the cause seems to have been a transient software problem. It seems that the RAID controller detected a problem on the channel and shut off all the disks. We were able to restore the disks by resetting the controller and running parity checks on the RAID logical drives (each is a collection of disks that look like one big disk to the operating system). The RAID parity checks and subsequent UNIX file checks took about 45 minutes for each of the 3 RAID logical drives. ----------------------------------------------------------------------- (Sep 22) - How to access the EECS modems and home IP 1) Get a password for the modems by using the on-line service ' telnet home-ip.berkeley.edu' 2) Students enrolled in EECS classes automatically have dialin access to 642-6679, 642-0070, 643-9600 3) See http://inst.eecs.berkeley.edu/modems.html for more info. ----------------------------------------------------------------------- (Sep 22) - UNIX email and restrictions on "pasteur" POP server In June 1998, new security restrictions were implemented on the pasteur.eecs mail server. Here are some restrictions and usage tips when using POP or IMAP: What's a mail server? "pasteur.eecs.berkeley.edu" is the EECS Instructional email server. On the Instructional UNIX systems, programs such as "pine" and "mailx" access Pasteur directly. On NT and Mac systems, programs such as "Eudora" can read your email from Pasteur using the POP or IMAP protocols. In Eudora, you enter "pasteur.eecs.berkeley.edu" as your mail host and enter your UNIX account name as user name. You can also enter anything you want in Eudora as your own computer name, which is where replies to your outgoing mail are sent. See below for restrictions on that. Setting your computer name: Pasteur requires that you configure your own computer name to be an EECS or CS computer, such as "cory.eecs.berkeley.edu". You do this in your POP client (Eudora, etc), so that the "From" line in your outgoing mail says that you are using an EECS or CS computer. Otherwise, our POP server will reject your requests to connect to it and download your mail. This is an added security feature. Your .forward file: ...may not work like it used to. Specific problems: 1. The following filter programs are the only ones supported: filter, procmail, slocal and vacation. If there is something you need to run, contact root@cory 2. .forward files may not be symbolic links 3. If you use procmail and your .forward file looked like: "|IFS=' '&&exec /usr/sww/bin/procmail -f-||exit 75 #login" it needs to be changed to: "|/usr/sww/bin/procmail -f- #login" 4. '||', '$' and '&&' are no longer valid in .forward files For more information, please see /usr/pub/email.help, send email to root@cory.eecs.berkeley.edu or visit 384/386 Cory or 333 Soda. ----------------------------------------------------------------------- (Sep 16) - Franklin.cs filesystems were down, until about 9:45am (Sep 16) - Pasteur.eecs (mail server) was down, until about 9:30am The recurrent problem on franklin.cs also affected pasteur.eecs this morning, while pasteur waited for disk access to franklin. We will be replacing the disk controller and motherboard on franklin this week. ----------------------------------------------------------------------- (Sep 12) - Franklin.cs filesystems were down, 5:00pm-6:45pm (Saturday) (Sep 06) - Franklin.cs filesystems were down, 4:15pm-9:45pm (Sunday) The server franklin.cs stopped exporting its filesystems; this has been a recurring problem that has been impossible to diagnose. We will replace disk controllers soon in an effort to resolve it. ----------------------------------------------------------------------- (Sep 05) - Cory.eecs was down, 1:10am-1:45am 09/05 CORY.EECS (includes http://inst.eecs) was down from about 1:10am-1:45am for installation a larger swap disk. ----------------------------------------------------------------------- (Aug 27) - Franklin.cs was down, 10:45pm-11pm Franklin.cs was rebooted to clear a problem that prevented its file systems from being exported. ----------------------------------------------------------------------- (Aug 03) - Cochise downtime 4:30-5:30pm Tue Aug 4 Cochise will be down from 4:30 until about 5:30 on Tuesday, Aug. 4 to replace a bad disk drive. ----------------------------------------------------------------------- (Jul 21) - Cory.eecs changes into an OSF1/Alpha system on Mon July 27 On Monday July 27 at about noon, the computer name "cory.eecs" will be changed from the current Ultrix operating system running on a computer with a MIPS processor to a DEC UNIX operating system (also called "OSF1") running on a computer with a DEC Alpha processor. "Cory.eecs" will become the same computer as "saidin.eecs". Note that Ultrix binaries do not run on OSF/Alpha systems, so this will affect any programs you have compiled for Ultrix only. We will continue to support the Ultrix operating system on "volga.eecs". Please email root@cory.eecs.berkeley.edu if this creates any problems. ----------------------------------------------------------------------- (Jul 15) - Po.eecs downtime and password service changes July 16/17 PO.EECS: new computer --------------------- Po.eecs will become a new computer at about noon on Fri Jul 17. Po.eecs has the master password file, and you have been told to login there to change your password. Po.eecs is now a DEC Ultrix system; it will become a Solaris X86 system. The benefits are: faster computer, new password server software. Your current 'login' password will still be valid after this change. On Fri July 17, Po.eecs may be down (off the net) at times. From Thu July 16 - Mon July 20, we may prevent any password changes. The New Password Service ------------------------ The new password software is called "NIS+". It will allow users at the Solaris X86 PCs to change their passwords without logging in to po.eecs. Users on the HP, DEC and SGI systems will still have to login to po.eecs. But in all cases, you won't get "password file is busy" messages any more, and the changes will take effect within 5-10 minutes instead of 40-60 minutes. NIS+ has better efficiency and security features than our current password service. For more technical info about NIS+, please type "man nis+" on any of our Solaris systems (for lists, please see http://inst.eecs.berkeley.edu/clients). YOU'LL HAVE A SECOND PASSWORD: ----------------------------- In addition to the 'login' password that you now use, NIS+ uses a second 'secure RPC' (also called 'secret key' and 'NIS+ credential') password. The default 'secure RPC' password for all users is "nisplus". When logging into one of our Solaris X86 computers, you may see a warning message such as This password differs from your secure RPC password. or Password does not decrypt secret key for unix.291@Inst.nisplus. This is not a problem, but it will be an advantage to you to make your 'secure RPC' password be the same as your 'login' password. You can do that by typing logging into "po.eecs" or "torus.cs" and runing the Solaris X86 command "chkey". For example: % chkey -p Updating nisplus publickey database. Generating new key for 'unix.3232@Inst.nisplus'. Please enter the Secure-RPC password for jdoe: nisplus Please enter the login password for jdoe: {jdoe's password} This sets the 'secure RPC' password to match the 'login' password, so you won't have to type it when you change your password, shell or 'finger' information on a Solaris X86 system. We plan to install "wrapper" programs for the UNIX 'passwd', 'chsh and 'chfn' programs to automate the entry of the 'secure RPC' password, but you may still see messages about it when you use these programs. Please notify 'root@cory.eecs' if you have difficulty using these new programs. Use 'ssh' for better security: ----------------------------- Users on other UNIX computers will still need to run those commands on po.eecs. For added security, we recommend that you login into po.eecs using the "ssh" program rather than rlogin or telnet. "Ssh" is available on our UNIX computers and is used like "rlogin": ssh po.eecs -l {your_login} "Ssh" is commercial software that unfortunately is not available for free. It can be purchased for PCs and Macs: please see http://inst.eecs.berkeley.edu/usr/pub/ssh.help for details. ----------------------------------------------------------------------- (Jul 15) - /home/tmp on po.eecs will be unavailable Thu/Fri, July 16/17 /home/tmp on po.eecs will be unavailable from about 1pm Thu Jul 16 until 1pm on Fri Jul 17, while we move it to the new po.eecs. ----------------------------------------------------------------------- (July 01) - Saidar /home/d will be inaccessible from 8am-9:30am on July 1 /home/d will undergo full dumps between 8AM and 9:30AM on July 1. During this time the filesystem will not be mounted by any of our client systems. This was work that was not completed last Friday as originally scheduled. We apologize for the abrupt notice. ----------------------------------------------------------------------- Kevin Mullally, Manager Ferenc Kovac, Associate Manager EECS Instructional & Electronics EECS Instructional & Electronics 378 Cory Hall, (510) 643-6141 377 Cory Hall, (510) 642-6952 kevinm@eecs.berkeley.edu ferenc@eecs.berkeley.edu source: /usr/pub/reports/manager/Fall.1998 - revised January 31, 1999