University of California at Berkeley Department of Electrical Engineering & Computer Science Instructional Systems Support Group /usr/pub/reports/manager/Spring_1994 Report on EECS Instructional Computing Facilities ------------------------------------------------- Spring Semester 1994 by: Kevin Mullally, Manager of EECS Instructional Systems week of May 30-April 3, 1994: This was the week of Spring Break. Systems work was done to create more user disk space, and to reduce the loads on Ara and Cory. A new file server, Bard, was added. Bard is a Sun4 that was given to Instruction by the Robotics group (it was formerly Zeus and Zephyr). The /share/a, /share/b and /cad partitions were moved from Ara to Bard, and a second file server (Congo) was set up for the Ara clients. The central mail service for the Instructional system in Cory Hall was moved off of Cory to a decicated mail server machine (Pasteur). A new 670 MB filesystem was provided for sole use by CS162. The /cad filesystem was increased by 300 MB for use by CS250. Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am. Cory crashed and rebooted on the 4/3 at 5:30pm. week of April 4-10, 1994: Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm. Cory crashed and rebooted on the 4/9 at 6pm. A number of users complainted about email forgeries. We received a complaint from outside the university about an offensive posting to net news. Legal action was threatened. We disabled the account of the student who was responsible. week of April 11-17, 1994: Bard was down from 11am-7pm on 4/11 due to a hardware failure. These symtoms resulted: - X11 not available for Ara clients (fixed around 3pm) - Workview not available (fixed about 6pm) - scm not available (fixed about 6pm) - some users' logins would fail (fixed about 6pm) - Hspice not available (fixed about 7pm) Volga crashed and rebooted itself on the afternoon of 4/12. Danube crashed and rebooted itself on 4/14 at 4am. The Snake clients have been experiencing reboots (from crashes or from users?) frequently at night. This was observed after the fact to have happened on 4/6 at about 10:05am, 4/9 at about 11:10am and 2:05pm, and 4/16 between 12am and 11pm. The Internet address for Moby was changed, and that broke the license for "codecenter" for a couple of days. There were a number of stolen mouse balls; most were returned. This is happening with increasing frequency. 5-6 users complainted about people hogging the workstations, either by playing games or running screen locks. These behaviors are prohibited, but it is difficult to police against them. week of April 18-24, 1994: Danube crashed and rebooted: 4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm Volga crashed and rebooted 4/22 at 11am. Po crashed and rebooted 4/23 at 4pm. There were a number of "borrowed" mouse balls, one stolen mouse. The /home/e filesystem (contains the CS9E accounts) was full for about 24 hours between 4/20 and 4/21. The sys admins cleared some space. week of April 25-May 1, 1994: Danube crashed and rebooted: 4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk). Parker started denying much of its NFS service on 4/29 at about 3pm. At 4:30pm it was rebooted to clear the problem. Users with home dirs on Parker were affected: their attempts to compile (gcc) and run HSPICE were thwarted by "NFS time out" and "can't start a new shell" error messages. The problem was cleared by 4:45pm. Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am Users are complaining increasingly about other users who leave screen lock programs running on idle workstations. There is no safe way for the sys admins to detect this and prevent it, so we must rely upon peer pressure and reports to prevent it. week of May2-May 8, 1994: This is the final week of the semester, before exams, and the workstation labs are extemely crowded. There have been dozens of complaints to "root" about other students who leave screen locks running on our work- staions for hours. We have temporarily turned off some accounts for that when we catch them. Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm. Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am. On May 2, we disabled further logins to Danube, which should reduce the crashes (to 0, we hope). The users will get an explanation when they try to login. They can still access their directories on Danube over the net from other systems. Danube is running an older version of the operating system, and until we upgrade that, the vendor is unable to diagnose the cause of the crashes. Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm. We are pushing DEC to look closely at the diagnostic kernel that we have installed on Po, in the hopes that they will identify and fix the cause of these crashes. We suspect that they are triggered by socket-related coding in the final CS162 project, and we have asked those students to avoid executing their "nachos" code on Po. Torus started denying much of its NFS service on the evening of 5/1. It failed to reboot properly the next morning; it seems to have a bad system disk. We replaced it with another disk (taken from a workstation) and the CS184 and CS284 home directories were available again by about 4pm on 5/2. The Iris workstations were available again by about noon on 5/3. week of May 9-May 15, 1994: Po crashed and rebooted: 5/11 at 8am. Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm. Danube crashed and rebooted: 5/09 at 10pm. Users had been allowed to login there again, by accident. While logins have been denied, Danube has not crashed. Logins are now denied again. We disciplined perhaps a dozen students for leaving screen locks on the workstations. We have posted larger signs saying that screen locks are prohibited. week of May 16-May 22, 1994: Parker started denying much of its NFS service again and was rebooted on 5/17 at 11:30am. Users with home dirs on Parker were affected. The problem was cleared by 11:45am. We disciplined perhaps a dozen students for leaving screen locks and playing games on the workstations. We have posted larger signs saying that these pehaviors are prohibited. Improvements, Spring '94: - converted Parker HPs to dataful (improved performance) - freed 2 GB for home directory disk space - freed 670MB for CS162 - freed 400MB for CS250 - improved password file distribution routines - installed a Sun4 file server - decreased load on Ara file server; added second cluster server (Congo) - moved mail server off of Cory onto a dedicated system - worked with individual instructors to customize setups (CS61A, CS162) - decreased NFS dependency: duplicated some critical software, moved some disks to extra servers - decreased NFS network load: installed automouters ("amd") - set up X11 (xdm) default user interface on HPs; Vue is an option - set up keymappings for CS clases on HPs; provide help files for users about keyboard and windowing on each architecture - obtained software for new CS2 course - obtained HSPICE - ported Berkeley scm, SPIM, etc to the HPs - ported Berkeley scm to PCs and Macs (Gambit) Problems, Spring '94: - user disks filling up; downtime to fix that - system crashes (mostly Danube, once Bard) - performance delays (mostly from server and network bottlenecks) - failed to survey the faculty in December about Spring computing needs Improvements (pending), Fall '94: - expanding to 4 nets (2 in Soda, 2 in Cory) - adding 100+ HP workstations - moving CS61B and CS61C from WEB to Soda labs - CAP improvements to 199 Cory, adding networking for wkstns - adding "Instructional Reports" documentation of events and performance - porting nachos to HPs University of California at Berkeley Department of Electrical Engineering & Computer Science Instructional Systems Support Group Report on EECS Instructional Computing Facilities ------------------------------------------------- April 1994 by: Kevin Mullally, Manager of EECS Instructional Systems week of May 30-April 3, 1994: This was the week of Spring Break. Systems work was done to create more user disk space, and to reduce the loads on Ara and Cory. A new file server, Bard, was added. Bard is a Sun4 that was given to Instruction by the Robotics group (it was formerly Zeus and Zephyr). The /share/a, /share/b and /cad partitions were moved from Ara to Bard, and a second file server (Congo) was set up for the Ara clients. The central mail service for the Instructional system in Cory Hall was moved off of Cory to a decicated mail server machine (Pasteur). A new 670 MB filesystem was provided for sole use by CS162. The /cad filesystem was increased by 300 MB for use by CS250. Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am. Cory crashed and rebooted on the 4/3 at 5:30pm. week of April 4-10, 1994: Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm. Cory crashed and rebooted on the 4/9 at 6pm. A number of users complainted about email forgeries. We received a complaint from outside the university about an offensive posting to net news. Legal action was threatened. We disabled the account of the student who was responsible. week of April 11-17, 1994: Bard was down from 11am-7pm on 4/11 due to a hardware failure. These symtoms resulted: - X11 not available for Ara clients (fixed around 3pm) - Workview not available (fixed about 6pm) - scm not available (fixed about 6pm) - some users' logins would fail (fixed about 6pm) - Hspice not available (fixed about 7pm) Volga crashed and rebooted itself on the afternoon of 4/12. Danube crashed and rebooted itself on 4/14 at 4am. The Snake clients have been experiencing reboots (from crashes or from users?) frequently at night. This was observed after the fact to have happened on 4/6 at about 10:05am, 4/9 at about 11:10am and 2:05pm, and 4/16 between 12am and 11pm. The Internet address for Moby was changed, and that broke the license for "codecenter" for a couple of days. There were a number of stolen mouse balls; most were returned. This is happening with increasing frequency. 5-6 users complainted about people hogging the workstations, either by playing games or running screen locks. These behaviors are prohibited, but it is difficult to police against them. week of April 18-24, 1994: Danube crashed and rebooted: 4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm Volga crashed and rebooted 4/22 at 11am. Po crashed and rebooted 4/23 at 4pm. There were a number of "borrowed" mouse balls, one stolen mouse. The /home/e filesystem (contains the CS9E accounts) was full for about 24 hours between 4/20 and 4/21. The sys admins cleared some space. week of April 25-May 1, 1994: Danube crashed and rebooted: 4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk). Parker started denying much of its NFS service on 4/29 at about 3pm. At 4:30pm it was rebooted to clear the problem. Users with home dirs on Parker were affected: their attempts to compile (gcc) and run HSPICE were thwarted by "NFS time out" and "can't start a new shell" error messages. The problem was cleared by 4:45pm. Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am Users are complaining increasingly about other users who leave screen lock programs running on idle workstations. There is no safe way for the sys admins to detect this and prevent it, so we must rely upon peer pressure and reports to prevent it. University of California at Berkeley Department of Electrical Engineering & Computer Science Instructional Systems Support Group Report on EECS Instructional Computing Facilities ------------------------------------------------- May 1994 by: Kevin Mullally, Manager of EECS Instructional Systems week of May2-May 8, 1994: This is the final week of the semester, before exams, and the workstation labs are extemely crowded. There have been dozens of complaints to "root" about other students who leave screen locks running on our work- staions for hours. We have temporarily turned off some accounts for that when we catch them. Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm. Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am. On May 2, we disabled further logins to Danube, which should reduce the crashes (to 0, we hope). The users will get an explanation when they try to login. They can still access their directories on Danube over the net from other systems. Danube is running an older version of the operating system, and until we upgrade that, the vendor is unable to diagnose the cause of the crashes. Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm. We are pushing DEC to look closely at the diagnostic kernel that we have installed on Po, in the hopes that they will identify and fix the cause of these crashes. We suspect that they are triggered by socket-related coding in the final CS162 project, and we have asked those students to avoid executing their "nachos" code on Po. Torus started denying much of its NFS service on the evening of 5/1. It failed to reboot properly the next morning; it seems to have a bad system disk. We replaced it with another disk (taken from a workstation) and the CS184 and CS284 home directories were available again by about 4pm on 5/2. The Iris workstations were available again by about noon on 5/3. week of May 9-May 15, 1994: Po crashed and rebooted: 5/11 at 8am. Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm. Danube crashed and rebooted: 5/09 at 10pm. Users had been allowed to login there again, by accident. While logins have been denied, Danube has not crashed. Logins are now denied again. We disciplined perhaps a dozen students for leaving screen locks on the workstations. We have posted larger signs saying that screen locks are prohibited. week of May 16-May 22, 1994: Parker started denying much of its NFS service again and was rebooted on 5/17 at 11:30am. Users with home dirs on Parker were affected. The problem was cleared by 11:45am. We disciplined perhaps a dozen students for leaving screen locks and playing games on the workstations. We have posted larger signs saying that these pehaviors are prohibited.