College of Engineering EECS Instructional Support Group Sep 18, 2012 EECS Instructional Computing - Review and Plans ----------------------------------------------- Fall 2011 & Spring 2012 CONTENTS: Mission Statement ISG Organizational Scope New Lab for Parallel Computing Budget Cuts Current Concerns Current Development Goals Funding Wish List Notable Events Mission Statement ----------------- The EECS Instructional Support Group (ISG) installs and maintains networked computers that are used by EECS classes. ISG provides computer accounts for instructors and students in the Instructional labs and on Instructional servers. ISG purchases, installs and maintains application software needed for classes. ISG supports instructional labs in Cory Hall, Soda Hall and Sutardja-Ddai Hall. ISG wishes to meet the computing needs of instructors and students in EECS courses and to provide support for new and innovative learning environments. We wish to be accessible and responsive to requests for service. We also wish to learn about new and interesting technologies that may be of value in this service. ISG Organizational Scope ------------------------ These are the functions in which ISG interacts with other UCB support groups: - we use EECS department services (IDSG) for Active Directory, disk space, network access and security scans - we synchronize our user accounts with the EECS department (IDSG) - we obtain enrollment lists from the Registrar (Student Information Services) - we obtain cardkey numbers from the CAl1 office - we submit cardkey authorization to our labs in batch uploads to UCPD - we bill students' voluntary printer charges to CARS - we coordinate our use of the EECS Network Node Bank with IDSG and CNS - we manage the computers in engineering labs with ESG - we manage the licenses for Synopsys/TCAD/HSPICE with the Device Group - we manage the licenses for Cadence with the BSAC group - we manage the licenses for Maya and Renderman with the BCAM group New Lab for Parallel Computing ------------------------------ Thanks to generous support from Prof Yelick and Intel, we have obtained 28 new Dell Precision T5500 Workstations in 330 Soda for use by CS194-15 (Engineering Parallel Software) and other classes (August 2010). The workstations are our most powerful. Each has Dell Precision T5500 Workstation Dual Quad Core E5620 2.40GHz cpu 6X1GB DDR3 ECC SDRAM Memory 1067MHz 512MB PCIe x16 NVIDIA Quadro FX 580 Dual Monitor DVI 320GB SATA 3.0 Gb/s hard drive NVidia Tesla C1060 graphics processor card Ubuntu Linux These replaced 5-year old Sun W1100Z Windows computers. Many of those computers were relocated to 275 and 349 Soda. Budget Cuts ----------- The funding that supports Instructional computing was cut by 20% ($110K) over 2 years, starting July 1 2009. This funding supports computers, software, service contracts, printing, network fees, furniture, supplies and salaries. We are now in a very austere spending mode, and more than ever we are relying on donations for any major new equipment. In FY 2009-10, our budget was cut by $63K (not including the temporary cuts due to furloughs). To match that cut, we reduced capital and operating costs by eliminating service contacts and the purchases of new equipment. We reduced salaries by taking voluntary leaves without pay and by eliminating our student employee positions. In FY 2010-11, the budget was cut an additional $47K. To match that cut, we had to layoff a career staff member in July 2010. Reference: As a result, we have reduced these services since July 2009: 1) CS student developer position eliminated: no development work on Scheme, CS61A Complaint/Resolve WEB service, CS61A Hadoop/MapReduce. 2) No lab maintenance on weekends. 3) No primary support staff for UNIX CAD tools such as Cadence and Synopsys. 4) Slower responses to some service requests. 5) Canceled service contracts on SUN servers, Legato tape archive software, CoventorWare, RedHat Linux. 6) Deferred maintenance: chairs, printers, computers all getting old. We considered using course materials fees to offset some of this loss of operating funds, but we withdrew our request for these fees when the student fees were increased by 33%. (The increased student fees do not increase our operating funds, but we could not in good conscience add to the burden on the students.) In the spirit of the campus Operational Excellence initiative, these permanent budgets cut drive us to look for creative solutions to support and improve our services at lower costs. Some possibilities include: 1) Reduce the cost of compute servers by supporting low-cost virtual servers (such as the Amazon EC2 cloud) for students to do course assignments. 2) Reduce the cost of workstations by providing better services for student- owned laptops (access to network, files, printers, licenses). 3) Reduce the cost of software by sharing licenses with other departments. 4) Gain revenue by recharging for the use of our computer resources by workshops and classes from other departments. 5) Establish partnerships with industry, other UCB IT groups and with UCB stundent organizations to share solutions of common interest. Current Concerns ---------------- These are current policy and procedural concerns for ISG: 1) How to provide support for permanent archiving of WEB sites & videos (disk/tape space, access rights, ownership, bSpace, labor, cost)? ISG has adapted the IRIS tape restore policy: 13 months max. 2) Department policy about desktop computers in the meeting rooms may be unclear to instructors. No repairs/replacements? (ie they'll vanish randomly?) 3) ISG needs to plan farther ahead for new services and potential course fees for classes. We will survey the instructors before each semester to determine needs and priorities. 4) ISG needs to communicate better with the GSIs and instructors at the start of the semester, to understand their needs and to guide them to best practices how to manage the grading software, file security, WEB content and files suitable for printing. We will offer help sessions for GSIs and advertise the instructions that are on our WEB site. 5) Can we retire the ISG (imail.eecs) email server and bounce incoming email? Should we manage the communications services to our students? Alternatives: bSpace (IST), lists.eecs (IDSG), news.berkeley.edu (CSUA), Google groups, ... Current Development Goals ------------------------- We conduct periodic surveys of students for their opinions about the computing resources. (http://inst.eecs.berkeley.edu/~inst/surveys) Based on these surveys, on requests from our instructors and on our own observations, our group has these current development goals: 1) Improve service to users who bring their own laptops. Install power strips and wired network connectivity for laptop users in 199 Cory, 277 Soda and elsewhere. Enable authenticated access to floating software licenses for laptop users when permissible. Set up laptop stations in each lab (seating, table space, power, faster wireless). 2) Install power-saving features on the workstations and lighting in our labs. The UCB Operational Excellence (http://www.berkeley.edu/oe) initiative also mandates this. (With campus and departmental funds, we did replace all the old CRTs with LCD screens in FY 2009-2010.) 3) Provide virtual servers and VM images for classes to allow for more rapid implementation of new computing environments for class projects. 4) Provide collaborative software such as a webcast of the instructor displayed onto the students' workstations or laptops, or the console of any workstation or laptop in a lab displayed onto the instructor's screen. 5) Provide clean and safe labs, including the chairs, keyboards and mice. 6) Gain techical expertise in virtual, clustered and cloud computing. 7) Improve the WEB interfaces for ISG services such as user documentation, user account self-maintenance, SVN, course WEB site development and grading (submit and glookup). Budget cuts of over 21% since 2008 have caused us to nearly eliminate maintenance contracts, proactive maintenance and repairs to non-essential infastructure (such as chairs in the labs). We must set our budget priorities very strictly now. Funding Wish List ----------------- Here is a wish list for which we lack funding: 1) upgrade workstations in Instructional labs ($234K total) - replace 30 PCs (circa 2002) in 105 Cory for EE20N ($90K) - replace 8 PCs (circa 2003) in 111 Cory for EE117 ($24K) - replace 8 PCs (circa 2002) in 218 Cory for EE143 ($24K) Benefit to approx 750 students in CS152, CS160, CS162, CS164, CS169, CS170, CS172, CS184, CS186, CS188, CS194, CS198, EE117, EE143, EE20N. 2) renovate 105 Cory: new furniture, wiring, A/V ($30K) 105 Cory is a teaching lab that is used primarily by EE20N but is also available to other classes. It has an overhead projector and is well-suited to hands on instruction. The lab benches, chairs, network and electrical cabling are in disrepair from years of use. 3) salaries for 2 student staff for 1 year ($8K) To meet the FY 09-10 budget cut (21%), we have reduced our student staff. As a result, we have a higher incidence of delayed repairs and deferred maintenance in the labs. This funding would support staff for projects such as: - check physical condition of labs and do routine maintenance - install new equipment (such as with items #2 above) - update our UCB Scheme software (CS3, CS61A) for home users - install an RSS feed server for use by instructors This would benefit all 6800+ students who use our computers. 4) Replace old chairs in 1 lab per year (about $6K per lab). 5) Build a WEB front end to the CS grading software (submit/glookup). Notable Events -------------- See http://inst.eecs.berkeley.edu/notices.html for current events. NOTICES for EECS Instructional Users (Fall 2011-Spring 2012): -------------------------------------------- Feb 9: Login and disk quota failures last night From about 6pm to 11pm on Feb 8, these problems prevented some users from logging onto our UNIX systems or from saving files in their home directories. Some CS186 students were unable to write to their home directories until about 9am on Feb 9. Starting at about 6pm, these 3 unrelated events happened concurrently: 1) An LDAP server failed, preventing logins in 330 Soda (Hive*), 125 Cory (T7400*) and on some other Instructional UNIX systems. The symptom was "permission denied" even though you typed the right password. 2) The home directory disk for class accounts (/home/cc) filled up. The symptom was "no space left on device" and zero-length files when you tried to write them in your home directory. 3) The CS186 accounts appeared to be over their disk quotas. The symptom was "disk quota exceeded" and zero-length files. We fixed #1 by forcing the systems to use our backup LDAP server. Later, we repaired the primary LDAP server. We fixed #2 by deleting the expired Fall 2011 class accounts, which we kept on the disk until we could verify the tape archives. We fixed #3 by 11pm (Feb 8), but it reoccurred from about 2am-9am. Problem #3 started when we tried to raise the quota on CS186 accounts. The behavior of the quota system is poorly documented (/home/cc is on a NetApp that serves nearly all of the home directories in EECS). Until now, all Instructional users had the same, default quota on UNIX (800MB). To raise just CS186, we thought we had a 'group' directive that meant "give each member of the group a 2GB quota", but it turns out it meant "give the entire group a 2GB quota". It takes several hours for the quota system to rebuild, so the problem did not surface right away. The solution was to add a directive for each user. In an effort to redo this while most users were asleep, we restarted the quota rebuild at about 2am. Unfortunately, we had not removed the offending directive, so the problem reappeared. We killed that, and rebuilt it one more time with the correct entries. Alternate file storage: The only UNIX network shares are the home dirs (~$USER) and /home/tmp. There is also /var/tmp on the local UNIX disks. You could copy files you have written to USB flash drives for backup. Store in an off site service, such as http://box.com. File recovery: Files that previously existed in an earlier form may be found in the hidden .snapshot/ directory that is in each home directory. The oldest snapshots are overwritten in 2 days. If the file had never been saved before, it is, unfortunately, lost. I apologize for this interruption in service and for wasting the students' time. I appreciate the calm (resigned?) attitude that you have all demonstrated despite this. Kevin Mullally Manager, EECS Instructional Support Group kevinm@berkeley.edu -------------------------------------------- Nov 18: What if you are threatened in a lab? Hello students, instructors and staff: The shooting in a computer lab at Haas School of Business on Nov 15 is a serious concern for us all. Ways to call for help in an emergency in one of our labs: * from a campus phone: call 911 (UCPD) * from a cell phone: call 911 (CA Highway Patrol) * from a cell phone: call 510-642-3333 (UCPD) * from a cell phone: text 510-664-8477 (UCPD) * email cal@tipnow.com, (http://police.berkeley.edu/caltip/) (UCPD) Access to every EECS instructional lab is restricted by cardkey on evenings, weekends and holidays. Many labs are restricted 24x7. Students in EECS classes are automatically enabled for cardkey access to the labs for their classes. (This is a good time to remember the rule not to let unauthorized people into the buildings or labs.) The UC Police Department has posted an informative video to help us be prepared in the event of a shooter or other threat in our confined lab spaces: Shots Fired. (Login to bSpace. The video is in Flash, so it may not work on Apple devices.) Ways you can be automatically notified of emergencies: * Subscribe to http://warnme.berkeley.edu/ (UC Berkeley) * Subscribe to http://www.ci.berkeley.ca.us/ContentDisplay.aspx?id=25416 (City of Berkeley) * https://alertsf.org/, http://twitter.com/#!/alertsf (City of San Francicso) Please report any observations or concerns you have about safety in our labs. You can report to EECS staff in any of these ways: * visit 251 Cory or 387 Soda (main dept offices) * visit 377/378/380/384/386 Cory or 333 Soda (instructional support) * email inst@eecs.berkeley.edu * call 510-643-6141 Thank you, Kevin Mullally Manager, EECS Instructional Support Group http://inst.eecs.berkeley.edu/~kevinm -------------------------------------------- July 12: Instructional computers will be down today, 7PM-midnight We'll need to deny access to all of our Solaris computers for several hours tonight while the department server for the /usr/sww filesystem is being replaced. While that filesystem is offline, many common UNIX programs will be unavailable and users who are logged in are likely to experience frozen login sessions or long delays from timeouts. So at 7PM, we will boot everyone off the computers on the 2nd floor of Soda, including the servers Nova, Star, Solar, Stella, Cory, C199, Torus and Pentagon. The downtime may last until midnight. Updates about the dept systems are posted at https://iris.eecs.berkeley.edu/. We will post signs in the labs and in a message that appears at the login session. I apologize for the short notice about this. Kevin Mullally Manager, EECS Instructional Support Group 378 Cory Hall, UC Berkeley, (510) 643-6141 http://inst.eecs.berkeley.edu/~kevinm -------------------------------------------- May 9, 8:50pm-9:15pm: UNIX home dirs (on Project) were down The department file server called Project was down for about 25 minutes tonight. More information may be available on http://iris.eecs.berkeley.edu. -------------------------------------------- This server was down Mon May 3, 3:15pm-5pm The http://inst.eecs.berkeley.edu WEB server and subordinate sites were unresponsive on Mon May 3 from about 3:15pm-5pm. This was caused by an overload of WEB processes, probably caused by excessive hits on the server or by a runaway CGI script of one of our users. We restarted the WEB server and ended the problem. -------------------------------------------- Downtime Thursday afternoon Some Instructional computer services stalled on Thursday afternoon as a side effect of the unplanned downtime of some department servers (see "290 Machine Room Outage" on https://iris.eecs.berkeley.edu/). Services that were effected included: logins to UNIX systems access to UNIX home directories login servers: pulsar.eecs, quasar.eecs (rebooted) WEB server (inst.eecs) email server (imail.eecs) Email deliveries were postponed but not lost. Service was restored by 5pm. -------------------------------------------- UNIX disk space filled up (Sun Feb 13) The UNIX /home/cc filesystem filled at about 1:30am. This filesystem includes all of the home directories of EECS class accounts. Users were getting error messages such as "disk quota exceeded" and "error 28" from Filefox and Java apps. Some old user files were removed at about 2pm to free up space. -------------------------------------------- Jan 10 - this server was updated; you may need to update your .htaccess files Please see http://inst.eecs.berkeley.edu/setup.html#modauth. -------------------------------------------- Jan 4 - Instructional login problems are fixed Some of our security certificates expired on Jan 1, 2011. From Friday night until Monday noon, users could not login to our UNIX computers (such as cory.eecs and torus.cs). Our WEB servers denied URLs that use the "https" or "~/user" syntax. The WEB servers include: https://inst.eecs.berkeley.edu https://isvn.eecs.berkeley.edu https://imail.eecs.berkeley.edu https://acropolis.eecs.berkeley.edu Also, these services were not working until Tueday morning (Jan 4): https://imail.eecs.berkeley.edu (Squirelmail WEB client) IMAP on imail.eecs.berkeley.edu (port 993) https://isvn.eecs.berkeley.edu (SVN WEB interface) All of these services have been restored now. We apologize for the lapse in service. -------------------------------------------- 20" LCD Displays Replace 122 Old CRT Monitors Please see the Fall 2009 Manager's Report (https://inst.eecs.berkeley.edu/~inst/reports/?file=Fall_2009#4) for information about the Dell LCD monitors that were installed in the Instrucitonal labs in Fall 2009. -------------------------------------------- Symptoms when UNIX email or home directories are missing: - when you try to login the screen freezes - you see the error message "home directory is /" - session hangs up if you try to 'ssh' into an Instructional computer - unable to read WEB pages from the http://inst.eecs.berkeley.edu - lots of annoying "NFS timeout" error messages on your screen - new email deliveries will be delayed on imail.eecs While the server is down, you may not be able to logout in our labs because you can't type any commands. On a SunRay, even turning it off doesn't log you out. The support staff check the labs after events like this to be sure everyone gets logged out. We also post information about the problem at http://inst.eecs.berkeley.edu to help students find out when the problem has been fixed. So all you can really do in this case is to wait until the problem is fixed, go back to the lab (or login to the SunRay server for that lab) and log yourself out, or let us log you out. We disable email receipt and relaying through imail.eecs when the home directory server (mamba.cs.berkeley.edu) is down. No mail is lost. Computers that send mail queue messages that are not accepted by a remote server, and they resend the messages periodically until they are received. -------------------------------------------- For additional information, please contact me: Kevin Mullally, ISG Manager EECS Instructional Support Group 378 Cory Hall, (510) 643-6141 kevinm@eecs.berkeley.edu http://inst.eecs.berkeley.edu/ source: ~inst/public_html/reports/managers/Spring_2012