CS61C Fall 2018 Lab 0 - Intro, Git, Shell

Goals

Reading

Policies and Partners

There is no checkoff this week. Starting next week, you are REQUIRED to have a partner for lab checkoffs. This will reduce the number of check-offs we have to perform (allowing us to answer more of your questions) as well as give you someone to discuss class material with. BOTH partners will need to be present at check-off to receive credit and both partners will be asked to participate during the check-off. Try your best to find someone in your lab section with similar work habits as yourself.

How Checkoffs Work

At the end of each exercise, there is a section labelled "Checkoff." The items in this section are what you must successfully demonstrate to the course staff in order to receive credit for completing the lab. Once you and your partner finish ALL of the exercises, you should put your names and logins on the checkoff list on the board, and a course staff member will come and check you off.

Checkoff

Labs in CS 61C are graded out of 2 points. Labs are due for full points by the next lab session (which is 1 week after the lab was assigned). If they're another week late, then you get half credit. Any later than that and it's 0 points. You can always ask for help on the lab, but you can only asked to be checked off once. If you asked to be checked off and you don't pass the checkoff you'll get 0 points.

This semester, to encourage students to use lab time more efficiently by starting the lab at home, you can also earn extra credit by checking-off within the first hour of your assigned lab time. For every lab in which you and your partner get checked-off within the first hour of your assigned lab section, both you and your partner will receive 1 extra credit point. After getting checked-off early on a lab, a good way to spend the rest of your lab time is by being altruistic and helping other students with the lab, which will also help you solidify your own understanding of the material.

Exercises

In today's lab only, everyone will need to find a partner, preferably with the same operating system, but both students in the partnership will need to complete the setup steps in Exercises 0, 1, and 2. Later exercises only need to be done on one computer.

Before beginning the exercises, familiarize yourself with Professor Hilfinger's Simple UNIX Commands guide. All of the exercises in this lab (as well as future labs) will rely on using the terminal to navigate and manipulate files and programs on your computer. Oftentimes, as in this lab, we'll also be connected remotely to another computer, so it's especially important to learn an efficient workflow for the terminal shell environment. In this lab, we omit a few commands because we assume that you have some experience working with git before and have learned how to navigate to different directories, make new directories, and edit and remove files. If you're not as comfortable, it's especially important to find a partner to work through the lab!

Exercise 0: Instructional Account Setup

To obtain your CS 61C login, go to webacct and login using your CalNet ID. Once logged-in, create a new account for CS 61C. This should give you a username and a temporary password. Now, you can login to your instructional account by running the command ssh cs61c-xxx@hiveYY.cs.berkeley.edu on your laptop (where xxx is your CS 61C login, and YY is any number between 1 and 29), and entering in your temporary password. Congratulations! You are now remotely accessing the "Hive" computer located in 330 Soda. You can also login directly onto one of the lab computers with that username and password.

If you're having trouble logging into particular Hive machine, make sure that it's connected to the internet. You can check the availability of all of the EECS instructional computers by using the Hivemind web tool to monitor current computer usage. This will be particularly handy when it comes time to run projects on the Hive machines to help identify which computers are under heavy use, and which ones are not.

In order to change your password from the temporary one, while still logged into your instructional account (i.e. in the same terminal window that you ran ssh cs61c-xxx@hiveYY.cs.berkeley.edu), and enter ssh cs61c-xxx@update.cs.berkeley.edu and follow the prompts.

Now you're ready to start the lab!

If you're unable to obtain a CS 61C account through webacct, complete the Instructional Account Request Form and we will manually process your request at the end of each week prior to the drop deadline. You can still complete the rest of the activities in this lab with your partner, and setup your GitHub repository as well.

Exercise 1: GitHub Account Setup

Please read the following instructions carefully before proceeding. Almost all issues students run into during this lab can be prevented by carefully following the steps provided. Even if you have experience with git from previous CS 61-series classes, the process we use to set up your accounts may be different in this class.

This semester, we will be requiring that you use git, a distributed version control system. Version control systems are better tools for sharing code than emailing files, using flash drives, or even other file sharing mechanisms like Dropbox.

We'll be using GitHub Classroom to host private repositories in which you'll store your code. If the previous sentence means nothing to you, don't be alarmed! We'll walk you through the process.

Setting up your lab repository

Fill out the Lab Repository Registration form, which will help you create a private GitHub Classroom repository and link your student ID to your CS 61C login and GitHub repository, which will be required to identify your work and assign grades.

Later, we'll also help you setup separate GitHub Classroom repositories for your coding homework exercises, as well as for each of the projects. Use this lab repository only for lab assignments!

Remember that you are not allowed to post your code publicly. We provide this GitHub repository to make it easy to backup your work online and share it with the course staff privately.

Setting up git

Now that we have created our repository, let's configure git so that it knows who you are.

While logged into your instructional account, Run the commands listed below, replacing YOUR NAME with your first and last name (inside quotes) and YOUR EMAIL ADDRESS with the email address registered with your GitHub account.

Your terminal might use a different prompt than $. Type out or paste only the part after the prompt, and modify the name and email fields.

$ git config --global user.name "YOUR NAME"
$ git config --global user.email "YOUR EMAIL ADDRESS"
$ git config --global push.default simple

Exercise 2: Git Remotes and the Hive machines

First, some quick definitions.

Throughout this class, you will regularly work with three different computers that may very well have three different versions of your code. These three are your local machine (your personal computer), one of the hive machines (while logged into your instructional account), and a remote (your GitHub repositories). For the least pain throughout the semester, it's essential that you understand the difference between these three and how you can share code between them.

  1. Your local machine. Just your good ole personal computer -- nothing new here!
  2. The "Hive" machines, or other instructional computers. We'll be using a lot of different software and libraries throughout this class that might require different versions than the ones on your local machine (such as python2.7 versus python3). Therefore, you'll need to log into your instructional account on a Hive machine (hiveYY.cs.berkeley.edu) so that you can run your assignment code in an environment with all of the correct software/library versions.
  3. The GitHub remote. Conceptually, you can also think of the GitHub remote as another machine that only stores your code (and doesn't do much else). Pushing changes to GitHub updates files on GitHub, and pulling changes updates files on your current machine: either your local machine, or a Hive machine, depending on what you're logged into.

Cloning the repository

While logged into your instructional account, clone your GitHub remote repository.

Starter code for labs will be distributed through the fa18-lab-starter. Everytime you clone your repository, you will need to add the starter code repository as a remote.

cd into your git repository, and run the following command to add it as a remote.

$ git remote add starter https://github.com/61c-teach/fa18-lab-starter.git

Let's double check to make sure that your repository is up-to-date with the starter code before moving on. Run the following command to pull changes from the starter repository to our master branch.

$ git pull starter master

If everything is setup correctly, git should report, Already up to date.

Pushing

Now let's practice pushing some code to our GitHub repository! A common setup step for new repositories is to add a README.md file which describes the contents of the repository.

While logged into your instructional account, run the following commands, and make sure that you understand what each of them will do. It's important to think through the steps needed to translate what you want to do into individual steps for git to carry-out.

Whenever using a Git repository, the first thing to do is determine the state of our repository. We can do this by running git status.

$ git status

We should see that everything is up-to-date, which means that the GitHub remote repository has all of our code. In order to add a README.md file, we need to follow these steps:

  1. Create a new file with your text editor.
  2. Save the file with the name, README.md.
  3. Tell Git to track the new README.md file, register it as part of our local repository history, before finally synchronizing our local history with the GitHub remote repository.

Since we're currently accessing the Hive machine remotely, it is somewhat tricky to display windowed, graphical editors. We'll be learning how to edit a file in a terminal text editor, a skill which will be essential to any programmer's toolkit.

If you've never edited in the terminal before, we suggest that you use nano, which works a lot like a windowed text editor. If you prefer a more powerful text editor, two other popular choices include vim and emacs. Launch nano (or your preferred text editor) in the terminal.

$ nano README.md

Then, add your description. md is the file type for the Markdown format, a way of writing prose without getting bogged down by worrying too much about details like font sizes or spacing. Writing Markdown is incredibly simple: in fact, you've probably already written many documents in a style similar to Markdown. Here's an example you can use; feel free to modify it to your liking.

# CS 61C Fall 2018 Labs

**Hello, world!** This repository contains all of my lab work in CS 61C.

- [CS 61C Fall 2018](http://inst.eecs.berkeley.edu/~cs61c/fa18/)

Quit the editor with ^X (Ctrl+X). You'll be prompted to save first: a good idea!

Now, we need to communicate these changes to both our local repository, as well as the GitHub remote repository. Remember the following sequence of commands, you'll be using them regularly to commit changes to your code.

  1. Stage all changes for commit with git add -A. This tells git to start tracking the files, but doesn't yet treat them as an official part of the local repository history.
  2. Verify that all of your desired changes have been staged for committing with git status. We should see the README.md file listed under "Changes to be committed".
  3. Commit the changes, recording README.md as an official part of history with git commit -m "Add README.md". You can change the message to anything you like.
  4. Push the updated, official history to the GitHub remote repository with git push origin master.

If you now visit your GitHub repository, you'll see the contents of the README.md file beautifully rendered in your browser!

The git version control is built around commits, or checkpoints in development of different versions/stages of your code. To explain the above steps a little further, and define some of the key terms:

If you'd like a deeper refresher, learn more about Using Git from CS 61B. This lab only touches on the bare minimum; if you feel shaky at all, now is the perfect time to run through the Using Git guide and ask questions!

Exercise 3: Working on Projects

This semester, you'll be working on the first project individually, but all other projects will be done in pairs. You may sometimes be working on the Hive machines, other times on your local machine, and yet other times on your partner's machine. But how can you smoothly change code in the same files without having to delete and copy files back and forth?

In this part, you'll learn the process you will use for every project for obtaining the starter files for the project and then also working on the code on both machines. For the rest of this part, you will work on both your laptop and your instructional account to simulate a real environment. It will be easiest to open up multiple tabs in your terminal.

Multiple Repositories

Now that our GitHub remote repository is up-to-date with the local repository on the Hive machine, a common thing students like to do is also maintain a local copy the code on their personal computer as well. If you don't have your own computer, we still think learning this process will be helpful. Since git doesn't have knowledge about all the different repositories on a computer, you can simulate "using a local machine" by working in a separate directory on the lab computer.

Open a terminal window on your local machine (not Hive!), and navigate to the directory where you'd like to store your labs on your local machines (such as ~/cs61c). Then, clone your GitHub repository like we did earlier.

Just like that, the repository, along with the README.md file you just pushed to GitHub, are now also on your local machine!

Conflicting Changes

At this time, all three of your repositories agree on the official history. They all see the README.md file containing your latest changes. We'll now see what happens when either your local machine or the Hive machine diverges in history, and figure out how to handle it. This is a common scenario students working on the project will run into, even on solo projects, as they'll often make changes on Hive or on their local computer, but forget to push or pull to synchronize with the GitHub remote repository before making further changes, resulting in divergent histories.

Let's pretend you are working on the project on your local machine. Open up README.md on your local machine and change the title to read, "CS 61C Fall 2018 Lab Assignments", stage, commit, and push the changes to the GitHub remote repository.

At this time, both your local machine and the GitHub remote repository should agree that the title ends in "Lab Assignments", while the Hive machine is still one commit behind, with its latest version ending in "Labs".

Now, switch back to the Hive machine, but forget to pull the new changes. Let's modify README.md. Change the title to read, "CS 61C Fall 2018 Lab Exercises", stage, commit, and push the changes to the GitHub remote repository.

This time, your push command will result in an error because your GitHub remote repository "contains work that you do not have locally" on your Hive machine repository. It will reject the push with the following message.

error: failed to push some refs to '...'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

To fix this, let's do as the instructions say. Git is full of instructive messages like these, so take the time to read each one. If you're not sure how to interpret the instructions, search online or ask a course staff member for help!

$ git pull origin master

You should get "Auto-merging fails". What does it mean?

Merge Conflicts

In the previous section, you worked on the outdated directory on the Hive machine and tried to push the new changes without pulling the most recent code. When you try to pull code that conflicts with the code you are working on, this situation is called a merge conflict and poses a problem because git doesn't know which modifications it should accept for the conflicting line(s). Let's see how to resolve this.

If you open up README.md on the Hive machine, you'll notice that some arrows and other numbers have been added by git to signify a merge conflict. The top half (before the ======) is the changes you made locally on the Hive machine, and the bottom half is the code from the GitHub remote repository. Resolve the conflict by deleting the extra lines GitHub used to draw your attention (like >>>>>>, ======, and the commit IDs next to the arrows), and leave the correct text: the title ending in "Lab Exercises".

Once that's done, stage the changes, commit to register it as part of the history, and push.

$ git add README.md
$ git commit -m "Resolved merge conflict"
$ git push origin master
$ git log

This time push should succeed. Take a look at the log output and make sure you understand the source for each line in the log. Which one was introduced by your commit on your local machine? The one introduced by the Hive machine? The one which resolved the inconsistency?

Finally, switch back to your local machine and pull the changes you just made.

$ git pull origin master

You should see the updated title ending in "Lab Exercises" on your local machine.

Exercise 4: System Recovery in Unix

This exercise draws heavily from CS 107 at Stanford.

For this exercise, consult the internet and search online to help answer each of the questions using Unix commands.

Situation: You would like to help a friend whose Unix-based system has been affected by an unauthorized access. Your friend was worried about the few days that the hacker had access to the system and has made a backup copy of several key directories on the system as evidence. They've made a copy of this evidence for you, and would like you to look through it to try to piece together some of the details of what happened.

These evidence files are in the lab00.zip file. Download and unzip it into the lab repository.

Your friend has determined that one of the first things that the intruder did is add themselves (their username) to the list of users of the system. This list is kept in a file, config/users.list. Whenever this file is edited, a backup copy of its contents before the edit is automatically made. This backup copy from the most recent edit is also in the config directory.

The malicious intruder is the only person whose username was added between these two versions. Based on this information, discuss a process for finding out the intruder's username and answer the following two questions.

Your friend suspects that the intruder was trying to install malicious programs on the system. The system's programs are located in the bin directory. Knowing that the intruder was the only person logged in to the system around the time that they edited the users.list, look at the programs and determine which ones may have been edited or installed by the intruder, based on the timestamps of the files.

Having the malicious code present on the system is of little use (from the intruder's perspective) if it is not executed. Your friend's system has a way that each user can configure certain programs to be automatically launched whenever they log in. This convenience is something the intruder may have tried to exploit, by editing other users' configuration of this feature to execute the malicious programs they installed or modified.

Each user has a folder called init.d in their home directory. The users' home directories are located in the user directory. You can open a couple of the files in the init.d folders to see what they look like, but the main thing to know is that if the name of one of the malicious programs you identified appears anywhere in the init.d file, that file should be considered compromised.

Then, using Unix commands only (not by editing files with a text editor!) recover the system by removing all traces of the intruder's activity. At the end of the exercise, your lab00 folder should meet the following requirements:

  1. The config folder should only contain one file, users.list, without the malicious intruder's username.
  2. The malicious programs in bin should be removed, leaving only the programs where were last modified before the intrusion.
  3. All compromised lines in the user home directories should be removed, leaving as much of the original files intact as possible.