Elementary Git with GitHub

This is a quick and simple introduction to the essentials of the Git version control system, as well as the use of GitHub for hosting and sharing your source code.

Parts of this tutorial have been adapted from a course on reproducible research, by Karl Broman, of the Department of Biostatistics & Medical Informatics, at the University of Wisconsin - Madison, found here.

What/Why/How Version Control?

  • Save the entire history of changes to a project.

  • Go back to any point in time (and see what changed between any two times).

  • Some uses: software, data analysis projects, papers, talks, and web sites.

  • My code was working…now it’s not – figure out exactly when it became broken using Git tracking.

Git for Formal Version Control

  • Developed by Linus Torvalds, the inventor of Linux!

  • Tracks any content: source code, data analyses, manuscripts, websites, presentations (all plain text files).

  • Git is very fast, does not need a server, and is amazing at merging changes when conflicts do occur.

GitHub for Hosting/Sharing Code

  • Free hosting for public Git repositories, with an open-source community.

  • Provides an interface for exploring the commit history for public projects.

  • Accessible graphical interface for Git, lowering barriers for collaboration.

Setting up Your System for Git + GitHub

  • Install Git and create a GitHub account.

  • Set up in the home directory a .gitconfig file for the use of Git and GitHub (this defines many aspects of the behavior of Git); to do this (and much more) easily, see my working example here.

  • Set your name, to match that of your GitHub account, via:

  git config --global user.name NAME
  • Set your email, to match that of your GitHub account, via:
  git config --global user.email EMAIL

Key Commands for using Git with GitHub

  • git init - initialize the current directory for tracking changes (creates .git subdirectory).

  • git remote - to view remote repos (-v); to add a remote (add URL); and to change remote repos (set-url).

  • git status - see changes made since the last commit (and number of changes not yet pushed to remote).

  • git diff - see what was changed by a given commit (an easy option is to provide the commit ID as input); no additional input will display changes made by the most recent un-pushed commit.

  • git log - see a history of commits that have been made in the current repository, including basic details.

  • git add - stage changes that have been made to ready for committing (using either . or flag -A for all files).

  • git commit - commit staged changes to add to tracking history (git commit -m 'MESSAGE' to add details).

  • git push - send committed changes to GitHub repo (correct syntax is git push REMOTE BRANCH).

  • git fetch - retrieve any changes from the GitHub repository to current local repository.

  • git merge - attempt to integrate changes that have been fetched with the current files.

  • git pull = git fetch+git merge (integrates collecting/merging changes); use in place of above two.

  • git clone - create DIR, initialize Git, add remote, and perform git pull (see the minimal example below).

  • git mv FILE - conveniently rename/move a file while maintaining tracking by Git (analogous to UNIX mv).

  • git rm FILE - remove a given file from being tracked by Git, and track the deletion in Git history (analogous to UNIX rm).

  • git branch - view the available branches on project, use NAME option to add a branch with the given name.

  • git checkout COMMIT/BRANCH - switch to versions of files at given commit ID or switch branches.

The Cycle of Git: Work, Add, Commit, Push (and Pull)

  • As work on a project evolves, Git allows you to create a coherent history of changes that have been implemented; the goal is to track the logical and conceptual steps that led to your new feature/revision.

  • This workflow makes it easier for you (and collaborators) to understand what you have done when trying to figure it out months later.

  • To effectively use Git for tracking work on projects, it is best to stage changes (via git add) and make small commits consisting of related changes.

  • A few simple guidelines/steps to follow when using Git/GitHub:

    1. Make changes that improve some aspect of a given project controlled by Git.
    2. Stage changes to ready them for commits (via git add . or git add FILE).
    3. Commit the staged changes with a relevant message to add them to the version history.
    4. After work has been completed, push the staged/saved (committed) changes to GitHub.
    5. When revisiting a project repo after some period of time, use git pull to synchronize changes.

A Minimal Example: Local Set Up of a GitHub Repo

  1. On the GitHub website, use the “+” icon in the top right corner to create a new repository.

  2. Name the repository “testing”, do not initialize with a README, and click “create repository.”

  3. From the comfort of the command line, use the following to set up a local directory for the repository just created above:

   mkdir testing
   cd testing
   git init
  1. Within the local directory for the repo, add the remote URL, in the following manner (substituting USER as appropriate):
   git remote add origin https://github.com/USER/testing.git
  1. git clone URL testing takes care of the two preceding steps in a single command.

  2. Add a “readme” file with minimal text just for testing purposes:

   echo "# testing new repo" >> README.md
  1. Stage changes made to the new README.md, using git add . or git add -A or git add README.md.

  2. Commit the changes to README.md that have been staged, using git commit -m "initial commit".

  3. Finally, push the committed changes to GitHub (using git push origin master.)

  4. Examine the updated repository on GitHub to easily view the changes made.

(My) Best Practices for using Git with GitHub

  • Perform a git pull prior to making any changes, especially if the repo is shared with collaborators.

  • Keep commits small, frequent, with clearly denoted messages, and themed around specific changes.

  • Commit source files (not derived files) - e.g., the code (.R) rather than images (.png) produced.

  • Add a .gitignore file (per repository) with lists of files or types of files that Git should ignore.

  • Use a .gitignore_global file in the home directory for types of files that should always be ignored.

  • Use origin as the name of the remote of a Git repository (or of a fork) that you control exclusively.

  • Use upstream as the name of the remote of a Git repository from which you have created a fork.

Further reading/resources