Octocat says Hi!

The first time I came across Git was when I discovered the Arch User Repositories. I didn’t know much about it at the time but considering I only needed to install community maintained packages, I quickly got around it.

To install Spotify for example I’d type:

And when a more recent version of Spotify would be released, I’d update with:

Pretty straight forward, isn’t it? But then I started writing code, and version control got a lot more…

The basics

First, let’s make sure we understand the difference between:

  • Git, the version control system that track changes in files. You need to install a version of it on your system. Alternatives to Git would be Mercurial, Apache Subversion, for example.
  • GitHub, the hosting platform. You need to create an account as it is cloud based. Alternatives to GitHub would be GitLab, Bitbucket, for example.

Let’s do this then! Let’s install Git and create an account on GitHub.

Configure Git

Once installed, run these commands to finalise the initial setup:

The defaultBranch main line is about using main instead of master and avoid unnecessary reference to slavery.

The username and email are going to be public (in case you want to use GitHub services) but I’d still recommend to use the email address used to create the GitHub account as it will be used to monitor your contributions and populate the contribution graph on your profile page:

Busy bee!

(There are options to hide/anonymise your email address in the GitHub settings if you value your privacy)

Creating a project with Git

Here is the typical workflow when you start a project (All the commands need to be run from your [project_name] directory):

Initiate the tracking filesystem with git init [project_name].

Create a .gitignore file. It is a list all files that won’t be tracked. This is especially useful with Jupyter Notebooks checkpoints or if you want to keep some files private — like a to-do list for your project for example. The easiest way to do this is using the gitignore.io command line alias.

Add the files you want to track with git add [file]. [file] can be replace with . if all the files in the folder need to be tracked. (By all I mean “all but the ones listed in the .gitignore”).

Now that files are tracked, it is time to commit them with git commit -m “Changes made/Comments”. Committing is capturing a snapshot of the files. If you were not using Git, it would be similar to saving the files in a different (local) directory that would be named YYMMDD-project_name. If something goes wrong, you can always reopen that directory and start coding from there.

There you go, 4 steps later and you have a project with version control!

Now, if you want to share your wisdom, or collaborate with other people, the easiest way is to link this local repository of yours with the cloud version, so let’s head to GitHub!

Sharing a project on GitHub

Once you have committed the files to the project, you need to let Git know that this project will have a cloud version with git remote add origin [url_of_project].

To get the [url_of_project], you need to leave the terminal and login to GitHub. Click on the + sign in the top right corner then New repository. Fill in the form and give your repository a public name. There is no need to initiate with a README.md nor to create a .gitignore (as the project already has files).

Push the files to the GitHub with git push -u origin main. Push is committing to the cloud. To recycle the backup directory analogy, after transferring the files to the YYMMDD-project_name directory, pushing would be like uploading the directory to Google/Amazon/One Drive.

That’s it, your many project versions are safe and sound!

Passed this initial setup, the workflow is going to be:

  • git add . to stage all the modified changes.
  • git commit -m “comment” to commit the changes to your local repository.
  • git push to push the files from the local repository to the cloud.

Using version control

There is a lot that can be said here as git is incredibly powerful and is designed to cope with a lot of scenarios but I’ll try to cover a few common concepts. But before that, let’s have a look at Git’s structure to understand:

Git Structure, Credits to /u/stamminator/ on Reddit.

Git is designed for collaborative software development: multiple people working on the same project at the same time. To allow the whole team to work without breaking the codebase, developers “branch” the codebase. They code on that branch and once the feature is working, that feature is added to the codebase.

  • Create a branch: git branch [branch_name].
  • List local and remote branches: git branch -av, git branch to list local branches.
  • Start working on a branch: git checkout [branch_name], git checkout [main]OR[master] to return to the main branch.
  • Delete a branch: git branch -d [branch_name].

To understand the difference between all these concepts (and more), have a look at this brilliant blogpost from Lydia Hallie.

In a nutshell:

  • Fetch
    git fetch will get the latest changes without merging. It means the local repository isn’t updated but the changes are locally available. To apply the changes, you’ll then need to merge.
  • Merge
    git merge. Merging is applying changes to the local files.
  • Pull
    git pull is used to update the files from the cloud to your local repository. It is essentially git fetch followed by git merge FETCH_HEAD. Pulling is preferred to Fetch+Merge approach.
  • Rebase
    git pull --rebase is another way to update the local files. Unlike merging, which will try to figure out which files to keep or not, rebase will assume the most recent changes are in the branch being rebase.
  • Pushing
    git push, as seen above, will update local changes to the cloud.

Sometimes synchronizing can be a real nightmare, so here are a few useful commands designed to figure that mess out.

  • Status
    git status is a command that will tell you what files are not up -to-date and not committed yet.
  • Log
    git log displays the complete change history.
  • Diff
    git diff shows changes to files not staged yet.
    git diff --cached shows the changes to staged files.
    git diff [commit1] [commit2] shows the changes between 2 commits.
  • Show
    git show [commit]OR[file] shows the files changes for a commit or file.
  • Blame
    git blame [file] is useful to see who has made changes.

git reset [commit] take you back to the desired [commit] and keep the local changes.
git reset --hard [commit] reset to the desired [commit] and delete all the changes that happened after that commit.

Working with big files

GitHub has a 100 MB push limit. So it is necessary to install Git Large File Storage if you are working with big files.

  • Install Git LFS
  • Run git lfs install.
  • From the local repository ([project_name]), run: git lfs track [big_file] then git add .gitattributes (to start tracking .gitattributes).
  • git add [big_file].
  • git commit -m “start tracking big_file”.
  • git push origin main.
  • After this, you can use git as per usual, Git LFS will deal with the details.

If you clone a repository with LFS files, you will need to git lfs pull before being able to use the file as Git LFS will replace that file itself by a text file listing the information necessary for LFS to retrieve your file:

Before and after a git lfs pull.

Conclusion

These few commands should be more than enough to get you started with Git/GitHub, at least for your personal projects.

Collaborating and dealing with merge conflicts is a lot more complex but the good thing about Git is it is a tool designed to create safety nets. You can (almost) always revert to a previous commit, so you can confidently google your way out of merge troubles!

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store