Up and running with Git
--
The first time I came across Git was when I discovered the Arch User Repositories. I didn’t know much about it at the time but considering I only needed to install community maintained packages, I quickly got around it.
To install Spotify for example I’d type:
$ cd /usr/local/aur # to get into the parent folder where all my community packages are
$ git clone https://aur.archlinux.org/spotify.git # to download the files
$ makepkg -si # to build and install the package
And when a more recent version of Spotify would be released, I’d update with:
$ cd /usr/local/aur/spotify # to get into the tracked folder
$ git pull # to download updated files
$ makepkg -si # to build and install the package
Pretty straight forward, isn’t it? But then I started writing code, and version control got a lot more…
The basics
First, let’s make sure we understand the difference between:
- Git, the version control system that track changes in files. You need to install a version of it on your system. Alternatives to Git would be Mercurial, Apache Subversion, for example.
- GitHub, the hosting platform. You need to create an account as it is cloud based. Alternatives to GitHub would be GitLab, Bitbucket, for example.
Let’s do this then! Let’s install Git and create an account on GitHub.
Configure Git
Once installed, run these commands to finalise the initial setup:
git config --global user.name “YourUsername”
git config --global user.email YourEmailAddress
git config --global init.defaultBranch main
The defaultBranch main
line is about using main
instead of master
and avoid unnecessary reference to slavery.
The username and email are going to be public (in case you want to use GitHub services) but I’d still recommend to use the email address used to create the GitHub account as it will be used to monitor your contributions and populate the contribution graph on your profile page:
(There are options to hide/anonymise your email address in the GitHub settings if you value your privacy)
Creating a project with Git
Here is the typical workflow when you start a project (All the commands need to be run from your [project_name]
directory):
Initiate the tracking filesystem with git init [project_name]
.
Create a .gitignore file. It is a list all files that won’t be tracked. This is especially useful with Jupyter Notebooks checkpoints or if you want to keep some files private — like a to-do list for your project for example. The easiest way to do this is using the gitignore.io command line alias.
Add the files you want to track with git add [file]
. [file] can be replace with .
if all the files in the folder need to be tracked. (By all I mean “all but the ones listed in the .gitignore
”).
Now that files are tracked, it is time to commit them with git commit -m “Changes made/Comments”
. Committing is capturing a snapshot of the files. If you were not using Git, it would be similar to saving the files in a different (local) directory that would be named YYMMDD-project_name
. If something goes wrong, you can always reopen that directory and start coding from there.
There you go, 4 steps later and you have a project with version control!
Now, if you want to share your wisdom, or collaborate with other people, the easiest way is to link this local repository of yours with the cloud version, so let’s head to GitHub!
Sharing a project on GitHub
Once you have committed the files to the project, you need to let Git know that this project will have a cloud version with git remote add origin [url_of_project]
.
To get the [url_of_project]
, you need to leave the terminal and login to GitHub. Click on the + sign in the top right corner then New repository
. Fill in the form and give your repository a public name. There is no need to initiate with a README.md
nor to create a .gitignore
(as the project already has files).
Push the files to the GitHub with git push -u origin main
. Push is committing to the cloud. To recycle the backup directory analogy, after transferring the files to the YYMMDD-project_name
directory, pushing would be like uploading the directory to Google/Amazon/One Drive.
That’s it, your many project versions are safe and sound!
Passed this initial setup, the workflow is going to be:
git add .
to stage all the modified changes.git commit -m “comment”
to commit the changes to your local repository.git push
to push the files from the local repository to the cloud.
Using version control
There is a lot that can be said here as git is incredibly powerful and is designed to cope with a lot of scenarios but I’ll try to cover a few common concepts. But before that, let’s have a look at Git’s structure to understand:
Branching
Git is designed for collaborative software development: multiple people working on the same project at the same time. To allow the whole team to work without breaking the codebase, developers “branch” the codebase. They code on that branch and once the feature is working, that feature is added to the codebase.
- Create a branch:
git branch [branch_name]
. - List local and remote branches:
git branch -av
,git branch
to list local branches. - Start working on a branch:
git checkout [branch_name]
,git checkout [main]OR[master]
to return to the main branch. - Delete a branch:
git branch -d [branch_name]
.
Synchronizing
To understand the difference between all these concepts (and more), have a look at this brilliant blogpost from Lydia Hallie.
In a nutshell:
- Fetch
git fetch
will get the latest changes without merging. It means the local repository isn’t updated but the changes are locally available. To apply the changes, you’ll then need tomerge
. - Merge
git merge
. Merging is applying changes to the local files. - Pull
git pull
is used to update the files from the cloud to your local repository. It is essentiallygit fetch
followed bygit merge FETCH_HEAD
. Pulling is preferred to Fetch+Merge approach. - Rebase
git pull --rebase
is another way to update the local files. Unlike merging, which will try to figure out which files to keep or not, rebase will assume the most recent changes are in the branch being rebase. - Pushing
git push
, as seen above, will update local changes to the cloud.
Monitoring
Sometimes synchronizing can be a real nightmare, so here are a few useful commands designed to figure that mess out.
- Status
git status
is a command that will tell you what files are not up -to-date and not committed yet. - Log
git log
displays the complete change history. - Diff
git diff
shows changes to files not staged yet.git diff --cached
shows the changes to staged files.git diff [commit1] [commit2]
shows the changes between 2 commits. - Show
git show [commit]OR[file]
shows the files changes for a commit or file. - Blame
git blame [file]
is useful to see who has made changes.
Correcting mistakes
git reset [commit]
take you back to the desired [commit]
and keep the local changes.git reset --hard [commit]
reset to the desired [commit]
and delete all the changes that happened after that commit.
Working with big files
GitHub has a 100 MB push limit. So it is necessary to install Git Large File Storage if you are working with big files.
- Install Git LFS
- Run
git lfs install
. - From the local repository (
[project_name]
), run:git lfs track [big_file]
thengit add .gitattributes
(to start tracking.gitattributes
). git add [big_file]
.git commit -m “start tracking big_file”
.git push origin main
.- After this, you can use git as per usual, Git LFS will deal with the details.
If you clone a repository with LFS files, you will need to git lfs pull
before being able to use the file as Git LFS will replace that file itself by a text file listing the information necessary for LFS to retrieve your file:
Conclusion
These few commands should be more than enough to get you started with Git/GitHub, at least for your personal projects.
Collaborating and dealing with merge conflicts is a lot more complex but the good thing about Git is it is a tool designed to create safety nets. You can (almost) always revert to a previous commit, so you can confidently google your way out of merge troubles!