Understanding Git: A Deep Dive into Essential Concepts
Git is one of the most powerful tools in a developer’s arsenal, but its flexibility and depth can sometimes make it challenging to master. This post explores some of the fundamental Git concepts that every developer should understand to work effectively with version control.
Local vs. Remote: Understanding the Relationship
When working with Git, one of the first things to grasp is the relationship between your local repository and the remote repository (typically hosted on platforms like GitHub, GitLab, or Bitbucket).
The message Your branch is ahead of 'origin/master' by 2 commits
is a common sight for Git users. This simply means your local repository contains two commits that haven’t been pushed to the remote repository yet. These commits exist only on your machine until you synchronize them with the remote using git push origin master
.
Handling Divergent Histories
This relationship becomes more complex when changes occur in both locations. Consider this scenario:
Developer: What happens when one also cleans up code that was part of a pull request on GitHub and not locally?
When you edit files on GitHub directly but also have local changes, your histories diverge. This creates a situation where both repositories have different changes to the same codebase. When you try to synchronize them, Git will inform you of this divergence:
$ git pull origin symbolic-tensor-constants
From https://github.com/c5huracan/tinygrad
* branch symbolic-tensor-constants -> FETCH_HEAD
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint: git config pull.rebase false # merge
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
In this case, you need to decide how to reconcile these divergent histories. For branches that are part of pull requests (which are considered public), merging is typically the safer approach:
git pull --no-rebase origin symbolic-tensor-constants
Fetch vs. Pull: Different Ways to Get Remote Changes
When you need to get changes from the remote repository, Git offers two primary commands:
Developer: What is the difference between a fetch and a pull?
Git Fetch
git fetch
downloads changes from the remote repository but doesn’t integrate them into your working files. It’s like checking what’s new without actually applying those changes.
git fetch origin
This command:
- Updates your remote-tracking branches (like origin/master)
- Doesn’t modify your working directory
- Allows you to review changes before integrating them
Git Pull
git pull
is more direct - it downloads changes and immediately integrates them into your current branch.
git pull origin master
Essentially, git pull
is a combination of git fetch
followed by either git merge
or git rebase
, depending on your configuration.
A safer workflow is often to fetch first, review the changes (using git diff origin/master
), and then decide how to integrate them.
Navigating Git’s Interactive Features
Developer: What is the : prompt when I run the git diff trying to tell us?
When running commands like git diff
, you might encounter a colon (:
) prompt. This indicates that Git is using a pager (typically “less” on Unix-like systems) to display the output.
This interactive mode allows you to:
- Navigate through large outputs with Space (page down) or b (page up)
- Search for specific text with
/
- Quit and return to the command line with
q
If you prefer to see all output at once, you can use the --no-pager
option:
git --no-pager diff
Merge vs. Rebase: Two Paths to Integration
Developer: What is the difference between merge and rebase?
When integrating changes from one branch into another, Git offers two distinct approaches:
Git Merge
Merging creates a new “merge commit” that combines changes from both branches, preserving the complete history.
A---B---C (master)
\
D---E (feature)
After git merge feature
while on master:
A---B---C---F (master)
\ /
D---E (feature)
Where F is a new merge commit that combines the changes.
Merging is non-destructive and clearly shows when and where branches were integrated, but it can create a more complex, non-linear history.
Git Rebase
Rebasing takes your commits from one branch and replays them on top of another branch, creating a linear history.
A---B---C (master)
\
D---E (feature)
After git rebase master
while on feature:
A---B---C (master)
\
D'---E' (feature)
Rebasing creates a cleaner, linear history but rewrites commit history in the process.
Developer: So if a branch is part of a pull request, am I correct to say that it is public and therefore I need to merge it?
Yes! The golden rule is never to rebase branches that others might be working on. Once a branch is part of a pull request:
- It should be considered public
- Other developers may have already reviewed it
- Comments might be tied to specific commits
- Others might have based their work on it
In these cases, using merge preserves the history that reviewers have already seen and commented on in the PR.
Handling Changes Made on GitHub
Developer: What if I made updates to the file already committed on GitHub?
If you’ve edited files directly on GitHub’s web interface, you have several options:
- Create a new commit (safest approach):
git add <updated-files> git commit -m "Update to address review comments" git push origin <branch-name>
- Amend the last commit (if your changes are small fixes):
git add <updated-files> git commit --amend git push --force origin <branch-name>
- Interactive rebase to modify specific commits:
git rebase -i HEAD~n # where n is the number of commits to go back # Then edit the commits as needed git push --force origin <branch-name>
If the PR has already been reviewed, adding a new commit is generally best as it preserves the review context.
Developing an Effective Git Workflow
Based on these concepts, here’s a recommended workflow for handling common scenarios:
- Before starting new work:
git fetch origin git diff origin/master # Review changes git pull # Or merge/rebase as appropriate
- When integrating remote changes with local work:
git fetch origin git diff origin/master # Review differences # Choose either: git merge origin/master # Preserve history (for public branches) # OR git rebase origin/master # Create linear history (for private branches)
- When resolving divergent histories:
git pull --no-rebase origin <branch-name> # For public/PR branches # OR git pull --rebase origin <branch-name> # For private branches git push origin <branch-name> # Push your integrated changes
Conclusion
Git’s power comes from its flexibility, but with that flexibility comes complexity. By understanding the relationship between local and remote repositories, the differences between fetch and pull, and when to use merge versus rebase, you’ll be better equipped to handle version control effectively.
Remember that Git is designed to protect your work, not lose it. When in doubt, you can always create a backup branch before attempting complex operations:
git branch backup-before-rebase
This ensures you can always return to a known good state if things don’t go as planned.