Git Fundamentals: What Your Tutorials Might Have Missed
As I’ve navigated the sometimes confusing world of Git, I’ve realized there’s a gap between knowing commands and understanding them. While plenty of resources teach you what to type, fewer explain the why behind each command. Let me share some insights that have made Git more than just a mysterious tool I use daily.
The Reality of git fetch
vs. git pull
Many developers (myself included, initially) use git pull
as their go-to command for updates:
# The command everyone knows
git pull
# Sometimes followed by: "Wait, what just happened to my code?"
What’s often overlooked is that git pull
is actually two operations in one:
# What git pull actually does behind the scenes:
git fetch # First, download changes
git merge # Then, integrate them immediately
This automatic merge can be convenient—but also surprisingly disruptive when you’re in the middle of something. A more intentional approach:
# A more deliberate workflow:
git fetch # Get the changes without applying them
git diff origin/main # See what you're about to integrate
git merge origin/main # Merge when you're actually ready
This small change in workflow has saved me from countless interruptions and merge headaches.
Why git push origin HEAD
Is Worth the Extra Typing
Another command worth reconsidering:
git push
# Sometimes works perfectly, sometimes fails with cryptic messages
I’ve found that git push origin HEAD
offers more reliability:
git push origin HEAD # Explicitly push current branch to matching remote branch
This approach is better because:
- It works consistently regardless of your Git configuration
- It’s clear about what’s happening (current branch → same-named remote branch)
- It avoids confusion when branch tracking isn’t set up
The extra few keystrokes have saved me more debugging time than I care to admit.
Protecting Your Secrets: A Non-Negotiable Practice
One mistake that can have serious consequences: committing API keys or credentials to a public repository.
# The dangerous habit:
git add .
git commit -m "Finished the API integration"
# Your API keys might now be public
A better approach that’s become second nature for me:
# Step 1: Always add .env to .gitignore
echo ".env" >> .gitignore
# Step 2: Create a template for documentation
cp .env .env.example
# (Then remove actual keys from .env.example)
# Step 3: Verify it's working as expected
git status
# .env should NOT appear in the files to be committed
If you’ve already committed sensitive data:
# Remove from Git tracking but keep the local file
git rm --cached .env
# Check if it's in your history
git log --all -- .env
Properly Cleaning Sensitive Data from Git History
When sensitive data like API keys or credentials has been committed to a Git repository, removing it completely requires careful steps. Here’s a detailed guide:
For Local Repositories (Not Yet Pushed)
If you’ve only committed sensitive data locally and haven’t pushed to a remote repository:
- Use interactive rebase to edit history:
# Replace N with the number of commits to examine git rebase -i HEAD~N
- In the editor that opens, change “pick” to “edit” for commits containing sensitive data:
edit 2c3a8b1 Add API integration with keys pick 8d7f235 Update UI components
- When the rebase stops at the commit to edit:
# Remove the file from Git but keep it locally git rm --cached .env # Update the commit git commit --amend # Continue the rebase process git rebase --continue
- Create a proper .gitignore file if you haven’t already:
echo ".env" >> .gitignore git add .gitignore git commit -m "Add .env to .gitignore"
For Repositories Already Pushed to Remote
If sensitive data has been pushed to a remote repository, the process is more complex:
Option 1: Using BFG Repo-Cleaner (Recommended)
BFG is faster and simpler than git filter-branch:
-
Download BFG Repo-Cleaner from https://rtyley.github.io/bfg-repo-cleaner/
- Create a fresh clone of your repository:
# Create a mirror clone git clone --mirror git://example.com/my-repo.git
- Run BFG to remove the sensitive file:
# Replace .env with your sensitive file java -jar bfg.jar --delete-files .env my-repo.git
- Clean up and update references:
cd my-repo.git git reflog expire --expire=now --all git gc --prune=now --aggressive
- Push the cleaned repository:
git push --force
Option 2: Using git filter-branch
If you can’t use BFG:
- Remove the file from all commits:
git filter-branch --force --index-filter \ "git rm --cached --ignore-unmatch .env" \ --prune-empty --tag-name-filter cat -- --all
- Force Git to remove references to old commits:
git reflog expire --expire=now --all git gc --prune=now --aggressive
- Force push to remote:
git push --force
Critical Follow-up Steps
After cleaning the repository:
- Consider all exposed credentials compromised:
- Immediately rotate all API keys, passwords, and tokens that were exposed
- Generate new credentials for all services
- Notify team members:
- All collaborators must re-clone the repository or carefully reset their local copies
- Instructions for collaborators:
# Fetch the latest state git fetch --all # Reset to match the remote git reset --hard origin/main # OR clone fresh git clone https://github.com/user/repo.git new-clone
- Check hosting platform policies:
- GitLab, GitHub, and other platforms may cache data
- Some platforms like GitLab require waiting periods (30+ minutes) after history cleaning
- Contact support if necessary for complete purging
- Implement prevention measures:
- Use .gitignore properly
- Consider pre-commit hooks to catch sensitive patterns
- Use environment variables or secure credential management tools
Remember: The safest approach is to treat any credentials that have been committed to a repository as compromised, even after cleaning the history. This isn’t just good practice—it’s essential security hygiene.
Notes about GitLab: Platforms like GitLab require waiting periods after history cleaning for several important technical reasons:
-
Distributed Caching Systems: Large Git hosting platforms use distributed caching systems across multiple servers. When you rewrite history and force push, the changes need time to propagate through all these caches. Some servers might still have the old data cached temporarily.
-
Background Processing: When you push changes, especially force pushes that rewrite history, platforms like GitLab don’t process everything immediately. They queue certain operations like rebuilding repository data, updating search indexes, and regenerating project statistics to run asynchronously in the background.
-
Garbage Collection Cycles: Git repositories have their own garbage collection cycles on the server side. When you remove commits through history rewriting, the objects aren’t immediately deleted but marked for collection during the next garbage collection run.
-
Reference Updates: All references to the old commits (including in issues, merge requests, and CI/CD pipelines) need to be updated across the entire platform, which is done in batches rather than instantly.
-
Backup Systems: Many platforms maintain point-in-time backups or snapshots. The waiting period ensures that your changes have been fully processed before new backups are created, preventing the sensitive data from being included in new backups.
This waiting period helps ensure that when the process completes, the sensitive data is truly gone from all active systems. However, it’s still important to remember that older backups might contain the sensitive data, which is why credential rotation is always recommended regardless of history cleaning.
Notes about GitHub: Similar principles apply to GitHub, although there are some differences in implementation:
GitHub also has distributed systems that don’t update instantaneously after history rewrites. While GitHub doesn’t explicitly state a specific waiting period like GitLab’s 30+ minutes, they do acknowledge that:
-
Cached Data Persistence: GitHub maintains various caches that might not immediately reflect history changes. This includes caches for the web interface, API responses, and search indexes.
-
Background Processing: GitHub processes force pushes and history rewrites in the background, particularly for larger repositories.
-
GitHub Pages: If you use GitHub Pages, there’s a documented delay (often 10+ minutes) before changes propagate after force pushes.
-
Repository Rebuilding: For significant history rewrites, GitHub may need to rebuild internal repository data structures.
GitHub provides specific documentation for removing sensitive data through their help pages, using a tool called BFG Repo-Cleaner or git filter-branch. Their documentation notes that after using these tools, you should contact GitHub Support to request removal of cached views and references to the sensitive data from their servers.
The key difference is that GitHub doesn’t specify an exact waiting period, but the underlying technical reasons for delays are similar to GitLab’s. GitHub Support may need to take additional actions on their end to fully purge sensitive data from all systems.
Regardless of platform, the best practice remains the same: consider any pushed credentials compromised and rotate them immediately, even after cleaning the repository history.
Branching: Your Safety Net for Experimentation
Working directly on the main branch feels efficient until something breaks. I’ve learned that branching isn’t just for teams:
# Before making significant changes:
git checkout -b feature-new-ui
# Work, commit, test without pressure...
# When confident, push your branch
git push -u origin feature-new-ui
This approach has given me the freedom to experiment without the anxiety of breaking production code.
Commit Messages: Future You Will Thank Present You
I used to write commit messages like:
"Fixed bug"
"Updated styling"
"Changes from review"
These messages tell almost nothing about what actually changed. Now I aim for:
# Check what you're about to commit
git status
# Add changes deliberately
git add file1.js file2.js
# Write a message that actually helps later
git commit -m "Fix user session timeout after password change"
When I look back at projects months later, these detailed messages have proven invaluable.
Final Thoughts
Git is powerful but not always intuitive. It’s like having a sophisticated tool that can either make your work significantly easier or occasionally more complicated—depending on how you use it.
I’ve found that taking the time to understand Git commands rather than just memorizing them has made me more confident and prevented many frustrating situations. These practices aren’t just about following best practices—they’re about making development more predictable and less stressful.
What Git practices have you found most helpful in your development work?
This post reflects practical Git experiences and lessons learned along the way.