Author Topic: Real-world git use (Read 3313 times)

Bassman59 · « **on:** August 12, 2021, 07:12:28 pm »

I was going to post in this big thread, but perhaps a separate thread is better.

As I said in that thread, we are an SVN house (and I have an SVN repo for personal projects). It works well for us, and it's worth noting that there are no situations where the engineers do not have direct access to the repo.

I primarily do FPGAs. I've built up a collection of synthesizable modules and another collection of simulation models. I'm big on re-use. I also do the occasional microcontroller design. The other engineers here do FPGAs and MCUs too. (Board designs are in Altium and not under version control.)

We have one company-wide SVN repository served by a local machine in the closet. Each design gets a part number in our ERP system, so each design is kept in the repo under that part number. The designs have the usual trunk/branches/tags tree. Modules and models are also in the repo, with "modules" and "models" as clever categories, each with its own trunk/branches/tags tree. My designs tend to have a standard tree format: source, testbench, fitter, docs. If I want to use my uart module in a design, I include it using the svn:externals property on the source directory. If I want to use a particular ADC bus-functional model in my test bench, I include it using the svn:externals property on the testbench directory. Design releases and prereleases get tagged after ensuring all externals are pegged.

Common modules for microcontroller designs are put in the repo the same way. For example, there's an lwip project in the repo, and that gets included with svn:external in the source tree as you would expect.

Put another way, our work consists of FPGA designs and microcontroller designs which are small and mostly self-contained except for the shared modules. We are not doing ongoing development of a large software product like, say, Kicad, with developers spread across the world. The products I work on are separate from what the others work on (things talk together at the system level, well above the board level) so it's rare that I will change another engineer's design. The usual thing is to say to the responsible engineer, "hey, XYZ does ABC and it needs to do D, too, so can you add that?" and a short time later, I get an email saying "done."

SVN remains under active development so I don't see it going away any time soon. But you never know. So, help me to understand the git workflow. With git:

Is the common use to have one company-wide (or department, or whatever) repository, or is it common to have one repository per project?

If the one-repo-per-project idiom is used, where are the repositories stored? How is the concept of shared modules and models implemented? Is every module (like a UART) its own repository?

I get that git allows the user to clone a repo and do work on the clone, committing changes to it and then only when the user thinks the work is done, pushing the final change back to the repo. I suppose that private branches in a Subversion repository do the same thing, except that with SVN with every commit you're hitting the main repo over the network rather than the clone on the local machine. In this way, it seems like other engineers can't see the "work in progress" with git because that work isn't back in the main repo.

Thanks. I expect this to be an interesting discussion.

brucehoult · « **Reply #1 on:** August 12, 2021, 09:53:05 pm »

You'd probably want a different repo for each component, and a repo for each project, with submodules for the components it uses.

Like this example:

https://github.com/riscv/riscv-gnu-toolchain

Here we have a repo for building a complete toolchain for RISC-V. The various components such as bunutils, gcc, gdb, glibc, newlib, qemu are large projects in their own right.

The main purpose of this git repo is to coordinate versions of the other components that are known to work together.

MrMatthias · « **Reply #2 on:** August 14, 2021, 09:35:20 pm »

There is a great, somewhat dated talk by Linus that addresses your points: https://youtu.be/4XpnKHJAok8

sokoloff · « **Reply #3 on:** August 14, 2021, 11:56:28 pm »

I’ve seen it done both ways, but the many-repo (repo per project/module) model seems much more common.

We also tend to push to upstream repos frequently. (You can push to a remote repo on a branch if you want others to see/participate in your work without it being on the default branch.)

AntiProtonBoy · « **Reply #4 on:** August 16, 2021, 12:42:54 am »

I also vote for the submodule approach, but keep in mind it can become burdensome if dependencies become large. Keep submodule hierarchies simple and flat.

ejeffrey · « **Reply #5 on:** August 16, 2021, 04:57:29 pm »

I think in the pure software world the monorepo approach is becoming more common where you have one repository or a small number for your entire company, although completely independent code bases might be kept separate for various reasons. MS published an article a few years back about the extensions they made to git to support their absolutely enormous source trees with distributed development.

The idea here is that you are always supposed to be able to build all projects against the head. If you have really good test coverage and a good CI system, any potentially breaking changes to a sub-module ideally gets detected immediately and you spend your effort fixing those rather than backporting bug fixes or dealing with projects using older versions of dependencies with known unfixed bugs. This also makes org-wide refactoring easier as you can make an atomic commit to the entire code base. The disadvantage is that it in turn becomes harder to maintain older releases. If you make a major change to an internal dependency then a typical approach is to treat the new major version as a new project in a separate folder within the VC tree, and then tools can depend on the old version until/unless they get ported to use the new version.

If your projects are largely independent, and especially if they are also externally distributed separately such as brucehoult's example on the development toolchain, then keeping separate repositories probably makes more sense. I do personally prefer to keep as few repositories as possible. As a general rule, I would try not to divide up your repositories finer than you intend to "ship", whatever that may mean.

Mattjd · « **Reply #6 on:** September 24, 2021, 10:45:33 pm »

Monolithic repos are certain not a good thing or the way to go about it.

At work we use have an online bitbucket server (flavor of git). We have projects which house repos and each repo is a code base for a different "sub component" of the project. In standard git you just have separate repos and the idea of project doesn't really exist.

At one point there was a single repo for multiple code bases and it was fucking awful. You clone or do an "pull origin master/develop" and you're downloading 10s of megabytes of data.

mac.6 · « **Reply #7 on:** October 04, 2021, 07:42:59 pm »

10s MB?

it's quite small, I have worked with multi Gigagbyte repository tree (with an awful lot of submodules)...

Power-Electronics · « **Reply #8 on:** October 04, 2021, 09:31:40 pm »

You don't want a single git repository for your whole company. Git doesn't do partial check-out or externals, so you're always checking out everything. Branches and tags are not folders, they apply to the whole repo, so you'd be branching and tagging the state of your company but not the individual projects.

With one repo per project, you need a separate tool to manage your repos. Some people think they can use sub-modules like externals, but they end up pulling in whole repositories that they might not need. Sub-modules are perilous because git doesn't try to coordinate broken references across repositories, which happens for example if you "squash". Also the filesystem details leak into the sub-module references, so it's possible to name something in a way that works on Linux but not Windows or something.

With git, it's entirely up to the engineer to remember to push their code. Committing only makes local changes. Pushing shares those changes.

brucehoult · « **Reply #9 on:** October 04, 2021, 10:03:15 pm »

Quote from: Power-Electronics on October 04, 2021, 09:31:40 pm

You don't want a single git repository for your whole company. Git doesn't do partial check-out or externals, so you're always checking out everything. Branches and tags are not folders, they apply to the whole repo, so you'd be branching and tagging the state of your company but not the individual projects.

Modern git supports both partial cloning (so your local repo contains only part of the remote repo) and sparse checkout (so your working directory contains only part of the files in the repo).

Yes -- folders are *not* branches or tags, even if you use a naming convention to pretend they are. With git you can if you wish use a different branch (starting from the empty root commit) in the same repo for each project.

SVN users often complain that git commit hashes aren't a sequential number. If you have your entire company in one SVN repo then the commit numbers for a given project are not sequential but have huge gaps between them.

Quote

With git, it's entirely up to the engineer to remember to push their code. Committing only makes local changes. Pushing shares those changes.

With SVN there is no way for an engineer to save work in progress as a series of small logical changes, or to experiment with several different approaches before deciding on the best one, without permanently having all of them in the repository for everyone forever. There is no way for several engineers to cooperatively work on something without sharing each change with everyone -- they have to resort to a shared working directory, or emailing patch files around.

What is extremely common in the places I've worked in the last dozen years that still use SVN or perforce or others is engineers in a team using git locally and between themselves before eventually pushing the finished, approved, change set to SVN or perforce or whatever the company-approved system is.

PlainName · « **Reply #10 on:** October 04, 2021, 11:03:40 pm »

Quote

the commit numbers for a given project are not sequential but have huge gaps between them.

Doesn't matter. What does matter is that '125' comes after '93'.

Time and date could be used (and is for GUID), but time and date alone aren't unique.

PlainName · « **Reply #11 on:** October 04, 2021, 11:07:06 pm »

Quote

With SVN there is no way for an engineer to save work in progress as a series of small logical changes, or to experiment with several different approaches before deciding on the best one, without permanently having all of them in the repository for everyone forever.

You do come out with some rubbish. Take a look at Shelving.

But, as pointed out before, with SVN it is trivial to have several checkouts on the go all at the same time. After the fact, you can check them in as logical progressions, or branches, or merges - you don't have to decide up front what you're going to want to do later, but do it as it occurs to you.

brucehoult · « **Reply #12 on:** October 04, 2021, 11:28:30 pm »

Quote from: dunkemhigh on October 04, 2021, 11:03:40 pm

Quote
the commit numbers for a given project are not sequential but have huge gaps between them.

Doesn't matter. What does matter is that '125' comes after '93'.

If you want to know which git commit comes earlier then 'git merge-base f62660a 6c18ef5' will return which ever of them is the ancestor of the other. If it returns something else then they aren't on the same branch and the returned thing is the common ancestor.

PlainName · « **Reply #13 on:** October 05, 2021, 12:19:21 am »

Since when was finding something to type in a command and then finding a repo to type that command against and then trying to type a 14-digit number accurately simpler than just looking at two short numbers?

Power-Electronics · « **Reply #14 on:** October 05, 2021, 12:52:23 am »

Quote from: brucehoult on October 04, 2021, 10:03:15 pm

Modern git supports both partial cloning (so your local repo contains only part of the remote repo) and sparse checkout (so your working directory contains only part of the files in the repo).

Neat, I hadn't heard of it. The couple articles I skimmed about it just now cautioned against using it with submodules, at least at the time of writing. But I'm glad they're adding that feature.

brucehoult · « **Reply #15 on:** October 05, 2021, 12:55:36 am »

Since when is this something anyone seriously cares about? I don't recall ever caring about this. And if you do care then, as demonstrated, you can find the information you want.

And don't tell me it's because I only know git or am some kind of git bigot. I've used probably a dozen different source code control systems over the last 30 years. SVN was arguably the best for about five years in the early 00's -- and I converted a number of open source projects and several employers to it during that time -- but it's really clear to me that it's not been the best since about 2006. I resisted git at first because it was weird compared to what I was used to (CVS and SVN for quite some years), but once I learned how it works it is CLEARLY better.

SiliconWizard · « **Reply #16 on:** October 05, 2021, 01:12:37 am »

A distributed VCS both scales up much better and is also better for each individual developer. Centralized systems may still have their uses in specific cases, but that's becoming a niche.

I personally use Mercurial when I don't have to use git, because I like it better, but that's personal preference. Main functionalities are pretty similar.

I don't know well how git submodules work exactly. I do know the equivalent in Mercurial with subrepos, and with them, when set up properly, a pull from a main repo certainly doesn't trigger a pull in all subrepos! Actually, that's even the main point of subrepos. They essentially act as links to separate repos. As subrepos, you still have a local copy of sources in the main repo's hierarchy, but pulls can still be handled separately. The point is, for instance, that you can use some libraries with a given version while continuing development of your project. In many cases, automatically pulling the latest revision for all libraries/sub-projects you use is NOT something you'd want to do. So you can still manage your main repo independently of them. Incidentally, it also limits the amount of transfered data at each pull, but that's not even the main point.

Oh, and in case you *do* want some sub-project to get automatically pulled when you pull from your main repo, you can specifically set it up for this as well.

But again, automatically 'updating' all dependencies of a given project is quite often not a good idea at all.
And whereas I still don't know git as well, I'm pretty sure you can do what I said above with it too.

ledtester · « **Reply #17 on:** October 05, 2021, 01:13:33 am »

Quote from: Bassman59 on August 12, 2021, 07:12:28 pm

...
Is the common use to have one company-wide (or department, or whatever) repository, or is it common to have one repository per project?
...

One data point:

https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext

Of course, what works for Google may not work for your company.

PlainName · « **Reply #18 on:** October 05, 2021, 01:31:55 am »

Quote

One data point:

https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext

Interesting. I suspect that wouldn't work for most, though, since the developers essentially have a copy of everything. Except they don't, because only changed files are stored locally, but if it weren't for Googles Piper setup then they would each have a copy of the lot (as they would with a git/svn/cvs setup).

They also develop on the trunk, with branches for release, so arse about face for many, if not most, git practisers. And I think that implies that there aren't submodules in the sense you can use an older version of some part.

Seems to work for them, but I get the feeling that if one were asking how to setup afresh nowadays, a right-on developer might point to Google and say "not like that".

thinkfat · « **Reply #19 on:** October 05, 2021, 01:30:08 pm »

Quote from: dunkemhigh on October 04, 2021, 11:07:06 pm

Quote
With SVN there is no way for an engineer to save work in progress as a series of small logical changes, or to experiment with several different approaches before deciding on the best one, without permanently having all of them in the repository for everyone forever.

You do come out with some rubbish. Take a look at Shelving.

But, as pointed out before, with SVN it is trivial to have several checkouts on the go all at the same time. After the fact, you can check them in as logical progressions, or branches, or merges - you don't have to decide up front what you're going to want to do later, but do it as it occurs to you.

Shelving is not equal to creating local branches. It's more like "git stash". Multiple checkouts ("workspaces") are a pain to work with. Yes, you can "make do with what you have" but it's better to have a tool supporting your workflow.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Real-world git use (Read 3313 times)

Share me