04 August 2014

Problem

The majority of code that my team members and I work on is managed on company internal GitHub Enterprise repositories. As a general guideline we try to include external 3rd party code via Git Submodules instead of coping it into our own repository. This is straightforward since a great number of open source projects is already managed with Git and most of them are also hosted on GitHub.com.

This approach has a number of obvious advantages. First of all, you keep your own repository small and clean. Even if you reference gigantic binaries or gazillions of files in these Submodules, your repository will always stay lean and fast. Furthermore, you keep the history of the 3rd party code. This is especially useful if you want to understand an odd line of code via Git blame or if you want to fix a bug. Since the bug fix is just a commit on the projects source tree, it is very easy to contribute this change back to the community, too. Bottom line, 3rd party projects as Submodules are a great thing. But, as always in life, there is a catch.

Although GitHub is an awesome service with great availability they tend to be a target for DoS attacks. If you updated a 3rd party reference and your CI server creates a clean build while a DoS attack is executed then the CI machine might not be able to download the proper 3rd party sources. The result is a build brake and according to Murphy’s Law this will happen (it happened to us!).

Solution

The obvious solution is to host these external Git repositories on an internal Git server and link the 3rd party references to the internal Git repositories. However, this approach is cumbersome because you need to manually create the repository and import all its history. Furthermore you need to repeat this step whenever you want to update a 3rd party component. Last but not least, if you fix a bug, then the bug fix commit will likely remain in your internal repository and never see the shining light of an open source contribution.

In order to address this problem in the GitHub world I wrote HubSync. It syncs GitHub.com repositories to corresponding, read only, GitHub Enterprise repositories. To use HubSync you create an organization on GitHub.com (e.g. we created AIM360) and a personal API token. You do the same on your GitHub Enterprise instance. With the organization names and the tokens you start HubSync on a 24/7 running server. That’s it. All your GitHub.com repositories will synced to your internal GitHub enterprise instance. If you want to use a new open source library, you can just fork it to your GitHub.com organization and it will automagically appear in your internal network. The same applies to updates.

You can download HubSync here: https://github.com/larsxschneider/hubsync

Suggestion, comments, critique? Please drop me a note. Feedback is always appreciated!


blog comments powered by Disqus