feat: decentralized registries and mirrors#2386
Draft
Aslemammad wants to merge 1 commit intonpmx-dev:mainfrom
Draft
feat: decentralized registries and mirrors#2386Aslemammad wants to merge 1 commit intonpmx-dev:mainfrom
Aslemammad wants to merge 1 commit intonpmx-dev:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
|
Hello! Thank you for opening your first PR to npmx, @Aslemammad! 🚀 Here’s what will happen next:
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a pull request for my two months of research on how we can decentralize npm by not breaking any mainstream behavior, so we make it as easy as possible to adopt new paradigms by users without having to deal with new conventions or at least a inconvenient number of them.
I'd avoid making this pull request description shorter, not because of time, but because i cannot hold my excitement any more than that. So forgive me for the obvious writing mistakes.
I went through a journey of ideas. Initially I visualized this as a localhost server called denpm that would store its local url in
~/.npmrcasregistry=http://denpm.localand then from there it'd distribute the pacakge requests, mapping each package to a random registry, something likenpm add vitewould go throughdenpm.local -> registry.npmjs.org,denpm.local -> registry.yarnpkg.com,denpm.local -> registry.npmmirror.co,denpm.local -> r.cnpmjs.orgor perhaps any other registry the user might want to provide.This is totally possible due to the nature of redirects in npm. So
npm add vite --registry=http://denpm.localwould result into thispackage-lock.jsonif the localhost server decides to just redirect the request toregistry.npmjs.org.Before the redirect, the source, or the proxy server, which is in this case
denpm.localcan do a whole lot of stuff. It can check the signatures behind the package to make sure it hasn't been tampered or the destination server, the registry, the mirror, which is in this caseregistry.npmjs.org, does not serve the user something different than what it previously claimed through the signature.Or to stimulate decentralization, the proxy can just randomly assign each package to a distinct registry. This would potentially remove the single point of failure nature of npm and our overreliance on it.
I'm bringing all of this just to mention that the possibility of opting out of the npm registry is there and unbelievably it's as simple as
npm config set registry http://denpm.local/.We all love npm and it's the giant everyone is standing on its shoulder, BUT if there's an opportunity to ease the work for the npm servers, distribute the load being lifted, increase security and a whole lot of other stuff, then why not explore those wins?
The recent growth over npmx showed that all of this is possible as long as we make something smoother than what's available.
The thing that striked me after researching denpm was that the golang ecosystem had nearly solved the package management issue through a mix of centralization and a lot of decentralization. So that led me to dig even more into how they did it and how they leveraged transparency logs to allow proxies act in an authentic manner. At that point I realized a new CLI is not only not enough, but it might be unncessary.
So the biggest inspiration for this effort is the golang ecosystem. Centralization at that point would be part of npmx itselsf, specifically the Checksum database it'd maintain. Decentralization would be basically everything else, like the npm registry and other registries and mirrors.
I keep separating registries and mirrors, though might there be a slight technical difference, but both should be advertised and users should know the difference between them and the fact that spinning up a new registry is way cheaper than spinning a full mirror.
The community might decide to maintain servers that are one-to-one replications of the npm registry itself or at least, or a portion of it. That's what I'd call a mirror, like
registry.npmmirror.coby cnpm.Registries are though more important, they might want to host exclusive packages. For instance,
registry.viteplus.devwould decide to only host packages like vite or vitest only, or even better, their supply chain.So registries for ownership and mirrors for distribution and obviously, mirroring. Imagine a world where each maintainer can host their own packages under their own domain if they prefer, which is totally possible, but hasn't been mainstream yet due to friction I'd argue.
That's where VSR or Verdaccio can join the effort as well to ease up the hosting side.
Back to the solution, in the next few sections I'll go in details around how the puzzle pieces are going to fit together.
Checksum Database
Something like
sum.npmx.dev.This is the point of centralization in the puzzle. It'd solve the problem of package unpublishes, mutability and version replacements in the new decentralized package management world. Two mirrors won't be able to ship different bytes for the same version of the same package, if one acts unfaithfully, it'd be easily caught by what's already recorded in the checksum database.
The initial and main consumer of this checksum database would be the npmx proxy, but after gaining momentum, it might be something that the package managers might want to rely on, independantly.
Merkle Trees and Transparency Logs
More details in Russ Cox's blog post. Briefly, this data structure would allow us to create a tamper-evident database so a released package would be cryptographically frozen and therefore cannot be tampered.
And similar to the golang checksum database, we'd expose APIs that'd allow any user or service to verify the merkle tree we're hosting.
The checksum database itself allows for the auditability of registries and proxies. This data structure would allow the auditability for the checksum database itself.
So it's not an unverifiable point of centralization but rather a totally verifiable and consistent one.
Proxy
registry.npmx.devorproxy.npmx.dev. This the same url that the user would have to pass tonpm config set registry.It'd handle the job of redirects to the right registries, making sure that they serve the right content, returning consistent manifests and all the security improvments we can make over npm.
In the current mvp, the proxy only allows project with the
integrityfield to be stored in the checksum database and returned to the user to increase the security.So packages with no
integrity(not signed by the registry) are not allowed to be stored. This can be changed but it also means less security, even though we sign the each field in the checksum database too./-/npm/v1/keysThis what
npm audit signaturesuse to audit the signatures of the packages and verify that we're consuming what the registry has actually signed.``:
registry.npmx.dev/-/npm/v1/keysnot only can host those keys by the npm registry, but all the keys from all other registries and mirrrors.I assume this file won't be hundreds of megabytes or even more, but if my assumption is wrong, we can cherry pick the keys we return to the user based on what registries they prefer in a potential dashboard using the
authorizationhttp header.The New world
By avoiding overdependence on the npm registry, new kind of registries and mirrors would emerge. One I keep dreaming of that'd mitigate most of attacks happening on npm, remind you that most attacks happen on npm LINK ARTICLE, is a
mirror.socket.devwhich would only host what's available on npm once it goes their in-house audits, which are pretty good. They have been able to catch most of the recent attacks on npm before anyone else, but still, since there's no way to affect user workflows directly, like through explicit errors and failures innpm addby avoiding to serve a particular package, a lot of potential is being missed.Another kind of registries I imagine are organization backed registries that host only what they ship or what they rely on, like the
registry.viteplus.devexample mentioned above. That'd be the same story with maintainer backed registries, likeregistry.roe.devhosting packages that Daniel maintains.FAQ
I am adding these as prompts for myself. If the PR is going to persuade anyone, it should answer these directly instead of assuming the reader will fill the gaps for me.
Open FAQ prompts
What exact npm failure modes am I trying to solve first?
Which important npm security problems am I explicitly not solving in this PR?
What is the concrete win for users if they adopt this?
Why a proxy instead of a new CLI?
Why a sumdb instead of relying only on lockfile
integrity?Why keep lockfiles pointing at the upstream tarball URL instead of the proxy URL?
What is the trust root in this system?
What exactly does
keyIdmean here?What does a successful verification prove?
What does it not prove?
What does “decentralization” mean in this proposal?
What is decentralized today and what is still centralized?
If the proxy currently fetches from the first configured registry, how should readers interpret that limitation?
Are mirrors and registries different in principle, or just in operational practice?
Are package names still the global npm names?
If two sources claim the same package and version but offer different tarballs, what happens?
Does this stop malicious maintainers?
Does this stop
preinstallandpostinstallmalware?Does this stop unpublishing or version replacement?
What does this system say about content authenticity versus content safety?
Who runs the sumdb?
Which npm client flows already work with this prototype?
Which npm flows are intentionally unsupported right now?
Why is this better than a plain mirror?
Is the long-term goal to decentralize hosting, trust, naming, or all three?
What is prototype-only in this PR?
What is already real and verified?
What would have to change next to support real multi-registry fetch selection?
What is the minimal next milestone that would prove this idea is viable?
“This is still centralized.” What is my answer?
“This doesn’t solve install-time code execution.” What is my answer?
“This just re-wraps npm trust.” What is my answer?
“Why not just use npm mirrors?” What is my answer?
“Why should anyone trust
sum.npmx.dev?” What is my answer?“If the proxy isn’t in the lockfile, what value is it adding?” What is my answer?