feat: unstable SHA256 support#1206
Conversation
dd35da4 to
a332c64
Compare
a332c64 to
fd975c7
Compare
c07b123 to
5c04d59
Compare
|
I believe to have found something that you missed in this PR. Note that Lines 271 to 275 in e6919b7 I believe that this means that the The function Line 231 in e6919b7 will hit the Line 29 in e6919b7 This is because I discovered this experimenting in code that is downstream of Then I started to think about how much #[inline]
pub(crate) fn zeroed_raw_oid(format: ObjectFormat) -> raw::git_oid {
raw::git_oid {
kind: format.raw(),
id: [0; raw::GIT_OID_MAX_SIZE],
}
}In fact one might even ask why not to have impl Oid {
pub const ZERO_SHA1: Self = Self {
raw: raw::git_oid {
kind: raw::GIT_OID_SHA1,
id: [0; raw::GIT_OID_MAX_SIZE],
}
};
pub const ZERO_SHA256: Self = Self {
raw: raw::git_oid {
kind: raw::GIT_OID_SHA256,
id: [0; raw::GIT_OID_MAX_SIZE],
}
};
// NOTE: Maybe call this `zero_with_object_format` to not break `zero`.
pub fn zero(format: ObjectFormat) -> Self {
// TODO: Match on `format` and return `Oid::ZERO_SHA1` or `Oid::ZERO_SHA256`.
}
}This way, users that know at compile time which kind of zero they want can just use the constant, while those that decide at run time would call If you do not want For |
|
Another comment regarding |
| #[cfg(not(feature = "unstable-sha256"))] | ||
| { | ||
| if bytes.len() != raw::GIT_OID_SHA1_SIZE { | ||
| return Err(Error::from_str("raw byte array must be 20 bytes")); |
There was a problem hiding this comment.
maybe a clearer error for trying to accidentally use 32 bytes when the feature isn't enabled?
Also, why have different ways of creating the SHA1 oid depending on if the SHA256 feature is available?
let oid_type = match bytes.len() {
raw::GIT_OID_SHA1_SIZE => raw::GIT_OID_SHA1,
#[cfg(feature = "unstable-sha256")]
raw::GIT_OID_SHA256_SIZE => raw::GIT_OID_SHA256,
#[cfg(not(feature = "unstable-sha256"))]
raw::GIT_OID_SHA256_SIZE => {
return Err(Error::from_str(
"raw byte array of 32 bytes requires unstable-sha256 feature to be enabled",
))
}
_ => {
return Err(Error::from_str(
"raw byte array must be 20 bytes (SHA1) or 32 bytes (SHA256, if enabled)",
))
}
};
unsafe {
try_call!(raw::git_oid_from_raw(&mut raw, bytes.as_ptr(), oid_type));
}
Ok(Oid { raw })There was a problem hiding this comment.
Good suggestion, though I am not sure if here we should talk about unstable-sha256 feature, downstream consumer of git2-rs may just propagate error message to end-users, and that might be weird to see that.
There was a problem hiding this comment.
If the error message is propagated to end users, presumably the end users had some kind of say in the input value
| } | ||
| #[cfg(not(feature = "unstable-sha256"))] | ||
| { | ||
| let _ = oid_type; |
There was a problem hiding this comment.
hmm, even if the git2-rs bindings don't have SHA256 enabled, if the oid_type corresponds to SHA256 we should say that and probably return an error, rather than just claiming that the repository is formatted with SHA1?
There was a problem hiding this comment.
IIUC, without unstable-sha256, the underlying libgit2-sys can't return a SHA256 oid or even open a SHA256 repo.
There was a problem hiding this comment.
What if the underlying library was built with sha256 enabled, but the git2 crate wasn't?
There was a problem hiding this comment.
AFAIK none of the mainstream packagers distribute libgit2.so with SHA256 support. The reason being it was broken before libgit2 1.9.3 (released a week ago). Also when sha256 is enabled, libgit2 built a shared libray with -experimental suffix. Unless packager intentionally patch that build script to remove the suffix, it is pretty distinguishable. However, it is unlikely someone does that, since the two are ABI incompatible and doing that breaks other programs dynamically linking to it.
I also raised this concern in rust-lang/cargo#16939 (comment) but from the other angle – Cargo may need to build from source always as nobody ships libgit2-experimental.so.
This comment has been minimized.
This comment has been minimized.
This is still ambiguous for those not 41-64 char hex strings. I'd still consider deprecating it and eventually remove. |
9b1f12d to
5778238
Compare
There was a problem hiding this comment.
New revision is up.
The major difference is that we stop follow libgit2@main but 1.9.3. The other is API design, which is stated in the PR description:
Introduce
ObjectFormataccessors and*_extAPI variants unconditionally,
The actual SHA256 wiring are gated behind theunstable-sha256Cargo feature
in order to keep public API additive.The
*_extsuffix follows libgit2's convention for extended function variants.
At next major version, the original methods may absorb the format parameter.
The actual change will leave for future.
|
cc @ehuss |
There was a problem hiding this comment.
I will double down on what I explained in #1206 (comment): If the feature unstable-sha256 is enabled, the implementation of the unsafe function crate::util::zeroed_raw_oid is incorrect. It allows to create OIDs in memory, which are invalid. Their kind of value zero does not correspond to a known git2::ObjectFormat and also neither to libgit2's GIT_OID_SHA1 nor GIT_OID_SHA256. That is because the function uses std::mem::zeroed but the object formats are non-zero bytes. This is a huge footgun. Please consider these options:
- Remove
crate::util::zeroed_raw_oid. - Remove the
unsafekeyword fromcrate::util::zeroed_raw_oidand reimplement it withoutunsafeblocks. This will force you to add an argumentformat: git2::ObjectFormat, otherwise, ifunstable-sha256is enabled, you will not be able to construct a legal zero OID. - Use
#[cfg(not(feature = "unstable-sha256"))] crate::util::zeroed_raw_oid.
| Sha1, | ||
| /// SHA256 object format (32-byte object IDs) | ||
| #[cfg(feature = "unstable-sha256")] | ||
| Sha256, |
There was a problem hiding this comment.
Consider using an explicit discriminant that matches with libgit2. Like
| Sha256, | |
| Sha256 = raw::GIT_OID_SHA256, |
or
| Sha256, | |
| Sha256 = 2, |
There was a problem hiding this comment.
Thanks for the suggestion. Some of the pre-existing translation enums in git2-rs follow the plain enum + Binding + explicit match pattern. libgit2_sys::git_oid_t is a numeric type alias, and ObjectFormat is the safe abstraction over it. Explicit discriminants would definitly simplify ObjectFormat::raw(), though from_raw still needs the match for safety against invalid C values. I'd keep ObjectFormat as-is for now.
|
|
||
| /// Object ID format (hash algorithm). | ||
| #[derive(Debug, Copy, Clone, PartialEq, Eq)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
I should have been more explicit when I suggested this in #1090 (comment), sorry.
Adding #[non_exhaustive] breaks downstream crates, since their matches will fail to compile. Thus I think this is a breaking change. This might be completely fine, as this crate is 0.x anyway, but I think it's worth noting. That being said...
Adding #[non_exhaustive] even when unstable-sha256 is enabled, might be "too much". This way you are preparing your users for whatever comes after SHA-256. Which might be your intention, and then that's fine. But you might as well consider that it might take another decade or two until that happens, and you could spare your dependents the _ => unimplemented!("...") arm. This would mean that adding the next object format is a breaking change. And that might be fine as well.
So one approach might be to have
| #[non_exhaustive] | |
| #[cfg_attr(not(feature = "unstable-sha256"), non_exhaustive)] |
now, and whenever SHA-256 becomes stable, to remove non_exhaustive.
This way, all dependents not enabling unstable-sha256 will be forced to think about further object formats (which might make sense, since SHA-256 is already out in the wild), and everyone that opts in to unstable-sha256 can enjoy that this enum is exhaustive. The migration (as SHA-256 stabilizes) also works out nicely: The _ => unimplemented!("...") arm nicely takes the spot of the Sha256 variant, but since now the non_exhaustive attribute would be removed no third arm is required once the dependent decides to implement support for SHA-256.
There was a problem hiding this comment.
Then this is probably not worthy. We will anyway have another major version bump when stabilizing sha256 feature.
There was a problem hiding this comment.
Think about it more, yeah this sounds good.
There was a problem hiding this comment.
Fixed and squashed into 17d0f51. Thanks again for the forward-looking suggestion!
| #[non_exhaustive] | ||
| pub enum ObjectFormat { | ||
| /// SHA1 object format (20-byte object IDs) | ||
| Sha1, |
There was a problem hiding this comment.
| Sha1, | |
| Sha1 = raw::GIT_OID_SHA1, |
or
| Sha1, | |
| Sha1 = 1, |
(See https://github.com/rust-lang/git2-rs/pull/1206/changes#r3207277482.)
| @@ -65,36 +106,78 @@ impl Oid { | |||
| pub fn from_bytes(bytes: &[u8]) -> Result<Oid, Error> { | |||
| crate::init(); | |||
| let mut raw = crate::util::zeroed_raw_oid(); | |||
There was a problem hiding this comment.
If unstable-sha256 is enabled, raw.kind will be zero and thus not a valid object format!
There was a problem hiding this comment.
@lorenzleutgeb commented in #1206 (review)
(inline comment is better for discussion)
I will double down on what I explained in #1206 (comment): If the feature
unstable-sha256is enabled, the implementation of the unsafe functioncrate::util::zeroed_raw_oidis incorrect. It allows to create OIDs in memory, which are invalid. Theirkindof value zero does not correspond to a knowngit2::ObjectFormatand also neither to libgit2'sGIT_OID_SHA1norGIT_OID_SHA256. That is because the function usesstd::mem::zeroedbut the object formats are non-zero bytes. This is a huge footgun. Please consider these options:
- Remove
crate::util::zeroed_raw_oid.- Remove the
unsafekeyword fromcrate::util::zeroed_raw_oidand reimplement it withoutunsafeblocks. This will force you to add an argumentformat: git2::ObjectFormat, otherwise, ifunstable-sha256is enabled, you will not be able to construct a legal zero OID.- Use
#[cfg(not(feature = "unstable-sha256"))] crate::util::zeroed_raw_oid.
There was a problem hiding this comment.
Thanks for the detailed follow-up. Will do a cleanup then, though let me clarify the current state:
Oid::zero()is now deprecated and returnsOid::ZERO_SHA1always. The reasoning being high-level binding should not have this kind of zeroed struct as output param. If there is any probably an API design flaw. If people want oid placeholder, they probalby already know the object format so can choice either const. Tthanks for the const suggestion!zeroed_raw_oid()ispub(crate). Every internal call site uses it as an output parameter that libgit2 overwrites before any read. Agreed it may become an internal footgun if misused.- FWIW, libgit2 itself uses uninitialized
git_oidas output buffers in lots of places. Likemaybe_sha_or_abbrev:git_oid oid;is declared uninitialized, then filled bygit_oid__fromstrn(&oid, ...)before any read.
Though thanks again, removing one more unsafe is a good thing! Fixed in 8c17e75
The only one concern is that if libgit2 somehow start checking kind for output parameter cases, it might become buggy. Though that will be considered an upstream libgit2 bug.
This is a preparation of SHA256 support * Added `Oid::object_format()` * Added `Oid::raw_bytes()` * Changed `Oid::from_bytes()` to use `GIT_OID_SHA1_SIZE` * Added `Repository::object_format()` * Added `Remote::object_format()` * Added `RepositoryInitOptions::object_format()`
Add `*_ext` method variants that accept an `ObjectFormat` param. The short-hand methods now delegate to `_ext` variants with Sha1. * Added `Oid::from_str_ext` * Added `Oid::hash_object_ext` * Added `Oid::hash_file_ext` * Added `Diff::from_buffer_ext` * Added `Index::new_ext` * Added `Index::open_ext` * Added `Indexer::new_ext` * Added `Odb::new_ext`
This makes these call sites automatically handle the correct object format once SHA256 support is wired up.
Declare the `unstable-sha256` Cargo feature,
reflecting upstream libgit2's `GIT_EXPERIMENTAL_SHA256`.
This is an ABI-breaking experimental feature.
Implement the SHA256 support gated behind this feature:
Things gated:
* `ObjectFormat::Sha256` and non_exhaustive when `unstable-sha256` is off
* all FFI call sites for the overloaded libgit2 function signatures
* git_oid_fromstrn
* git_oid_fromraw,
* git_odb_hash
* git_odb_hashfile
* git_diff_from_buffer,
* git_index_new
* git_index_open
* git_indexer_new
* git_odb_new
* `Repository::object_format` is format-aware
* `Remote::object_format` is format-aware
* `Oid::{as_bytes,object_format,from_bytes}` are format-aware
`Oid::zero().object_format()` panics with `unstable-sha256` because the raw `kind` field was left as zero. The zeroed struct is generally use for output param. Not really meaning to switch a `zero_ext` variant. Replace it with const values for both kinds.
Construct `git_oid` with a struct literal instead of `mem::zeroed`. Under `unstable-sha256`, default `kind` to `GIT_OID_SHA1` so the value is a valid `git_oid_t` even before libgit2 overwrites it as an output buffer.
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
| /// Object ID format (hash algorithm). | ||
| #[derive(Debug, Copy, Clone, PartialEq, Eq)] | ||
| #[cfg_attr(not(feature = "unstable-sha256"), non_exhaustive)] | ||
| pub enum ObjectFormat { |
There was a problem hiding this comment.
Downstream I just attempted to print an ObjectFormat and realized that it does not impl Display. Perhaps that might be a useful addition?
| /// permissions to create the repository with | ||
| flag_shared: Option<String>, | ||
| #[structopt(name = "object-format", long, value_parser = parse_object_format)] | ||
| /// object format to use (sha1 or sha256, requires unstable-sha256 feature) |
There was a problem hiding this comment.
| /// object format to use (sha1 or sha256, requires unstable-sha256 feature) | |
| /// object format to use (sha1 or sha256, requires unstable-sha256 feature to use the sha256 format) |
| /// This parses the diff assuming SHA1 object IDs. Use | ||
| /// [`Diff::from_buffer_ext`] to specify a different format. | ||
| pub fn from_buffer(buffer: &[u8]) -> Result<Diff<'static>, Error> { | ||
| Self::from_buffer_ext(buffer, crate::ObjectFormat::Sha1) |
There was a problem hiding this comment.
I can't make suggestions on the unmodified line, but I suggest adding an import for crate::ObjectFormat rather than needing to fully qualify it
| let data = buffer.as_ptr() as *const c_char; | ||
| let len = buffer.len(); | ||
| unsafe { | ||
| // NOTE: Doesn't depend on repo, so lifetime can be 'static |
There was a problem hiding this comment.
this explanation of why things can be 'static is lost
| /// This always creates a SHA1 index. | ||
| /// Use [`Index::new_ext`] to create an index with a specific object format. | ||
| pub fn new() -> Result<Index, Error> { | ||
| Self::new_ext(crate::ObjectFormat::Sha1) |
There was a problem hiding this comment.
same suggestion to add a use statement for the ObjectFormat, also applies in some other files, not going to keep repeating the suggestion
| /// Parses a hex-formatted object id | ||
| /// with a specific object format. | ||
| /// | ||
| /// See [`Oid::from_str`] for more details. |
There was a problem hiding this comment.
except ::from_str documents that it is an error if the string is longer than 40 hex characters, and presumably for this new method the maximum length is format-dependent, so this method should have some extra documentation
| #[cfg(not(feature = "unstable-sha256"))] | ||
| { | ||
| if bytes.len() != raw::GIT_OID_SHA1_SIZE { | ||
| return Err(Error::from_str("raw byte array must be 20 bytes")); |
There was a problem hiding this comment.
If the error message is propagated to end users, presumably the end users had some kind of say in the input value
| Ok(Oid { raw: out }) | ||
| } | ||
|
|
||
| /// View this OID as a byte-slice 20 bytes in length. |
There was a problem hiding this comment.
can we document that this is going to truncate sha256 ids?
There was a problem hiding this comment.
I'm not sure I understand, why would this truncate?
I would agree that the doc comment should probably be updated, though, to mention it can be 32 bytes for sha256.
|
|
||
| /// View the full underlying byte buffer of this OID. | ||
| /// | ||
| /// Currently equivalent to [`Oid::as_bytes`]. |
There was a problem hiding this comment.
This is a bin confusing - the "Currently equivalent" assumes that the feature is not enabled, but then the next line discusses that it could be enabled. Maybe
| /// Currently equivalent to [`Oid::as_bytes`]. | |
| /// For SHA1 hashes, this is equivalent to [`Oid::as_bytes`]. |
| use tempfile::TempDir; | ||
|
|
||
| #[test] | ||
| #[cfg(not(feature = "unstable-sha256"))] |
There was a problem hiding this comment.
so I noticed this on a few other files too - why are various tests disabled when using unstable-sha256? If they would be failing, then that is indicative of a BC break, and if they would be passing, then we should presumably keep the tests enabled?
|
|
||
| // SHA1 OID comparisons with explicit format | ||
| assert_eq!( | ||
| Oid::from_str_ext("decbf2b", ObjectFormat::Sha1)?, |
There was a problem hiding this comment.
similarly, for the *_ext methods when using Sha1, those parts can also be tested even when sha256 support is not enabled
|
Note: Git 3 will switch to reftable by default (source). Reftable support was only just merged in libgit2/libgit2#7117 which presumably will be in libgit2 2.0. SHA256 support is a major step towards being compatible with git3, but reftable will be another large piece that will be required. Fortunately I don't think that requires any API changes, though we should probably plan to do some testing ahead of the libgit2 2.0 release. |
| - run: cargo test --features https,ssh | ||
| - run: cargo run -p systest | ||
| - run: cargo run -p systest --features unstable-sha256 | ||
| - run: cargo test --locked |
There was a problem hiding this comment.
Moving these breaks the --locked check since the earlier cargo run will update the lockfile. Can you either move them back, or add --locked to the other commands?
| - run: cargo run -p systest --features unstable-sha256 | ||
| - run: cargo test --locked | ||
| - run: cargo test --features https,ssh | ||
| - run: cargo test --features unstable-sha256 |
There was a problem hiding this comment.
Can we also make sure this covers networking (and examples)? Maybe something like:
| - run: cargo test --features unstable-sha256 | |
| - run: cargo test --features https,ssh,unstable-sha256 |
| #[cfg(not(feature = "unstable-sha256"))] | ||
| id: Oid::from_bytes(&[0; 20]).unwrap(), | ||
| #[cfg(feature = "unstable-sha256")] | ||
| id: Oid::from_bytes(&[0; 32]).unwrap(), |
There was a problem hiding this comment.
Just a suggestion, take it or leave it:
| #[cfg(not(feature = "unstable-sha256"))] | |
| id: Oid::from_bytes(&[0; 20]).unwrap(), | |
| #[cfg(feature = "unstable-sha256")] | |
| id: Oid::from_bytes(&[0; 32]).unwrap(), | |
| #[cfg(not(feature = "unstable-sha256"))] | |
| id: Oid::ZERO_SHA1, | |
| #[cfg(feature = "unstable-sha256")] | |
| id: Oid::ZERO_SHA256, |
There was a problem hiding this comment.
I suggested these constants, so that will surprise no one, but I second that suggestion. More readable and less unwraping!
|
|
||
| #[test] | ||
| #[cfg(feature = "unstable-sha256")] | ||
| fn smooke_in_memory_index_sha256() { |
There was a problem hiding this comment.
| fn smooke_in_memory_index_sha256() { | |
| fn smoke_in_memory_index_sha256() { |
| Ok(Oid { raw: out }) | ||
| } | ||
|
|
||
| /// View this OID as a byte-slice 20 bytes in length. |
There was a problem hiding this comment.
I'm not sure I understand, why would this truncate?
I would agree that the doc comment should probably be updated, though, to mention it can be 32 bytes for sha256.
| /// With the `unstable-sha256` feature, these two methods will differ. | ||
| pub fn raw_bytes(&self) -> &[u8] { |
There was a problem hiding this comment.
Why was this method added? I don't see it used anywhere.
| #[cfg(feature = "unstable-sha256")] | ||
| pub fn repo_init_sha256() -> (TempDir, Repository) { |
There was a problem hiding this comment.
Instead of duplicating the code here, can we merge them somehow?
I might suggest having a repo_init_ext that takes an ObjectFormat as a parameter, and rewrite repo_init to just call repo_init_ext(ObjectFormat::Sha1).
Or any other approach, just try to remove the duplicate code.
This adds an
unstable-sha256Cargo feature, as a follow-up of #1201.Also adds smoke tests for affected operations/types.
Part of #1090
Design
Introduce
ObjectFormataccessors and*_extAPI variants unconditionally,The actual SHA256 wiring are gated behind the
unstable-sha256Cargo featurein order to keep public API additive.
The
*_extsuffix follows libgit2's convention for extended function variants.At next major version, the original methods may absorb the format parameter.
The actual change will leave for future.
Insta-stable
ObjectFormatenum (#[non_exhaustive]when unstable-sha256 is not enabled)Oid::ZERO_SHA1constOid::object_format()to access underlying OID typeOid::raw_bytes()to return the full underlying byte bufferRepository::object_format()to get hash algo on a repoRemote::object_format()to get hash algo on a remoteRepositoryInitOptions::object_format()setterOid::from_str_ext/Oid::hash_object_ext/Oid::hash_file_extDiff::from_buffer_extIndex::new_ext/Index::open_extIndexer::new_extOdb::new_extOid::zero()in favor ofOid::ZERO_SHA1Behind
unstable-sha256ObjectFormat::Sha256enum variantOid::ZERO_SHA256const_extmethods dispatch to SHA256-aware FFI pathsOid::as_bytesreturns logical bytes (SHA1 -> 20; SHA256 -> 32)Oid::from_bytesaccepts 20 or 32 bytes