Skip to content

Make scancode.api.get_licenses more widely useful by reducing dependencies #5115

@gotmax23

Description

@gotmax23

Short Description

In https://fedora.gitlab.io/sigs/go/go-vendor-tools/, we use the scancode.api.get_licenses() Python API to detect license texts contained within license files and then produce an SPDX expression (relevant code). The scancode license detector is very powerful due to its large corpus of license texts, but it has a huge dependency footprint which has prevented us from making it the default license scanner in go-vendor-tools (currently, askalono is the default). Many of these dependencies are to support package scanning features that we don't use. I wonder what it would look like to extract the license scanning–related APIs into a smaller subpackage (or maybe to make more dependencies optional Python extras) without disrupting scancode-toolkit as a whole. Do you think that'd be feasible?

Possible Labels

  • new feature
  • improve-license-detection
  • installation and packaging
  • core and api

Select Category

  • Enhancement
  • Add License/Copyright
  • Scan Feature
  • Packaging
  • Documentation
  • Expand Support
  • Other

Describe the Update

A mechanism to use the scancode/licensedcode file license detection API (scancode.api.get_licenses()) without pulling in so many dependencies.

How This Feature will help you/your organization

See earlier comment about Go Vendor Tools' license scanning functionality.

Possible Solution/Implementation Details

See above. Reudction of dependencies in the main scancode-toolkit package (e.g., by using extras) or extracting the license scanning API are possible solutions.

Example/Links if Any

Can you help with this Feature

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions