Adapt to extended alerts source data#5
Conversation
| # Try to extract CWE from help_uri (e.g. cwe.mitre.org/data/definitions/78.html) | ||
| help_uri = str(alert.get("help_uri") or "") | ||
| m = re.search(r"cwe\.mitre\.org/data/definitions/(\d+)", help_uri, flags=re.IGNORECASE) | ||
| if m: | ||
| return f"CWE-{m.group(1)}" |
There was a problem hiding this comment.
I do not understand the purpose of this block of code. I would like to get it explained.
There was a problem hiding this comment.
The m.group(1) represents extracted value after "CVE-". This value is extracted in IGNORECASE regime.
So in f"CWE-{m.group(1)}" we define expected CASE and add only extracted part in regex: (CVE-\d{4}-\d+).
I have changed the code to stop use group with () in regexes as this code solution is not easy readable.
| # Semver-like token: digits.digits with optional pre-release suffix | ||
| _VERSION_RE = r"\d+(?:\.\d+)+(?:[-.]\w+)*" | ||
|
|
||
| # Common verb group shared across patterns. | ||
| _FIX_VERBS = r"(?:fixed|patched|resolved|addressed)" | ||
|
|
||
| # Ordered list of patterns – first match wins. | ||
| _FIX_VERSION_PATTERNS: list[re.Pattern[str]] = [ | ||
| # "fixed in version 10.2.1" / "patched in version 4.1.124.Final" | ||
| # "resolved in version 3.0.0" / "addressed in version 2.1.0" | ||
| re.compile( | ||
| rf"{_FIX_VERBS}\s+in\s+versions?\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "patched on 4.17.23" | ||
| re.compile( | ||
| rf"{_FIX_VERBS}\s+on\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "fixed in jsPDF@4.2.0" / "fixed in jspdf@4.1.0" | ||
| re.compile( | ||
| rf"{_FIX_VERBS}\s+in\s+\S+@({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "fixed in jsPDF 4.2.0" (package-space-version, no @) | ||
| re.compile( | ||
| rf"{_FIX_VERBS}\s+in\s+[A-Za-z][\w-]*\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "fixed in 1.10.1" (bare version right after "in") | ||
| re.compile( | ||
| rf"{_FIX_VERBS}\s+in\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "upgrade/upgrading to version 2.0" / "update/updating to version 2.0" | ||
| re.compile( | ||
| rf"(?:upgrad|updat)(?:e|ing)\s+to\s+version\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "upgrade to 2.0" / "update to 2.0" (no "version" keyword) | ||
| re.compile( | ||
| rf"(?:upgrad|updat)(?:e|ing)\s+to\s+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "first patched version: 4.3.0" / "first_patched_version: 4.3.0" | ||
| # (GitHub Dependabot / advisory format) | ||
| re.compile( | ||
| rf"first[_\s]patched[_\s]version[:\s]+({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "fix available in 3.2.1" / "fix is available in version 3.2.1" | ||
| re.compile( | ||
| rf"fix\s+(?:is\s+)?available\s+in\s+(?:version\s+)?({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "remediation: upgrade to 2.1.0" (Snyk-style) | ||
| re.compile( | ||
| rf"remediation[:\s]+(?:upgrad|updat)(?:e|ing)\s+to\s+(?:version\s+)?({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # ">= 1.5.0" / ">=1.5.0" (version-constraint notation) | ||
| re.compile( | ||
| rf">=\s*({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "versions 3.0.0 and later are not affected" | ||
| re.compile( | ||
| rf"versions?\s+({_VERSION_RE})\s+and\s+(?:later|above|newer)\s+(?:are\s+)?not\s+affected", | ||
| re.IGNORECASE, | ||
| ), | ||
| # "starting from version 2.4.0" / "starting from 2.4.0" | ||
| re.compile( | ||
| rf"starting\s+from\s+(?:version\s+)?({_VERSION_RE})", | ||
| re.IGNORECASE, | ||
| ), | ||
| ] |
There was a problem hiding this comment.
I would move these constants elsewhere than to code module with methods.
There was a problem hiding this comment.
This block of code were removed. There were misunderstanding and missing values in collected alerts from Security tab.
HuvarVer
left a comment
There was a problem hiding this comment.
I am not sure (from README), if the solution is manipulating data or if it just describe, what is the meaning of the fields.
There was a problem hiding this comment.
I know it is not placed in this PR, but here are two 1.
| ### Impact | ||
|
|
||
| **Derived** from `severity` × `rule_name`. | ||
| Exploitable code issues (`vulnerabilities`, `sast`) carry the severity rating directly as impact, while configuration issues (`iacMisconfigurations`, `pipelineMisconfigurations`) are down-ranked one level because they typically require additional conditions to cause damage. |
There was a problem hiding this comment.
What is the source of this decision? Does this mean that you are degrading the information that comes from the source data, or is this degradation already visible in the source data?
There was a problem hiding this comment.
There were misunderstanding and missing values in collected alerts from Security tab.
It is fixed now in code.
The readme were updated - I have removed that chapter as there is no extra logic now.
I did strict review and documented all places where data mutation is done. See README.md.
|
|
||
| Returns a CVE identifier when ``rule_id`` or ``help_uri`` contains one. | ||
| """ | ||
| for field in ("rule_id", "help_uri"): |
There was a problem hiding this comment.
look at this logic once again please.
There was a problem hiding this comment.
Code refactored.
| }.get(severity, "Unknown") | ||
|
|
||
|
|
||
| def assess_likelihood(alert: dict[str, Any]) -> str: |
There was a problem hiding this comment.
is already in the data IMO.
- Removed not needed code from bad understanding of dada source. - Fixed alert collector to mine all required data.
Data manipulation was an error in received data. I was missing some inputs and created evaluation. Fixed by fix alerts collector logic. |
…WE and CVE identifiers; update default values for impact, likelihood, and confidence.
…e alert processing by removing unused CWE extraction functions
…d enhance extra data extraction
Fixed wrong work with alert message vs inner message. Fixed run-time issue in model.
…change notifications and improve code clarity
…g in issue body construction
…ing new workflows, refactoring issue body handling, and removing deprecated dependencies
…d refactor related logic
…replace print statements with logging calls
…isibility in issue synchronization and GraphQL calls
| help_uri = alert_value(alert, "help_uri") | ||
| rule_id = str(alert.get("rule_id") or "") | ||
| extra = { | ||
| "cve": rule_id if rule_id.upper().startswith("CVE-") else NOT_AVAILABLE, |
There was a problem hiding this comment.
What about all rule ids that does not start with CVE?
There was a problem hiding this comment.
I see your observation. There is required to get CVE info, there is no field for value when CVE is not part of it. Due to another review comments I used rule_id in name of parent issues. So this information is not lost, just duplicated in case of CVE type.
| "impact": alert_value(alert, "impact") or NOT_AVAILABLE, | ||
| "likelihood": alert_value(alert, "likelihood") or NOT_AVAILABLE, | ||
| "confidence": alert_value(alert, "confidence") or NOT_AVAILABLE, | ||
| "remediation": _msg_param(alert, AlertMessageKey.MESSAGE) or NOT_AVAILABLE, |
There was a problem hiding this comment.
remediation is under remediation not message if I understand that correctly.
There was a problem hiding this comment.
Work "Remediation" is not present in received alert data. Usable information for remediation is in message's inner "Message" part.
| or rule_id | ||
| ) | ||
|
|
||
| title = alert_value(alert, "title", "rule_name", "rule_id") or rule_id |
There was a problem hiding this comment.
I see no case, when you would need multiple keys to fill the one title value. It will be always set by the output I am giving you with the previous step or with N/A.
There was a problem hiding this comment.
After re-evaluating this line of code, with all latest experiences I have removed it.
The title will be showing rule_id as there is more specific information usable for name.
| avd_id = ( | ||
| alert_value(alert, "avd_id", "rule_id") | ||
| or _msg_param(alert, AlertMessageKey.VULNERABILITY) | ||
| ) |
There was a problem hiding this comment.
This is the same thing, in what scenario you need three different tries to fill the template. When you have deterministic input. To me it looks like you are having too many options to fill the issue template with different data. And be inconsistent with the trust of the same source.
… issue body to use rule_id
tmikula-dev
left a comment
There was a problem hiding this comment.
The PoC is currently working. The code quality is not ideal. The refactored map is scheduled and much needed. Approving to unblock the following development.
Release Notes:
Closes #4