Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2373 +/- ##
=======================================
Coverage 47% 47%
- Complexity 6560 6571 +11
=======================================
Files 780 780
Lines 64398 64398
Branches 9628 9628
=======================================
+ Hits 30509 30517 +8
+ Misses 31569 31555 -14
- Partials 2320 2326 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rodinaarssen
left a comment
There was a problem hiding this comment.
Cool! Just a few remarks and questions.
DavyLandman
left a comment
There was a problem hiding this comment.
Pinging @jurgenvinju since he has in the past done some more stuff around formatting & indentation.
I have the feeling we're missing something in this PR, a feature we already have in rascal. (aside from the fact I don't really like all these regexps and char loops)
| } | ||
|
|
||
| @synopsis{Split a string to an indentation prefix and the remainder of the string.} | ||
| tuple[str indentation, str rest] splitIndentation(/^<indentation:\s*><rest:.*>/) |
There was a problem hiding this comment.
what if the string contains multiple lines?
There was a problem hiding this comment.
I think that should be an explicit exception (invalid argument or similar)
| int count = size(findAll(input, nl)); | ||
| linesepCounts[nl] = count; | ||
| // subtract all occurrences of substrings of newline characters that we counted before | ||
| for (str snl <- substrings(nl), linesepCounts[snl]?) { |
There was a problem hiding this comment.
this almost looks like pattern matching on strings? (which we only have reasonable support over)
for example:
rascal>visit("abcd") { case str m : println(m); }
abcd
bcd
cd
d
come to think of it, this whole function smells like an parsing automata. Where we build a big state table of all the possible matches and then iterate through all the chars and count the matches based on their state.
In java this would be 20/30 lines, but in rascal we might be missing some primitives (as we don't have a character loop).
There was a problem hiding this comment.
If we hard-code the set of newline characters (e.g. to all Unicode newline chars), we could write it as a grammar and use the parser generator. Downside (as we discussed) is that all (transitive) imports of this module will trigger generation of a parser. We could also move some of those to a specific Format module.
fa8fe8b to
836f117
Compare
836f117 to
8d8de3e
Compare
This PR adds many generic tools that can be used to process/format strings.
TODO