|
| 1 | +--- |
| 2 | +title: 'PHP’s New URI Extension: An Open Source Success Story' |
| 3 | +layout: post |
| 4 | +tags: |
| 5 | + - stories |
| 6 | +author: |
| 7 | + - name: Tim Düsterhus |
| 8 | + url: https://github.com/TimWolla/ |
| 9 | + |
| 10 | +published_at: 10 October 2025 |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +URLs are a fundamental building block of the Web we rely on every day. |
| 15 | + |
| 16 | +Their familiarity makes them appear deceptively simple: Seemingly clearly |
| 17 | +delineated components like scheme, hostname, path, and some others suggest that |
| 18 | +it’s trivial to extract information from a URL. In reality, there are thousands |
| 19 | +of custom parsers built over the years, each with their own take on details. |
| 20 | + |
| 21 | +For us web developers, there are two main standards specifying how URLs are |
| 22 | +supposed to work. [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986), |
| 23 | +which is the original URI standard from 2005; and the [WHATWG URL Living |
| 24 | +Standard](https://url.spec.whatwg.org/), which is followed by web browsers. |
| 25 | +Because things are not as simple as they appear at first glance, these two |
| 26 | +commonly used standards are incompatible with each other\! Mixing and matching |
| 27 | +different standards and their parsers, especially when they do not *exactly* |
| 28 | +follow the standard, is something that [commonly leads to security |
| 29 | +issues](https://daniel.haxx.se/blog/2022/01/10/dont-mix-url-parsers/). |
| 30 | + |
| 31 | +## Why Change Was Needed |
| 32 | + |
| 33 | +Despite the importance of correctly parsing URLs, PHP did not include any |
| 34 | +standards-compliant parser within the standard library for the longest time. |
| 35 | +There is the |
| 36 | +[`parse_url()`](https://www.php.net/manual/en/function.parse-url.php) function, |
| 37 | +which has existed since PHP 4, but it does not follow any standard and is |
| 38 | +explicitly documented not to be used with untrusted or malformed URLs. |
| 39 | +Nevertheless, it is commonly used for lack of a better alternative that is |
| 40 | +readily available and also because it appears to work correctly for a majority |
| 41 | +of well-formed inputs that developers encounter in day-to-day work. This can |
| 42 | +mislead developers to believe that the security issues of `parse_url()` are a |
| 43 | +purely theoretical problem rather than something that *will* cause issues |
| 44 | +sooner or later. |
| 45 | + |
| 46 | +As an example, the input URL `example.com/example/:8080/foo` is a valid URL |
| 47 | +consisting of only a relative path according to RFC 3986\. It is invalid |
| 48 | +according to the WHATWG URL standard when not resolved against a base URL. |
| 49 | +However, according to `parse_url()` it is a URL for the host `example.com`, |
| 50 | +port 8080 and path `/example/:8080/foo`, thus including the 8080 in *two* of |
| 51 | +the resulting components: |
| 52 | + |
| 53 | +```php |
| 54 | +<?php |
| 55 | + |
| 56 | +var_dump(parse_url('example.com/example/:8080/foo')); |
| 57 | + |
| 58 | +/* |
| 59 | +array(3) { |
| 60 | + ["host"]=> string(11) "example.com" |
| 61 | + ["port"]=> int(8080) |
| 62 | + ["path"]=> string(18) "/example/:8080/foo" |
| 63 | +} |
| 64 | +*/ |
| 65 | +``` |
| 66 | + |
| 67 | +## Introducing a New API |
| 68 | + |
| 69 | +This changes with PHP 8.5. Going forward, PHP will include standards-compliant |
| 70 | +parsers for both RFC 3986 and the WHATWG URL standard as an *always-available* |
| 71 | +part of its standard library within a new “URI” extension. Not only will this |
| 72 | +enable easy, correct, and secure parsing of URLs according to the respective |
| 73 | +standard, but the URI extension also includes functionality to modify |
| 74 | +individual components of a URL. |
| 75 | + |
| 76 | +```php |
| 77 | +<?php |
| 78 | + |
| 79 | +use Uri\Rfc3986\Uri; |
| 80 | + |
| 81 | +$url = new Uri('HTTPS://thephp.foundation:443/sp%6Fnsor/'); |
| 82 | + |
| 83 | +$defaultPortForScheme = match ($url->getScheme()) { |
| 84 | + 'http' => 80, |
| 85 | + 'https' => 443, |
| 86 | + 'ssh' => 22, |
| 87 | + default => null, |
| 88 | +}; |
| 89 | + |
| 90 | +// Remove default ports from URLs. |
| 91 | +if ($url->getPort() === $defaultPortForScheme) { |
| 92 | + $url = $url->withPort(null); |
| 93 | +} |
| 94 | + |
| 95 | +// Getters normalize the URL by default. The `Raw` |
| 96 | +// variants return the input unchanged. |
| 97 | + |
| 98 | +echo $url->toString(), PHP_EOL; |
| 99 | +// Prints: https://thephp.foundation/sponsor/ |
| 100 | +echo $url->toRawString(), PHP_EOL; |
| 101 | +// Prints: HTTPS://thephp.foundation/sp%6Fnsor/ |
| 102 | +``` |
| 103 | + |
| 104 | +## Thoughtfully Built to Last |
| 105 | + |
| 106 | +In this post we not only want to showcase the functionality but also tell you |
| 107 | +the story of how this project developed and how work gets done in PHP to keep |
| 108 | +the language modern and a great choice for web development. There is often more |
| 109 | +work behind new PHP features than meets the eye. We hope to provide some |
| 110 | +insight into why we prefer doing things right rather than fast. |
| 111 | + |
| 112 | +[Máté Kocsis](https://github.com/kocsismate) from The PHP Foundation’s dev team |
| 113 | +initially started discussion for his [RFC of a new URL parsing |
| 114 | +API](https://wiki.php.net/rfc/url_parsing_api) in June 2024\. Given PHP’s |
| 115 | +strong backwards compatibility promise, the new API needed to get things right |
| 116 | +on the first attempt in order to serve the PHP community well for the decade to |
| 117 | +come without introducing disruptive changes. Thus, over the course of *almost |
| 118 | +one year*, [more than 150 emails on the PHP Internals |
| 119 | +list](https://news-web.php.net/php.internals/123997) were sent. Additionally, |
| 120 | +several off-list discussions in various chat rooms have been had. Throughout |
| 121 | +this process, various experts from the PHP community continuously refined the |
| 122 | +RFC. They discussed even seemingly insignificant details, to provide not just a |
| 123 | +standards-compliant implementation, but also a clean and robust API that will |
| 124 | +guide developers towards the right solution for their use case. We also planned |
| 125 | +ahead and made sure that the new URI extension with its dedicated `Uri` |
| 126 | +namespace provides a clear path forward to add additional URI/URL-related |
| 127 | +functionality in future versions of PHP. |
| 128 | + |
| 129 | +The RFC ultimately went to vote in May 2025 and was accepted with a 30:1 vote. |
| 130 | +But work didn’t stop there: The proposed API also had to be implemented and |
| 131 | +reviewed. Instead of building a PHP-specific solution, Máté opted to stand on |
| 132 | +the shoulders of giants and selected two libraries to perform the heavy |
| 133 | +lifting. The [uriparser library](https://uriparser.github.io/) provides the RFC |
| 134 | +3986 parser, and the [Lexbor library](https://lexbor.com/), which is already |
| 135 | +used by PHP 8.4’s new DOM API, provides the WHATWG parser. |
| 136 | + |
| 137 | +## Open Source Collaboration |
| 138 | + |
| 139 | +As part of the integration, Máté and The PHP Foundation worked together with |
| 140 | +the upstream maintainers to include missing functionality in the respective |
| 141 | +libraries. As an example, neither library included functionality to cheaply |
| 142 | +duplicate the internal data structures, which was necessary to support cloning |
| 143 | +the readonly PHP objects representing the parsed URL when attempting to modify |
| 144 | +individual components with the so-called with-er methods (e.g., |
| 145 | +`->withPort(8080)`). The uriparser library also did not include any functions |
| 146 | +for modifying components of a parsed URL. All this functionality is now |
| 147 | +available in the upstream libraries for everyone to use and benefit from. |
| 148 | + |
| 149 | +The review and testing of Máté’s PHP implementation was carried out by PHP |
| 150 | +community contributors [Niels Dossche](https://github.com/nielsdos/) and |
| 151 | +[Ignace Nyamagana Butera](https://github.com/nyamsprod/). This included |
| 152 | +reviewing and testing the new functionality that had been added to the two |
| 153 | +upstream libraries. [Tideways, a founding member and Silver |
| 154 | +sponsor](https://thephp.foundation/#sponsors_silver) of The PHP Foundation, |
| 155 | +also sponsored engineering time; their contribution came in the form of [Tim |
| 156 | +Düsterhus](https://github.com/TimWolla/). During the review and testing, these |
| 157 | +reviewers discovered several pre-existing bugs in the upstream libraries. They |
| 158 | +submitted fixes to the upstream maintainers, [Sebastian |
| 159 | +Pipping](https://github.com/hartwork) (uriparser) and [Alexander |
| 160 | +Borisov](https://github.com/lexborisov) (Lexbor), who quickly reviewed and |
| 161 | +applied them. |
| 162 | + |
| 163 | +## Test It Now |
| 164 | + |
| 165 | +This work paid off, and PHP’s new URI extension with not just one but two |
| 166 | +feature-rich and standards-compliant URI implementations is fully available for |
| 167 | +testing with PHP 8.5 RC 1\. |
| 168 | + |
| 169 | +If you'd like to see further improvements to PHP’s standard library please |
| 170 | +consider [sponsoring The PHP Foundation](https://thephp.foundation/sponsor/). |
0 commit comments