Skip to content

Commit 73a347c

Browse files
TimWollapronskiy
authored andcommitted
Add “PHP’s New URI Extension” blog post
1 parent 44138a0 commit 73a347c

File tree

1 file changed

+170
-0
lines changed

1 file changed

+170
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
title: 'PHP’s New URI Extension: An Open Source Success Story'
3+
layout: post
4+
tags:
5+
- stories
6+
author:
7+
- name: Tim Düsterhus
8+
url: https://github.com/TimWolla/
9+
10+
published_at: 10 October 2025
11+
12+
---
13+
14+
URLs are a fundamental building block of the Web we rely on every day.
15+
16+
Their familiarity makes them appear deceptively simple: Seemingly clearly
17+
delineated components like scheme, hostname, path, and some others suggest that
18+
it’s trivial to extract information from a URL. In reality, there are thousands
19+
of custom parsers built over the years, each with their own take on details.
20+
21+
For us web developers, there are two main standards specifying how URLs are
22+
supposed to work. [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986),
23+
which is the original URI standard from 2005; and the [WHATWG URL Living
24+
Standard](https://url.spec.whatwg.org/), which is followed by web browsers.
25+
Because things are not as simple as they appear at first glance, these two
26+
commonly used standards are incompatible with each other\! Mixing and matching
27+
different standards and their parsers, especially when they do not *exactly*
28+
follow the standard, is something that [commonly leads to security
29+
issues](https://daniel.haxx.se/blog/2022/01/10/dont-mix-url-parsers/).
30+
31+
## Why Change Was Needed
32+
33+
Despite the importance of correctly parsing URLs, PHP did not include any
34+
standards-compliant parser within the standard library for the longest time.
35+
There is the
36+
[`parse_url()`](https://www.php.net/manual/en/function.parse-url.php) function,
37+
which has existed since PHP 4, but it does not follow any standard and is
38+
explicitly documented not to be used with untrusted or malformed URLs.
39+
Nevertheless, it is commonly used for lack of a better alternative that is
40+
readily available and also because it appears to work correctly for a majority
41+
of well-formed inputs that developers encounter in day-to-day work. This can
42+
mislead developers to believe that the security issues of `parse_url()` are a
43+
purely theoretical problem rather than something that *will* cause issues
44+
sooner or later.
45+
46+
As an example, the input URL `example.com/example/:8080/foo` is a valid URL
47+
consisting of only a relative path according to RFC 3986\. It is invalid
48+
according to the WHATWG URL standard when not resolved against a base URL.
49+
However, according to `parse_url()` it is a URL for the host `example.com`,
50+
port 8080 and path `/example/:8080/foo`, thus including the 8080 in *two* of
51+
the resulting components:
52+
53+
```php
54+
<?php
55+
56+
var_dump(parse_url('example.com/example/:8080/foo'));
57+
58+
/*
59+
array(3) {
60+
["host"]=> string(11) "example.com"
61+
["port"]=> int(8080)
62+
["path"]=> string(18) "/example/:8080/foo"
63+
}
64+
*/
65+
```
66+
67+
## Introducing a New API
68+
69+
This changes with PHP 8.5. Going forward, PHP will include standards-compliant
70+
parsers for both RFC 3986 and the WHATWG URL standard as an *always-available*
71+
part of its standard library within a new “URI” extension. Not only will this
72+
enable easy, correct, and secure parsing of URLs according to the respective
73+
standard, but the URI extension also includes functionality to modify
74+
individual components of a URL.
75+
76+
```php
77+
<?php
78+
79+
use Uri\Rfc3986\Uri;
80+
81+
$url = new Uri('HTTPS://thephp.foundation:443/sp%6Fnsor/');
82+
83+
$defaultPortForScheme = match ($url->getScheme()) {
84+
'http' => 80,
85+
'https' => 443,
86+
'ssh' => 22,
87+
default => null,
88+
};
89+
90+
// Remove default ports from URLs.
91+
if ($url->getPort() === $defaultPortForScheme) {
92+
$url = $url->withPort(null);
93+
}
94+
95+
// Getters normalize the URL by default. The `Raw`
96+
// variants return the input unchanged.
97+
98+
echo $url->toString(), PHP_EOL;
99+
// Prints: https://thephp.foundation/sponsor/
100+
echo $url->toRawString(), PHP_EOL;
101+
// Prints: HTTPS://thephp.foundation/sp%6Fnsor/
102+
```
103+
104+
## Thoughtfully Built to Last
105+
106+
In this post we not only want to showcase the functionality but also tell you
107+
the story of how this project developed and how work gets done in PHP to keep
108+
the language modern and a great choice for web development. There is often more
109+
work behind new PHP features than meets the eye. We hope to provide some
110+
insight into why we prefer doing things right rather than fast.
111+
112+
[Máté Kocsis](https://github.com/kocsismate) from The PHP Foundation’s dev team
113+
initially started discussion for his [RFC of a new URL parsing
114+
API](https://wiki.php.net/rfc/url_parsing_api) in June 2024\. Given PHP’s
115+
strong backwards compatibility promise, the new API needed to get things right
116+
on the first attempt in order to serve the PHP community well for the decade to
117+
come without introducing disruptive changes. Thus, over the course of *almost
118+
one year*, [more than 150 emails on the PHP Internals
119+
list](https://news-web.php.net/php.internals/123997) were sent. Additionally,
120+
several off-list discussions in various chat rooms have been had. Throughout
121+
this process, various experts from the PHP community continuously refined the
122+
RFC. They discussed even seemingly insignificant details, to provide not just a
123+
standards-compliant implementation, but also a clean and robust API that will
124+
guide developers towards the right solution for their use case. We also planned
125+
ahead and made sure that the new URI extension with its dedicated `Uri`
126+
namespace provides a clear path forward to add additional URI/URL-related
127+
functionality in future versions of PHP.
128+
129+
The RFC ultimately went to vote in May 2025 and was accepted with a 30:1 vote.
130+
But work didn’t stop there: The proposed API also had to be implemented and
131+
reviewed. Instead of building a PHP-specific solution, Máté opted to stand on
132+
the shoulders of giants and selected two libraries to perform the heavy
133+
lifting. The [uriparser library](https://uriparser.github.io/) provides the RFC
134+
3986 parser, and the [Lexbor library](https://lexbor.com/), which is already
135+
used by PHP 8.4’s new DOM API, provides the WHATWG parser.
136+
137+
## Open Source Collaboration
138+
139+
As part of the integration, Máté and The PHP Foundation worked together with
140+
the upstream maintainers to include missing functionality in the respective
141+
libraries. As an example, neither library included functionality to cheaply
142+
duplicate the internal data structures, which was necessary to support cloning
143+
the readonly PHP objects representing the parsed URL when attempting to modify
144+
individual components with the so-called with-er methods (e.g.,
145+
`->withPort(8080)`). The uriparser library also did not include any functions
146+
for modifying components of a parsed URL. All this functionality is now
147+
available in the upstream libraries for everyone to use and benefit from.
148+
149+
The review and testing of Máté’s PHP implementation was carried out by PHP
150+
community contributors [Niels Dossche](https://github.com/nielsdos/) and
151+
[Ignace Nyamagana Butera](https://github.com/nyamsprod/). This included
152+
reviewing and testing the new functionality that had been added to the two
153+
upstream libraries. [Tideways, a founding member and Silver
154+
sponsor](https://thephp.foundation/#sponsors_silver) of The PHP Foundation,
155+
also sponsored engineering time; their contribution came in the form of [Tim
156+
Düsterhus](https://github.com/TimWolla/). During the review and testing, these
157+
reviewers discovered several pre-existing bugs in the upstream libraries. They
158+
submitted fixes to the upstream maintainers, [Sebastian
159+
Pipping](https://github.com/hartwork) (uriparser) and [Alexander
160+
Borisov](https://github.com/lexborisov) (Lexbor), who quickly reviewed and
161+
applied them.
162+
163+
## Test It Now
164+
165+
This work paid off, and PHP’s new URI extension with not just one but two
166+
feature-rich and standards-compliant URI implementations is fully available for
167+
testing with PHP 8.5 RC 1\.
168+
169+
If you'd like to see further improvements to PHP’s standard library please
170+
consider [sponsoring The PHP Foundation](https://thephp.foundation/sponsor/).

0 commit comments

Comments
 (0)