[Enhancement] (FE) Convert intra FE-to-FE calls to HTTPS when enabled#60921
[Enhancement] (FE) Convert intra FE-to-FE calls to HTTPS when enabled#60921nsivarajan wants to merge 5 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 28728 ms |
TPC-DS: Total hot run time: 183423 ms |
|
run buildall |
TPC-H: Total hot run time: 29084 ms |
TPC-DS: Total hot run time: 184255 ms |
FE UT Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Code Review Summary
This PR converts intra FE-to-FE HTTP communication to HTTPS when enable_https=true. The core approach is sound — centralizing URL construction in HttpURLUtil.buildInternalFeUrl() and adding SSL-aware HTTP clients in InternalHttpsUtils. However, I found several issues that should be addressed:
Critical Issues
-
JVM-wide global state mutation (
installTrustManagerForUrlConnection) —setDefaultSSLSocketFactory()andsetDefaultHostnameVerifier()modify JVM-wide defaults that affect ALLHttpsURLConnectioninstances in the process, not just FE-to-FE connections. This is called on everygetConnectionWithNodeIdent()invocation, which is wasteful and thread-unsafe. Additionally, the hostname verifier(hostname, session) -> truecompletely disables hostname verification globally, which could affect other HTTPS connections in the JVM (e.g., external catalog connections). -
Resource leak in
HttpUtils.executeRequest()—CloseableHttpClientis created but never closed. TheResponseHandlervariant ofexecute()closes the HTTP response, but NOT the client itself. This leaks connection pool resources. Other callers in the codebase (e.g.,MysqlLoadManager,HFUtils) properly usetry-with-resources. -
Incomplete conversion — Several FE-to-FE communication paths still hardcode
http://andConfig.http_port, which will break whenenable_https=trueandhttp_port=0. Key missed locations:HttpUtils.concatUrl()(line 75) — hardcodeshttp://HttpUtils.getFeList()(line 65) — hardcodesConfig.http_portSessionController.getOtherSessionInfo()— hardcodeshttp://NodeAction.concatFeSetConfigUrl()— hardcodeshttp://NodeAction.getFeList()andhandleConfigurationInfo()— hardcodeshttp://andConfig.http_port
Minor Issues
-
Removed useful comments — Two comments about using
Env.getServingEnv()instead ofgetCurrentEnv()were deleted. These documented a non-obvious requirement and should be preserved. -
Removed LOG.info in MetaHelper.doGet — The debug logging for URL, timeout, and headers was removed. This is useful for troubleshooting connectivity issues.
Checkpoints
- Correctness: Issues found (global state mutation, incomplete conversion, resource leak)
- Security: The approach of certificate validation + disabled hostname verification for internal communication is reasonable and documented, but the JVM-wide global mutation is a concern
- Performance:
installTrustManagerForUrlConnection()re-reads the truststore from disk and re-initializes the SSLContext on every call — should be cached/initialized once - Tests: Unit tests for
buildInternalFeUrl()are good; no tests for the SSL client creation or the HTTPS integration path
| sslContext.init(null, tmf.getTrustManagers(), null); | ||
|
|
||
| javax.net.ssl.HttpsURLConnection.setDefaultSSLSocketFactory(sslContext.getSocketFactory()); | ||
| javax.net.ssl.HttpsURLConnection.setDefaultHostnameVerifier((hostname, session) -> true); |
There was a problem hiding this comment.
Critical: JVM-wide global state mutation
setDefaultSSLSocketFactory() and setDefaultHostnameVerifier() modify JVM-wide defaults that affect ALL HttpsURLConnection instances in the entire process — not just the FE-to-FE connection being created. This is problematic because:
- Thread safety: Multiple threads calling this concurrently could race. There's no synchronization.
- Global hostname verification bypass:
(hostname, session) -> truedisables hostname verification for ALL futureHttpsURLConnectioninstances, including any external catalog connections or other HTTPS clients usingHttpsURLConnection. - Redundant re-initialization: This is called on every
getConnectionWithNodeIdent()invocation, re-reading the truststore from disk and re-initializing theSSLContexteach time.
Recommendation: Initialize the SSLContext and install the defaults once at startup (e.g., via a static initializer or an init() method called during FE bootstrap). Alternatively, set the SSL factory on the individual HttpsURLConnection instance rather than the global default:
HttpsURLConnection httpsConn = (HttpsURLConnection) url.openConnection();
httpsConn.setSSLSocketFactory(sslContext.getSocketFactory());
httpsConn.setHostnameVerifier(...);This avoids polluting the JVM-wide defaults.
|
|
||
| private static String executeRequest(HttpRequestBase request) throws IOException { | ||
| CloseableHttpClient client = getHttpClient(); | ||
| CloseableHttpClient client; |
There was a problem hiding this comment.
Bug: Resource leak — CloseableHttpClient is never closed
The CloseableHttpClient created here (both the HTTPS and plain HTTP paths) is never closed. The ResponseHandler variant of execute() closes the HTTP response, but NOT the client itself. This leaks the client's internal PoolingHttpClientConnectionManager and associated socket file descriptors.
Other callers in the codebase do this correctly (e.g., MysqlLoadManager.java uses try-with-resources).
Fix:
private static String executeRequest(HttpRequestBase request) throws IOException {
try (CloseableHttpClient client = request.getURI().getScheme().equalsIgnoreCase("https")
&& Config.enable_https
? InternalHttpsUtils.createValidatedHttpClient()
: HttpClientBuilder.create().build()) {
return client.execute(request, httpResponse -> EntityUtils.toString(httpResponse.getEntity()));
}
}Note: this resource leak also existed before this PR (the old code called getHttpClient() without closing it), but since this PR rewrites the method, it's a good opportunity to fix it.
| // because here we need to obtain selfNode through the official service catalog. | ||
| HostInfo selfNode = Env.getServingEnv().getSelfNode(); | ||
| conn.setRequestProperty(Env.CLIENT_NODE_HOST_KEY, selfNode.getHost()); | ||
| conn.setRequestProperty(Env.CLIENT_NODE_PORT_KEY, selfNode.getPort() + ""); |
There was a problem hiding this comment.
Deleted useful comments
The comments // Must use Env.getServingEnv() instead of getCurrentEnv(), // because here we need to obtain selfNode through the official service catalog. were removed here and in getNodeIdentHeaders(). These documented a non-obvious architectural requirement. Please preserve them — they explain why getServingEnv() is used instead of the more common getCurrentEnv().
| import java.util.Map; | ||
|
|
||
| public class MetaHelper { | ||
| public static final Logger LOG = LogManager.getLogger(MetaHelper.class); |
There was a problem hiding this comment.
Removed useful debug logging
The LOG.info("meta helper, url: {}, timeout{}, headers: {}", url, timeout, headers) was removed. While the response wrapping in try-catch is a good improvement, the debug logging for URL, timeout, and headers is valuable for troubleshooting connectivity issues (especially with this HTTPS change). Consider keeping it, perhaps at LOG.debug level.
| private static final Logger LOG = LogManager.getLogger(InternalHttpsUtils.class); | ||
|
|
||
| public static CloseableHttpClient createValidatedHttpClient() { | ||
| try { |
There was a problem hiding this comment.
Performance: Truststore re-read on every call
Both createValidatedHttpClient() and installTrustManagerForUrlConnection() re-read the truststore file from disk and re-initialize the SSLContext on every invocation. For internal FE-to-FE calls (checkpoint sync, journal ID queries, etc.), this could be called frequently.
Consider caching the SSLContext in a static field, initialized lazily or at startup. The truststore file doesn't change at runtime.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Currently, Doris provides an enable_https switch to enforce HTTPS connections. In hardened deployments, HTTP is completely disabled by setting http_port = 0. However, intra FE-to-FE communication still relies on HTTP, causing failures in edit log and checkpoint synchronisation when HTTPS is enabled.
This PR enhances HTTPS support by automatically converting intra FE-to-FE communication to HTTPS when enable_https=true, without introducing any new configuration, leveraging mysql_ssl_default_ca_certificate created for https connection. When enable_https=false, behavior remains unchanged. This ensures secure and seamless FE-to-FE communication without breaking existing workflows.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)