Chronicles of a PI TLS 1.2 upgrade
We recently upgraded our PI 7.4 systems to SPS 15. Nothing really fancy in terms of features since SAP development in the PI space has pretty much come to a standstill these days. However, as part of the upgrade, it included the new IAIK library that now supports TLS 1.2. As mentioned in my previous blog post over a year ago, this is one of the more popular issue/discussion last year as it is a key functionality. Fast forward one year later, and that is still one of the most viewed blog of mine since the migration to this new SAP Community platform, so my guess is this topic is still relevant today as it was last year.
As mentioned, this new library is delivered as part of SAP Note 2284059 – Update of SSL library within NW Java server and it provides a fair amount of details related to the configuration of the library. Another useful resource related to this is Markus Schalk‘s blog Outbound support for TLS 1.1/1.2.
In this blog, I will aim to share my own experience during this upgrade project with the intention of highlighting some of the key areas to take note for anyone going through a similar journey.
Upgrading the library
So once we have the IAIK library installed, we could begin testing our HTTPS interfaces. As mentioned in the SAP Note, SAP strongly recommends that testing is done for all connections prior to deployment in production.
We can verify the new IAIK library is being used and TLS 1.2 capability is available by using XPI Inspector to troubleshoot HTTP SSL connections. As shown below, the SSL debug logs shows that the client hello is sent requesting SSL version 3.3 (i.e. TLS 1.2). Additional observation is that the renegotiation_info and signature_algorithms TLS extensions are also included.
Once we began testing, we noticed java.net.SocketException: Connection reset errors on some of the interfaces.
One of our external partners confirmed that when using TLS 1.2, they also require the SNI (Server Name Indication) extension to be sent as well. Without this, the SSL handshake would fail and result in the error above.
SSL config file
The SAP Note mentions that the SSL handshake can be further configured using a custom SSL configuration file. So first of all, we need to find a copy of the default configuration file. This is important because there are many existing properties configured there, and if we start with a blank file instead, all these default properties would be missing.
Surprisingly, although the SAP Note refers to the SSL configuration file, it does not provide a copy of it, and neither does it mention where it can be found. Lucky for us, Markus has already gone through the exercise and mentioned that it can be found in the iaik_ssl.jar file in the PI system. However, the path provided by him was not applicable in our system (maybe because ours is dual stack), so I’m listing where I found it on my system instead:-
Once we have default config file, we configured SNI support by following section 5.5 of the note. Since our interfaces connect to both servers that require SNI as well as do not support SNI, we add the following property to the file.
Subsequently, this custom property file is places in the PI system, and the VM parameters are configured to read this file.
The real fun begins!
After a system restart, we thought that this would have solved the problem, but on the contrary this was where the “fun really began”!
While some of the previously failing connections began to work, others began to show different errors.
Just to be sure that the configuration was done correctly, we ran another XPI inspector trace, and we can now see that the server_name extension is being included.
We noticed that the connections on non-SOAP adapters (like our Advantco DYNCRM adapter) were working fine, but the SOAP connections were facing a new error:-
Communication over HTTPS/PROXY with host verification. Unable to create a socket
As this was a unexpected and bizarre error, after troubleshooting the issue without much progress – guess what…. we decided to log an incident on the SAP Support Portal. Anyone who has experienced SAP Support would know what this means…. PING PONG match!
Despite having paid premium $$$ for SAP MaxAttention support for the duration of the upgrade project, it did not change the fact that the Incident ticket had to make its round amongst SAP’s First Level support before getting any traction. Dealing with SAP Support was in itself a test of patience, resolve, muscle flexing and anger management!
We were even sent on a wild goose chase, when one of them recommended that we add the property in the following format to the config file.
The issue with this was:-
- It wasn’t mentioned in the note, so we wouldn’t have known to configure it
- We can only list one server, what if we have more than one server requiring SNI support?
Of course, we being good SAP customers, followed the suggestion and
wasted spent our time testing it to no avail.
It was after some muscle flexing that we finally got through to the development team. And to everyone’s surprise, it was revealed that the above was an undocumented feature that was only meant for internal testing purposes. “So why did your colleague suggested it to us in the first place???!!!”
Excuse me, did you see my bug fix?
So, apparently from the SOAP adapter development expert, the host verification error was due to a bug fix (SAP Note 1381198) that was previously released but somehow did not make its way to the latest release. Imagine that – a bug fix got lost in SAP’s code repository…. why??? how??? Anyway, we were then asked to implement SAP Note 2413354 in order to get the fix for the issue.
With that in place, we were supposed to configure the following in the SOAP channels that were connecting to the servers requiring SNI support. The “promise” of course would be that this would fix our issue – guess what… it didn’t!!
99 little bugs in the code
So, after we implemented the above note, we hit the following new error on our SOAP channels
java.io.EOFException: Connection closed by remote host.
This issue particularly took a longer time to resolve because the error was appearing intermittently. At times to connection began to work (and we had a premature “hooray” moment), only for it to fail again later. We were unable to consistently reproduce the error and as such it was hard to SAP to determine the root cause. After a while, we finally noticed the following trend:-
|Channel had hostVerification = true||Messages fail with error java.io.EOFException: Connection closed by remote host.|
|Channel then had hostVerification parameter removed||Messages fail with previous error Communication over HTTPS/PROXY with host verification.|
|Channel reconfigured back with hostVerification = true||Messages start to process successfully!!|
|After some time elapsed||Messages fail back again with error java.io.EOFException: Connection closed by remote host.|
This seems to indicate some sort of temporary cache related issue. So I decided to troubleshoot the issue using Fiddler to analyse the network traffic. Fiddler acted as a listening proxy (mimicking our organization’s forward proxy) and captured the network traffic.
The analysis revealed that there is a difference in the following HTTP header related to proxy authentication.
|Channel had hostVerification = true|
|Channel then had hostVerification parameter removed|
It seems the fix in SAP Note 2413354 introduced a new bug where the proxy authentication HTTP header was no longer sent. This error was masked because our organization’s forward proxy caches the proxy authentication for a certain time limit. So when the hostVerification was removed, the proxy authentication was sent and cached.
We managed to confirm this behavior by engaging our network team to capture Wireshark traces of the traffic when we performed the above sequence of steps. Additionally, we also collected XPI Inspector trace using Example 100 as advised by SAP, which showed the details of the error.
With this, we managed to convince SAP that there was indeed a bug and therefore a fix was required.
However, it still took SAP a few iterations of private fixes before finally releasing the SAP Note publicly. This was because the development team was unable to test the scenario internally as they do not have a proxy server in their infrastructure (imagine that… SAP with all their billions, it must have all gone to Leonardo!) They had to rely on us to implement the private fix, perform the test and collect the Wireshark and XPI traces.
Anyway, finally we received SAP Note 2435720, and after thorough testing of all the interfaces, we were glad that this marked the end of our saga.
Upgrading the TLS library, whilst necessary is not a trivial pursuit, especially if there are a lot of HTTPS-based interfaces with external partners. As recommended by SAP, make sure everything is tested before you deploy it to production, if not you never know if some nasty issue awaits you.