Sunday, January 21, 2024

DEF CON 31 AIV Post Mortem

 DEF CON is always a flurry of activity, pushing technology to its limits and often revealing unforeseen challenges. This year, amidst all the buzz, there was a notable mention of the platform in Sven Cattell's "Generative Red Team Recap." While the recap provided an extensive overview, as one of the primary maintainers of the platform during the event, I'd like to address and clarify a few points, particularly concerning the network challenges and our response.

We observed high latency issues on August 11th between 12:00 and 14:30 PDT. Given the high stakes and our commitment to seamless service, our team quickly sprang into action. We implemented and deployed a reverse proxy, written in Go, which effectively resolved the latency and throttling issues. We immediately saw the latency issues within the DEF CON network vanish. This improved performance was consistent across various test locations, including on-site at DEF CON, two locations in the Bay Area (SF and Santa Cruz), and Argentina.

However, later in the day, around 15:30, there was a complete network outage lasting approximately 15 minutes. As the primary observer of our network activities, I noticed signs indicating internal throttling within the DEF CON network. External factors remained consistent, leading to some speculations which are best reserved for another discussion.

To highlight our system's robustness during this event, we successfully handled 134.5k requests on the first day. Of these, 27k were direct vendor requests to Large Language Models (LLMs). A minuscule 289 of these vendor requests resulted in a 4xx error, indicating client or request issues, while only 64 led to a 5xx error, implying vendor-side server problems. Of the 134.5k requests our service received, a total of 390 resulted in 5xx errors, giving us an uptime of 99.66% on the first day. It's worth noting that a bug in our system caused duplicate email entries to trigger a 500 error. When accounting for this, our actual unhandled SLA was 99.947%.

One of the pain points of the event ended up being the physical referral codes used in part to gate access to the event. While these codes were pre-generated and single-use per attendee, some codes ended up being recycled and given out to several attendees. This caused errors during signup, potentially contributing to the overall error rate and degradation in SLA, but more importantly, eroded user experience and caused confusion amongst attendees.

Another pain point of the event was an error that was believed to have leaked user credentials. Due to the network throttling from DEF CON at approximately 15:30 PDT mentioned above, for a small number of users, some elements of the user experience failed to load, causing other elements, such as forms, to fall back to their default behaviors. This included forming GET requests with user event information in the query path. Due to the TLS proxy between DEF CON and the platform's backend, there is little to no chance outside parties could gain access to this information for the few users it occurred for, and all logs that may have included this information have been erased. If this were to happen again, credentials could get logged in the web server access logs, something we should be aware of and check for. 

Even with these pain points, day one ran effectively with a large number of attendees greatly enjoying their experience in the GRT Challenge. Day 2 saw even smoother operations, with the previously encountered issues non-existent, a testament to our team's swift and effective response.

DEF CON 31 was an intense, learning-filled experience. We remain committed to pushing boundaries, innovating rapidly in the face of challenges, and ensuring the best service possible.

Moving over to Medium!

 Thanks for checking out my TODO: Fix This blog! Please continue over to Medium , where I'll continually update and post content! Thank ...