The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge

Electrical engineer Gilbert Herrera was appointed research director of the US National Security Agency in late 2021, just as an AI revolution was brewing inside the US tech industry.

The NSA, sometimes jokingly said to stand for No Such Agency, has long hired top math and computer science talent. Its technical leaders have been early and avid users of advanced computing and AI. And yet when Herrera spoke with me by phone about the implications of the latest AI boom from NSA headquarters in Fort Meade, Maryland, it seemed that, like many others, the agency has been stunned by the recent success of the large language models behind ChatGPT and other hit AI products. The conversation has been lightly edited for clarity and length.

How big of a surprise was the ChatGPT moment to the NSA?

Oh, I thought your first question was going to be “what did the NSA learn from the Ark of the Covenant?” That’s been a recurring one since about 1939. I’d love to tell you, but I can’t.

What I think everybody learned from the ChatGPT moment is that if you throw enough data and enough computing resources at AI, these emergent properties appear.

The NSA really views artificial intelligence as at the frontier of a long history of using automation to perform our missions with computing. AI has long been viewed as ways that we could operate smarter and faster and at scale. And so we’ve been involved in research leading to this moment for well over 20 years.

How could commercially available large language models be useful to the NSA?

One of the things that these large models have demonstrated they are pretty good at is reverse engineering and automating cyber defenses. And those things can be accomplished without being overly constrained when it comes to laws related to personal privacy [since it could be trained on software code that isn’t as sensitive].

Let’s say that we wanted to create an analyst “copilot,” something that uses a GPT-type thing to help an analyst analyze data. If we wanted to do that. Then we’d need something with analytical skills in American culture and the English language, and that would be really hard for us to do, given the various laws [about accessing US data].

Hypothetical we could use something like RAG [retrieval augmented generation, a technique in which a language model responds to a query by summarizing trusted information] to utilize an LLM to only look at data that had been through our compliance scrutiny.

How would the law complicate the development of language models at the NSA?

We might need to keep certain datasets that were used to train models for very long periods of time, and it raises a question of their data retention issues. The other issue is, imagine getting a lot of information and it was the entire internet. You might have US persons’ data on it and might have copyrighted data. But you don’t look at it [when feeding it to an AI model]. At what time do all the laws apply?

I think it will be difficult for the intelligence community to replicate something like GPT-10, because we already know the scale of investment they have. And they can do things with data that nobody in government would ever think of doing.

Does widespread use of AI create new security problems for the US?

On day one of the release of ChatGPT, there was evidence of improved phishing attacks. And if it improves their success rate from one in 100,000 to one in 10,000. That’s an order of magnitude improvement. Artificial intelligence is always going to favor people who don’t have to worry about quantifying margins and uncertainties in the usage of the product.

Is AI opening a new frontier of information security then?

They’re going to be huge new security threats. That’s one of the reasons why we formed an AI Security Center. There are a lot of things you can do to harm a model. You can steal models and engineer on them, and there are inversion attacks where you can try to steal some of the private data out of them.

The first line of defense in AI security is good cybersecurity. It means protecting your models, protecting the data that’s in there, protecting them from being stolen or manipulated.

Source : Wired