Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development , Video
ISMG Editors: Will AI Survive the Data Drought?
Also: ISMG's Summit in Chicago; Navigating Regulatory Change Anna Delaney (annamadeline) • June 14, 2024In the latest weekly update, Information Security Media Group editors discussed ISMG's upcoming North America Midwest Cybersecurity Summit, challenges and solutions regarding AI training data, and the implications of the new European Union Artificial Intelligence Act for CISOs.
See Also: Safeguarding Election Integrity in the Digital Age
The panelists - Anna Delaney, director, productions; Tony Morbin, executive news editor, EU; Tom Field, senior vice president, editorial; and Rashmi Ramesh, assistant editor, global news desk - discussed:
- An overview of ISMG's North America Midwest Cybersecurity Summit, which will include interactive sessions, workshops and networking focused on current cybersecurity tools, technologies and strategies;
- Why researchers are warning that AI models are rapidly depleting available training data, which could soon slow advancements;
- Strategies for CISOs to navigate the regulatory landscape and maintain privacy, safety and security while fostering innovation in AI technologies, especially as the new EU AI Act comes into force this month.
The ISMG Editors' Panel runs weekly. Don't miss our previous installments, including the June 5 edition on the opening day overview of Infosecurity Europe 2024 and the June 7 edition on the Infosecurity Europe Conference 2024 wrap-up.
Transcript
This transcript has been edited and refined for clarity.Anna Delaney: Hello and welcome to the ISMG Editors' Panel. I'm Anna Delaney, and this week we'll be discussing ISMG's upcoming North America Midwest: Cybersecurity Summit, the challenges and solutions in AI training data and the implications of the new European Union Artificial Intelligence Act for CISOs. Our panel today features Tom Field, senior vice president of editorial; Rashmi Ramesh, assistant editor, global news desk; and Tony Morbin, executive news editor for the EU. Tom, you'll be hosting ISMG's North America, Midwest cybersecurity Summit. Tell us about it and what attendees look forward to.
Tom Field: We've been preparing for it for the past few weeks, and I put together some questions for our keynote speaker, Congressman Bill Foster, a democrat from Illinois, who is also a member of the Congressional AI Task Force. We'll have the opportunity to sit with him, talk about the task force, its mission, the prospect of AI regulation anytime in the U.S., and, in an election year, concerns about election security, and his concerns about whether cybersecurity remains a non-partisan issue, or is it starting to get politicized? We've got some terrific panels as well. You've participated in the past in the deepfakes exercise we do with the Secret Service and with our attendees, we have one of those coming up. I'm working with some CISOs to have discussions about traversing the compliance maze and preparing strategies for regulatory success with all the new regulations on data and on-incident reporting. There's so much going on there that we're going to sit with the CISOs and talk about how they prioritize and make sense of this. We're going to talk about strategies for overcoming the cybersecurity skills shortage, and some of the creative things that organizations do to recruit non-traditional employees, to reskill some of the ones they have, people to come in and help them untap their AI potential, and those skills are just developing. What that portends is that we're going to see some new faces in cybersecurity over the next several years. How's that going to shakedown for us? We have to start thinking about the type of content that we create and how we reach this new audience. We expect a good number of attendees, and I'm only giving you the tip of the iceberg in terms of the topics we're going to discuss. We'll have our studio there as well, our colleague, Michael Novinson will be joining me in Chicago. I look forward to being able to sit down with some of our thought leaders, and produce some engaging content that we can come back here and share.
Delaney: What makes this unique compared to other cybersecurity events? Because there are a few out there.
Field: Here's how I always pitch this to our speakers and to the attendees, "You're not going to an event like RSA, InfoSec or BlackHat, where you're one among 1000s and are marching along and being spoken to, and taking the opportunity to sit and listen. Here, you've got the opportunity to engage. Our stage is right beside the attendees. There are no bright lights in the speaker's eyes, you can see the people you're talking to, you can engage with them. We pass microphones around the room so that the attendees can speak with the speakers, and we provide plenty of opportunities between sessions for folks to go off and have a private conversation. The intimacy of it sets us aside. It's not just a didactic "we present to the audience." The audience and the speakers truly engage in a summit. One of our big differentiators is this deepfakes exercise that brings people together throughout the room. You sit at separate tables, and you sit with Secret Service agents and with other moderators to solve a tabletop exercise about an organization that suffered fraud as a result of a deepfake. You've been there and seen this. You have watched the dynamic of the room change once people go through that exercise, it brings people together in a way that traditional networking doesn't. Friendships and relationships are formed because of that, and knowing that it happens at our event, brings people back to our events. I think some of that sets it aside. We try to have the most engaging and most up to date content as well. We have teams that work to ensure that we've got relevant agendas and that we're bringing in speakers that have got true, hands-on expertise to share. You're not going to see recycled presentations, it's unique for each of these destinations.
Delaney: It sounds like a very good event ahead. We look forward to hearing all about it next week. Rashmi, moving on to AI training data. There's a lot happening in the AI arms race at the moment, but a key dilemma is that AI models are rapidly depleting available training data, and despite innovations in compute efficiency and synthetic data, research has warned that data bottlenecks and diminishing returns could slow advancements very soon. Could you help us understand what's going on, and what are the challenges and the solutions in the field of AI training?
Rashmi Ramesh: To put it in concise words, AI developers are running out of high-quality public text data to train models on, and quite quickly - in as little as two to eight years. To set some context on how much data we talking about here, the amount of training data AI models need depends on the complexity of the problems that the models are built to solve, the model architecture itself and the performance metrics of it. Generally, when you increase the training compute of a model by 100x, you would increase the size of its training data set by 10x. To give you a sense of how much data that is, the largest published data set that we've used for AI training so far is Meta's Llama 3, which used 15 trillion segments of text called tokens. It's possible that some other closed-source models have used larger data sets, but the companies haven't published those numbers, so we don't know. This is what we know, and if we look at history, the size of datasets for training LLMs has increased about 2.2 times per year. If that trend continues, by the end of the decade 2023, we will have models that will need to be trained on a quadrillion tokens, which is about 1000 times more than what we use today. The quality of the text also matters, better training data means better performing models. Currently, there's no standard for what determines high-quality text, what makes for low quality text, so it's empirical judgment. If it's passed some curation and editing, such as news articles or books or academic articles, this text is considered high-quality, and social media posts are not. Then we get to the heart of the issue. We know that there's a problem. We know the scope of the problem. Now we see what researchers have to say about potential solutions. One very promising way out of this data bottleneck is to use synthetic data to train AI models. We have had cases, such as a DeepMind model called AlphaZero, which - a few years ago - was able to surpass human players in a game of Go using only synthetic data. It's possible to apply this at a wider scale, and it's a viable solution, but like everything else, it also comes with its own set of drawbacks. The best-case scenario of the disadvantages is that the model doesn't learn anything new. Worst cases are that it produces inconsistent, biased and inaccurate data. An expert that I was speaking to yesterday shared a witty example, he said that using AI-generated content to train AI models with no human intervention, was like asking a student to learn by grading their own exam without outside help or information. There are other alternatives as well, but maybe we can leave something for everyone to go back to the story and read.
Delaney: That's a great explanation, Rashmi, and if AI models reach a point of diminishing returns, due to data limitations, what are the long-term implications? How might this affect AI development and deployment?
Ramesh: The obvious result is that many AI applications will probably reach a performance plateau, which makes it harder to achieve the levels of efficiency or effectiveness that we need for certain tasks. However, the implications are both opposite and equal. For businesses, this might mean slower improvements in AI capabilities and higher costs for smaller gains and only well-funded organizations and countries might be able to afford cutting-edge AI. However, it also means that others might begin to develop more specialized applications, researchers could focus on creating smarter algorithms that do more for lesser data, and we could see improvements in how we can collect and use data, like generating synthetic data. On a larger scale, industries that rely on AI might see slower growth, which potentially could have an impact on the economy, but the slower advancement also means that it could help ease fears about the massive job losses that we've been having because of the automation in AI. It'll have an impact, but it's likely not a linear impact.
Delaney: No doubt we'll be returning to this topic, Rashmi. Tony, the new EU AI Act officially comes into force this month after being approved by the European Parliament and the Council of the European Union earlier this year. I think you want to revisit a theme which we've discussed a bit in the past. How can CISOs navigate the regulatory landscape, maintain privacy, security and safety, while fostering AI innovation? Over to you.
Tony Morbin: There's going to be some echoes of some of the things that Rashmi has been talking about, but you don't have to be an AI accelerationist to want rapid advancement and deployment of artificial intelligence because it is promising transformative benefits for society. Equally, you don't need to be an AI Luddite or a safety-first regulator who bans AI to have concerns about the safeguards to protect the rights, privacy and safety of both individuals and society, especially when they are potentially standing in the way of big-tech achieving enormous profits. Unfortunately, there is an inevitable conflict between AI's insatiable demand for ever more data, which Rashmi is illustrating, to deliver ever more innovative services, versus the individual's right to safeguard their personal data from being used without informed consent. As security professionals, we're right in the middle, tasked by the organization with securing fast, friction-free deployment of new AI apps, while simultaneously we're the ones needing to ensure compliance with all relative security regulations, including the new EU AI Act, which comes into force this month, and that does indeed ban the use of - or some use of - AI where it's deemed too risky. It's no wonder that CISOs might sometimes feel like they're in the Charge of the Light Brigade, "rushing headlong into the valley of death," believing "theirs is not to reason why, theirs is but to do or die." While the Tennyson poem says of the soldiers, "theirs is not to make reply," with CISOs, their role is to explain to management exactly what the risks are along with their potential consequences, and to understand themselves both in the social and ethical context, but also then deploy the appropriate technology and best-practice strategies to ensure that they actually meet the compliance with minimal friction. While the AI regulations - from the EU, Executive Orders, or guidelines from various territories - do vary, but the steps that cybersecurity professionals should be taking remain fairly consistent. Beyond an understanding of the regulations that do apply in your geography and industry, CISOs also need to implement technical steps for compliance. First is data anonymization to protect user privacy while training AI models, this should include removing or encrypting personally identifiable information from training data. We can also use techniques such as differential privacy to add noise to data. For example, when we're building a recommendation system, we can anonymize user profiles by hashing user IDs and excluding sensitive attributes. Organizations also need to undertake privacy impact assessments, evaluating the risks and mitigating them. First, we identify any AI projects that will require a PIA, we describe the information flows, assess the privacy risks and propose solutions to address those identified risks. For example, before deploying a chatbot that processes personal data, we need to conduct a PIA to assess what are the privacy implications. Possibly, the most difficult step is model explainability. We need to ensure transparency for high-risk systems, though this is easier to state by me than to actually deliver. In some cases, we will be able to use interpretable models, the decision tree and linear regression approaches. We can also generate feature important scores and provide explanations for model predictions. For example, in a credit scoring model, we will need to be able to explain why a certain applicant was denied credit based on specific features. One thing where we do have control is consent management; we need to ensure that we obtain user consent appropriately. This includes clearly explaining data usage and processing in consent forms allowing users to opt in or out, and then recording the consent details. For example, when deploying a health-monitoring app, we need to obtain explicit consent for collecting health data and only use it for the purpose for which consent has been given. Then, we need to monitor our compliance by conducting regular audits of AI systems, check if they adhere to the latest privacy regulations and update policies based on audit findings. For example, regularly review AI algorithms used in our credit scoring to ensure fairness and compliance. When it comes to balancing compliance and innovation, we need to encourage collaboration and communication between our cybersecurity professionals and developers: that's implementing shift-left approaches to get security built at the development stage of an AI project, highlight the importance of iterative improvements, emphasizing how we have continuous enhancements and consider the latest best practices for minimizing friction on users. We can deploy AI tools and automation to conduct scanning on AI models that we ingest for vulnerabilities or use of inappropriate learning data, and create SBOMs to facilitate patching when necessary. The privacy - innovation risk balance may well be set by others, but we will need to play our part providing the technology to ensure that regulatory compliance of the AI systems deployed is done without being a block on innovation.
Delaney: A rich answer there, Tony, lots to think about. Thank you for weaving in some poetry. I think, that's a first on the Editors' Panel. Tony, as AI, becomes more integrated into business processes, and there's this more increasingly complex regulatory landscape, how do you think the role of the CISO will evolve?
Morbin: It's going to be different in different places. Some will take the, "ours is not to a reason why," and they will "go ahead, do as they are told," whatever that may be. Some will err on the side of the board, some will err on the side of making sure that they meet their compliance. Ultimately, for the individual, you need to meet the compliance, because you're now potentially personally liable. Compliance will be key, because it's increasingly being enforced. But, then we get the whole thing - as Tom was saying - about are we then going to get regulation in other territories that perceive a benefit in having a lighter touch regulatory approach, whether that be the U.S., which has a very light-touch approach, or the U.K., somewhere between the EU and the U.S. Even within the EU, there are exceptions for things such as military and research. We are going to see different approaches, but the general thrust is going to be, people do want to have some transparency over where their data is being used. As Rashmi said there's an insatiable demand for data by AI, you know, even while Apple is trumpeting Apple intelligence having greater privacy. At the same time, it's obvious that if your phone is going to be able to identify that it's your daughter speaking, because it has looked inside your emails and knows that this person is your daughter, and it's looking at all the interactions you've ever done. There's a great deal of information that you're going to have to consent to give if you want to get the functionality. The main thing is that you have the option of giving that consent or not.
Delaney: I think we agree there, and as there are more questions than answers, we'll be returning to that, of course again as well. Tony, thank you. Finally, just for fun, what nostalgic technology or game or item from your childhood do you think would be fascinating to see recreated or enhanced by AI today? Tom, go for it.
Field: You may remember the old Choose Your Own Adventure books, where you could read along and get to a critical place in the story, and you could go in this direction or that. Think about how AI could enhance that experience. Digitally, it wouldn't be just the limitations of what the author and the publisher gives you. You can explore no limitations because of your own imagination and the directions that you might go. I think that would be a fabulous application, and I wouldn't mind being a part of that.
Delaney: A lot of fun there. Rashmi?
Ramesh: For me, it would be a game called Antakshari. It's the song version of Word Building where one person sings a couple of lines of a song and the next person needs to come up with a song that begins with the same syllable the previous player ended with. Where AI would probably help is it could offer suggestions if a participant is stuck, because it's not easy to remember 1000s of songs, right? It could help keep score and have a virtual multiplayer session where friends and family across the world could join in and even be a co-player, because with Antakshari, more is always merrier.
Delaney; That's great. Tony?
Morbin: You might view this as a failure of imagination, but as I thought about this, AI would take all the fun out of things, because it's going to reduce the skill required, the unpredictability of the outcome. I was thinking in terms of competitive games. If I had all the information to win any game, so would my opponent, and then it would be the match telling us who's the winner before we began, or everything just becoming too easy. Although, perhaps fewer of my pets would have died. However, AI is a great predictor. Perhaps, the "I predict your future" game would be more accurate with AI, but even there, it could become too serious and become more of a career advisor than a game. I'm more in favor of sending the kids outside and get on with it.
Delaney: You've thought about that one, Tony. I'm reimagining the game conkers. Around autumn, or the fall, you attach these horse chestnuts that fall from the trees, and attach them to this string, and you whack them! The one that remains intact is the winner. You could upgrade this experience by transporting yourself to an AI-powered VR game called Conkerverse. I think you imagine the setting stunning landscapes and chanted forests and mystical castles and magical conkers, and you have AI opponents and allies and experience all the twists and turns and tournaments. I think that's a significant upgrade.
Morbin: I was even thinking of conkers as one of the things that if you've got this thing on the string, you try and smash the other one, but if you could work out the velocity, the size, the angle, and all the rest of it, and you're sure to win, well then, as I say, it becomes maths. I want the unpredictability of real life!
Delaney: Thank you so much, everybody. This has been excellent, great fun and very informative. Until next time, thank you so much.