Editor’s note: This story is part of a series about the impacts of disinformation, how to guard against it and what researchers are doing to stop its spread.
It seems like magic — you read one article about exercising and suddenly you’re bombarded with stories and advertisements promoting new running routines or the best active wear. This is not by accident, it’s by algorithm.
Machine learning and artificial intelligence (AI) are becoming increasingly integrated into how we interact with technology, especially in the kind of news we see. This becomes an issue when people create and share news that isn’t true.
“There's a lot of potential for AI in this area, but there's no way you're going to be able to just make AI stop everything, partially because this is a really perfect storm of people and technology together,” says Nadya Bliss, executive director of Arizona State University’s Global Security Initiative. “There has to be an understanding of the impact on people and what makes people spread, look at and absorb disinformation.”
GSI works across disciplines to develop new approaches to security challenges. The initiative’s research effort in disinformation leverages ASU strengths in narrative framing, journalism and computer science to tackle this pervasive problem.
ASU professor and GSI affiliate Huan Liu and doctoral student Kai Shu are helping address disinformation by developing an algorithm to detect “fake news.” They co-edited a book with two researchers from Penn State University, titled “Disinformation, Misinformation and Fake News in Social Media,” which was published in July 2020.
Liu is a professor of computer science and engineering with the School of Computing, Informatics, and Decision Systems Engineering in the Ira A. Fulton Schools of Engineering. Shu is a final year PhD candidate in the Department of Computer Science and Engineering at ASU under the supervision of Liu.
The pair spoke with a Knowledge Enterprise writer to discuss disinformation from a computer science perspective.
Is disinformation the same as fake news?
Liu: Disinformation is an umbrella, and fake news is just one branch. There are other branches, like rumors, spam and trolls.
You’re developing an algorithm to defend against fake news. What is this algorithm and how does it work?
Liu: There is no magic, we have to learn from data to make the learning algorithm work. You have some fact-checked or agreed upon news pieces that are fake, and you have some that are fact-checked and true. We can “train” machine learning algorithms with this kind of data set in a very simple manner. The challenge with learning with news data, however, is that topics are always changing. While we could just train a machine learning algorithm to learn from the past and then predict what's new, when topics change new challenges are presented.
Shu: We proposed a model called “Defend,” which can predict fake news accurately and with explanation. The idea of Defend is to create a transparent fake news detection algorithm for decision-makers, journalists and stakeholders to understand why a machine learning algorithm makes such a prediction. I am using explainable algorithms that can not only provide prediction results to indicate whether a piece of news is fake, but also provides additional explanations to the users.
This is important because if we can find explanations from the dataset, then journalists can understand which part of the news is more fake than others. That’s why we started this study.
The explanations from the datasets might include two perspectives, actual news content and user comments. In terms of the news content, fake news often includes some sentences that are fake, but others may not be fake. We want to extract the sentences that are fake to be able to fact check.
On the other hand, from user comments, we can see how people talk about news and that can provide additional information to the false claims in the original news. Some comments, however, are not very informative. Why is that? One example is if one piece of fake news is claiming that the president is giving citizenship to some Iranians, but this is a false claim, and we find a user comment talking about some additional evidence such as, “The president does not have the power to give citizenship,” then this can help explain why this news is fake.
Basically, we define and extract explanations from two perspectives — news sentences and user comments.
What is FakeNewsNet?
Shu: FakeNewsNet is the data repository of fake news that we are trying to build. Most existing datasets have very limited information. In this data repository, we are trying to provide different dimensions of information so that the researchers and practitioners can benefit from our dataset for their own study on this topic. The data set contains three types of information: news content; social context, which indicates the user's engagement regarding these news pieces; and information indicating the location or time that a news piece is spread on social media. So, we have various dimensions of information included in this data repository.
We start by collecting news content from fact checking websites, so we do not decide ourselves whether this piece of news is fake or not. We're just collecting these data sets from fact checking websites. They have journalists and experts who carefully read each piece of news and will assign a label, which indicates if it is fake or not. We collect that information.
Is your algorithm designed for a specific social media platform?
Shu: The way we utilize the data does not have to be from specific platforms. We started with Twitter to develop the model. On this platform, we can leverage user comments and replies. We can apply our model in different platforms, but for now our data is coming from Twitter.
What makes Defend different from other fake news detection software?
Shu: The uniqueness of our tool is that we are looking at different dimensions or different perspectives of social media information. For example, we are looking at user profiles and user comments. We're looking at the propagation networks of how this news piece is spreading on social media.
And also, we're the first to look at how we can detect fake news at the earliest stage, using only news content without any user engagement.
So, we are studying these different challenging scenarios very early, ahead of other people. We are not only trying to detect fake news, we are looking beyond detecting in that we want to provide explanations, we want to detect at the earliest stage. We are also studying how we can adapt our traditional models into different domains.
For example, in the COVID-19 scenarios, there is a lot of misleading information, fake news, mostly about the public health domain, while existing models, they might only look at the political domain. So how can we create a powerful prediction model that can generalize across different domains, across different topics?
You have a book coming out on July 20, “Disinformation, Misinformation and Fake News in Social Media.” Why do you think it was important to publish this book?
Liu: Many experts contributed to this book. It's not just our work. We noticed a lot of people are working on this important topic, and we thought that it would be good to have a convenient point for researchers from different disciplines to share and get access to what's going on in this domain. And also, we try to put these three related concepts together. So, it will help advance the field and also it will help practitioners to know what's going on, to take advantage of the state-of-the-art research findings.
Why do I see what I do on my social media pages?
Liu: Online platforms, like Google, Facebook or Twitter, will try to feed you something you like. And that's part of how machine learning works. They learn what you like because they cannot give you everything. The goal for them is to get you to stay on their website as long as possible. They also want to serve you information, so they have to make a trade-off and then they will select the things you like.
Of course, they can also send things you don't like. Sometimes you see the ads, right? But they still try to relate their ads to your readings. So that’s why everyone sees different things, it’s not just you or computer scientists, algorithms affect everyone. Even Kai’s and my feeds will look different. You see what you see on social media because everyone has a profile and the articles you read helps them to build your extended profile.
Nobody would like to see something we dislike. So that's how they get you addicted to their sites, if they do the tradeoff well, so you don't get fed up. Gradually, without your notice, you will be in a world of your own, or a filter bubble.
What advice would you have for media consumers?
Liu: Don't be too lazy. You should be open minded and look at other sources, not just the sources presented to you, but human beings are lazy. So, without our own notice, we just get into this filter bubble comfortably or happily. That's a mistake that will make you more and more narrow-minded.
That's why our fake news detection or disinformation detection algorithms could help. If we just simply tell people, “Hey, this is disputed,” then if they are open-minded, they will probably say, “Okay, let me look at it.” It's a very complicated issue. A computer scientist cannot solve this problem of disinformation mitigation alone. We can help social scientists to try to mitigate this effect.
Shu: I talk with journalists a lot. Learning from them, there are some very simple heuristics that the public could avoid some fake news pieces, but not all. For example, if you look at the URLs of the website, there are malicious signals of the website you should be careful about. Or if the headlines are very catchy or very sensational, you have to be careful. So, there are different rules that journalists state you can use to educate the general public to not fall into fake news.
Liu: We should be more diligent when we consume information online and be careful of the fact that not all information is genuine, and also be aware that we can be easily fooled.