We need to rethink what we consider as “content”

We need to rethink what we consider as “content”

The fight against misinformation can be compared to a big clean-up initiative: Think of your neighbourhood is covered with trash, not only from last week but several years. Littering everything is done quickly, but cleaning up takes much longer.

What is important here is to rethink, critically, what we consider as content. Yes, a news article is clearly content. But there are today many other forms of information bits that are either preceding or following pieces of traditional news media content. And it should be clear that all these added bits of information must be reliable and trustable. 

When falsified content goes viral, lives might be in danger

One big criticism towards large social media platforms is about not having foreseen and then later not having acted against dynamically generated information, based on posts, comments, discussions. When falsified content goes viral, lives can be in danger. There are documented cases where false information was used to incite anger among a group, sometimes leading to angry mobs in the streets burning down the house of a victim of such allegations

We must, as a result, be clear that all elements of what we consider trustable information must be verifiable. Content components include of course written text. But the definition of what is the content must include pictures, artwork/visuals, videos, video stills, numerical data, system data, raw or aggregated data, algorithms. In addition, we must include user-generated content, such as discussions, opinions and other interactions on social media. So far, on a technical level, it is very difficult to separate one from the other, if only the words are analysed. One take-away is that all kind of meta-data must be included in the analysis, too. 

Most difficult: Mixing true and false information

A very difficult problem here is when correct content is taken out of context and is combined or mixed with false or fabricated information – the connection is hard to distinguish, specifically by automated searches. Humans might be able to see the difference, but there is no feasible way that every item of information is checked for plausibility or truth by a human. Search strings and search methods to identify new information are as important to evaluate as other forms of information detection. 

Just one example to illustrate how technical systems can be tricked is the manipulation of publication dates for information. A malicious actor might have written false content, with catchy headlines. In order to let a search spider pick up the content as new, it is sufficient to re-publish the content, potentially on a different website and under a different URL and IP address. 

Fraud detection can be tricked

Much of the data used by Google for search depends on what website owners and content creators provide. Of course, there are all kinds of fraud detection, but in a world where not many data points for content that is published via Content Management Systems can be verified at the source, the search engines do have not many options. Further, of course, the number of content sources that are intentionally falsifying what they publish is always only a fraction of the total. But because they have a chance to go undetected they can do so much harm. 

If a website methodically refreshes dates for the content on its pages there are not many ways for search engines to detect this. Or, in other words: With the right motivation and a little bit of know-how, it is possible to trick search spider software into believing that a recycled article has been published very recently.

What is the main motivation to invest work into falsified content? Presumably, the main and the most frequent motivation is simply to make money. Running a partially automated fake news system and connecting it to an advertising platform can result in a very good payout. 

Perspective: $50 billion lost to ad fraud by 2025

A report published by industry organisation IAB Europe says: “According to the World Federation of Advertisers (WFA), it is estimated that by 2025, over $50 billion will be wasted annually on ad fraud.” 

After the 2016 election researchers found that a considerable number of entirely faked articles were coming from a region as far away from the US as Macedonia. Some people there had learned how to make money through digital advertising and the key was to write entirely falsified, but outrageous articles about political candidates. The wilder the allegations, the better the click rates and shares for such content. This created a mini-industry based on “fake news” in the region. 

The techniques which make ad fraud successful can be used for political or criminal disinformation campaigns. The financial motivation for ad fraud exploits helps to build experience and a lot of practical knowledge on how to mislead existing platforms, which then can be reused for targeted disinformation campaigns. 

These are some, but not even all arguments why we need to broaden our understanding of what is content. Over time there should be detection measures, even at the source where the information is published, to enable 100% verification. 

Examples of funded projects from TruBlo

Among the ten projects which received funding in the 1st open call are several which are explicitly looking for new ways to detect falsified information and content. Some examples below, full list can be found here.

CONTOUR – Trusted content for tourism marketing purposes

LEDGEAIR – Aircraft data mining framework

ShoppEx – to restore the trust between retailers/brands and consumers

More information:

IAB Europe: Guide to ad fraud, 2020