Social media platform Twitter on Wednesday rolled out a new feature that will prompt users to review and revise “potentially harmful or offensive” replies. The feature will first be available for Apple and Android device users that have the English language enabled in their settings.

In a blog post, Twitter said that they started testing the feature last year. It said that while in early tests, people were prompted unnecessarily, the feature has ultimately resulted in less offensive replies and improved behaviour on Twitter.

Advertisement

The social media platform said that when prompted, 34% of the users revised their initial reply or decided to not send the message at all. It said that after being prompted once, users, on average, composed 11% fewer offensive replies in the future.

“If prompted, people were less likely to receive offensive and harmful replies back,” the company said.

Twitter pointed out that during the early tests, the algorithms used in the feature faced problems in capturing the nuance in many conversations and also often did not differentiate between potentially offensive language, sarcasm and friendly banter. “Throughout the experiment process, we analyzed results, collected feedback from the public, and worked to address our errors, including detection inconsistencies,” the social media platform said.

Advertisement

It said that the company incorporated some changes to improve upon how the prompts are being sent out to the users. Twitter said that it took into consideration the relationship between the author and replier, including how often they interact. “For example, if two accounts follow and reply to each other often, there’s a higher likelihood that they have a better understanding of the preferred tone of communication,” it said.

The company also said that it made improvements to its technology to more accurately detect strong language, including profanity.

The social media platform said that it will further look into how these prompts can help encourage healthier conversations on Twitter, adding that the feature will be expanded to other languages.

Advertisement

Twitter has been under pressure to clean up hateful and abusive content on its platform, which are managed by the company’s technology and users flagging offensive tweets, reported Reuters. The company’s policies do not allow users to target individuals with slurs, racist or sexist messages or degrading content.

“We’re trying to encourage people to rethink their behaviour and rethink their language before posting because they often are in the heat of the moment and they might say something they regret,” Sunita Saligram, Twitter’s global head of site policy for trust and safety, told Reuters.

The company has taken action against about 3,96,000 accounts under its abuse policies and over 5,84,000 accounts under its hateful conduct policies between January and June last year, according to its transparency report.

Advertisement

On Tuesday, Twitter had permanently suspended the account of actor Kangana Ranaut after she repeatedly violated the company’s policy on “hateful conduct and abusive behavior”.

The actor had tweeted about the West Bengal Bharatiya Janata Party leadership alleging that party supporters were being targeted after the results of the Assembly elections were announced. She urged Prime Minister Narendra Modi to “tame” Chief Minister Mamata Banerjee using his “virat roop” or destructive side from “early 2000s”. She was purportedly referring to the 2002 Gujarat riots.

On January 8, Twitter had also permanently suspended the then United States President Donald Trump’s account, citing the “risk of further incitement of violence”. Trump’s Twitter account was first blocked for 12 hours on January 6, after thousands of his supporters stormed the US Capitol in Washington DC, and clashed with the police.