Unveiling the Future of Gadgets — Unveil the Latest Gadgets & Tech Trends

AI Tools Require Unique Evaluation Measures for Journalists' Usage

AI performance evaluations by corporations fail to gauge essential aspects in newsroom settings.

, and Administrator

2025 September 25 . 1:42 PM

2 min read

AI Tools Require Specific Evaluation Measures for Journalists

AI Tools Require Unique Evaluation Measures for Journalists' Usage

The world of journalism is witnessing a significant shift with the advent of Generative AI in the Newsroom. This project aims to develop benchmarks tailored to journalism, focusing on six core use cases: information extraction, semantic search, summarization, content transformation, background research, and fact-checking.

However, a growing concern is that the evaluation methods used by major AI companies may not accurately reflect the capabilities of these models. A study of ChatBot Arena, a widely used benchmark platform, revealed that companies like OpenAI, Meta, and Google test numerous variants of their models privately and release only the best-performing scores, potentially misrepresenting the actual capabilities of the models.

This process, according to the study authors, encourages overconfidence, and there's a recognition that popular benchmark tests used to evaluate AI models' performance fail to capture their real-world capabilities. Performance tests often reward models for guessing rather than declining to answer if they aren't certain.

The BBC conducted its own tests and found that AI tools often distort the content of articles. Building open datasets for newsroom benchmarks raises questions about confidentiality and resources. A recent Muck Rack study found that while AI models cite journalistic sources, little is known about whether they accurately represent reporting, cite sources correctly, or provide sufficient context when repackaging articles.

In an effort to address these concerns, researchers are pushing for a fundamental rethinking of how large language models are evaluated. They advocate smaller, task-based evaluations grounded in social-science methods, prioritizing adaptability, transparency, and practicality, and focusing on "highest-risk deployment contexts" such as medicine, law, education, and finance.

Individual newsrooms should try to evaluate AI tools directly on the tasks they care about most, designing "fail tests" that reflect their editorial priorities. AI researcher Nick McGreivy likened this situation to letting pharmaceutical companies decide whether their drug should go to market.

The field of medicine and law is already developing domain-specific benchmarks for AI models, and efforts are underway for journalism as well. Establishing clear standards for the third-party evaluation of AI models in journalism is crucial for ensuring responsible and trustworthy uses of AI. As AI continues to play a growing role in the newsroom, it's essential that we prioritise accuracy, transparency, and ethical considerations in the development and evaluation of these powerful tools.

Latest

This is an edited picture of a forest where we can see trees, path and the sky.

Explore Gadget Flare's Tech Data & Cloud Computing Solutions

Kamchatka Residents Get State Forest Registry Extracts in Just 10 Minutes

Say goodbye to long waits! Kamchatka's new digital system delivers state forest registry extracts in just 10 minutes, boosting convenience and efficiency.

, and Administrator

2025 October 9

In this image we can see a watch in a box. There is a white color paper with some text on it. At...

Wearables

Amazon Prime Day: Grab Ben Affleck's Timex Expedition Scout from 'The Accountant 2' for Under €60

Get your hands on Ben Affleck's on-screen timepiece before 'The Accountant 2' hits theaters. This stylish and affordable watch is a must-have for adventure enthusiasts and movie fans.

, and Administrator

2025 October 9

In this image there is a text written on the compound wall, behind the compound wall there are...

Climate-change

Axpo Misses Renewable Energy Targets, Coupon Premiums Rise

Axpo fell short on its renewable energy targets, triggering higher coupon payments. Despite this setback, the company remains committed to its sustainability goals.

, and Administrator

2025 October 9

As we can see in the image, there is a woman wearing bag and on road there is a car.

Stay Ahead of Cyber Threats with Gadget Flare

BlackByte Ransomware Gang Resurfaces With Sophisticated EDR Bypass Attack

BlackByte's new attack method disables EDR and ETW features, rendering ineffective EDR vendors. This development highlights the need for adaptive security measures.

, and Administrator

2025 October 9

AI Tools Require Unique Evaluation Measures for Journalists' Usage

AI Tools Require Unique Evaluation Measures for Journalists' Usage

Read also:

Related

Latest