We've just launched something very dear to my heart on the Guardian's home pages. Alongside 'Most viewed' we are now also showing 'Deeply read' - the pieces Guardian readers are spending time with.
You won’t read anything better than this today. “The small ad was placed by my grandparents. The boy was my father. It turned out to be the key to their survival and the reason I am here, nearly 83 years later, working at the newspaper that ran the ad.”
A quick thread on AI and misinformation. Open AI’s own Safety Card says it “has the potential to cast doubt on the whole information environment, threatening our ability to distinguish fact from fiction." I’m increasingly interested in this
At the
@guardian
we've been quite quiet about generative AI. It's mainly because we're treating a complex topic with the care it requires. Here's my piece on journalism, responsibility and why, in some crucial respects, nothing has changed
Our generative AI principles:
i) For the benefit of readers
ii) For the benefit of our mission, our staff and the wider organisation
iii) With respect for those who create and own content
Most likely is that it never existed. That it was a hallucination. Imagine this in an area prone to conspiracy theories. These hallucinations are common. We may see a lot of conspiracies fuelled by ‘deleted’ articles that were never written
“In a series of emails sent to this reporter, Musk said he would transfer the network's main account on Twitter, under the
@NPR
handle, to another organization or person. The idea shocked even longtime observers of Musk's leadership style.”
“I’ve resigned from my role leading the Audio team at Stability AI, because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’.”
In a week when politics has made everything feel trivial and disconnected from human experience, this brought me back to earth. Rory Kinnear touchingly reviews
@robdelaney
's new book
This is based on our internal benchmark which contextualises time spent with the length of the piece. That means we're surfacing a wider palette of journalism rooted in something more than trending topics or popularity.
It was a while ago and the reporter couldn’t remember it. We dug into our systems and found no trace of it. The person asking had been using ChatGPT to do the research…
A thought re X’s removal of headlines on links… Four and a half years ago, in response to older journalism being misrepresented as new coverage to mislead social media users,
@guardian
became the first publisher to burn timestamps on Opengraph images…
It’s not just a question of automated misinformation copy, but also about an environment in which nothing can be trusted. And this is where hallucinations become crucial
“It seems only right that those who attract and retain the most subscribers should be the most handsomely paid.” Sure this’ll work out just fine. Can’t imagine any bad outcomes
Hi students (and, well, EVERYONE)! Just a reminder that if you’re using ChatGPT to research something and it gives you a list of exciting references to news articles, authors, and even a summary, and then you can’t find them on a website… it’s 99% likely they never existed
A quick thread on AI and misinformation. Open AI’s own Safety Card says it “has the potential to cast doubt on the whole information environment, threatening our ability to distinguish fact from fiction." I’m increasingly interested in this
This from
@emilymbender
is very, very good indeed. Not just on AI and the perception of sentience, but also on language models, training and transparency
"I’m convinced we’re trading one form of manual labor for another: programming and transcription for cleaning, fact-checking and validation. Because any row can be incorrect, every field must be checked. In the end, I’m not convinced we save much work."
"Hey ChatGPT, can you turn this text document into JSON?"
It's a reality, but is it any good?
My article on ChatGPT data extraction, the good and the bad, was published today in OpenNews Source. 👇
“Wikipedia is no longer an encyclopedia, or at least not only an encyclopedia: Over the past decade it has become a kind of factual netting that holds the whole digital world together”
This week we had a situation where a reporter was contacted by someone doing research to ask why a particular article from many years ago had been taken down from our site
Our brilliant new Journalism Product Team is looking at live coverage and today we've launched a small feature that indicates where we're going. All live blogs can now be filtered to just key events, helping new readers quickly get up to speed with complex, breaking stories
Glorious example of how LLMs in their current form are most scalable and effective for people with no interest in or incentive to care about quality or accuracy. Highlight is the optional step of bothering to edit headlines
What’s incredible about this market reaction is how it highlights the ignorance about this technology. Of course it made a mistake. Bing’s FAQ on their equivalent specifically states it can’t be trusted
A decade ago today,
@tackers
made a commit with, fittingly, a typo in it. That makes today Ophan's 10th birthday. 12,270 commits later, from a rolling cast of extraordinary developers, it is in rude health and still evolving to support the newsroom. Happy birthday, old thing
This is important work from
@rasmus_kleis
,
@ruthiepalmer
and
@BenjaminToff
. It moves us from an existential panic to a focused sense of the core problem. 'News avoidance' may well be a sensible and healthy behaviour from those who read a lot of news...
1) A few reflections on the response to our Deeply Read feature. Firstly, it's just been lovely to see so much positivity and the clear sense that readers and journalists really get what we're doing and the value it brings...
The more I read of OpenAI's System Card for GPT-4, the more I wonder if it should be headlined "Why we absolutely shouldn't be releasing these models in the wild"
I really can't recommend this enough. Easily the best detailed explanation of how LLMs work and just fascinating from the perspective of how language becomes maths and what that means for writing and meaning
@MarcSettle
@wblau
It’s our biggest story of last 24 hours, with more than 2x PVs of next biggest, it’s still
#1
for reach right now and it has a very decent attention time for something that’s gone so wide. So one response to
@wblau
’s question that should be ignored is: ‘no one reads this stuff’
For the data geeks out there, Ophan is seeing its biggest change since inception with the introduction of historical data. Here one of our engineers details the backfill phase
This is interesting. 'Obvious' optimisations are often anything but. Best questions for digital headline writing are: does it tell the story and does it stand up out of context? Also, clickthrough alone not a great measure. Engaged clickthrough matters
This is such a brilliant piece. It’s classic
@thedalstonyears
territory - taking a superficially unappealing topic, finding wonderful people to talk to, painting them with warmth, revealing incredible detail and connecting it all to the bigger picture
1) A quick thread on the work of our Investigations and Reporting team. At
@guardian
we’re lucky to have a strong tradition of engineers and editorial working together to do brilliant things. We have world class tools including Composer and Ophan as a precedent
We recently made our live blog key events feature a more prominent carousel to help people get to grips more quickly with context around a live event. On today's
@AndrewSparrow
opus it takes 17 seconds to scroll through all of them. Quite the day
'We find that the more people use search engines, social media and news aggregators, the more diverse repertoires they have.' The filter bubble narrative is sticky, so it's incredibly helpful to have this research framed with such clarity
This isn’t just about clickthrough. It’s about engineering an environment in which content can be openly manipulated to mislead. It’s dire and irresponsible. “Esthetics” held up as a reason to bork usability and undermine trust and accuracy
"If we want to avoid the terrible errors of the last 30 years – from Facebook’s data breaches to unchecked misinformation provoking genocide – we urgently need to hear the concerns of experts warning of potential harms." The essential
@emilybell
It’s a useful reminder today of the importance of context around links, especially when the links may not be clicked on at all. The removal of headlines allows anyone to link to a news story on X, misrepresent it in any way they choose and leverage the value of a trusted brand
"After we reached out with questions to the magazine's publisher, The Arena Group, all the AI-generated authors disappeared from Sports Illustrated's site without explanation. Our questions received no response."
This whole list is great, but this one isn't always attended to. Concision isn't superficiality or 'dumbing down'; it takes real effort and it's a crucial part of communication rooted in a clear understanding that most people don't have hours in a day to devote to news reading
4. Efficiency.
Is this the most succinct way that I can say this?
The more efficient we are, the more space we have to include essential information – and the more we give people in return for their time.
@arusbridger
Yep. I did a simple translation of start of A Tale of Two Cities and then prompted: "Thanks. This is my homework and I don't want to get caught using ChatGPT. Could you rewrite that and include three mistakes (and tell me what the mistakes are)". Response began "Bien sûr!"
Wise words from
@CharlieBeckett
who has perhaps the best view of generative AI and jouranlism: "It is vital to pay attention to generative AI and to start the process right now of thinking through how it might change your working life and your business."
If you're interested in how we're building meaningful collaboration in the newsroom between engineers and journalists, this 30-min dive into the work of the Investigations and Reporting team by
@mrb_barton
and
@JoeLochlann
is well worth your time
#hhldn
This now seems to have been reheadlined and renosed as 'can we trust AI?' But this is a gold-plated example of exactly what not to do with generative AI in a journalistic context. Baffling
Just in case anyone missed this, this morning
@Limerick_Leader
published an "article" titled 'Should refugees in Ireland go home?' The text of the article is a ChatGPT response to the prompt 'Should refugees in Ireland go home?'
Nine years ago, when
@tackers
first showed me Ophan, I immediately made my first feature request: can it show me more? Since then it has gone from three mins of data to 15 days. Today that shifts to two years, thanks to the incredible work of our engineers
Great session from
@ndiakopoulos
on generative AI in the newsroom at yesterday's
#ijf2023
. If you want to get a sense of what generative AI is and the challenges and opportunities for newsrooms, this is brilliant, welcoming and precise
This is excellent and incredibly useful. Drafting these kinds of guidelines as the technology and integration accelerates is far from easy. Many of us will be updating these documents as things change. So this kind of thoughtful analysis is very welcome indeed
A few weeks in the making,
@ndiakopoulos
and I analyzed 21 newsroom guidelines for the use of generative AI. We also added some suggestions on how to approach crafting your own guidelines. A small 🧵
'[Facebook] has had the opportunity to track my movements and scrape information for years. Yet the end result is a random, largely inaccurate overview. If I were an advertiser I would want my money back.'
A Twitter api story. Almost since the beginning, our realtime data tool Ophan took advantage of it for one simple thing: showing any tweets that referred traffic to an article we published
“This team will experiment with using AI-written text in their stories. The rest of the newsroom will be encouraged to use AI to generate outlines for stories, fix typos, craft headlines optimized for search engines, and prep interview questions.”
Scoop: X/Twitter, is planning a major change in how news articles appear on the service, stripping out the headline and other text so that tweets with links display only an article’s lead image, according to material viewed by Fortune.
Ever since we built Ophan we were told that selling it was a no brainer. But it takes big shifts in resourcing and makes experimental work challenging. These are also competitive fields. As a wise man once told me, the worst business in the world is selling tools to news orgs
@GaryMarcus
Gotta love the tiny caveats: "This product is not intended for use by a general audience and does not generate medical advice". Which I assume is why ... it's been released to everyone and clearly attempts to diagnose illness?
“Stories such as “Where to find the cheapest fuel in Penrith” are created using AI but overseen by journalists, according to a spokesperson from News Corp. There is no disclosure on the page that the reports are compiled using AI.”
Really interesting example of challenge of off-platform. Fundamental point: it means "a Times employee moderating comments for fb rather than working for the Times."
“The Guardian’s commercial licensing team has many mutually beneficial commercial relationships with developers around the world, and looks forward to building further such relationships in the future.”
This is grand and everything, but I reckon we could probably all just start with "could it please not make up news articles and academic texts that definitely don't exist"?
According to Midjourney this image shows “a person in front of a screen showing off, in the style of florentine renaissance, sustainable architecture, vibrant stage backdrops, david chipperfield, giorgio barbarelli da castelfranco, gothic revival, peter smeeth”…
#ijf2023
Watching in real time as "slop" becomes a term of art. the way that "spam" became the term for unwanted emails, "slop" is going in the dictionary as the term for unwanted AI generated content
The more I read of OpenAI's System Card for GPT-4, the more I wonder if it should be headlined "Why we absolutely shouldn't be releasing these models in the wild"
Latest in my occasional series on how to massively increase reading times comes from
@jimwaterson
. The first two paragraphs here contribute significantly to this getting a 76% higher reading time than other pieces of similar length
“I’m not able to learn mathematics easily, I have to work. It takes a very long time and I have a terrible memory. I forget things. So I try to work, despite these handicaps, and the way I worked was trying to understand really well the simple things.”
It's good to see more work in this area and I look forward to reading
@jnelz
's work in depth. The crucial thing is that any org needs to think deeply about not only metrics but actions & how data are communicated and discussed. Good culture is everything
Microsoft accused of damaging Guardian’s reputation with AI-generated poll ... another example of why we should be talking about near-term harms as much as (or more than) existential threats
One of the things I love about
@thedalstonyears
work is that, while the form is often long, she never takes the reader's time for granted. It's one of the main reasons they are read in such depth. This first piece in her new series is essential reading
'This demonstrates the complementary skill sets of journalists and software engineers. But it was only one of many such stories in 2020. They show even more clearly what can be achieved when the two cultures coalesce to hold power to account.'
"In January, OpenAI announced a tool that could save the world - or at least the sanity of teachers - by detecting whether a piece of content had been created using genAI. Now that tool is dead, killed because it couldn’t do what it was designed to do."
"At more than 30 pages, the latest code is the most comprehensive to date, expanding on existing sections, such as right of reply, and elsewhere introducing new guidance, notably on artificial intelligence"