Professor at Waseda University, Tokyo, Japan. Interested in language data science, AI, corpus linguistics, educational technology, and the world at large.
I'm happy to say AntConc 4.3.0 for Win, MacOS, and Linux is now released, including the new ChatAI feature from recent beta versions. ChatAI allows you to interact with various LLMs and get insights on traditional corpus results. I hope you like it!
I'm very happy to announce that AntConc 4.0 is now officially released! I hope you like its many new features, its much improved speed, and especially its new sorting of KWIC concordance lines by pattern frequency. You can get it as a free download here:
I'm happy to announce that AntConc 4.1.0 for Windows, MacOS, and Linux is now released. It includes a new *online* corpus repository, a new wordcloud tool, and various tweaks to improve the user experience. You can get it as a free download here:
I'm happy to announce that AntConc 4.2.0 for Windows, MacOS, and Linux is now released. Highlights include a much faster startup time, an easier to use and more powerful corpus manager, and a new, simpler way to create corpora from wordlists:
I'm happy to announce a new release of AntConc (ver. 4.3.0) that integrates traditional corpus methods with Large Language Models (LLMs) via a new 'ChatAI' tool. I've kept it as a beta version for now, so I can get everybody's feedback. I hope you like it!
I'm happy to announce that AntConc 4.0.7 for Win, MacOS, and Linux is now released. You can get it here:
For a complete list of changes, see here:
I hope you like the new version!
I'm very happy to announce the release a completely new version of AntWordProfiler (version 2.0) for Windows, MacOS, and Linux, written from the ground up in Python. The new version offers many, many new features. You can get it free at the following URL:
I'm happy to announce AntConc 4.2.4 for Windows, MacOS (Intel and Silicon), and Linux. The new version fixes a few bugs and adds the LogDice and LogRatio stats. You can get
#AntConc
as a free download here:
I'm happy to announce AntConc 4.2.1 for Windows, MacOS (Intel and Silicon), and Linux . This is the first release that natively supports MacOS-Silicon, so please let me know if it works well. You can get
#AntConc
as a free download here:
Very proud to have a chapter on corpus software in the 2nd Ed. of the Routledge Handbook of Corpus Linguistics. Check it out if you are interested in how online, offline, and DIY tools can help you do a corpus analysis. Big thanks to the editors
@Anne0Keeffe
and
@ProfMMcCarthy
.
If you’re interested in using AntConc 4, I’ve now created some video tutorials to help. You can find the first three on my YouTube channel here with more coming: . All the videos will eventually show on the following playlist:
With the release of AntWordProfiler 2.0, I was asked if it worked with Arabic. The answer is yes! AntWordProfiler now works smoothly with all right-to-left languages. See the screenshot below where I created Arabic wordlists in AntConc 4 and imported them into AntWordProfiler.
Very happy to announce that our monograph, "Knowledge-based Vocabulary Lists" is now available from Equinox Publishing. It was a pleasure to work with Norbert, Karen, Barry, and Benjamin on this project.
I'm happy to announce AntConc 4.2.2 for Windows, MacOS (Intel and Silicon), and Linux . The MacOS-Silcon version should now run on BigSur and newer operating systems. I've also fixed a few more bugs. You can get
#AntConc
as a free download here:
For anyone interested, I'll be presenting on AntConc 4 for the first time at the LADAL Opening Webinar Series 2021 (online) on Sept 27 (Mon) at 5pm Brisbane, 4pm Tokyo, and 9am Berlin time. Anyone can attend. Just click the Zoom link at:
See you there!
A little self promotion... Today, I was very happy to receive the
@waseda_univ
6th e-Teaching Award for innovative uses of corpus methods (Data-Driven Learning) in doctoral student technical writing for scientists and engineers. 😄🏆
It was such an honor to give the opening plenary of
@aacl2022
today. I cannot express how much it meant to me when
@jesse_egbert
, in his wonderful opening, asked people to stand if they had used my tools, and the whole audience got to their feet. It was very moving.
If anybody is interested in presenting at a virtual corpus linguistics conference, the deadline for abstracts for JAECS2020 () is July 31. The conference will be completely free and presenters only need to submit a 300-word abstract. (I am the Chair).
Today, I finally completed all the work I needed to do on my book: Introducing English for Specific Purposes (Routledge). It should be available online and in bookshops from May, 2018.
Just updated AntConc 4.3.0 to beta2 with GPT-4o as the default LLM in the ChatAI tool. The AI responses are now very much faster and of much higher quality. Remember that AntConc uses API calls, with each call priced by OpenAI.
As I announced a few weeks ago, I'll be presenting on "Corpus AI: Integrating large language models (LLMs) into a corpus analysis toolkit" at the JAECS 2023 conference tomorrow. Check out the screenshot of
#AntConc
for a sneek preview!
It’s always nice to work with others. So, I was very happy to see this new book edited by Vander Viana arrive over the weekend. It features two short chapters that Jenny Kemp and I wrote together.
For anyone interested, my chapter on programming for corpus linguistics in the book "A practical handbook of corpus linguistics" (Eds: Paquot & Gries) is now out:
Thanks to
@MagaliPaquot
and
@stgries
for all their work getting to book to print.
I'm still in the UK, now attending
#ICAME43
. Today, I'll be presenting on "KWIC patterns" - hopefully a 'new normal' (the conference theme) for concordance line displays, which I recently added to AntConc 4. If you are able to come, it will be lovely to see you!
@icame43
.
I'm super happy to announce that my new book "Introducing English for Specific Purposes" has now been published. You should be able to get it at all major bookstores and online sites like Amazon.
Big thanks to everyone who came to my talk at JAECS 2023 on integrating Large Language Models (LLMs) into
#AntConc
. For anyone interested in seeing the slides, you can find them here:
I happy to announce that AntConc 4.0.3 for Win, MacOS, and Linux is ready. This release addresses a few issues I was contacted about. See here:
Two big improvements are KWIC columns are now resizable and N-Gram open slot variants can now be viewed.
I’ve now released three more tutorial videos for AntConc 4, covering the Collocate, Word, and Keyword tools. You can find them on my YouTube channel here: . A playlist of all the tutorials is here:
Hope you like them!
If you're interested in Data-Driven Learning (DDL), I'll be giving an online talk on April 29, 2022 (Friday) about how AntConc 4 was designed to address some of the common challenges. Registration is free and open to anybody. You can find the link here:
If anybody is interested in working with me here at Waseda Uni., Tokyo, Japan, we have announced a opening for an Assistant Prof. (contract). You'll need some basic Japanese to manage daily duties, but the uni is one the most international in the country.
I’ve now released three new tutorial videos for AntConc 4 covering the File, Cluster and N-Gram tools. You can find them on my YouTube channel here: . A playlist of all the tutorial videos is here:
Hi everyone! This weekend, I'm chairing the
#JAECS2020
corpus conference (online) (Oct 3/4). It's free to attend and we have two great plenaries from
@jesse_egbert
and
@adyantalamadhya
, and 40 great papers. If you have time, come along! ()
For a long time, people have said that the Plot tool of AntConc was quite limited. Is this better? The view shows a dispersion plot of "I" *overlaid* with a dispersion plot of "we" in the "learned" sub-corpus of AmE06 (Potts & Baker, 2012) sorted by the Juilland's D measure.
After 15 years developing
#AntConc
, I've finally put donate buttons on my download page! If you've ever wondered how you can support the development of my tools, it's now possible using
@Paypal
or
@Patreon
. I really hope the links are not too obtrusive and the services work well.
Very honored to be giving the opening plenary of the AACL 2022 conference. If you're interested in my thoughts on how texts, discourse analysis, corpus methods, and machine learning all connect, please come along!
@aacl2022
The end of a great first day at the
#LSBC2024
conference in Kuwait, where I gave my keynote on LLM assisted corpus linguistics. I also got to talk with many of the wonderful Kuwait University student volunteers. They did an excellent job making sure everything went smoothly.
For anybody interested, I'll be starting my upcoming one year sabbatical with a plenary in Kuwait at the
#LSBC2024
conference, where I'll be discussing my recent work on integrating corpus methods with AI. Really looking forward to going and also hearing the other great speakers.
#LSBC2024
programme is now shaping up. Here is a near-finalised version. A stellar line-up of keynote speakers and a group of top-notch scholars presenting on a wide range of AI- and corpus linguistics-related topics.
Follow for the PDF version:
The program for
#TaLC2022
in Limerick, Ireland is finally out. I'll be there in person to talk about the latest updates to AntConc 4 on Friday, July 15 from 10:00-10:30. First time to travel overseas in 3 years! Hope to see some of you there, too.
Recently, I've been working on a new wordcloud tool for AntConc. The problem is that changing the settings can dramatically affect the results. If you had a choice, which plot (if any) in the following figure would you want as the default? Comments/suggestions are very welcome!
Getting ready for my plenary at
#AESLA2024
on "Understanding and explaining discourse with the help of Large Language Models". First time to Valencia so looking forward to seeing everybody there. The talk includes some quite thought-provoking new results. Hope you like them!
If you are in Japan on September 9-10, 2023, I'll be presenting something quite new at the JAECS 2023 conference. My paper is: "Corpus AI: Integrating large language models (LLMs) into a corpus analysis toolkit".
Very happy to announce my joint paper with
@natalie_eloise
and Emma Marsden has just been published as is available as a free download. It describes our
#Multilingprofiler
tool for adaptive vocabulary profiling of French, German, and Spanish texts.
I'm very honored to be invited to give a plenary talk at
#LSBC2024
together with many top experts in the field. With a focus on corpus and AI perspectives, it's sure to be a really important and timely event.
#LSBC2024
Announcement 📣
Thrilled to announce that Prof. Laurence Anthony
@antlabjp
will be joining us as a keynote speaker at
#LSBC2024
.
Don’t miss this opportunity to learn from the expert and register now!
Attendee:
Participating: lsbc
@ku
.edu.kw
This is such great news! Many of the improvements I made to AntConc under the hood were to handle very large corpora, but I didn't expect it would be given a billion word corpus so soon! I'm so happy! 😊😊
Thank you for the very positive response to AntConc 4. To address a few issues, I've now updated the software to 4.0.2. The changes are: 1) RTL language support, 2) .dmg images for Mac users, and 3) a small bug fix in the Ngram tool. You can get it here:
Setting off shortly to spend a week as a visiting professor at Kuwait University, where I will also give a plenary at
#LSBC2024
on refining and redefining corpus linguistics with Large Language Models (LLMs). Very much looking forward to meeting the faculty and staff there.
Third trip to Europe in around 6 weeks! This time, I'll be presenting at
@CLconf2019
in a few joint papers with different authors including
@schtepf
,
@lovermob
, and
@nottyknight
. What's nice is that all the papers are related to data and tools... as well as... AntConc 4.0.
Big thanks to the organizers of the Frontiers of Corpus-Based Interdisciplinary Research Conference at Xi’an Jiaotong University. I really appreciated the fantastic support of all the staff and students, and was impressed everyday by the incredible energy of the participants.
It was really nice to see so many people at my
#CLSS2021
session on creating custom corpora for research yesterday. I think we had almost 180 people in the room! Thank you all for coming. Thanks also to
@mrkm_a
and the
@CCR_UoB
team for arranging everything.
#JAECS2020
has now finished. Thank you to the 2 plenary speakers
@jesse_egbert
and
@adyantalamadhya
, the 40 presenters, and the 315 participants, who all contributed to the success of the conference. It was a true pleasure meeting you all in the Zoom rooms and the chat sessions.
For anyone interested, I'll be presenting on methodological considerations in the selection and design of corpus tools with
@DrCrosthwaite
tomorrow (March 27, 2024). This is our first co-authored presentation, so it should be fun. Registration is free.
If you are interested in my presentations at
#ICAME44
, you can find the slides for "Understanding corpus text prototypicality: A multifaceted problem" that I co-presented with Nick Smith, Sebastian Hoffmann, and
@perayson
here:
On a bus to Tampere, Finland for
#icame39
. Will be presenting on
#Kaleidographic
() tomorrow at the corpus data visualization workshop. Hope to see some of you there!
The Vocab
@Vic
conference finished today. Really great to see so many friends and colleagues over the past 5 days. If you're interested in the slides from my presentation with
@natalie_eloise
on MultilingProfiler, you can get them here:
We had a great response to the JAECS2020 () call for abstracts. More than ever before! Because of the interest around the world, we've extended the call until Aug 18. If you were thinking of submitting, you still can! The event is online and free for all.
Getting ready for a week of conferences. First, I'll be talking about Data-Driven Learning at
#APCLC2018
in Takamatsu, Japan, and then I'll by flying over to Atlanta, US for
@AACL2018
, where I'll be presenting on
#AntGram
with
@uroemer
. Hope to see some of you this next week!
Heads-up for translators, editors, researchers and other language nerds working with
#corpora
:
Blog post just published on
@antlabjp
's
#AntConc
v.4, released Dec 2021. Read all about the new version here:
Wishing everybody a very happy new year from here in Japan, where it is now 2019! Last year was a quite incredible year with so many great collaborations and chances to meet fantastic people around the world. My New Year's resolution.start sticking to deadlines!! Wish me luck!!
Getting ready for my workshop on Data-Driven Learning (DDL) in the Technical Writing Classroom for
#talc2018
. Looks like lots of people have signed up. I'll be sending out some information on how to access the materials over the weekend. See you all in Cambridge!
Thanks to everybody who came to my presentation with
@DrCrosthwaite
on the selection and design of corpus tools yesterday. The comments in the chat were great. The slides are now on OSF and ResearchGate. Here's the OSF link. I hope you find them useful!
Flying back to Japan today after a fun
@CLconf2019
conference. Great to see so many old friends and also make a few new ones. Thanks to everyone who came to my talks. I hope to see you somewhere in the world at the next corpus conference.
Hi everyone. It seems that there was a problem with the video of my
#CorpusCast
interview, so they had to upload a new version. Here’s the revised link:
Happy new year, everyone! If you’re interested in listening to my
#CorpusCast
interview with
@lovermob
, you can access it at the links below. It was an honor to appear on the 25th episode of the series.
Lovely to see so many people live at
#TaLC2022
. Just finished chairing the Day 2 early afternoon session. I'd like to thank the presenters and the audience members for all the great questions.
@Noun_Fraze
AntConc 4 comes with an online repository. There are only two corpora there now, but I have unlimited space so if you know of other corpora that I can add and you think lots of people would use, let me know. BE21, BASE, and BAWE will be added soon.
Great news just in.
@natalie_eloise
has won the 2023 System Early Career Research Award for her work on our (open access) paper describing
@MLProfiler
. You can read the news here, , and read the paper here: . Congrats Natalie!
Arrived back in Japan today after a fantastic month at
@Sydney_Uni
hosted by
@Corpusling
. Everyone I met in Australia was incredibly kind and helpful, and there is a happy, positive vibe that seems to run through the whole country. It's a truly special, beautiful place.
Looking forward to giving a plenary talk on vocabulary profiling at the "1st workshop on the Readability for Low Resourced Languages" in about a hour's time. Hope to see some of you there!
For anybody wondering what happened to my main site earlier today, I can happily tell you that everything is working again. As a bonus, you'll see that I updated
#ProtAnt
and
#AntWordProfiler
, which both now come with MacOS native silicon versions.
Back in the UK for
#CL2023
. Just finished presenting on prototypicality with Nick Smith
@perayson
, and Sebastian Hoffmann. Great to see so many people interested in the topic! I'll next be presenting on vocabulary profiling with
@natalie_eloise
on Wed, July 5 at 11:30.
Finally, my copies of Corpus Approaches to Discourse have arrived with my chapter on visualization. Thanks for sending two copies so that I could take this great photo (I think I was supposed to only get one)!!
@_ctaylor_
@journolinguist
I just heard that the 13 billion parameter Llama-v2 large language model (LLM) model has been released from
@Meta
, which is fine tuned for chat completions. This is currently the most powerful open source model. So, I used my standard test to see how much it halucinates... (1/4)
Just received news that my co-authored paper with
@Medlec
and
@Corpusling
on
#Kaleidographic
(the brainchild of
@Medlec
) will be published in the
@IJCL_journal
in Vol. 24 (2), in Aug 2019. This has been a really fun project to work on. See the link here: .
Really enjoyed giving an online talk on corpus methods in the classroom today at EduHK. We had around 300 people attend, with many staying on after the end as I responded to all the great questions. A big thanks to Angel Ma and her team for organizing everything so nicely.
Just finished my first joint presentation with
@schtepf
at
@CLconf2019
on data interoperability. Lots of fun working together on this project. Also, really happy to see so many people at the session. If you came and want the slides, just let me know by email.
Very happy to receive the following book in the post this week. I was also honored to be asked to write the foreword for it. I love the cover photo! Mt Fuji is really breathtaking, especially at this time of year.
Heading to Xi'an Jiaotong University, China today to give a plenary talk at the Int. Conf. on Frontiers of Corpus-Based Interdisciplinary Research (FoCIR). I haven't been back for a long time, so very much looking forward to seeing old friends there.
I'm very honored to serve as one of the plenary speakers at this event. I'm planning to talk about multidisciplinary vocabulary research and the impact of AI, but if you plan to come and have related topics you want me to include, do let me know!
A very kind blog post about my research has just been published at
@AllAboutCorpora
. Really happy to see that the post mentions the collaborations I've had with others in the field. Working with such great people is one the most enjoyable parts of all my software projects.
If you want to hear an intro to
#AntConc
and also learn about other corpus tools, you might be interested in the following event. . I'll be giving my talk just a few hours after landing in Japan following
@aacl2022
so hopefully no flight delays!
For anyone interested, I've now created a test release of AntConc 4.0.2 for Linux to go with the Win and MacOS versions. Just expand the tar.gz folder and double click the AntConc.sh file inside to launch the app. You can get it here:
Happy New Year!
The
#LSBC2024
conference in Kuwait finished yesterday afternoon, with
@lovermob
giving the final keynote, with quotes from across the field. We then heard the conference chair Mohammad Alenezi closing the event. Congratulations to his whole team for a very successful two days.
Here in Sapporo (Japan) today to give a keynote lecture "Remembering the texts in corpus linguistics" at the 1st Symposium of the Corpus Tools and Statistical Methods (TASM) SIG of the Japan Assocation for English Corpus Studies (JAECS).
Had the pleasure to talk with
@lovermob
as part of the
#CorpusCast
podcast earlier this week. If you are interested in hearing about my goals in life and why I decided to create
#AntConc
, check it out in early January.
As
#CorpusCast
enters its third year, we have a fun episode to kick off 2024: coming up in January, I'm joined by Professor Laurence Anthony (
@antlabjp
) to hear the story behind AntConc, a widely used tool for corpus analysis 🐜
Lots of talk about ChatGPT recently, so I decided to do some expts. Results show that the language model dramatically affects the accuracy of info generated. Simple models generate perfect sentences with completely false info! Better models generate increasingly correct info!...
It was really nice to present with
@uroemer
and I was so happy to see so many people interested in p-frames. If you were at the session, thanks for coming!
Getting on a plane in a few hours flying to
@CCR_UoB
for a week of research as part of the
@waseda_univ
@unibirmingham
partnership. I'll be giving a talk next Friday. Hope to see some of you there!