Since the arrival of GPT-3, content generators have multiplied the use cases for SEO. It seems a bi-monthly update to review the new progress in the field of language models is in order.
First of all, at the end of 2021, the very large language models club grew significantly.
Each country has tried to showcase its technologies and make them accessible through research papers and public or private demonstrations.
Here are the main competitors in the race:
- US: OpenAI – Turing NLG.
- China: Wu Dao 2.0 – PanGu-Alpha.
- South Korea: HyperCLOVA.
- Israel: A121 (Jurassic-1).
- Europe: Aleph Alpha.
- Open Source: EleutherAI.
Each model has its strengths and weaknesses.
To test them, many SEO software editors or SEO agencies are now trialing these models.
How to Choose a GPT-3 Model?
You may think that the more parameters the model has, the better it would be (Editor’s note: a parameter corresponds to a concept learned by the AI).
But you would be wrong.
The number one criteria is absolutely not the number of parameters, because you can obtain great results with lighter models.
Rather, it is the data on which the model was trained.
In fact, to be effective, a model must be able to understand a large number of disparate domains.
The first thing to do is to find out how the model was trained. For GPT-3, the following diagram helps:
We can see that GPT-3 was mainly trained with data from:
- Webarchive between 2016 and 2019.
- WebText, which corresponds to data retrievals on the web.
- Books in English (Books1)
- Books in other languages (Books2).
Now, if we look at how the open-source models are Egypt Phone Number trained, we see that the sources are quite different.
Everything is based on the project The Pile, which is a data set of 825 GB of diversified English texts that are free and accessible to the public.
With The Pile, we find very varied data such as books, GitHub repositories, webpages, discussion journals, articles in medicine, physics, mathematics, computer science, and philosophy.
Unlock (not provided) with Keyword Hero
See all your organic keywords in GA and their specific performance metrics. Free Trial. Cancel anytime. Professional support. 4-minute setup.
In general, it will be important to test the language model in your language and especially on your website’s specific vocabulary.
Before we look at specific SEO use cases, let’s look at the pitfalls.
GPT-3 Content Generation Pitfalls for SEO
To generate qualitative texts that interest your users, it is important to know the pitfalls to avoid.
First of all, whatever model you choose, you must provide it with quality examples as input so that it can imitate them and, above all, respect a specific type of text.
If you ask a language model to generate content on “New York plumbers,” the model will head down various and often unsuitable paths:
- Should it create a made-up directory?
- Should it create content about a New York plumber?
- Maybe a poem about plumbing in New York?
Second, language models do not handle duplicate content at all.
Therefore, whatever text you generate, you will have to use a third-party tool to check that the model has not duplicated something it has learned – and more particularly, that the text does not already exist and that it is unique.
There are many tools available to confirm whether your content is unique. If it is not, simply regenerate the content.
In addition, content generation templates do not optimize text for search at all.
Again, they are trained on a wide variety of sources so you’ll have to guide them with all the semantic tools that exist on the market.