Introduction To Vector Databases And How To Use AI For SEO via @sejournal, @vahandev
Boost your AI SEO strategy with vector databases. Find out how to leverage vector embeddings to improve your content relevance and search engine rankings. The post Introduction To Vector Databases And How To Use AI For SEO appeared first...
A vector database is a collection of data where each piece of data is stored as a (numerical) vector. A vector represents an object or entity, such as an image, person, place etc. in the abstract N-dimensional space.
Vectors, as explained in the previous chapter, are crucial for identifying how entities are related and can be used to find their semantic similarity. This can be applied in several ways for SEO – such as grouping similar keywords or content (using kNN).
In this article, we are going to learn a few ways to apply AI to SEO, including finding semantically similar content for internal linking. This can help you refine your content strategy in an era where search engines increasingly rely on LLMs.
You can also read a previous article in this series about how to find keyword cannibalization using OpenAI’s text embeddings.
Let’s dive in here to start building the basis of our tool.
Understanding Vector Databases
If you have thousands of articles and want to find the closest semantic similarity for your target query, you can’t create vector embeddings for all of them on the fly to compare, as it is highly inefficient.
For that to happen, we would need to generate vector embeddings only once and keep them in a database we can query and find the closest match article.
And that is what vector databases do: They are special types of databases that store embeddings (vectors).
When you query the database, unlike traditional databases, they perform cosine similarity match and return vectors (in this case articles) closest to another vector (in this case a keyword phrase) being queried.
Here is what it looks like:
Text embedding record example in the vector database.
In the vector database, you can see vectors alongside metadata stored, which we can easily query using a programming language of our choice.
In this article, we will be using Pinecone due to its ease of understanding and simplicity of use, but there are other providers such as Chroma, BigQuery, or Qdrant you may want to check out.
Let’s dive in.
1. Understanding Vector Databases2. Create A Vector Database3. Export Your Articles From Your CMS4. Inserting OpenAi's Text Embeddings Into The Vector Database5. Finding An Article Match For A Keyword6. Inserting Google Vertex AI Text Embeddings Into The Vector Database7. Finding An Article Match For A Keyword Using Google Vertex AI8. Try Testing The Relevance Of Your Article Writing1. Create A Vector Database
First, register an account at Pinecone and create an index with a configuration of “text-embedding-ada-002” with ‘cosine’ as a metric to measure vector distance. You can name the index anything, we will name itarticle-index-all-ada‘.
Creating a vector database.
This helper UI is only for assisting you during the setup, in case you want to store Vertex AI vector embedding you need to set ‘dimensions’ to 768 in the config screen manually to match default dimensionality and you can store Vertex AI text vectors (you can set dimension value anything from 1 to 768 to save memory).
In this article we will learn how to use OpenAi’s ‘text-embedding-ada-002’ and Google’s Vertex AI ‘text-embedding-005’ models.
Once created, we need an API key to be able to connect to the database using a host URL of the vector database.
Next, you will need to use Jupyter Notebook. If you don’t have it installed, follow this guide to install it and run this command (below) afterward in your PC’s terminal to install all necessary packages.
And remember ChatGPT is very useful when you encounter issues during coding!
2. Export Your Articles From Your CMS
Next, we need to prepare a CSV export file of articles from your CMS. If you use WordPress, you can use a plugin to do customized exports.
As our ultimate goal is to build an internal linking tool, we need to decide which data should be pushed to the vector database as metadata. Essentially, metadata-based filtering acts as an additional layer of retrieval guidance, aligning it with the general RAG framework by incorporating external knowledge, which will help to improve retrieval quality.
For instance, if we are editing an article on “PPC” and want to insert a link to the phrase “Keyword Research,” we can specify in our tool that “Category=PPC.” This will allow the tool to query only articles within the “PPC” category, ensuring accurate and contextually relevant linking, or we may want to link to the phrase “most recent google update” and limit the match only to news articles by using ‘Type’ and published this year.
In our case, we will be exporting:
Title. Category. Type. Publish Date. Publish Year. Permalink. Meta Description. Content.To help return the best results, we would concatenate the title and meta descriptions fields as they are the best representation of the article that we can vectorize and ideal for embedding and internal linking purposes.
Using the full article content for embeddings may reduce precision and dilute the relevance of the vectors.
This happens because a single large embedding tries to represent multiple topics covered in the article at once, leading to a less focused and relevant representation. Chunking strategies (splitting the article by natural headings or semantically meaningful segments) need to be applied, but these are not the focus of this article.
Here’s the sample export file you can download and use for our code sample below.
2. Inserting OpenAi’s Text Embeddings Into The Vector Database
Assuming you already have an OpenAI API key, this code will generate vector embeddings from the text and insert them into the vector database in Pinecone.
You need to create a notebook file and copy and paste it in there, then upload the CSV file ‘Sample Export File.csv’ in the same folder.
Jupyter project.
Once done, click on the Run button and it will start pushing all text embedding vectors into the index article-index-all-ada we created in the first step.
Running the script.
You will see an output log text of embedding vectors. Once finished, it will show the message at the end that it was successfully finished. Now go and check your index in the Pinecone and you will see your records are there.
3. Finding An Article Match For A Keyword
Okay now, let’s try to find an article match for the Keyword.
Create a new notebook file and copy and paste this code.
We’re trying to find a match for these keywords:
SEO Tools. TikTok. SEO Consultant.And this is the result we get after executing the code:
Find a match for the keyword phrase from vector database
The table formatted output at the bottom shows the closest article matches to our keywords.
4. Inserting Google Vertex AI Text Embeddings Into The Vector Database
Now let’s do the same but with Google Vertex AI ‘text-embedding-005’embedding. This model is notable because it’s developed by Google, powers Vertex AI Search, and is specifically trained to handle retrieval and query-matching tasks, making it well-suited for our use case.
You can even build an internal search widget and add it to your website.
Start by signing in to Google Cloud Console and create a project. Then from the API library find Vertex AI API and enable it.
Screenshot from Google Cloud Console, December 2024
Set up your billing account to be able to use Vertex AI as pricing is $0.0002 per 1,000 characters (and it offers $300 credits for new users).
Once you set it, you need to navigate to API Services > Credentials create a service account, generate a key, and download them as JSON.
Rename the JSON file to config.json and upload it (via the arrow up icon) to your Jupyter Notebook project folder.
Screenshot from Google Cloud Console, December 2024
In the setup first step, create a new vector database called article-index-vertex by setting dimension 768 manually.
Once created you can run this script to start generating vector embeddings from the the same sample file using Google Vertex AI text-embedding-005 model (you can choose text-multilingual-embedding-002 if you have non-English text).
You will see below in logs of created embeddings.
Screenshot from Google Cloud Console, December 2024
4. Finding An Article Match For A Keyword Using Google Vertex AI
Now, let’s do the same keyword matching with Vertex AI. There is a small nuance as you need to use ‘RETRIEVAL_QUERY’ vs. ‘RETRIEVAL_DOCUMENT’ as an argument when generating embeddings of keywords as we are trying to perform a search for an article (aka document) that best matches our phrase.
Task types are one of the important advantages that Vertex AI has over OpenAI’s models.
It ensures that the embeddings capture the intent of the keywords which is important for internal linking, and improves the relevance and accuracy of the matches found in your vector database.
Use this script for matching the keywords to vectors.
And you will see scores generated:
Keyword Matche Scores produced by Vertex AI text embedding model
Try Testing The Relevance Of Your Article Writing
Think of this as a simplified (broad) way to check how semantically similar your writing is to the head keyword. Create a vector embedding of your head keyword and entire article content via Google’s Vertex AI and calculate a cosine similarity.
If your text is too long you may need to consider implementing chunking strategies.
A close score (cosine similarity) to 1.0 (like 0.8 or 0.7) means you’re pretty close on that subject. If your score is lower you may find that an excessively long intro which has a lot of fluff may be causing dilution of the relevance and cutting it helps to increase it.
But remember, any edits made should make sense from an editorial and user experience perspective as well.
You can even do a quick comparison by embedding a competitor’s high-ranking content and seeing how you stack up.
Doing this helps you to more accurately align your content with the target subject, which may help you rank better.
There are already tools that perform such tasks, but learning these skills means you can take a customized approach tailored to your needs—and, of course, to do it for free.
Experimenting for yourself and learning these skills will help you to keep ahead with AI SEO and to make informed decisions.
As additional readings, I recommend you dive into these great articles:
GraphRAG 2.0 Improves AI Search Results Introducing SEOntology: The Future Of SEO In The Age Of AI Unlocking The Power Of LLM And Knowledge Graph (An Introduction)More resources:
AI Has Changed How Search Works AI For SEO: Can You Work Faster & Smarter? Leveraging Generative AI Tools For SEOFeatured Image: Aozorastock/Shutterstock