The CyberSEO Pro plugin includes a powerful capability that allows you to extract full-text articles from HTML pages using the container tag. This functionality can help you display complete articles on your site rather than just snippets or summaries, ensuring your site’s content is unique and informative.

In previous versions of CyberSEO Pro, you could use the universal Full-Text RSS script to extract full-text articles from arbitrary webpages. While Full-Text RSS script is very powerful and can extract almost any article, it’s not almighty and sometimes fails or is unable to correctly extract certain parts of HTML code, such as embedded videos, etc. This method is still available as before, but there is a new one that enables you to extract a full-text article from any webpage with a fixed HTML layout, even if the Full-Text RSS script struggles with it. Just note that this new method must be tailored for each particular website, according to its internal HTML layout. As a result, it won’t work with sources like Google News RSS feeds, which are linked to different websites with their own unique HTML layouts.

In this article, you will be guided through the process of using the container tag method with the help of browser Inspector tools like Firefox or Chrome. Understanding that not everyone is familiar with HTML or web development, this guide has been designed to be as simple and easy to follow as possible.

Step 1: Find the container tag

  1. Open the webpage containing the article you want to extract in either Firefox or Chrome.
  2. Right-click on the main content area of the article and select “Inspect” (Firefox) or “Inspect Element” (Chrome) from the context menu. This will open the browser’s Inspector tool, displaying the HTML structure of the page.
  3. In the Inspector tool, you’ll see the HTML element that wraps around the main content of the article highlighted. This is the container tag. It could be a <div>, <section>, <article>, or another similar HTML element.

Step 2: Identify the attributes

  1. Examine the highlighted container tag in the Inspector tool to find its attributes. Attributes are properties of an HTML element that provide additional information about it. Common attributes include class, id, style, etc.
  2. Make a note of the attribute(s) and their corresponding values. For example, if the container tag is <div id="main" class="article-content">, the attributes are class with the value "article-content" and id with the value "main".

Chrome Inspector

Step 3: Configure the CyberSEO Pro plugin

  1. In your feed settings, navigate to the “Advanced” tab.
  2. Select “Use custom settings” in the Extract full text articles drop-down menu.
  3. In the “Container tag” field, enter the tag name you found in Step 1 (e.g., div, article, section). For the example above it’s div.
  4. In the “Attributes (JSON format)” field, enter the attributes and their values in JSON format, as found in Step 2. For our example, you would enter {"class": "article-content", "id": "main"}.
  5. Choose whether to include the container tag and its attributes in the extracted content by checking or unchecking the “Inclusive” option.
  6. Save your changes and pull the feed to test the extraction.

CyberSEO Pro extract full text article

By following these steps, you can easily configure the CyberSEO Pro plugin to extract full text articles from HTML pages using the container tag feature. With a little practice, you’ll be able to identify container tags and their attributes with ease, ensuring your site has the most comprehensive and unique content possible.

While this approach may not work for aggregator feeds like Google News or Bing News, it is an effective solution for extracting full-text articles from individual websites with consistent HTML layouts. By following the steps outlined in this guide, you can ensure that you import high-quality, full-text content to your WordPress site using the CyberSEO Pro plugin.



Source link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *