Info
This function is one of Plugins Operation.You can find the movie in ARGOS RPA+ video tutorial.

We will release this doc soon. Please, visit later.

Panel

titleColor	#4178be
titleBGColor	#e7f5ff
title	All Plugins

Child pages (Children Display)pagePlugins

Image Added	Web Extract
Image Added	You can build a bot to extract data from websites (Web Scraping) using this tool. In order to use this operation, you must have a knowledge about HTML and YAML.

Tip 1. This operation is used after extracting the HTML source file from your browser.

Image Added

Need help?

Technical contact to tech@argos-labs.com

May you search all operations,

Tip 2. The Parameters.

Image Added

1) Specify your HTML Source file here.

2) Specify your Rule file (YAML) here --- always check the check-box --- this file is mandatory.

3) If your data has many occurrences, you can limit the # of data to be extracted by setting the number here (0 means no limitation = default).

4) Define preferred encoding standard of your HTML file here – if your choice does not work Web Extract will go to auto-detect mode.

5) Define the HTML parsing standard here or leave it unchecked for auto detect mode.

6) Choose your output format (String, CSV, or File).

7) You must set your variable at Settings menu in the Main menu.

Tip 3. A simple example below should help you build the web scraping bot.

Below is your target website page.

Image Added

And then below is the HTML source file.

Image Added

Below is the Rule file (YAML).

Image Added

And finally, the output file with extracted data.

Image Added

Tip 4. Below are the explanations of the Rule file construction (syntax).

Image Added

1) Give explanations of the Rule file as comments.

2) Regardless of the desired final output format, always start with [csv].

3) [or] is used when you have more than just one type of HTML source returned from the website. It is optional.

4) [header] defines the labels of your output data table.

5) Rest of the YAML is to specify the data to be extracted. Use combinations of tag (name) and attribute (key+value) to identify the data.
You may use multiple attributes if needed. Please note that the Rule file also includes “split” and “re-replace” for correcting the data.

Tip 5. Use of xpath is also possible to specify the target area in the HTML source file like in an example below.

Image Added

Additional explanations are provided below.

Image Added

Image Added The Split command can take integer, or you can define separate as shown in this example.

Image Added The re-replace command will replace the “from” value (regular expression) to “to” value (string).

Image Added Global options can be added at the bottom of the Rule file.

In this example, it shows that when there is no result that data says “There is no Result (default is “No Result”) and skip-empty-row can take true/false parameter.

Version	Old Version 1	New Version 2
Changes made by	ahnhyunyoung (Unlicensed)	Former user
Saved on	Aug 20, 2019	Aug 21, 2019

Versions Compared

Key

We will release this doc soon. Please, visit later.

Web Extract

Tip 1. This operation is used after extracting the HTML source file from your browser.

Image Added

Tip 2. The Parameters.

Image Added

Tip 3. A simple example below should help you build the web scraping bot.

Tip 4. Below are the explanations of the Rule file construction (syntax).

Tip 5. Use of xpath is also possible to specify the target area in the HTML source file like in an example below.

Page Comparison

Versions Compared

Key

We will release this doc soon. Please, visit later.

Web Extract

Tip 1. This operation is used after extracting the HTML source file from your browser.

Image Added

Tip 2. The Parameters.

Image Added

Tip 3. A simple example below should help you build the web scraping bot.

Tip 4. Below are the explanations of the Rule file construction (syntax).

Tip 5. Use of xpath is also possible to specify the target area in the HTML source file like in an example below.