Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.




Info

This function is one of Plugins Operation.You can find the movie in ARGOS RPA+ video tutorial.

Image Removed

Image Added

Scrapy Basic

Author: Jerry Chae

Description

This plugin is for web-scraping processes and it uses a Python solution called Scrapy as the engine.

https://scrapy.org/ 

Prerequisite

This plugin requires the user to have Python coding capabilities


Need help?

Technical contact to tech@argos-labs.com


May you search all operations,



Required Input

• Parsing code (Python program that defines what data to be extracted from HMTL)

    https://docs.scrapy.org/en/latest/intro/tutorial.html

• URL or a list of URLs (You can use a text file as well)


Output/Return Value

• A CSV will be returned. (Preferred)

  Headers will be defined in the Parsing code.

• Any string is possible for the user’s purpose.


Advanced Feature

• Parameters

  You can pass on “values” to either your Parsing Code or to the URL(s).


Syntax is the standard Python String Named Placeholder rule.

https://riptutorial.com/python/example/13577/named-placeholders

https://pyformat.info/



Note
titleCAUTION

In STU, variables are defined with double curly brackets {{variable.variable}}. The Python String Format Named Placeholders use single curly brackets like {placeholder}. You are able to use both STU variables and Python standard Named Placeholder with this plugin.


How to set parameters.

Image RemovedImage Added




Sample Spider Code

import sys

import csv

import scrapy

from random import randint


################################################################################


class MySpider(scrapy.Spider):

    name = 'finance_yahoo_most_active'

    start_urls = START_URLS

   

    custom_settings = {{

    }}


    header = (

        '{symbol}', '{name}', 'price',

        'change', 'p_change', 'volume',

        'avg_vol_3m', 'market_cap', 'pe_ratio'

    )

    csv_writer = csv.writer(sys.stdout, lineterminator='\n')

    csv_writer.writerow(header)

   

    # --------------------------------------------------------------------------

    # noinspection PyMethodOverriding

    def parse(self, response):


        texts = response.xpath('//*[@id="scr-res-table"]/div[1]/table/tbody/tr//text()').getall()

        n_rows = len(texts) // 9

        for i in range(n_rows):

            row = (

                texts[i*9 + 0], texts[i*9 + 1], texts[i*9 + 2],

                texts[i*9 + 3], texts[i*9 + 4], texts[i*9 + 5],

                texts[i*9 + 6], texts[i*9 + 7], texts[i*9 + 8],

            )

            row_info = {{

                '{symbol}': texts[i*9 + 0],

                '{name}': texts[i * 9 + 1],

                'price': texts[i * 9 + 2],

                'change': texts[i * 9 + 3],

                'p_change': texts[i * 9 + 4],

                'volume': texts[i * 9 + 5],

                'avg_vol_3m': texts[i * 9 + 6],

                'market_cap': texts[i * 9 + 7],

                'pe_ratio': texts[i * 9 + 8],

            }}

            self.csv_writer.writerow(row)

            yield row_info