Scrapy Basic
Required Input
• Parsing code (Python program that defines what data to be extracted from HMTL)
https://docs.scrapy.org/en/latest/intro/tutorial.html
• URL or a list of URLs (You can use a text file as well)
Output/Return Value
• A CSV will be returned. (Preferred)
Headers will be defined in the Parsing code.
• Any string is possible for the user’s purpose.
Advanced Feature
• Parameters
You can pass on “values” to either your Parsing Code or to the URL(s).
Syntax is the standard Python String Named Placeholder rule.
https://riptutorial.com/python/example/13577/named-placeholders
CAUTION
In STU, variables are defined with double curly brackets {{variable.variable}}. The Python String Format Named Placeholders use single curly brackets like {placeholder}. You are able to use both STU variables and Python standard Named Placeholder with this plugin.
How to set parameters.
Sample Spider Code
import sys
import csv
import scrapy
from random import randint
################################################################################
class MySpider(scrapy.Spider):
name = 'finance_yahoo_most_active'
start_urls = START_URLS
custom_settings = {{
}}
header = (
'{symbol}', '{name}', 'price',
'change', 'p_change', 'volume',
'avg_vol_3m', 'market_cap', 'pe_ratio'
)
csv_writer = csv.writer(sys.stdout, lineterminator='\n')
csv_writer.writerow(header)
# --------------------------------------------------------------------------
# noinspection PyMethodOverriding
def parse(self, response):
texts = response.xpath('//*[@id="scr-res-table"]/div[1]/table/tbody/tr//text()').getall()
n_rows = len(texts) // 9
for i in range(n_rows):
row = (
texts[i*9 + 0], texts[i*9 + 1], texts[i*9 + 2],
texts[i*9 + 3], texts[i*9 + 4], texts[i*9 + 5],
texts[i*9 + 6], texts[i*9 + 7], texts[i*9 + 8],
)
row_info = {{
'{symbol}': texts[i*9 + 0],
'{name}': texts[i * 9 + 1],
'price': texts[i * 9 + 2],
'change': texts[i * 9 + 3],
'p_change': texts[i * 9 + 4],
'volume': texts[i * 9 + 5],
'avg_vol_3m': texts[i * 9 + 6],
'market_cap': texts[i * 9 + 7],
'pe_ratio': texts[i * 9 + 8],
}}
self.csv_writer.writerow(row)
yield row_info
- ABBYY Download
- ABBYY Status
- ABBYY Upload
- AD LDAP
- Adv Send Email
- API Requests
- ARGOS API
- Arithmetic Op
- ASCII Converter
- Attach Image
- AWS S3
- AWS Textra Rekog
- Base64
- Basic Numerical Operations
- Basic String Manipulation
- Bot Collabo
- Box
- Box II
- Chatwork GetMessage
- Chatwork Notification
- Citizen Log
- Clipboard
- Codat API
- Convert CharSet
- Convert Image
- Convert Image II
- Create Newfile
- CSV2XLSX
- Dashboard Api
- DashBord Api
- Data Plot I
- Date OP
- DeepL Free
- Detect CharSet
- Dialog Calendar
- Dialog Error
- Dialog File Selection
- Dialog Forms
- Dialog Info
- Dialog Password
- Dialog Question
- Dialog Text Entry
- Dialog Text Info
- Dialog Warning
- DirectCloud API
- Doc2TXT
- DocDigitizer Get Doc
- DocDigitizer Tracking
- DocDigitizer Upload
- Drag and Drop
- Dropbox
- Dynamic Python
- Email IMAP ReadMon
- Email Read Mon
- Env Check
- Env Var
- Excel2Image
- Excel Advanced
- Excel Advance IV
- Excel AdvII
- Excel AdvIII
- Excel Copy Paste
- Excel Formula
- Excel Large Files
- Excel Macro
- Excel Newfile
- Excel Simple Read
- Excel Simple Write
- Excel Style
- Excel Update
- Fairy Devices mimi AI
- File Conv
- File Downloader
- File Folder Exists
- File Folder Op
- File Status
- Fixed Form Processing
- Floating Form Processing
- Folder Monitor
- Folder Status
- Folder Structure
- FTP Server
- Git HTML Extract
- Google Calendar
- Google Cloud Vision API
- Google Drive
- Google Search API
- Google Sheets
- Google Token
- Google Translate
- Google TTS
- GraphQL API
- Html Extract
- HTML Table
- IBM Speech to Text
- IBM Visual Recognition
- Java UI Automation
- JP Holiday
- JSON Select
- JSON to from CSV
- Lazarus Forms
- Lazarus Grid
- Lazarus Invoices
- Lazarus RikAI
- Lazarus RikAI2
- Lazarus RikAI2 Async
- Lazarus Riky
- Lazarus VKG
- LINE ID Card OCR
- LINE Notify
- LINE Receipt OCR
- Mangdoc AI Docs
- Microsoft Teams
- MongoDB
- MQTT Publisher
- MS Azure Text Analytics
- MS Word Extract
- NAVER OCR
- Newuser-SFDC
- OCI
- OCR PreProcess
- OpenAI API
- Oracle SQL
- Outlook
- Outlook Email
- PANDAS I
- pandas II
- pandas III
- PANDAS profiling
- Parsehub
- Password Generate
- Path Manipulation
- PDF2Doc
- PDF2Table
- PDF2TXT
- PDF Miner
- PDF SplitMerge
- PDF Viewer(Start/Stop)
- PostgreSQL
- PowerShell
- PPTX Template
- Print 2 Image
- Python Selenium
- QR Generate
- QR Read
- RakurakuHanbai API
- Regression
- Rename File
- REST API
- Rossum
- Running GAS
- Scrapy Basic
- Screen Capture
- Screen Recording START
- Screen Recording STOP
- Screen Snipping
- Seaborn Plot
- SharePoint
- Simple Counter
- Simple SFDC
- Slack
- Sort CSV
- Speed Test
- SQL
- SQLite
- SSH Command
- SSH Copy
- String Manipulation
- String Similarity
- Svc Check
- Sys Info
- Telegram
- Tesseract
- Text2PDF
- Text2Word
- Text Read
- Text Write
- Time Diff
- Time Stamp
- Web Extract
- Windows Op
- Windows Screen Lock
- Win UI Control
- Win UI Text
- Word2PDF
- Word2TXT
- Word Editor
- Work Calendar
- XML Extract
- XML Manipulation
- Xtracta Get Doc
- Xtracta Tracking
- Xtracta Upload
- YouTube Operation
- ZipUnzip