PDF2Table
From version 3.819.1007
Additional advanced parameters were added like below under the “TABLE” option.
To utilize these options users must understand functions and terminologies of the base Python technology of this plugin which is called pdfplubmer.
For more resources, please visit these websites.
- https://pypi.org/project/pdfplumber/0.5.28/
- https://newbedev.com/how-to-extract-text-from-pdf-in-python-3
Input (Required)
- Operation mode either TEXT mode or TABLE mode
- Input PDF file (.pdf but digitally generated PDF only) as input (file path)
- Output file name and path (For TABLE option, you can choose .csv or .txt – For TEXT option only .txt is available)
Input (Optional)
- Page number
- Table number (when there are multiple tables in a PDF they will have index (number) from the top to bottom)
- Separator to be used to separate values in table
- Horizontal and Vertical Strategies – this is used to determine the boundaries of values in the table when “lines” are not very clear
- Lines
- Lines – strict
- Text
Output/Return Value
Return Value
String Full file path for the output file
Csv Full file path for the output file
File Full file path for the output file
Return Code
0 Execution Successful
1 The table is not included in PDFfile
9 All other responses from the plugin
Parameter Settings
For TABLE option
For TEXT option
More tips for TABLE option
Table index: Select the table within the selected page.
Separator: Please enter a separator which will be inserted between words in exported .txt file (default=‘,’)
What are VERTICAL and HORIZONTAL Strategies
Strategy | Description |
"lines" | Use the page's graphical lines — including the sides of rectangle objects — as the borders of potential table-cells. |
"lines_strict" | Use the page's graphical lines — but not the sides of rectangle objects — as the borders of potential table-cells. |
"text" | For vertical_strategy: Deduce the (imaginary) lines that connect the left, right, or center of words on the page, and use those lines as the borders of potential table-cells. For horizontal_strategy, the same but using the tops of words. |
- ABBYY Download
- ABBYY Status
- ABBYY Upload
- AD LDAP
- Adv Send Email
- API Requests
- ARGOS API
- Arithmetic Op
- ASCII Converter
- Attach Image
- AWS S3
- AWS Textra Rekog
- Base64
- Basic Numerical Operations
- Basic String Manipulation
- Bot Collabo
- Box
- Box II
- Chatwork GetMessage
- Chatwork Notification
- Citizen Log
- Clipboard
- Codat API
- Convert CharSet
- Convert Image
- Convert Image II
- Create Newfile
- CSV2XLSX
- Dashboard Api
- DashBord Api
- Data Plot I
- Date OP
- DeepL Free
- Detect CharSet
- Dialog Calendar
- Dialog Error
- Dialog File Selection
- Dialog Forms
- Dialog Info
- Dialog Password
- Dialog Question
- Dialog Text Entry
- Dialog Text Info
- Dialog Warning
- DirectCloud API
- Doc2TXT
- DocDigitizer Get Doc
- DocDigitizer Tracking
- DocDigitizer Upload
- Drag and Drop
- Dropbox
- Dynamic Python
- Email IMAP ReadMon
- Email Read Mon
- Env Check
- Env Var
- Excel2Image
- Excel Advanced
- Excel Advance IV
- Excel AdvII
- Excel AdvIII
- Excel Copy Paste
- Excel Formula
- Excel Large Files
- Excel Macro
- Excel Newfile
- Excel Simple Read
- Excel Simple Write
- Excel Style
- Excel Update
- Fairy Devices mimi AI
- File Conv
- File Downloader
- File Folder Exists
- File Folder Op
- File Status
- Fixed Form Processing
- Floating Form Processing
- Folder Monitor
- Folder Status
- Folder Structure
- FTP Server
- Git HTML Extract
- Google Calendar
- Google Cloud Vision API
- Google Drive
- Google Search API
- Google Sheets
- Google Token
- Google Translate
- Google TTS
- GraphQL API
- Html Extract
- HTML Table
- IBM Speech to Text
- IBM Visual Recognition
- Java UI Automation
- JP Holiday
- JSON Select
- JSON to from CSV
- Lazarus Forms
- Lazarus Grid
- Lazarus Invoices
- Lazarus RikAI
- Lazarus RikAI2
- Lazarus RikAI2 Async
- Lazarus Riky
- Lazarus VKG
- LINE ID Card OCR
- LINE Notify
- LINE Receipt OCR
- Mangdoc AI Docs
- Microsoft Teams
- MongoDB
- MQTT Publisher
- MS Azure Text Analytics
- MS Word Extract
- NAVER OCR
- Newuser-SFDC
- OCI
- OCR PreProcess
- OpenAI API
- Oracle SQL
- Outlook
- Outlook Email
- PANDAS I
- pandas II
- pandas III
- PANDAS profiling
- Parsehub
- Password Generate
- Path Manipulation
- PDF2Doc
- PDF2Table
- PDF2TXT
- PDF Miner
- PDF SplitMerge
- PDF Viewer(Start/Stop)
- PostgreSQL
- PowerShell
- PPTX Template
- Print 2 Image
- Python Selenium
- QR Generate
- QR Read
- RakurakuHanbai API
- Regression
- Rename File
- REST API
- Rossum
- Running GAS
- Scrapy Basic
- Screen Capture
- Screen Recording START
- Screen Recording STOP
- Screen Snipping
- Seaborn Plot
- SharePoint
- Simple Counter
- Simple SFDC
- Slack
- Sort CSV
- Speed Test
- SQL
- SQLite
- SSH Command
- SSH Copy
- String Manipulation
- String Similarity
- Svc Check
- Sys Info
- Telegram
- Tesseract
- Text2PDF
- Text2Word
- Text Read
- Text Write
- Time Diff
- Time Stamp
- Web Extract
- Windows Op
- Windows Screen Lock
- Win UI Control
- Win UI Text
- Word2PDF
- Word2TXT
- Word Editor
- Work Calendar
- XML Extract
- XML Manipulation
- Xtracta Get Doc
- Xtracta Tracking
- Xtracta Upload
- YouTube Operation
- ZipUnzip