'AWS textract-trp package issue - cannot extract key-value pair
I'm using AWS Lambda running on Python 3.8 to run this code example below:
import boto3
from trp import Document
# Document
documentName = "employmentapp.png"
# Amazon Textract client
textract = boto3.client('textract')
# Call Amazon Textract
with open(documentName, "rb") as document:
response = textract.analyze_document(
Document={
'Bytes': document.read(),
},
FeatureTypes=["FORMS"])
#print(response)
doc = Document(response)
for page in doc.pages:
# Print fields
print("Fields:")
for field in page.form.fields:
print("Key: {}, Value: {}".format(field.key, field.value))
# Get field by key
print("\nGet Field by Key:")
key = "Phone Number:"
field = page.form.getFieldByKey(key)
if(field):
print("Key: {}, Value: {}".format(field.key, field.value))
# Search fields by key
print("\nSearch Fields:")
key = "address"
fields = page.form.searchFieldsByKey(key)
for field in fields:
print("Key: {}, Value: {}".format(field.key, field.value))
Im getting this error
Traceback (most recent call last):
File "/Users/shimon_zouzout/CloudZoneRepos/Projects/CloudZone/cloudzoneprod_lambdas/billing/BILLING_invoices-email-ocr/tests/queries.py", line 30, in <module>
doc = Document(response)
File "/Users/shimon_zouzout/Library/Python/3.9/lib/python/site-packages/trp/__init__.py", line 633, in __init__
self._parse()
File "/Users/shimon_zouzout/Library/Python/3.9/lib/python/site-packages/trp/__init__.py", line 667, in _parse
page = Page(documentPage["Blocks"], self._blockMap)
File "/Users/shimon_zouzout/Library/Python/3.9/lib/python/site-packages/trp/__init__.py", line 516, in __init__
self._parse(blockMap)
File "/Users/shimon_zouzout/Library/Python/3.9/lib/python/site-packages/trp/__init__.py", line 530, in _parse
l = Line(item, blockMap)
File "/Users/shimon_zouzout/Library/Python/3.9/lib/python/site-packages/trp/__init__.py", line 142, in __init__
if(blockMap[cid]["BlockType"] == "WORD"):
KeyError: '73d47382-4f5a-4423-9665-124380736c2a'
Can someone please assist here? I want to extract key-value pairs from PDF invoices without killing myself using REGEX.
- I've added textract-trp package using AWS Lambda layers.
- The same error occurs when I'm running this code locally.
Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|