Python基于百度API识别并提取图片中文字

发布时间：2023/12/16

下面是“Python基于百度API识别并提取图片中文字”的完整攻略，包含两个实际示例：

1. 准备工作

首先需要安装Python，建议安装Python 3.x版本；
安装Python包管理工具pip，一般Python安装包会自带pip；
注册百度API账号并开通文字识别服务，获取API Key和Secret Key；
安装Python中的requests，urllib等HTTP请求库，方便发送HTTP请求，处理响应等。

2. 代码实现

在Python代码中处理图片，将图片转换成二进制格式的数据，发送HTTP请求到百度API，在返回的JSON响应结果中提取出识别到的文字。

以下是一个简单示例：

import requests

# 设置API Key和Secret Key
APP_ID = 'your app id'
API_KEY = 'your api key'
SECRET_KEY = 'your secret key'

# 设置图片路径
img_path = 'test.jpg'

# 读取图片的二进制数据
with open(img_path, 'rb') as f:
    image = f.read()

# 发送HTTP请求
response = requests.post('https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=' + access_token, headers=headers, data=image)

# 解析JSON响应
result = response.json()
if 'words_result' in result:
    for words in result['words_result']:
        print(words['words'])
else:
    print('No result')

在以上代码中，我们通过requests库发送了一个POST请求，将图片的二进制数据作为POST请求的data参数送到API服务端，同时设置HTTP请求头和参数，其中HTTP请求头里需要包含请求的API Key和Secret Key，参数中需要指定要使用的API接口，以上代码中我们使用了百度OCR文字识别中的general_basic接口。

3. 示例说明1：从截屏中提取文字

我们可以使用Python来截取屏幕中的内容，也可以处理硬盘中已有的图片文件。这里我提供一个从屏幕快照中提取文字的示例代码：

import sys
import io
import os
import time
from PIL import ImageGrab
import requests

def get_access_token():
    """
    获取API的access_token
    """
    api_key = 'your api key'
    secret_key = 'your secret key'
    url = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + api_key + '&client_secret=' + secret_key
    res = requests.post(url)
    x = res.json()
    return x['access_token']


def capture_screen():
    """
    截屏
    """
    return ImageGrab.grabclipboard()


def get_text(access_token, img_bin):
    """
    获取图片文字
    """
    url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=' + access_token
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    data = {'image': img_bin}
    res = requests.post(url, headers=headers, data=data)
    x = res.json()
    return x.get('words_result')[0]['words']


if __name__ == '__main__':
    access_token = get_access_token()
    img = capture_screen()
    buf = io.BytesIO()
    img.convert('RGB').save(buf, format='JPEG')
    img_bin = buf.getvalue()
    text = get_text(access_token, img_bin)
    print(text)

以上代码通过使用ImageGrab.grabclipboard()方法从剪贴板中获取屏幕快照，转换为JPEG格式二进制数据作为参数调用get_text()方法，获取图片中的文字。

4. 示例说明2：批量识别图片

如果我们需要处理一个包含多张图片的文件夹，可以使用Python的os库和PIL库来批量读取图片，实现自动识别图片中的文字。以下是示例代码：

import os
import requests
from PIL import Image

def get_access_token():
    """
    获取API的access_token
    """
    api_key = 'your api key'
    secret_key = 'your secret key'
    url = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + api_key + '&client_secret=' + secret_key
    res = requests.post(url)
    x = res.json()
    return x['access_token']


def get_text(access_token, img_bin):
    """
    获取图片文字
    """
    url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=' + access_token
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    data = {'image': img_bin}
    res = requests.post(url, headers=headers, data=data)
    x = res.json()
    return x.get('words_result')[0]['words']

if __name__ == '__main__':
    access_token = get_access_token()
    img_dir = 'image_dir'
    for filename in os.listdir(img_dir):
        img_path = os.path.join(img_dir, filename)
        with open(img_path, 'rb') as f:
            img = Image.open(f)
            img_bin = f.read()
            text = get_text(access_token, img_bin)
            print('{}: {}'.format(filename, text))

以上代码中，我们通过os.listdir()方法获取包含多张图片的文件夹中所有的图片文件名，然后使用Image.open()方法打开每一个图片文件，将二进制数据作为参数调用get_text()方法，获取图片中的文字，最后打印出图片文件名和提取出的文字。

Python基于百度API识别并提取图片中文字

1. 准备工作

2. 代码实现

3. 示例说明1：从截屏中提取文字

4. 示例说明2：批量识别图片

相关文章

Python开发最新文章

热门教程