PyAuto Desktop Documentation

Welcome to the official documentation for PyAuto Desktop. This library offers high-performance, thread-safe desktop automation tools including computer vision and input control.

Starter Guide

1. The Inspector Tool

PyAuto Desktop includes a GUI tool called the Inspector. Use this to capture screenshots (“needles”), test matches against your screen, and automatically generate the Python code for your script.

# Opens the GUI for snipping, editing, testing, and code generation
pyauto_desktop.inspector()

2. Simple Example

Here is a basic example of locating an image and clicking it.

import pyauto_desktop

# Initialize a session for the primary monitor
session = pyauto_desktop.Session(
    screen=0,
    source_resolution=(2560, 1440),
    source_dpr=1.25,
    scaling_type="dpr"
)
 submit_btn = session.locateOnScreen("submit_btn.png", confidence=0.8)
# Find an image and click it
if submit_btn:
    print("Button found!")
    session.click("submit_btn.png")

Screen & Vision

class Session(screen=0, source_resolution=None, source_dpr=None, scaling_type=None, direct_input=False)

The core classs that manages screen capture and coordinate translation.

Parameters:

Parameter	Type	Default	Description
screen	`int`	`0`	The logical screen index (0 is primary, 1 is secondary, etc.).
source_resolution	`tuple`	`None`	The width and height of the monitor where your template images were originally captured.
source_dpr	`float`	`None`	The DPI scaling factor of the source monitor. If None, it is auto-detected.
scaling_type	`str`	`None`	The strategy for coordinate translation: `'dpr'` or `'resolution'`. See Scaling Strategies below for details.
direct_input	`bool`	`False`	WINDOWS ONLY, Uses `'pydirectinput'` (hardware scancodes) instead of `'pynput'` (virtual keys). Essential for applications, games, or Citrix/RDP sessions that ignore standard software-simulated input.

Scaling Strategies

The scaling_type determines how coordinates are translated when moving between screens with different properties. The correct choice depends on how the target application renders its UI.

‘dpr’ (DPI Dependent)
Best for standard Windows applications (e.g., Web Browsers, Office, Discord). These apps respect the Windows “Scale and layout” settings (100%, 125%, 150%). When the scale changes, these apps resize and reposition their UI elements accordingly.
‘resolution’ (Resolution Dependent)
Best for full-screen applications and games. These applications generally ignore Windows DPI scaling and render 1:1 with the screen’s pixel resolution.

Tip

How to choose the right type

To determine if your target application is DPR or Resolution dependent, perform this simple test:

Open the application.
Go to Windows Display Settings -> Scale and layout.
Change the scaling percentage (e.g., from 100% to 125%).

If the application window or UI elements physically resize or move on your screen, use 'dpr'. If the application UI looks exactly the same (common in full-screen games), use 'resolution'.

locateOnScreen(image, region=None, grayscale=False, confidence=0.9, source_resolution=None, time_out=0, downscale=3, use_pyramid=True)

Locates the single best instance of an image on the screen.

Parameters:

Parameter	Type	Default	Description
image	`str` \| `object`	Required	File path to the template image or a PIL image object.
region	`tuple`	`None`	A bounding box `(x, y, w, h)` to limit the search area.
grayscale	`bool`	`False`	If `True`, converts images to grayscale. Faster, but ignores color information.
confidence	`float`	`0.9`	Match strictness (0.0 to 1.0). Higher is stricter.
source_resolution	`tuple`	`None`	Overrides the session source resolution for this specific search.
source_dpr	`tuple`	`None`	Overrides the session source dpr for this specific search.
scaling_type	`tuple`	`None`	Overrides the session scaling type for this specific search.
time_out	`float`	`0`	Maximum seconds to keep retrying if the image is not found immediately.
downscale	`int`	`3`	Scale factor for the initial pyramid search pass (higher is faster).
use_pyramid	`bool`	`True`	Whether to use multi-scale pyramid optimization.

Example:

start_button = session.locateOnScreen(image='images/start_button.png', region=(100,200,400,500), grayscale=True, confidence=0.85, time_out=3)
if start_button:
 print("start_button found")

noindex:

locateAllOnScreen(image, region=None, grayscale=False, confidence=0.9, overlap_threshold=0.5, scaling_type=None, source_resolution=None, time_out=0, downscale=3, use_pyramid=True)

Locates all instances of an image on the screen.

Parameters:

Parameter	Type	Default	Description
image	`str` \| `object`	Required	File path to the template image or a PIL image object.
region	`tuple`	`None`	A bounding box `(x, y, w, h)` to limit the search area.
grayscale	`bool`	`False`	If `True`, converts images to grayscale. Faster, but ignores color information.
confidence	`float`	`0.9`	Match strictness (0.0 to 1.0). Higher is stricter.
overlap_threshold	`float`	`0.5`	Maximum allowed overlap (0.0 to 1.0) between found matches before they are merged.
source_resolution	`tuple`	`None`	Overrides the session source resolution for this specific search.
source_dpr	`tuple`	`None`	Overrides the session source dpr for this specific search.
scaling_type	`tuple`	`None`	Overrides the session scaling type for this specific search.
time_out	`float`	`0`	Maximum seconds to keep retrying if the image is not found immediately.
downscale	`int`	`3`	Scale factor for the initial pyramid search pass (higher is faster).
use_pyramid	`bool`	`True`	Whether to use multi-scale pyramid optimization.

Returns: A list of tuples [(x, y, w, h), ...].

Example:

start_buttons = session.locateAllOnScreen(image='images/start_button.png', region=(100,200,400,500), grayscale=True, confidence=0.85, time_out=3)
for start_button in start_buttons:
 print(f'Found start_button at {start_button}')

locateAny(tasks, time_out=0)

Searches for a list of different images and returns the first one found. Useful for detecting application states (e.g., loading finished).

Parameters:

Parameter	Type	Default	Description
tasks	`list[dict]`	Required	A list of task dictionaries. Each dict must have an `image` key (str or object). Optional keys: `label`, `confidence`, `grayscale`, `region`, `downscale`.
time_out	`float`	`0`	Maximum seconds to keep retrying if the image is not found immediately.

Returns: A tuple (label, match_tuple) or None.

Example:

tasks = [{'label': 'connected', 'task': dict(image='images/connected.png', confidence=0.8, grayscale=False)},
{'label': 'disconnected', 'task': dict(image='images/disconnected.png', confidence=0.8, grayscale=False)}]
result = session.locateAny(tasks)
if result:
label, match = result
if label == 'connected':
    #start automation
elif label == 'disconnect':
    #reconnect

locateAll(tasks, time_out=0)

Searches for multiple different images simultaneously and returns all matches for every image found.

Parameters:

Parameter	Type	Default	Description
tasks	`list[dict]`	Required	List of task dictionaries defining what images to look for.
time_out	`float`	`0`	Maximum seconds to keep retrying if the image is not found immediately.

Returns: A dictionary {label: (match)}. Example:

tasks = [{'label': 'low_health', 'task': dict(image='images/low_health.png', confidence=0.8, grayscale=False)},
{'label': 'low_mana', 'task': dict(image='images/low_mana.png', confidence=0.8, grayscale=False)}]
result = session.locateAll(tasks)
label, match = result
if label == 'low_health':
    #use health potion
elif label == 'low_mana':
    #use mana potion

read_text(region=None, mode='clean', use_det=False)

Captures the screen region and returns found text lines using Windows Native OCR.

Parameters:

Parameter	Type	Default	Description
region	`tuple`	`None`	A bounding box `(x, y, w, h)` to limit the OCR area.
mode	`str`	`'clean'`	Post-processing mode for the text.
use_det	`bool`	`False`	Whether to use additional detection models.

Returns: A list of strings containing the text lines found.

get_pixel(x, y)

Returns the RGB color of the pixel at the specified coordinates relative to the current screen.

Parameters:

Parameter	Type	Default	Description
x	`int`	Required	The X-coordinate of the pixel.
y	`int`	Required	The Y-coordinate of the pixel.

Returns: A tuple (R, G, B) integers representing the color values. Returns None if capture fails.

screenshot(imageFilename=None, region=None)

Takes a screenshot of the screen or a specific region, optionally saves it to a file, and returns a PIL Image object.

Parameters:

Parameter	Type	Default	Description
imageFilename	`str`	`None`	Optional file path to save the captured image (e.g., `'screen.png'`).
region	`tuple`	`None`	A bounding box `(left, top, width, height)` to limit the capture area.

Example:

# Keep the image in memory as a PIL Image object
im = session.screenshot()

# Save the entire screen to a file and return the PIL Image
im_saved = session.screenshot('my_capture.png')

# Capture and save a specific region of the screen
im_region = session.screenshot('my_region.png', region=(0, 0, 300, 400))

Mouse & Keyboard

click(target=None, y=None, offset=(0, 0), button='left', clicks=1, interval=0.2, hold_time=0)

Performs a mouse click. It can target a coordinate, a match result, a list of matches, or just click where the mouse currently is.

Parameters:

Parameter	Type	Default	Description
target	`tuple` \| `list` \| `int`	`None`	A match tuple `(x, y, w, h)`, a coordinate tuple `(x, y)`, a list of matches tuple `[(x, y, w, h), (x, y, w, h)]`, or `None` to click at current mouse position.
y	`int`	`None`	The Y-coordinate (only used if `target` is passed as an X integer).
offset	`tuple`	`(0, 0)`	Shifts the click position by `(x, y)` pixels relative to the target’s center.
button	`str`	`'left'`	Which mouse button to use: `'left'`, `'right'`, `'mouse4'`, `'mouse5'`, or `'middle'`.
clicks	`int`	`1`	Number of times to click (e.g., 2 for double-click).
interval	`float`	`0.2`	Delay in seconds between clicks if clicking multiple times or a list of targets.
hold_time	`float`	`0`	Delay between mouseDown and mouseUp for direct input. Only if direct_input is True in Session

moveTo(x, y, duration=0.0)

Moves the mouse cursor to a specific location.

Parameters:

Parameter	Type	Default	Description
target	`tuple` \| `list` \| `int`	`None`	A match tuple `(x, y, w, h)`, a coordinate tuple `(x, y)`, a list of matches tuple `[(x, y, w, h), (x, y, w, h)]`, or `None` to click at current mouse position.
y	`int`	`None`	The Y-coordinate (only used if `target` is passed as an X integer).
offset	`tuple`	`(0, 0)`	Shifts the click position by `(x, y)` pixels relative to the target’s center.
duration	`float`	`0.0`	Time in seconds to slide the mouse to the target. `0.0` moves instantly.

write(message, interval=0.0)

Types a string of text using the keyboard.

Parameters:

Parameter	Type	Default	Description
message	`str`	Required	The text string to type.
interval	`float`	`0.0`	Delay in seconds between each character typed.

press(key)

Presses a single keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key

Parameters:

Parameter	Type	Default	Description
key	`str`	Required	The name of the key (e.g., `"esc"`, `"f1"`, `"space"`) or a single character.

scroll(clicks, duration=0.0)

Scrolls the mouse wheel vertically.

Parameters:

Parameter	Type	Default	Description
clicks	`int`	Required	The number of steps to scroll. Positive values scroll up, negative values scroll down.
duration	`float`	`0.0`	The total time (in seconds) to complete the scroll. If greater than 0, the scroll is performed incrementally over the specified duration.

keyDown(key)

Presses and holds down a single keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key

Parameters:

Parameter	Type	Default	Description
key	`str`	Required	The name of the key (e.g., `"esc"`, `"f1"`, `"space"`) or a single character.

keyUp(key)

Releases keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key

Parameters:

Parameter	Type	Default	Description
key	`str`	Required	The name of the key (e.g., `"esc"`, `"f1"`, `"space"`) or a single character.

mouseDown(button='left')

Presses and holds a mouse button. The button will remain pressed until mouseUp() is called.

Parameters:

Parameter	Type	Default	Description
button	`str`	`"left"`	The mouse button to press. Accepted values: `"left"`, `"right"`, `"middle"`, `"mouse4"`, `"mouse5"`.

mouseUp(button='left')

Releases a previously held mouse button.

Parameters:

Parameter	Type	Default	Description
button	`str`	`"left"`	The mouse button to release. Accepted values: `"left"`, `"right"`, `"middle"`, `"mouse4"`, `"mouse5"`.

Window Control

Helper functions to manage application windows. These functions can target windows by Title (string) or PID (integer).

find_window(target)

Finds a window by Title (string) or PID (int). Returns the first matching window object or None.

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or Process ID (PID).

move_window(target, x, y)

Moves the top-left corner of the window to (x, y).

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.
x	`int`	Required	The target X-coordinate.
y	`int`	Required	The target Y-coordinate.

resize_window(target, width, height)

Resizes the window to the specified width and height.

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.
width	`int`	Required	The target width in pixels.
height	`int`	Required	The target height in pixels.

focus_window(target)

Brings the window to the foreground. Automatically un-minimizes (restores) it if hidden.

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.

maximize_window(target)

Maximizes the specified window.

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.

minimize_window(target)

Minimizes the specified window.

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.

get_window_info(target)

Returns a dictionary of window properties (x, y, width, height, title, pid).

Parameters:

Parameter	Type	Default	Description
target	`str` \| `int`	Required	The Window Title or PID.

get_focused_window()

Returns a dictionary of active window properties (title, x, y, width, height, pid).

Example:

active_window = pyauto_desktop.get_focused_window()
title = active_window.get('title')
x = active_window.get('x')
y = active_window.get('y')
width = active_window.get('width')
height = active_window.get('height')
pid = active_window.get('pid') #Windows only