PyAuto Desktop Documentation
Welcome to the official documentation for PyAuto Desktop. This library offers high-performance, thread-safe desktop automation tools including computer vision and input control.
Starter Guide
1. The Inspector Tool
PyAuto Desktop includes a GUI tool called the Inspector. Use this to capture screenshots (“needles”), test matches against your screen, and automatically generate the Python code for your script.
# Opens the GUI for snipping, editing, testing, and code generation
pyauto_desktop.inspector()
2. Simple Example
Here is a basic example of locating an image and clicking it.
import pyauto_desktop
# Initialize a session for the primary monitor
session = pyauto_desktop.Session(
screen=0,
source_resolution=(2560, 1440),
source_dpr=1.25,
scaling_type="dpr"
)
submit_btn = session.locateOnScreen("submit_btn.png", confidence=0.8)
# Find an image and click it
if submit_btn:
print("Button found!")
session.click("submit_btn.png")
Screen & Vision
- class Session(screen=0, source_resolution=None, source_dpr=None, scaling_type=None, direct_input=False)
The core classs that manages screen capture and coordinate translation.
Parameters:
Parameter
Type
Default
Description
screen
int0The logical screen index (0 is primary, 1 is secondary, etc.).
source_resolution
tupleNoneThe width and height of the monitor where your template images were originally captured.
source_dpr
floatNoneThe DPI scaling factor of the source monitor. If None, it is auto-detected.
scaling_type
strNoneThe strategy for coordinate translation:
'dpr'or'resolution'. See Scaling Strategies below for details.direct_input
boolFalseWINDOWS ONLY, Uses
'pydirectinput'(hardware scancodes) instead of'pynput'(virtual keys). Essential for applications, games, or Citrix/RDP sessions that ignore standard software-simulated input.Scaling Strategies
The
scaling_typedetermines how coordinates are translated when moving between screens with different properties. The correct choice depends on how the target application renders its UI.- ‘dpr’ (DPI Dependent)
Best for standard Windows applications (e.g., Web Browsers, Office, Discord). These apps respect the Windows “Scale and layout” settings (100%, 125%, 150%). When the scale changes, these apps resize and reposition their UI elements accordingly.
- ‘resolution’ (Resolution Dependent)
Best for full-screen applications and games. These applications generally ignore Windows DPI scaling and render 1:1 with the screen’s pixel resolution.
Tip
How to choose the right type
To determine if your target application is DPR or Resolution dependent, perform this simple test:
Open the application.
Go to Windows Display Settings -> Scale and layout.
Change the scaling percentage (e.g., from 100% to 125%).
If the application window or UI elements physically resize or move on your screen, use
'dpr'. If the application UI looks exactly the same (common in full-screen games), use'resolution'.
- locateOnScreen(image, region=None, grayscale=False, confidence=0.9, source_resolution=None, time_out=0, downscale=3, use_pyramid=True)
Locates the single best instance of an image on the screen.
Parameters:
Parameter
Type
Default
Description
image
str|objectRequired
File path to the template image or a PIL image object.
region
tupleNoneA bounding box
(x, y, w, h)to limit the search area.grayscale
boolFalseIf
True, converts images to grayscale. Faster, but ignores color information.confidence
float0.9Match strictness (0.0 to 1.0). Higher is stricter.
source_resolution
tupleNoneOverrides the session source resolution for this specific search.
source_dpr
tupleNoneOverrides the session source dpr for this specific search.
scaling_type
tupleNoneOverrides the session scaling type for this specific search.
time_out
float0Maximum seconds to keep retrying if the image is not found immediately.
downscale
int3Scale factor for the initial pyramid search pass (higher is faster).
use_pyramid
boolTrueWhether to use multi-scale pyramid optimization.
Example:
start_button = session.locateOnScreen(image='images/start_button.png', region=(100,200,400,500), grayscale=True, confidence=0.85, time_out=3) if start_button: print("start_button found")
- noindex:
- locateAllOnScreen(image, region=None, grayscale=False, confidence=0.9, overlap_threshold=0.5, scaling_type=None, source_resolution=None, time_out=0, downscale=3, use_pyramid=True)
Locates all instances of an image on the screen.
Parameters:
Parameter
Type
Default
Description
image
str|objectRequired
File path to the template image or a PIL image object.
region
tupleNoneA bounding box
(x, y, w, h)to limit the search area.grayscale
boolFalseIf
True, converts images to grayscale. Faster, but ignores color information.confidence
float0.9Match strictness (0.0 to 1.0). Higher is stricter.
overlap_threshold
float0.5Maximum allowed overlap (0.0 to 1.0) between found matches before they are merged.
source_resolution
tupleNoneOverrides the session source resolution for this specific search.
source_dpr
tupleNoneOverrides the session source dpr for this specific search.
scaling_type
tupleNoneOverrides the session scaling type for this specific search.
time_out
float0Maximum seconds to keep retrying if the image is not found immediately.
downscale
int3Scale factor for the initial pyramid search pass (higher is faster).
use_pyramid
boolTrueWhether to use multi-scale pyramid optimization.
Returns: A list of tuples
[(x, y, w, h), ...].Example:
start_buttons = session.locateAllOnScreen(image='images/start_button.png', region=(100,200,400,500), grayscale=True, confidence=0.85, time_out=3) for start_button in start_buttons: print(f'Found start_button at {start_button}')
- locateAny(tasks, time_out=0)
Searches for a list of different images and returns the first one found. Useful for detecting application states (e.g., loading finished).
Parameters:
Parameter
Type
Default
Description
tasks
list[dict]Required
A list of task dictionaries. Each dict must have an
imagekey (str or object). Optional keys:label,confidence,grayscale,region,downscale.time_out
float0Maximum seconds to keep retrying if the image is not found immediately.
Returns: A tuple
(label, match_tuple)orNone.Example:
tasks = [{'label': 'connected', 'task': dict(image='images/connected.png', confidence=0.8, grayscale=False)}, {'label': 'disconnected', 'task': dict(image='images/disconnected.png', confidence=0.8, grayscale=False)}] result = session.locateAny(tasks) if result: label, match = result if label == 'connected': #start automation elif label == 'disconnect': #reconnect
- locateAll(tasks, time_out=0)
Searches for multiple different images simultaneously and returns all matches for every image found.
Parameters:
Parameter
Type
Default
Description
tasks
list[dict]Required
List of task dictionaries defining what images to look for.
time_out
float0Maximum seconds to keep retrying if the image is not found immediately.
Returns: A dictionary
{label: (match)}. Example:tasks = [{'label': 'low_health', 'task': dict(image='images/low_health.png', confidence=0.8, grayscale=False)}, {'label': 'low_mana', 'task': dict(image='images/low_mana.png', confidence=0.8, grayscale=False)}] result = session.locateAll(tasks) label, match = result if label == 'low_health': #use health potion elif label == 'low_mana': #use mana potion
- read_text(region=None, mode='clean', use_det=False)
Captures the screen region and returns found text lines using Windows Native OCR.
Parameters:
Parameter
Type
Default
Description
region
tupleNoneA bounding box
(x, y, w, h)to limit the OCR area.mode
str'clean'Post-processing mode for the text.
use_det
boolFalseWhether to use additional detection models.
Returns: A list of strings containing the text lines found.
- get_pixel(x, y)
Returns the RGB color of the pixel at the specified coordinates relative to the current screen.
Parameters:
Parameter
Type
Default
Description
x
intRequired
The X-coordinate of the pixel.
y
intRequired
The Y-coordinate of the pixel.
Returns: A tuple
(R, G, B)integers representing the color values. ReturnsNoneif capture fails.
- screenshot(imageFilename=None, region=None)
Takes a screenshot of the screen or a specific region, optionally saves it to a file, and returns a PIL Image object.
Parameters:
Parameter
Type
Default
Description
imageFilename
strNoneOptional file path to save the captured image (e.g.,
'screen.png').region
tupleNoneA bounding box
(left, top, width, height)to limit the capture area.Example:
# Keep the image in memory as a PIL Image object im = session.screenshot() # Save the entire screen to a file and return the PIL Image im_saved = session.screenshot('my_capture.png') # Capture and save a specific region of the screen im_region = session.screenshot('my_region.png', region=(0, 0, 300, 400))
Mouse & Keyboard
- click(target=None, y=None, offset=(0, 0), button='left', clicks=1, interval=0.2, hold_time=0)
Performs a mouse click. It can target a coordinate, a match result, a list of matches, or just click where the mouse currently is.
Parameters:
Parameter
Type
Default
Description
target
tuple|list|intNoneA match tuple
(x, y, w, h), a coordinate tuple(x, y), a list of matches tuple[(x, y, w, h), (x, y, w, h)], orNoneto click at current mouse position.y
intNoneThe Y-coordinate (only used if
targetis passed as an X integer).offset
tuple(0, 0)Shifts the click position by
(x, y)pixels relative to the target’s center.button
str'left'Which mouse button to use:
'left','right','mouse4','mouse5', or'middle'.clicks
int1Number of times to click (e.g., 2 for double-click).
interval
float0.2Delay in seconds between clicks if clicking multiple times or a list of targets.
hold_time
float0Delay between mouseDown and mouseUp for direct input. Only if direct_input is True in Session
- moveTo(x, y, duration=0.0)
Moves the mouse cursor to a specific location.
Parameters:
Parameter
Type
Default
Description
target
tuple|list|intNoneA match tuple
(x, y, w, h), a coordinate tuple(x, y), a list of matches tuple[(x, y, w, h), (x, y, w, h)], orNoneto click at current mouse position.y
intNoneThe Y-coordinate (only used if
targetis passed as an X integer).offset
tuple(0, 0)Shifts the click position by
(x, y)pixels relative to the target’s center.duration
float0.0Time in seconds to slide the mouse to the target.
0.0moves instantly.
- write(message, interval=0.0)
Types a string of text using the keyboard.
Parameters:
Parameter
Type
Default
Description
message
strRequired
The text string to type.
interval
float0.0Delay in seconds between each character typed.
- press(key)
Presses a single keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key
Parameters:
Parameter
Type
Default
Description
key
strRequired
The name of the key (e.g.,
"esc","f1","space") or a single character.
- scroll(clicks, duration=0.0)
Scrolls the mouse wheel vertically.
Parameters:
Parameter
Type
Default
Description
clicks
intRequired
The number of steps to scroll. Positive values scroll up, negative values scroll down.
duration
float0.0The total time (in seconds) to complete the scroll. If greater than 0, the scroll is performed incrementally over the specified duration.
- keyDown(key)
Presses and holds down a single keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key
Parameters:
Parameter
Type
Default
Description
key
strRequired
The name of the key (e.g.,
"esc","f1","space") or a single character.
- keyUp(key)
Releases keyboard key. Key list can be found here: https://pynput.readthedocs.io/en/latest/keyboard.html#pynput.keyboard.Key
Parameters:
Parameter
Type
Default
Description
key
strRequired
The name of the key (e.g.,
"esc","f1","space") or a single character.
- mouseDown(button='left')
Presses and holds a mouse button. The button will remain pressed until
mouseUp()is called.Parameters:
Parameter
Type
Default
Description
button
str"left"The mouse button to press. Accepted values:
"left","right","middle","mouse4","mouse5".
- mouseUp(button='left')
Releases a previously held mouse button.
Parameters:
Parameter
Type
Default
Description
button
str"left"The mouse button to release. Accepted values:
"left","right","middle","mouse4","mouse5".
Window Control
Helper functions to manage application windows. These functions can target windows by Title (string) or PID (integer).
- find_window(target)
Finds a window by Title (string) or PID (int). Returns the first matching window object or
None.Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or Process ID (PID).
- move_window(target, x, y)
Moves the top-left corner of the window to (x, y).
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
x
intRequired
The target X-coordinate.
y
intRequired
The target Y-coordinate.
- resize_window(target, width, height)
Resizes the window to the specified width and height.
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
width
intRequired
The target width in pixels.
height
intRequired
The target height in pixels.
- focus_window(target)
Brings the window to the foreground. Automatically un-minimizes (restores) it if hidden.
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
- maximize_window(target)
Maximizes the specified window.
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
- minimize_window(target)
Minimizes the specified window.
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
- get_window_info(target)
Returns a dictionary of window properties (x, y, width, height, title, pid).
Parameters:
Parameter
Type
Default
Description
target
str|intRequired
The Window Title or PID.
- get_focused_window()
Returns a dictionary of active window properties (title, x, y, width, height, pid).
Example:
active_window = pyauto_desktop.get_focused_window() title = active_window.get('title') x = active_window.get('x') y = active_window.get('y') width = active_window.get('width') height = active_window.get('height') pid = active_window.get('pid') #Windows only