Frequently Asked Questions

How does browser-based AI upscaling work?

imagemochi loads a lightweight deep learning model (RealESRGAN, ~5MB) directly into your browser using ONNX Runtime Web. The AI runs on your device's GPU via WebGPU, or falls back to CPU via WebAssembly. Your images never leave your computer — all processing happens locally.

What's the difference between 2x and 4x upscaling?

2x doubles your image dimensions (e.g., 500×500 → 1000×1000) and 4x quadruples them (500×500 → 2000×2000). 4x takes longer but produces much larger output. Both use the same AI model for detail enhancement.

Which AI model should I choose?

General (5MB) works best for photographs, screenshots, and most images. Illustration (18MB) is optimized for anime, digital art, and illustrations with flat colors and sharp lines. When in doubt, start with General.

Is there a file size or resolution limit?

Input images up to 4096×4096 pixels are supported. Larger images are automatically resized before upscaling. For best speed, keep inputs under 2048×2048. The AI processes images in small tiles, so even large images work without running out of memory.

Does this work on mobile devices?

Yes! On mobile, the AI uses WebAssembly (CPU) instead of WebGPU. It's slower than desktop but fully functional. A 512×512 image takes about 10-15 seconds on a modern phone.

Is this really free? What's the catch?

100% free, no signup required. Since the AI runs in your browser (not our server), there's no GPU cost for us. Your images never leave your device, so there's no privacy concern either.

AI Upscaling — how it works, when to use it, and where it breaks

This page is more than a button — it's a real RealESRGAN deep-learning model running in your browser. This section explains what the model does, when a 2× pass beats a 4× pass, why anime art gets a separate model, and what limits the algorithm cannot cross. If you want to go deeper into super-resolution theory, this is the place.

Bicubic resize vs. neural upscaling — the core difference

A bicubic or Lanczos resize takes each existing pixel and interpolates the values between it and its neighbours using a fixed mathematical formula. The output image has more pixels, but no additional information — a straight line stays straight, but an edge that was blurred at the source will still be blurred. Neural upscalers flip that model. RealESRGAN was trained on millions of paired low- and high-resolution examples until it learned to synthesize plausible detail: the shape of an eyelash, the pattern of a wood grain, the micro-contrast of skin texture. The output is mathematically "invented," but because the invention is statistically consistent with what high-resolution versions of similar images actually look like, the result reads as natural rather than hallucinated.

Model architecture and file sizes

RealESRGAN-general-x4v3 (4.6 MB, used for photos) — the lightweight production model from Alibaba, fine-tuned to handle real-world degradation patterns (JPEG compression, blur, noise) rather than the pristine synthetic downsamples older models were trained on.
RealESRGAN-x4-anime (18 MB, used automatically if the image is detected as anime or illustration) — specialized for line art and flat-shaded illustration where photographic textures would create unwanted noise. Automatic detection uses edge density and colour histogram.
Face-slim-320 (utility model, not an upscaler itself) — runs only when the "Face Auto" preset is on, detecting facial regions so that the upscaler can be conditioned to preserve identity and avoid the "plastic face" problem common in older SR pipelines.

All three run via ONNX Runtime Web with tile-based inference. A 2048×2048 input is split into 384-pixel tiles on WebGPU (or 192 on WebAssembly fallback), each tile is upscaled independently, and tiles are feathered back together at the seams. This is what lets a 20-megapixel photo be processed without blowing out the browser's memory limit.

When to use 2× and when to use 4×

Use 2× when the source is already decent but needs sharper detail. A 1600×1200 photo going to a 4×6 print at 300 DPI needs 1200×1800 pixels — you're actually downsampling. But if you want that same photo at 11×14 or poster size, 2× gets you to 3200×2400 which covers most home-printer needs with sharper edges and clearer text than a bicubic resize.

Use 4× when the source is small, compressed, or degraded. Thumbnails from old websites, social media screenshots, phone photos saved at low quality, vintage digital camera shots — these benefit dramatically from 4×. The model has more "room" to synthesize detail, and the input's deficiencies are more forgiving at that scale. 4× is also the right choice when you're planning to crop aggressively — every pixel counts once you start cutting.

Don't use 4× on already-large files. Going from 4000×3000 to 16000×12000 produces a file that's hard to work with, fills up memory, and rarely has enough source detail to justify the scale. The extra synthesized detail becomes visible as patterns or textures that weren't in the original — it's technically "more data" but perceptually less convincing. 2× is usually the better choice above ~2 megapixels.

Where neural upscalers still fail

Heavily compressed JPEGs with visible block artefacts get sharpened in the wrong way — the block edges themselves become "features" the model preserves. Solution: pre-process with the Compress tool at high quality first to smooth blocks, then upscale.
Text on complex backgrounds can produce visible warping where the model tries to synthesize letter forms that aren't quite in the training distribution. Solution: stay at 2× for text-heavy images.
Faces under heavy blur or motion smear are the hardest case — the model will produce a face, but it may not look like the same person. Solution: use Face Auto preset, which conditions the model to be more conservative on facial regions.
Patterns with strong regularity (fabric, chain-link fences, roof tiles) can develop moiré-like interference. Solution: try the anime model even on photos with such patterns — it's more conservative on repetition.

The TTA preset and why it's slower

TTA stands for "test-time augmentation." When you enable it, the model runs the image through four rotations and averages the results. Each rotation is a separate 384-tile pass, so total compute is roughly 4× higher. The payoff is noticeably sharper edges and fewer micro-artefacts, because the averaging cancels out direction-dependent errors. Turn it on when you care more about quality than time — portrait prints, product photography, anything destined for physical reproduction. Turn it off for batch-processing social-media thumbnails where the extra quality wouldn't be visible at the final display size anyway.

Privacy — does the model "see" my image?

The model runs entirely in your browser via WebGPU or WebAssembly. The model weights are downloaded once and cached in IndexedDB so subsequent uses don't re-download. The image pixels never leave your machine. We log anonymous performance stats (time to process, device tier, tile count) but never the image content or any derivative of it. You can verify this yourself: open DevTools → Network, clear it, then upscale an image, and observe that no outbound request contains binary image data.

What Pro unlocks and why

4× upscale is Pro-gated because it's the most compute-intensive operation on the site — a 4× pass on a 10-megapixel photo can run 30+ seconds on a mid-range laptop, which multiplied across a batch of 50 images adds up. Pro also unlocks batch mode (drop 50 images, get a ZIP back), the offline PWA (process when your internet is out), and 30-day workflow history (Free accounts keep 30 minutes). See pricing for details. No feature on this site is artificially crippled to push upgrades — Free gives you unlimited 2× upscales forever.

AI Upscale

Drop your image here 🍡

How to Upscale Images with AI Online