Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[js/webgpu] Enable GroupedConvVectorize path #19791

Merged
merged 6 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[js/webgpu] Enable GroupedConvVectorize path
Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we
couldn't repro with all the GPUs at hand, including NVIDIA GPUs.
This PR introduces GPUAdapterInfo and enables this opt on non-NVIDIA
GPUs to make the bots happy.
No obivous perf gain can be seen if we enable vectorize on NVIDIA.
However, it shows big perf improvement on Intel. On my Gen12 Intel
GPU, mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.
  • Loading branch information
gyagp committed Mar 6, 2024
commit f98e9fe4030f1d3ef082d0bc308c689027ed3631
18 changes: 18 additions & 0 deletions js/web/lib/wasm/jsep/backend-webgpu.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,27 @@ const getProgramInfoUniqueKey =
return key;
};

export class AdapterInfo {
fs-eire marked this conversation as resolved.
Show resolved Hide resolved
private vendor: string;

constructor(adapterInfo: GPUAdapterInfo) {
if (adapterInfo) {
this.vendor = adapterInfo.vendor;
}
}

// vendor could be intel, nvidia, amd, etc.
isVendor(vendor: string): boolean {
return this.vendor === vendor;
}
}

/**
* this class is designed to store status and being used as a singleton for JSEP. It will be passed to jsepInit() as
* the first parameter so that it is stored for future use.
*/
export class WebGpuBackend {
adapterInfo: AdapterInfo;
device: GPUDevice;
/**
* an instance of GpuDataManager to manage a GpuDataId -> GpuBuffer mapping
Expand Down Expand Up @@ -212,6 +228,8 @@ export class WebGpuBackend {
}

this.device = await adapter.requestDevice(deviceDescriptor);
const adapterInfo = await adapter.requestAdapterInfo();
this.adapterInfo = new AdapterInfo(adapterInfo);
this.gpuDataManager = createGpuDataManager(this);
this.programManager = new ProgramManager(this);
this.kernels = new Map();
Expand Down
4 changes: 3 additions & 1 deletion js/web/lib/wasm/jsep/init.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import {Env} from 'onnxruntime-common';
import {OrtWasmModule} from '../binding/ort-wasm';
import {DataType, getTensorElementSize} from '../wasm-common';

import {WebGpuBackend} from './backend-webgpu';
import {AdapterInfo, WebGpuBackend} from './backend-webgpu';
import {LOG_DEBUG} from './log';
import {TensorView} from './tensor-view';
import {ShapeUtil} from './util';
Expand Down Expand Up @@ -54,6 +54,7 @@ class TensorViewImpl implements TensorView {
}

class ComputeContextImpl implements ComputeContext {
readonly adapterInfo: AdapterInfo;
fs-eire marked this conversation as resolved.
Show resolved Hide resolved
readonly opKernelContext: number;
readonly inputs: readonly TensorView[];
readonly outputCount: number;
Expand All @@ -66,6 +67,7 @@ class ComputeContextImpl implements ComputeContext {
private customDataOffset = 0;
private customDataSize = 0;
constructor(private module: OrtWasmModule, private backend: WebGpuBackend, contextDataOffset: number) {
this.adapterInfo = backend.adapterInfo;
const heapU32 = module.HEAPU32;

// extract context data
Expand Down
7 changes: 4 additions & 3 deletions js/web/lib/wasm/jsep/webgpu/ops/conv.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,11 +148,12 @@ const conv2d = (context: ComputeContext, inputs: readonly TensorView[], attribut
// const hasPreluActivationWeights = false; /* TODO: add support for prelu activation weights */
const isChannelsLast = attributes.format === 'NHWC';
if (attributes.group !== 1) {
// Temporarily disable createGroupedConvVectorizeProgramInfo path due to bots failures with below two cases:
// One CI bot with NVIDIA GPU fails with below 2 cases, but we couldn't repro them with any other GPUs, including NVIDIA ones.
// [webgpu]Conv - conv - vectorize group - B
// [webgpu]Conv - conv - vectorize group - D
const disableGroupedConvVectorize = true;
if (!disableGroupedConvVectorize && isChannelsLast && inputs[1].dims[0] === attributes.group &&
// Disable vectorize on NVIDIA to make bots happy. BTW, no obvious perf gain with vectorize is seen on NVIDIA GPUs.
const enableGroupedConvVectorize = context.adapterInfo.isVendor('nvidia') ? false : true;
fs-eire marked this conversation as resolved.
Show resolved Hide resolved
if (enableGroupedConvVectorize && isChannelsLast && inputs[1].dims[0] === attributes.group &&
inputs[1].dims[1] === 1 && attributes.dilations[0] === 1 && attributes.dilations[1] === 1) {
const outputShape = calculateOutputShape(
inputs[0].dims, inputs[1].dims, attributes.dilations, adjustedAttributes.pads, attributes.strides,
Expand Down
6 changes: 6 additions & 0 deletions js/web/lib/wasm/jsep/webgpu/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// Licensed under the MIT License.

import {DataType} from '../../wasm-common';
import {AdapterInfo} from '../backend-webgpu'
import {TensorView} from '../tensor-view';

import {ShaderHelper} from './ops/common';
Expand Down Expand Up @@ -146,6 +147,11 @@ export interface ComputeContextInputsOutputsMapping {
* A ComputeContext instance carries the states that representing the current running of a kernel.
*/
export interface ComputeContext {
/**
* gpu adapter info
*/
readonly adapterInfo: AdapterInfo;

/**
* stores the pointer to OpKernelContext
*/
Expand Down
Loading