Introduction to Partial Derivatives
Partial derivatives extend the concept of derivatives to functions of multiple variables. While ordinary derivatives measure how a function changes with respect to one variable, partial derivatives measure how a multivariable function changes when only one variable is varied while others are held constant.
Why Partial Derivatives Matter:
- Essential for understanding surfaces and functions in higher dimensions
- Foundation for gradient descent optimization algorithms
- Critical in physics for describing rates of change in multiple dimensions
- Key component in machine learning for backpropagation
- Used in economics for marginal analysis with multiple variables
Real-World Example:
The temperature T(x, y, z, t) at a point (x, y, z) at time t has partial derivatives:
∂T/∂x: Rate of temperature change in x-direction
∂T/∂t: Rate of temperature change over time
Definition & Notation
The partial derivative of a function f(x, y) with respect to x at a point (a, b) is defined as:
Similarly, the partial derivative with respect to y is:
Common Notations for Partial Derivatives
The partial derivative ∂f/∂x(a, b) represents the slope of the tangent line to the curve formed by intersecting the surface z = f(x, y) with the plane y = b.
Visualization: Partial derivative as slope in x-direction
Imagine slicing the surface with a vertical plane y = constant
The resulting curve's slope at x = a is ∂f/∂x(a, b)
Computation Methods
Computing partial derivatives follows the same rules as ordinary derivatives, with one key difference: treat all other variables as constants.
Basic Rule
To compute ∂f/∂x, treat y as a constant and differentiate with respect to x using ordinary derivative rules.
Example: f(x, y) = x²y + sin(x)
∂f/∂x = 2xy + cos(x)
Product Rule
For f(x, y) = u(x, y)·v(x, y):
∂f/∂x = (∂u/∂x)·v + u·(∂v/∂x)
Example: f(x, y) = x·exy
∂f/∂x = 1·exy + x·y·exy
Quotient Rule
For f(x, y) = u(x, y)/v(x, y):
∂f/∂x = [(∂u/∂x)·v - u·(∂v/∂x)] / v²
Example: f(x, y) = x/(x² + y²)
Chain Rule
For f(x, y) = g(h(x, y)):
∂f/∂x = g'(h(x, y))·(∂h/∂x)
Example: f(x, y) = sin(x² + y)
∂f/∂x = cos(x² + y)·2x
Compute ∂f/∂x:
1. Differentiate x³y²: ∂/∂x(x³y²) = 3x²y² (treat y² as constant)
2. Differentiate exy: ∂/∂x(exy) = y·exy (chain rule)
3. Differentiate ln(x² + y): ∂/∂x(ln(x² + y)) = 2x/(x² + y)
Result: ∂f/∂x = 3x²y² + y·exy + 2x/(x² + y)
Compute ∂f/∂y:
1. Differentiate x³y²: ∂/∂y(x³y²) = 2x³y (treat x³ as constant)
2. Differentiate exy: ∂/∂y(exy) = x·exy
3. Differentiate ln(x² + y): ∂/∂y(ln(x² + y)) = 1/(x² + y)
Result: ∂f/∂y = 2x³y + x·exy + 1/(x² + y)
Partial Derivative Calculator
The Gradient Vector
The gradient of a scalar function f(x, y) is a vector field that points in the direction of the greatest rate of increase of the function.
For functions of three variables:
Direction of Steepest Ascent
The gradient vector ∇f points in the direction where f increases most rapidly.
Magnitude: ||∇f|| gives the rate of increase in that direction.
Direction of Steepest Descent
-∇f points in the direction where f decreases most rapidly.
This is the basis for gradient descent optimization algorithms.
Level Curves & Surfaces
The gradient is perpendicular to level curves (2D) or level surfaces (3D).
For f(x, y) = c, ∇f is normal to the level curve at each point.
Directional Derivative
The rate of change of f in direction u (unit vector):
Duf = ∇f·u = ||∇f|| cos θ
Compute gradient:
∂f/∂x = 2x
∂f/∂y = 2y
∇f(x, y) = (2x, 2y)
At point (1, 2):
∇f(1, 2) = (2, 4)
Direction of steepest ascent: (2, 4) (or normalized: (1/√5, 2/√5))
Rate of increase: ||∇f|| = √(2² + 4²) = √20 ≈ 4.47
Gradient Vector Field Visualization
For f(x, y) = x² + y², gradient vectors point radially outward
Length increases with distance from origin
Gradient Calculator
Chain Rule for Partial Derivatives
The chain rule for multivariable functions has several forms depending on how variables depend on each other.
Case 1: z = f(x, y), x = x(t), y = y(t)
dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt)
Example: f(x, y) = x²y, x = cos(t), y = sin(t)
dz/dt = 2xy·(-sin(t)) + x²·cos(t)
Case 2: z = f(x, y), x = x(s, t), y = y(s, t)
∂z/∂s = (∂f/∂x)(∂x/∂s) + (∂f/∂y)(∂y/∂s)
∂z/∂t = (∂f/∂x)(∂x/∂t) + (∂f/∂y)(∂y/∂t)
Case 3: Implicit Differentiation
For F(x, y) = 0:
dy/dx = -(∂F/∂x)/(∂F/∂y)
Example: x² + y² = 1
dy/dx = -x/y
Tree Diagram Method
Draw branches from dependent variable to intermediate variables to independent variables.
Sum over all paths from z to t.
Compute ∂z/∂s:
1. ∂z/∂x = 2xy
2. ∂z/∂y = x²
3. ∂x/∂s = t
4. ∂y/∂s = 1
Chain rule: ∂z/∂s = (∂z/∂x)(∂x/∂s) + (∂z/∂y)(∂y/∂s)
∂z/∂s = (2xy)(t) + (x²)(1) = 2xyt + x²
Substitute x = st, y = s + t: ∂z/∂s = 2(st)(s + t)(t) + (st)²
Final: ∂z/∂s = 2s²t² + 2st³ + s²t² = 3s²t² + 2st³
Chain Rule Practice
Higher-Order Partial Derivatives
Just as with single-variable functions, we can take partial derivatives of partial derivatives.
Notation for Second Partial Derivatives
Clairaut's Theorem (Equality of Mixed Partials):
If fxy and fyx are continuous at a point, then they are equal at that point:
∂²f/∂x∂y = ∂²f/∂y∂x
First partials:
fx = 3x²y² + y·cos(xy)
fy = 2x³y + x·cos(xy)
Second partials:
fxx = ∂/∂x(3x²y² + y·cos(xy)) = 6xy² - y²·sin(xy)
fyy = ∂/∂y(2x³y + x·cos(xy)) = 2x³ - x²·sin(xy)
fxy = ∂/∂y(3x²y² + y·cos(xy)) = 6x²y + cos(xy) - xy·sin(xy)
fyx = ∂/∂x(2x³y + x·cos(xy)) = 6x²y + cos(xy) - xy·sin(xy)
Note: fxy = fyx as expected from Clairaut's theorem
Higher-Order Derivatives Calculator
Applications of Partial Derivatives
Partial derivatives have wide-ranging applications across science, engineering, and economics.
Physics: Heat Equation
The heat equation describes temperature distribution:
∂u/∂t = α(∂²u/∂x² + ∂²u/∂y² + ∂²u/∂z²)
where u(x, y, z, t) is temperature and α is thermal diffusivity.
Fluid Dynamics
Navier-Stokes equations for fluid flow:
ρ(∂v/∂t + v·∇v) = -∇p + μ∇²v + f
where v is velocity, p is pressure, ρ is density, μ is viscosity.
Economics: Marginal Analysis
For production function Q(K, L):
∂Q/∂K: Marginal product of capital
∂Q/∂L: Marginal product of labor
Used to optimize production with limited resources.
Signal Processing
Image processing uses partial derivatives for edge detection:
Sobel operator: G = √[(∂I/∂x)² + (∂I/∂y)²]
where I(x, y) is image intensity.
Production function: Q(K, L) = A·Kα·Lβ
where K = capital, L = labor, A = total factor productivity
Marginal products:
MPK = ∂Q/∂K = α·A·Kα-1·Lβ
MPL = ∂Q/∂L = β·A·Kα·Lβ-1
Interpretation:
MPK tells us how much additional output we get from one more unit of capital
MPL tells us how much additional output we get from one more unit of labor
Optimization with Partial Derivatives
Partial derivatives are essential for finding maximum and minimum values of multivariable functions.
Critical Points
A point (a, b) is critical if:
∂f/∂x(a, b) = 0 and ∂f/∂y(a, b) = 0
or if either partial derivative does not exist.
Second Derivative Test
Let D = fxxfyy - (fxy)²
• D > 0, fxx > 0: Local minimum
• D > 0, fxx < 0: Local maximum
• D < 0: Saddle point
• D = 0: Test inconclusive
Lagrange Multipliers
For constrained optimization: maximize f(x, y) subject to g(x, y) = k
Solve: ∇f = λ∇g, g(x, y) = k
where λ is the Lagrange multiplier.
Gradient Descent
Iterative optimization algorithm:
xn+1 = xn - γ∇f(xn)
where γ is the learning rate.
Step 1: Find critical points
fx = 3x² - 3y = 0 → y = x²
fy = 3y² - 3x = 0 → x = y²
Substitute y = x² into x = y²: x = (x²)² = x⁴
x⁴ - x = x(x³ - 1) = 0 → x = 0 or x = 1
Critical points: (0, 0) and (1, 1)
Step 2: Second derivative test
fxx = 6x, fyy = 6y, fxy = -3
D = fxxfyy - (fxy)² = 36xy - 9
At (0, 0): D = 0·0 - 9 = -9 < 0 → Saddle point
At (1, 1): D = 36·1·1 - 9 = 27 > 0, fxx = 6 > 0 → Local minimum
Optimization Calculator
Partial Derivatives in Machine Learning
Partial derivatives are fundamental to training neural networks through backpropagation.
Gradient Descent for Neural Networks
For loss function L(θ) where θ = (θ₁, θ₂, ..., θₙ):
θi(t+1) = θi(t) - η·∂L/∂θi
where η is the learning rate.
Backpropagation
Chain rule applied to compute gradients through network layers:
∂L/∂wij = (∂L/∂aj)(∂aj/∂zj)(∂zj/∂wij)
where aj is activation, zj is weighted sum.
Stochastic Gradient Descent
Uses gradient computed on random mini-batches:
θ ← θ - η∇LB(θ)
where LB is loss on mini-batch B.
Faster convergence for large datasets.
Optimizers
Advanced gradient-based optimizers:
• Adam: Adaptive moment estimation
• RMSprop: Root mean square propagation
• Adagrad: Adaptive gradient algorithm
Network: Single neuron with sigmoid activation
z = w·x + b
a = σ(z) = 1/(1 + e-z)
L = ½(y - a)² (mean squared error)
Gradients for backpropagation:
∂L/∂a = -(y - a)
∂a/∂z = σ(z)(1 - σ(z)) = a(1 - a)
∂z/∂w = x
∂z/∂b = 1
Chain rule:
∂L/∂w = (∂L/∂a)(∂a/∂z)(∂z/∂w) = -(y - a)·a(1 - a)·x
∂L/∂b = (∂L/∂a)(∂a/∂z)(∂z/∂b) = -(y - a)·a(1 - a)
Interactive Tools & Visualizations
Partial Derivatives Visualization Tool
Explore how partial derivatives relate to surfaces and contour plots.
Surface Plot
Function: f(x, y) = x² + y²
Point: (1.0, 1.0)
∂f/∂x = 2x = 2.0
∂f/∂y = 2y = 2.0
∇f = (2.0, 2.0)
Interpretation
The gradient vector (∂f/∂x, ∂f/∂y) points in the direction of steepest ascent.
At (1, 1) for f(x, y) = x² + y², the surface increases most rapidly in the direction (1, 1).
Directional Derivative Calculator
Practice Problems & Solutions
Solution:
∂f/∂x = ∂/∂x(x³ey) + ∂/∂x(y·ln(x)) = 3x²ey + y/x
∂f/∂y = ∂/∂y(x³ey) + ∂/∂y(y·ln(x)) = x³ey + ln(x)
Solution:
∂f/∂x = 2x + z·cos(xz)
∂f/∂y = z
∂f/∂z = y + x·cos(xz)
At (1, 2, 0):
∂f/∂x = 2·1 + 0·cos(0) = 2
∂f/∂y = 0
∂f/∂z = 2 + 1·cos(0) = 2 + 1 = 3
∇f(1, 2, 0) = (2, 0, 3)
Solution:
∂z/∂t = (∂z/∂x)(∂x/∂t) + (∂z/∂y)(∂y/∂t)
∂z/∂x = 2xy, ∂z/∂y = x²
∂x/∂t = 1, ∂y/∂t = -2t
∂z/∂t = (2xy)(1) + (x²)(-2t) = 2xy - 2tx²
Substitute x = s² + t, y = s - t²:
∂z/∂t = 2(s² + t)(s - t²) - 2t(s² + t)²
Solution:
∂f/∂x = 3x² - 3 = 0 → x² = 1 → x = ±1
∂f/∂y = 3y² - 3 = 0 → y² = 1 → y = ±1
Critical points: (1, 1), (1, -1), (-1, 1), (-1, -1)
Second derivatives: fxx = 6x, fyy = 6y, fxy = 0
D = fxxfyy - (fxy)² = 36xy
• (1, 1): D = 36 > 0, fxx = 6 > 0 → Local minimum
• (1, -1): D = -36 < 0 → Saddle point
• (-1, 1): D = -36 < 0 → Saddle point
• (-1, -1): D = 36 > 0, fxx = -6 < 0 → Local maximum
Find the maximum value of f(x, y) = xy subject to the constraint x² + y² = 1 using Lagrange multipliers.
Solution using Lagrange multipliers:
Constraint: g(x, y) = x² + y² = 1
Lagrangian: L(x, y, λ) = xy - λ(x² + y² - 1)
∂L/∂x = y - 2λx = 0 → y = 2λx
∂L/∂y = x - 2λy = 0 → x = 2λy
∂L/∂λ = -(x² + y² - 1) = 0 → x² + y² = 1
From first two equations: y/x = x/y → y² = x² → y = ±x
Substitute into constraint: x² + x² = 1 → 2x² = 1 → x = ±1/√2
Critical points: (1/√2, 1/√2), (1/√2, -1/√2), (-1/√2, 1/√2), (-1/√2, -1/√2)
f values: f(±1/√2, ±1/√2) = 1/2, f(±1/√2, ∓1/√2) = -1/2
Maximum value: 1/2 at points (1/√2, 1/√2) and (-1/√2, -1/√2)