Cloud Operations Junior Engineer - AI - HPC - GPU Fabric - NVIDIA

Exclusive opportunity

Urgent

Remote

Cloud Operations Junior Engineer - AI - HPC - GPU Fabric - NVIDIA

Cloud AI Company

Cloud Operations Junior Engineer - AI - HPC - GPU Fabric - NVIDIA

Skills

DevOpsSite reliability engineeringCustomer Support

17 hours ago

Exclusive opportunity

Share this opportunity

Share this opportunity to other talents of your network:
✓ Offer them a visibility boost with clients.
✓ Help your contacts find their next job.

Important information

Contract type:

Freelance

Daily rate:

400€

This job is at 0% commission 🎉

Starting date:

Urgent

Work mode:

Remote

Published on:

10 June 2026

What they need

EN: Global note: Our team is international, but during your shift you'll mainly collaborate with French-based colleagues.

FR: Note : Notre équipe est internationale, mais durant vos horaires vous travaillerez principalement avec des collègues basés en France.

[Summary]

We are seeking a Cloud Operations Junior Engineer to join our team. This role involves supporting cloud operations, assisting root cause analysis, and engaging with various engineering teams to drive continuous process improvement. The role is designed to be partnered with senior engineers to meet operational needs and provide professional development. This role emphasizes learning through doing, not observation-only participation. You will be trusted early with measurable deliverables—and held accountable alongside your senior counterpart—ensuring growth in both technical rigor and professional confidence.

[General Information]

The company is involved in cutting-edge development and greenfield projects, focusing on helping clients build processes and technology around their workflows. This position involves working directly with our immediate clients' engineering teams and requires strong communication skills. We are a security-oriented organization, and we value input from our engineers on technology selection and approaches.

[Tasks and Deliverables]

Tasks are performed in collaboration with a more senior engineer:

• Participate in client communications: listen, take notes, draft summaries, facilitate dialogue with supervision.

• Analyze telemetry streams (logs/metrics/traces) for anomalies using Grafana/Prometheus/ELK toolchains—starting with pattern recognition and progressing toward hypothesis generation.

• Perform localization of failures across compute/storage/network layers via structured diagnostics workflows (e.g., `journalctl`, `dmesg`, `strace`, network packet captures).

• Execute root cause analysis on incidents affecting production workloads—including GPU-accelerated inference services—and document findings in standardized incident reports.

• Collaborate with vendor support teams during escalation paths; prepare reproducible test cases and gather diagnostic artifacts per SLA requirements.

• Maintain human-readable documentation: runbooks, change logs, architecture decision records (ADRs), configuration baselines—all under version control using Git-based systems-of-record.

• Shadow domain architects during design reviews and operational readiness assessments—no passive attendance. Expect to contribute questions, proposed mitigations, or implementation trade-offs.

[Required Experience]

We expect foundational understanding with expertise to be developed on-the-job.

• Working familiarity with Linux system administration: process lifecycle management (`ps`, `top`, `systemd`), filesystem layout (`/proc`, `/sys`), user/group models.

• Exposure to containerization concepts: Docker/Podman CLI usage, basic image building (Dockerfile syntax), Kubernetes pod manifests (YAML structure).

• Basic virtualization awareness: KVM/QEMU command-line invocation for VM provisioning and debugging (`virsh list`, `qemu-system-x86_64 -enable-kvm ...`).

• Introductory knowledge of storage subsystems: understanding RAID levels, LVM volume groups, NFS mount semantics.

• Networking fundamentals: IP addressing/subnetting, TCP/UDP behavior, firewall rules (`iptables/nftables`), DNS resolution flow.

• Logging & monitoring literacy: interpreting structured logs (JSON vs syslog), reading metrics from Prometheus exporters.

• English fluency sufficient to read technical specs and write concise internal documentation.

[Nice-to-Have Skills]

• Experience with OpenStack components (Nova/Cinder/Neutron) — especially CLI interaction via `openstack` client.

• Security-conscious mindset: awareness of least privilege principles, RBAC models, audit logging expectations.

• GPU-related experience: CUDA toolkit familiarity, NVIDIA driver version management on Linux hosts.

• Scripting literacy in Rust/Go—basic syntax understanding; no requirement for production-grade code yet.

• Bilingual ability (e.g., Mandarin/Cantonese): useful when interfacing with regional vendors or offshore engineering teams.

[Engagement Highlights]

• Pair-Mentored Execution: Every task has dual ownership—one senior engineer accountable for correctness and one junior engineer accountable for execution fidelity.

• Career Opportunities: This role is designed as the first rung on a ladder that can lead to senior engineering leadership—or specialized roles in GPU infrastructure optimization, cloud-native security auditing, cross-border operations coordination, etc.

Two positions with two shifts:

- Shift 1 (00:00 - 09:00) EST - M-Friday

- Shift 2 - (00:00 - 09:00) EST - M,Tu,F,Sa,Su

Expected duration : 6-12 months

[Final Note]

We do not hire for credentials alone—we hire for curiosity, discipline, and resilience. If you are ready to operate at scale alongside engineers who built foundational technologies—and want to leave your mark on what comes next—this is where you begin.

Other offers great for you!

These companies are also looking for great profiles

Cloud AI Company

Associate Network Engineer - AI - HPC - GPU Fabric - NVIDIA

Freelance

Urgent

Remote

Skills

System & Network AdministrationNetwork EngineeringNetwork Administrator

17 hours ago

Exclusive opportunity

Jissen

Data Engineer – Mistral AI

500

Freelance

In 2 to 4 weeks

Paris, France

Hybrid

Skills

mistral AI

11 hours ago

Exclusive opportunity

Celad

PMO - Transformation Cloud - Clarity (H/F)

575€ HT

Freelance

In 2 to 4 weeks

Paris, France

Hybrid

Skills

Project Management (PMO)PMOClarity

9 hours ago

Exclusive opportunity

Professional network built for talents

Freelancers

Create a profile

Join a collective

Solutions and tools

Enterprises

Find profiles

Post a job

Success stories

About

Contact

Terms and conditions