Hao Xu

Data Scientist/Engineer | IT Professional | Supply Chain Analytics

Data Scientist/Engineer with 4+ years of experience in analytics, business intelligence, and IT solutions. Skilled in transforming complex data into actionable insights across industries including supply chain, logistics, and port operations. Expertise in statistical analysis, data pipelines, modeling, machine learning, and visualization.

1998 • 27 years old
Male
Master's Degree
4+ years experience
Gothenburg, Sweden
Chinese (Native) • English (Fluent)

Email

xuhao.cth@gmail.com

LinkedIn

https://www.linkedin.com/in/hao-xu-b277231b8/

Phone

(+46) 0724430299

(+86) 1812798743

WeChat

Magnetrician

Your Name

Technical Skills

Data Analytics & Data Science

Python
SQL
A/B Testing
Machine Learning
Semantic Modeling
R

Data Engineering & Development

AWS Cloud
Git Version Control
Data Pipeline
Database Design
Full-Stack Web Development
Distributed Computing (Hadoop & Spark)
Shell Scripting (PowerShell, Bash)

Visualization & BI

Power BI (M & DAX) | Tableau | Looker
Tableau
Google Looker Studio | BigQuery
Excel/Google Sheets

Professional Experience

Data Scientist

Stena Line

2022.08 - 2025.09

Primarily focus on end-to-end business data analysis — from data extraction and processing to visualization and reporting — to optimize operational processes and enhance decision-making. Some projects such as:

  • Improved the accuracy of Estimated Ready for Pick-up Time by integrating real-time ETA updates and applying a zone-based unloading model, while balancing prediction precision with terminal yard efficiency.
  • Designed a charging system for trailers that stayed at the port for a long time before picked up, to increase revenue and reduce the port congestion.
  • Developed Power BI dashboards to visualize, monitor and track operations at Ports and Terminals, which have been in continuous use for over two years.
  • Applied Causal Inference Analysis (specifically, Two-Way Fixed Effects DiD model) to evaluate and confirm the benefits of a pricing model.

Research Assistant (Data Analyst)

Chalmers University of Technology

2021.04 - 2022.06

Customer Behavior & Delivery Demand Analysis Project — Västra Götaland, Sweden. Analyzed how COVID-19 and socio-demographic factors shaped online shopping behavior, integrating multi-source datasets to uncover demand drivers and support forecasting.

  • Built a unified dataset by merging delivery records, population statistics, and weekly COVID-19 reports through data cleaning, feature engineering, and spatial–temporal alignment.
  • Conducted descriptive analysis and multiple regression modeling to quantify key demand drivers, revealing a strong correlation between weekly new cases and order volume (r > 0.70), and significant influence from the 25–44 age group and higher-education segments.
  • The result shows that the amount of orders has grown by over 40% after Covid-19, led by fashion and electronics categories.

Supply Chain Planner (Intern)

Midea Property Group

2019.06 - 2019.11

Supported a supply chain management internship project focused on demand forecasting, inventory optimization, logistics coordination, and procurement analysis to enhance overall planning efficiency and strategic decision-making.

  • Conducted demand forecasting and inventory analysis to support supply chain planning, improving forecast accuracy and reducing stock imbalances across multiple business units.
  • Performed market analysis on procurement categories, identifying and evaluating potential suppliers to optimize cost, quality, and lead time, supporting strategic sourcing decisions.

Education

Master of Science

Supply Chain Management

Chalmers University of Technology

2020.09 - 2022.08

Gothenburg, Sweden

Erasmus Exchange

Operations Management and Logistics

Eindhoven University of Technology

2021.09 - 2022.02

Eindhoven, Netherlands

Bachelor of Engineering

Industrial Engineering

South China University of Technology

2016.09 - 2020.07

Guangzhou, China

Featured Projects

Evaluate the impact of a Pricing System on the Revenue using Two-Way Fixed Effects DID model

Stena Line, Sweden 2025.01 - 2025.02
  • Project Background & Objective
    In 2020, Stena Line launched an EMSR-based automatic pricing system to help Regional Managers set ticket prices. The system was not mandatory, which led to very different adoption rates across routes—some routes barely used it (usage ≈ 0%), while others use it often (usage ≈ 100%).
    Management wanted to assess whether this system actually increased passenger revenue. We selected 15 routes for analysis. However, COVID-19 also broke out in 2020 — at the exact same time the pricing system was introduced — causing dramatic industry-wide revenue fluctuations. This made it impossible to simply compare "pre-2020 vs. post-2020" or predict post-pandemic revenue based on historical trends.
  • Methodology / Solution
    We applied a Two-Way Fixed Effects Difference-in-Differences (DID) model:ln(Revenueit + 1) = β ⋅ (UsageRateit × Postt) + γi + δt + εitWhere:
    • γi (Route Fixed Effects) — controls for time-invariant route-specific characteristics
    • δt (Time Fixed Effects) — controls for shocks affecting all routes at the same time
    • UsageRate × Post — the key interaction term capturing post-COVID treatment intensity
  • Reasons:
    • The adoption of the pricing system is continuous (0%–100%) rather than binary (0/1).
    • Without route fixed effects, a traditional specification (β * usage_rate) would incorrectly attribute baseline revenue differences between routes to the pricing system. γi removes these inherent cross-route differences that are unrelated to usage.
    • COVID-19 and the subsequent recovery significantly impacted passenger revenue over time. Time fixed effects δt eliminate industry-wide fluctuations and prevent falsely attributing common shocks to the pricing system.
    • Revenue levels differ greatly across routes, and using raw values can let outliers dominate the regression. Taking the natural logarithm stabilizes variance and allows the coefficient to be interpreted as a percentage effect rather than an absolute change.
  • Implementation Steps
    1. Data Setup
    • Routes: 15
    • Time Range: 2015–2025 (monthly)
    • Variables: Revenue, usage rate, time dummy (post)
    2. Variable Construction
    • post = 1 if year ≥ 2020, else 0
    • ur_post = usage_rate × post
    • Apply natural log transformation to revenue: ln(Revenue + 1)
    3. Model Estimation
    • Use statsmodels OLS with cluster-robust standard errors
  • Results
    The interaction term β is estimated at 0.08, meaning that a route with full adoption of the pricing system would see approximately an 8% increase in passenger revenue. However, the p-value is 0.12, indicating the result is positive in direction but only marginally significant, rather than statistically significant at conventional thresholds.
Python Econometrics Statistical Analysis Difference-in-Differences Fixed Effects Models Regression Analysis Data Analysis

Estimated Ready for Pick-up Time (Estimated RPT) Optimization

Stena Line, Sweden 2024.09 - 2024.12

By this project, the accuracy of the estimated RPT was increased from 10% to 90%.

  • 1. Problem Breakdown
    Estimated RPT = ETA + Unloading Duration. Improving accuracy requires optimizing both parts.
    • ETA: Highly uncertain, influenced by route, weather, and vessel speed, often with deviations of several hours.
    • Unloading Duration: Large vessels take 6–10 hours to unload, creating significant gaps between the first and last Real RPT.
  • 2. ETA Optimization
    • Use Stena Line's offshore real-time ETA system via API integration.
    • Refresh ETA at key stages: departure, hourly during voyage, and upon arrival.
  • 3. Unloading Duration Optimization
    3.1 Deck and Subsection
    • Current deck-level grouping is too coarse; unloading times vary within a deck.
    • Further divide decks into ~6 subsections (front/middle/rear × left/right).
    • Smaller variance within subsections improves Estimated RPT accuracy.
    • While zone-level averages provide a baseline, unloading times need also be adjusted for the vessel's time of arrival, which significantly impacts the unloading time.
    3.2 Subsection Trade-off
    • Too large → high variance, low accuracy.
    • Too small → sparse data, risk of overfitting, higher operational complexity.
    • Optimal size determined through analysis and simulation.
  • 4. Accuracy vs. Effectiveness
    Estimated RPT is an interval, Real RPT a time point. Accuracy means the Real RPT falls within the interval. Expanding the interval improves accuracy but reduces prediction effectiveness. Different ports adopt tailored accuracy targets and interval widths.
Python Data Analysis API Integration Predictive Modeling Statistical Analysis Simulation Operations Research

Full-Stack Serverless Photography Portfolio Website

Personal Project — www.haoexplore.com 2025.07 - 2025.08

As one of my passions, I have always wanted to create a personal website for photography, and now, it is completed. This website is more than just a static gallery — it's an interactive, cloud-powered platform showcasing my photos while demonstrating my end-to-end full-stack development skills.

  • 🎨 Frontend
    I designed and implemented a fully responsive, modern web application using HTML5 | CSS3 | JavaScript ES6+, featuring:
    1. Interactive Photo Gallery with smooth animations, lazy loading, and intuitive navigation
    2. 360° Spherical Panoramic View using Pannellum.js for immersive experiences
    3. Photo Rating System with 5-star ratings and cloud synchronization
    4. Leaflet.js Map Integration displaying geographic footprints with year-based filtering
    5. Smart Image Processing with automatic WebP conversion and thumbnail generation
    6. User Engagement Tools including email subscription and social media integration
  • ⚙️ Backend (Serverless on AWS)
    Built a highly scalable, cost-efficient serverless architecture on AWS:
    1. Amazon API Gateway — RESTful API endpoints with CORS configuration and request validation
    2. AWS Lambda (Python) — Business logic including: Gallery and photo management with CRUD operations; Advanced image processing using Pillow library; Direct S3 upload using presigned URLs (bypassing 10MB API Gateway limit); Photo rating system with device-based user identification, etc
    3. Amazon S3 — Optimized photo storage with WebP format and intelligent tiering
    4. Amazon DynamoDB — Three-table NoSQL architecture for galleries, photos, and ratings
    5. AWS Lambda Layers — Pre-built layers for Pillow and requests libraries
    6. Performance Optimization: WebP format conversion (95% quality originals, 40% thumbnails); Parallel photo processing and uploads; Automatic thumbnail generation with smart dimensions; Metadata synchronization between S3 and DynamoDB
Full-Stack Web Development End-to-End Development AWS Cloud (API Gateway, Lambda, S3, DynamoDB, SES) UI/UX Design Serverless Architecture Python

Power BI-Based Port Operations Monitoring System

Stena Line, Sweden 2023.03 - 2023.10

At the request of management, I designed and developed a Power BI Operations Monitoring System to provide multi-dimensional analysis of port efficiency (daily, weekly, monthly, as well as vessel-level and port-level granularities). The dashboard later served as the foundation for multiple spin-off projects that further improved port efficiency.

  • Data Integration & Extraction:
    • Wrote complex SQL queries to extract vessel arrival/departure, loading/unloading, gate operations, and trailer movement data.
    • Built automated ETL pipelines using Python scripts and Power Query for bulk data ingestion, cleaning, and transformation. Tasks included filling missing timestamps, validating voyage IN/OUT sequences, deduplication, and outlier handling.
    • Applied Python for advanced ETL logic, such as irregular timestamp conversion and batch correction of abnormal voyages.
  • Data Modeling:
    • Designed a star schema in Power BI, separating fact tables (shipping operations, port activities) from dimension tables (vessel, port, date, weekday).
    • Created key DAX measures, including unloading/loading efficiency, average port turnaround time (PT), and gate exit rate.
  • Visualization & Analytics:
    • Sailing Level Report: monitored unloading/loading ratios and time consumption for individual voyages.
    • Port Level Report: aggregated efficiency at the port level, with time-series comparisons and trend analysis.
    • Trend Charts: visualized efficiency changes over time, supporting anomaly detection and KPI monitoring.
    • Delivered interactive slicers (by vessel, port, date) to support flexible, ad-hoc analysis by operations teams.
  • Deployment & Optimization:
    • Deployed to Power BI Service with scheduled auto-refresh (twice daily).
    • Reduced data refresh time by ~40% through SQL pre-aggregation and Power Query optimization.
    • Built a usage monitoring report to track user adoption and continuously improve functionality.
  • Impact:
    • System has been running reliably for over 2 years, becoming the department's core operational tool.
    • Significantly improved operational visibility and decision-making efficiency compared to fragmented legacy reports.
    • Enabled effective monitoring of loading/unloading efficiency and gate processes, uncovering multiple issues that led to follow-up optimization projects.
Python Power BI SQL Data & Semantic Modeling Data Pipeline Data Visualization Data Analysis

Master's Thesis – Economic and Environmental Impacts of Dry Ports and Triangulation Transport on the Empty Container Repositioning Problem

Chalmers University, Sweden 2022.01 - 2022.08

This thesis focused on the Empty Container Repositioning (ECR) problem within Sweden's inland container transport network, assessing the economic and environmental impacts of introducing dry ports and adopting triangulation transport strategies. Using the case of Gothenburg Port and Eskilstuna Dry Port, the study developed an agent-based discrete-event simulation model in AnyLogic to compare multiple scenarios with and without dry ports and different repositioning strategies.

Key Contributions & Technical Details

  • Research Design & Data Collection
    • Conducted a systematic literature review covering container logistics, ECR strategies, and intermodal transport.
    • Collected case data from importers/exporters and calibrated it against historical transport records.
    • Defined key variables including container flows, transport costs, and CO₂ emission factors (train vs. truck).
  • Simulation Modeling
    • Built four scenarios in AnyLogic:
       ◦ With Dry Port: Introduces a dry port as an inland consolidation node to relieve seaport congestion and optimize ECR.
       ◦ Without Dry Port: Baseline case where all containers move directly between the seaport and customers without inland terminals.
       ◦ With Dry Port + Triangulation: Uses triangulation strategies under the dry port model, assigning import containers directly to export shipments to reduce empty repositioning.
       ◦ With Dry Port + Street-turn: Applies a street-turn strategy under the dry port model, where import containers are reused immediately by exporters, minimizing storage and repositioning needs.
    • Combined Agent-Based Modeling (ABM) for network actors (shipping lines, ports, customers) with Discrete-Event Simulation (DES) for facility-level operations.
    • Incorporated stochastic parameters (e.g., demand fluctuations, transport time variability) to improve realism.
  • Analysis & Results
    • Introducing a dry port reduced inland transport costs by ~62–66% and CO₂ emissions by ~71–79%.
    • Adding triangulation strategies provided further reductions (~25–27% in costs, ~7–10% in emissions) and significantly decreased the share of empty container movements.
    • The street-turn strategy also produced benefits but was less effective than triangulation.
  • Impact
    • Demonstrated the strategic value of dry ports in lowering inland transport costs and emissions.
    • Provided quantitative insights for sustainable intermodal transport planning in Sweden.
    • Advanced simulation methodology by combining ABM and DES with stochastic variables, improving both realism and applicability in logistics research.
Python Java Logistcs Network Modeling Agent-based Discrete-event Simulation (AnyLogic) Academic Research & Writing

Interests & Hobbies

Traveling

The essence of my love for traveling lies in a deep curiosity and passion for the world. Experiencing diverse natural landscapes, learning about different lifestyles and local cultures makes me feel that the world is truly wonderful, authentic, and full of life.

Photography

I believe everything eventually fades away — nothing can truly be kept except memories. Memories are the only things that are real and uniquely ours. That's why I love using my lens to capture the beauty of the world, making those memories clearer and longer-lasting.

www.haoexplore.com

Hiking & Nature

I am someone who deeply loves nature. In modern cities, where we live surrounded by concrete and steel, hiking and camping offer the best and most accessible way to reconnect with the natural world.

Reading & Learning

Reading and learning share the same essence as traveling: both are explorations of the world driven by curiosity. To me, the true sign of aging is not years, but losing the ability to learn new knowledge, embrace new ideas, and refresh one's perspectives.