init

2026-01-11 15:01:01 -03:00 · 2025-12-03 20:37:36 -03:00
parent 970ffa8856
commit 85a5cc75d1
20 changed files with 1265 additions and 0 deletions
--- a/29
+++ b/29
@@ -0,0 +1,29 @@
+FROM python:3.13-slim
+
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY src/ ./src/
+COPY gunicorn.conf.py ./
+
+# Create non-root user
+RUN useradd -m -u 1000 appuser
+
+RUN mkdir -p /app/tmp && \
+    chown -R appuser:appuser /app
+
+USER appuser
+EXPOSE 5000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/')" || exit 1
+
+CMD ["gunicorn", "-c", "gunicorn.conf.py", "src.app:app"]
--- a/README.md
+++ b/README.md
@@ -0,0 +1,123 @@
+# ML Converter
+
+Aplicación web en Python para procesar archivos Excel, convirtiendo automáticamente valores numéricos almacenados como texto al formato numérico apropiado.  
+Especialmente útil para resúmenes de Mercado Pago.
+
+## Características
+
+- **Detección y Conversión Inteligente de Números**: Identifica columnas con valores numéricos en formato texto (ej: "$1,234.56", "1.234,56", "(123.45)") y los convierte a números reales, manejando formatos internacionales, símbolos de moneda y negativos en paréntesis.
+- **Preserva Datos de Texto**: Columnas de texto (nombres, categorías, fechas) permanecen sin cambios.
+- **Interfaz Web Simple**: UI responsiva en español, con soporte drag & drop, mensajes claros de éxito/error y resumen de totales, ingresos y egresos tras el procesamiento.
+- **Manejo Seguro de Archivos**: Almacenamiento temporal en `/tmp/ml-converter/` y limpieza automática tras 30 minutos.
+- **Validación Real de Archivos**: Verifica que el archivo subido sea realmente Excel, no solo por extensión.
+- **Soporte de Formatos**: Acepta `.xlsx` y `.xls` (máx. 16MB).
+- **Pruebas Automáticas**: Incluye tests para endpoints y validaciones.
+- **Headers de Seguridad**: Cabeceras HTTP adicionales (HSTS, CSP, etc).
+
+## Tech Stack
+
+- **Backend**: Flask (Python), pandas, openpyxl
+- **Frontend**: HTML5, Tailwind CSS, Jinja2
+- **File Cleanup**: APScheduler
+- **Containerización**: Docker Compose
+
+## Configuración del Entorno
+
+1. **Copia y edita el archivo de entorno:**
+
+   ```bash
+   cp env.example .env
+   nvim .env
+   ```
+
+2. **Genera un SECRET_KEY seguro:**
+
+   ```bash
+   python3 -c "import secrets; print('SECRET_KEY=' + secrets.token_urlsafe(32))"
+   ```
+
+   Actualiza `SECRET_KEY` y `DOMAIN` en `.env`.
+
+3. **Variables importantes:**
+   - `FLASK_ENV`: 'production' para producción
+   - `MAX_CONTENT_LENGTH`: Tamaño máximo de archivo (por defecto: 16MB)
+
+## Quick Start
+
+### Opción 1: Entorno Virtual Python
+
+```bash
+./run.sh
+```
+
+O manualmente:
+
+```bash
+cd ml-converter
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+python src/app.py
+```
+
+Visita [http://localhost:5000](http://localhost:5000)
+
+### Opción 2: Docker
+
+```bash
+docker-compose up --build
+```
+
+Visita [http://localhost:5000](http://localhost:5000)
+
+## Flujo de Uso
+
+1. **Subi** tu archivo Excel (.xlsx/.xls) arrastrando o seleccionando.
+2. **Procesa**: Haz clic en "Procesar Archivo". Las columnas numéricas en texto se convierten automáticamente.
+3. **Descarga** el archivo procesado.
+4. **Limpieza**: Los archivos temporales se eliminan automáticamente tras 30 minutos.
+
+## API Endpoints
+
+- `GET /` – Página principal de carga
+- `POST /upload` – Subida y procesamiento de archivos
+- `GET /download/<filename>` – Descarga del archivo procesado
+
+## Despliegue en Producción
+
+### Gunicorn
+
+```bash
+gunicorn -w 4 -b 0.0.0.0:5000 src.app:app
+```
+
+### Docker
+
+```bash
+docker compose up -d
+```
+
+## Estructura de Archivos
+
+```
+ml-converter/
+├── src/
+│   ├── app.py              # Aplicación principal Flask
+│   ├── converters.py       # Helpers
+│   └── templates/
+│       ├── index.html      # Página de carga
+│       └── download.html   # Página de descarga
+├── tests/                  # Tests automáticos
+├── requirements.txt        # Dependencias
+├── Dockerfile
+├── compose.yml
+└── README.md
+```
+
+## Seguridad
+
+- **Nombres de archivo seguros** (`secure_filename`)
+- **Validación de tipo de archivo** (.xlsx/.xls y firma interna)
+- **Límites de tamaño** (16MB)
+- **Limpieza automática** (30 minutos)
+- **Nombres de archivo únicos** (UUIDs para prevenir conflictos)
--- a/compose.yml
+++ b/compose.yml
@@ -0,0 +1,22 @@
+services:
+  ml-converter:
+    build: .
+    environment:
+      - FLASK_ENV=${FLASK_ENV:-production}
+      - SECRET_KEY=${SECRET_KEY:-change-this-in-production}
+    restart: unless-stopped
+    networks:
+      - proxy
+    labels:
+      - "traefik.enable=true"
+      # HTTP Router
+      - "traefik.http.routers.ml-converter.rule=Host(`${DOMAIN:-localhost}`)"
+      - "traefik.http.routers.ml-converter.entrypoints=websecure"
+      - "traefik.http.routers.ml-converter.tls.certresolver=letsencrypt"
+      - "traefik.http.services.ml-converter.loadbalancer.server.port=5000"
+      # Optional: File size limit for uploads (16MB)
+      - "traefik.http.middlewares.ml-converter-limit.buffering.maxRequestBodyBytes=16777216"
+      - "traefik.http.routers.ml-converter.middlewares=ml-converter-limit"
+networks:
+  proxy:
+    external: true
--- a/env.example
+++ b/env.example
@@ -0,0 +1,12 @@
+# Copy this file to .env and update the values
+
+# Domain 
+DOMAIN=your-domain.com
+
+# Flask Configuration
+FLASK_ENV=production
+SECRET_KEY=change-this-very-long-random-secret-key-in-production
+
+# File Upload Limits
+# 16 MB = 16 * 1024 * 1024 = 16777216 bytes
+MAX_CONTENT_LENGTH=16777216
--- a/gunicorn.conf.py
+++ b/gunicorn.conf.py
@@ -0,0 +1,36 @@
+# Gunicorn configuration file for ML Converter
+# Usage: gunicorn -c gunicorn.conf.py src.app:app
+
+# Server socket
+bind = "0.0.0.0:5000"
+backlog = 2048
+
+# Worker processes
+workers = 4
+worker_class = "sync"
+worker_connections = 1000
+timeout = 30
+keepalive = 2
+
+# Restart workers after this many requests, with up to 50% jitter
+max_requests = 1000
+max_requests_jitter = 50
+
+# Logging
+accesslog = "-"
+errorlog = "-"
+loglevel = "info"
+
+# Process naming
+proc_name = "ml-converter"
+
+# Server mechanics
+daemon = False
+pidfile = "/tmp/ml-converter.pid"
+user = None
+group = None
+tmp_upload_dir = None
+
+# SSL (uncomment and configure for HTTPS)
+# keyfile = "/path/to/keyfile"
+# certfile = "/path/to/certfile"
--- a/main.py
+++ b/main.py
@@ -0,0 +1,6 @@
+def main():
+    print("Hello from ml-converter!")
+
+
+if __name__ == "__main__":
+    main()
--- a/pytest.ini
+++ b/pytest.ini
@@ -0,0 +1,3 @@
+[pytest]
+testpaths = tests
+python_files = test_*.py
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,9 @@
+Flask==3.1.2
+pandas==2.3.3
+openpyxl==3.1.5
+xlrd==2.0.2
+APScheduler==3.11.1
+Werkzeug==3.1.4
+gunicorn==23.0.0
+XlsxWriter==3.2.9
+pytest==9.0.1
--- a/run.sh
+++ b/run.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+echo "Starting ML Converter Development Server"
+
+if [ ! -d "venv" ]; then
+    echo "Creating virtual environment..."
+    python -m venv venv
+fi
+
+echo "Activating virtual environment..."
+source venv/bin/activate
+
+echo "Installing dependencies..."
+pip install -r requirements.txt
+
+mkdir -p tmp
+
+echo "Starting Flask development server..."
+echo "Access the application at: http://localhost:5000"
+python src/app.py
--- a/src/app.py
+++ b/src/app.py
@@ -0,0 +1,408 @@
+import atexit
+import logging
+import os
+import uuid
+from datetime import datetime, timedelta
+from pathlib import Path
+from threading import Lock
+
+import pandas as pd
+from apscheduler.schedulers.background import BackgroundScheduler
+from flask import Flask, flash, redirect, render_template, request, send_file, url_for
+from werkzeug.utils import secure_filename
+
+try:
+    from src.converters import (
+        convert_text_columns_to_numbers,
+        find_columns_with_keywords,
+        normalize_column_name,
+    )
+except ModuleNotFoundError as exc:
+    if exc.name == "src":
+        from converters import (
+            convert_text_columns_to_numbers,
+            find_columns_with_keywords,
+            normalize_column_name,
+        )
+    else:
+        raise
+
+app = Flask(__name__)
+app.secret_key = os.environ.get("SECRET_KEY", "dev-key-change-in-production")
+
+# Logging setup
+logging.basicConfig(
+    level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s"
+)
+logger = logging.getLogger(__name__)
+
+UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.dirname(__file__)), "tmp")
+ALLOWED_EXTENSIONS = {"xlsx", "xls"}
+MAX_CONTENT_LENGTH_DEFAULT = 16 * 1024 * 1024  # 16MB max file size
+MAX_CONTENT_LENGTH = int(
+    os.environ.get("MAX_CONTENT_LENGTH", str(MAX_CONTENT_LENGTH_DEFAULT))
+)
+
+app.config["UPLOAD_FOLDER"] = UPLOAD_FOLDER
+app.config["MAX_CONTENT_LENGTH"] = MAX_CONTENT_LENGTH
+
+# Ensure upload directory exists for all entrypoints (app import, gunicorn workers, tests)
+os.makedirs(UPLOAD_FOLDER, exist_ok=True)
+
+HSTS_POLICY = "max-age=31536000; includeSubDomains"
+CSP_POLICY = (
+    "default-src 'self'; "
+    "style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; "
+    "script-src 'self' 'unsafe-inline'; "
+    "img-src 'self' data:; "
+    "font-src 'self' data:; "
+    "connect-src 'self'; "
+    "form-action 'self'; "
+    "frame-ancestors 'none'; "
+    "base-uri 'self'"
+)
+PERMISSIONS_POLICY = "geolocation=(), microphone=(), camera=()"
+
+
+@app.after_request
+def apply_security_headers(response):
+    """Apply modern security headers to every response."""
+    response.headers["Strict-Transport-Security"] = HSTS_POLICY
+    response.headers["Content-Security-Policy"] = CSP_POLICY
+    response.headers["Permissions-Policy"] = PERMISSIONS_POLICY
+    response.headers["X-Content-Type-Options"] = "nosniff"
+    response.headers["X-Frame-Options"] = "DENY"
+    response.headers["X-XSS-Protection"] = "1; mode=block"
+    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
+    return response
+
+
+def allowed_file(filename):
+    return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS
+
+
+def is_valid_excel_file(file_path):
+    """
+    Validate that the file is actually an Excel file by checking file signature (magic bytes)
+    and attempting to read it with pandas.
+    """
+    try:
+        file_size = os.path.getsize(file_path)
+        if file_size == 0:
+            logger.warning(f"Empty file rejected: {file_path}")
+            return False
+
+        if file_size > MAX_CONTENT_LENGTH:
+            logger.warning(f"File too large rejected: {file_path} ({file_size} bytes)")
+            return False
+
+        # Check file signature (magic bytes)
+        with open(file_path, "rb") as f:
+            header = f.read(8)
+
+        # Excel file signatures
+        # .xlsx files start with PK (ZIP format)
+        # .xls files start with specific OLE signatures
+        xlsx_signature = (
+            header.startswith(b"PK\x03\x04")
+            or header.startswith(b"PK\x05\x06")
+            or header.startswith(b"PK\x07\x08")
+        )
+        xls_signature = header.startswith(
+            b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1"
+        )  # OLE2 signature
+
+        if not (xlsx_signature or xls_signature):
+            logger.warning(f"Invalid file signature for {file_path}: {header.hex()}")
+            return False
+
+        # Try to read with pandas as additional validation
+        # Use nrows=1 to minimize resource usage and prevent potential DoS
+        df = pd.read_excel(file_path, nrows=1)
+
+        if df is None:
+            return False
+
+        return True
+
+    except Exception as e:
+        logger.warning(f"File validation failed for {file_path}: {str(e)}")
+        return False
+
+
+def cleanup_old_files():
+    """Remove files older than 1 hour from the temp directory."""
+    try:
+        current_time = datetime.now()
+        for filename in os.listdir(UPLOAD_FOLDER):
+            file_path = os.path.join(UPLOAD_FOLDER, filename)
+            if os.path.isfile(file_path):
+                file_time = datetime.fromtimestamp(os.path.getctime(file_path))
+                if current_time - file_time > timedelta(minutes=30):
+                    os.remove(file_path)
+                    logger.info("Deleted old file: %s", filename)
+    except Exception as e:
+        logger.exception("Error during cleanup: %s", e)
+
+
+_scheduler_lock = Lock()
+_scheduler = None
+_scheduler_shutdown_registered = False
+
+
+def _shutdown_scheduler():
+    global _scheduler
+    with _scheduler_lock:
+        if _scheduler and _scheduler.running:
+            logger.info("Shutting down cleanup scheduler")
+            _scheduler.shutdown()
+
+
+def start_cleanup_scheduler():
+    """Ensure the cleanup scheduler starts only once per process."""
+    global _scheduler, _scheduler_shutdown_registered
+    with _scheduler_lock:
+        if _scheduler is None:
+            _scheduler = BackgroundScheduler()
+            _scheduler.add_job(
+                func=cleanup_old_files,
+                trigger="interval",
+                minutes=10,
+                id="cleanup-old-files",
+                replace_existing=True,
+            )
+        if not _scheduler.running:
+            logger.info("Starting cleanup scheduler")
+            _scheduler.start()
+            if not _scheduler_shutdown_registered:
+                atexit.register(_shutdown_scheduler)
+                _scheduler_shutdown_registered = True
+    return _scheduler
+
+
+start_cleanup_scheduler()
+
+
+@app.route("/")
+def index():
+    return render_template("index.html")
+
+
+@app.route("/upload", methods=["GET", "POST"])
+def upload_file():
+    if request.method == "GET":
+        return redirect(url_for("index"))
+
+    if "file" not in request.files:
+        flash("No se seleccionó ningún archivo")
+        logger.info("Upload attempted with no file in request")
+        return redirect(request.url)
+
+    file = request.files["file"]
+    if file.filename == "":
+        flash("No se seleccionó ningún archivo")
+        logger.info("Upload attempted with empty filename")
+        return redirect(request.url)
+
+    if file and allowed_file(file.filename):
+        try:
+            original_filename = secure_filename(file.filename)
+            unique_id = str(uuid.uuid4())
+            upload_path = os.path.join(
+                app.config["UPLOAD_FOLDER"], f"{unique_id}_original_{original_filename}"
+            )
+            file.save(upload_path)
+            logger.info("File uploaded: %s -> %s", original_filename, upload_path)
+
+            if not is_valid_excel_file(upload_path):
+                os.remove(upload_path)
+                flash(
+                    "El archivo no es un archivo Excel válido. Por favor sube un archivo Excel real."
+                )
+                logger.warning("Invalid Excel file rejected: %s", original_filename)
+                return redirect(url_for("index"))
+
+            logger.info("Starting processing of %s", upload_path)
+            df = pd.read_excel(upload_path)
+            processed_df, converted_columns = convert_text_columns_to_numbers(df)
+            date_keywords = ["fecha", "liberacion", "liberación"]
+            date_cols = find_columns_with_keywords(processed_df.columns, date_keywords)
+
+            for col in date_cols:
+                processed_df[col] = pd.to_datetime(processed_df[col], errors="coerce")
+                # Remove timezone info if present (Excel does not support tz-aware datetimes)
+                if pd.api.types.is_datetime64_any_dtype(processed_df[col]):
+                    try:
+                        processed_df[col] = processed_df[col].dt.tz_localize(None)
+                    except (AttributeError, TypeError):
+                        pass
+
+            sum_h = sum_h_pos = sum_h_neg = None
+            if processed_df.shape[1] > 7:
+                col_h = processed_df.iloc[:, 7]
+                col_h_numeric = pd.to_numeric(col_h, errors="coerce")
+                sum_h = col_h_numeric.sum(skipna=True)
+                sum_h_pos = col_h_numeric[col_h_numeric > 0].sum(skipna=True)
+                sum_h_neg = col_h_numeric[col_h_numeric < 0].sum(skipna=True)
+
+            processed_filename = f"{unique_id}_processed_{original_filename}"
+            processed_path = os.path.join(
+                app.config["UPLOAD_FOLDER"], processed_filename
+            )
+
+            # Use ExcelWriter to set date, ID, and money column formats
+            with pd.ExcelWriter(
+                processed_path, engine="xlsxwriter", date_format="yyyy-mm-dd"
+            ) as writer:
+                processed_df.to_excel(writer, index=False)
+                workbook = writer.book
+                worksheet = writer.sheets["Sheet1"]
+                date_format = workbook.add_format({"num_format": "yyyy-mm-dd"})
+                id_format = workbook.add_format({"num_format": "0", "align": "left"})
+                money_format = workbook.add_format({"num_format": "$ #,##0.00"})
+
+                header_format = workbook.add_format(
+                    {
+                        "text_wrap": True,
+                        "bold": True,
+                        "align": "center",
+                        "valign": "vcenter",
+                    }
+                )
+                worksheet.set_row(0, 40)
+                # Set all columns to width 20
+                for col_idx in range(len(processed_df.columns)):
+                    worksheet.set_column(col_idx, col_idx, 20)
+                # Overwrite header row with header_format to ensure wrap
+                for col_idx, value in enumerate(processed_df.columns):
+                    worksheet.write(0, col_idx, value, header_format)
+
+                # Define normalized money columns
+                money_col_targets = [
+                    "valor de la compra",
+                    "comision mas iva",
+                    "comisión más iva",
+                    "monto neto de operacion",
+                    "monto neto de operación",
+                    "impuestos cobrados por retenciones iibb",
+                ]
+
+                # Set date columns
+                for col in date_cols:
+                    col_idx = processed_df.columns.get_loc(col)
+                    worksheet.set_column(col_idx, col_idx, 20, date_format)
+                # Set ID columns to integer format, wide enough to avoid scientific notation
+                for col in processed_df.columns:
+                    norm_col = normalize_column_name(col)
+                    if "id" in norm_col:
+                        col_idx = processed_df.columns.get_loc(col)
+                        worksheet.set_column(col_idx, col_idx, 15, id_format)
+                # Set money columns to currency format
+                for col in processed_df.columns:
+                    norm_col = normalize_column_name(col)
+                    if norm_col in money_col_targets:
+                        col_idx = processed_df.columns.get_loc(col)
+                        worksheet.set_column(col_idx, col_idx, 15, money_format)
+            logger.info("Processed file saved: %s", processed_path)
+
+            os.remove(upload_path)
+            logger.info("Removed original uploaded file: %s", upload_path)
+
+            return render_template(
+                "download.html",
+                filename=processed_filename,
+                original_name=original_filename,
+                sum_h=sum_h,
+                sum_h_pos=sum_h_pos,
+                sum_h_neg=sum_h_neg,
+            )
+
+        except Exception as e:
+            # Clean up uploaded file in case of any error
+            try:
+                if "upload_path" in locals() and os.path.exists(upload_path):
+                    os.remove(upload_path)
+                    logger.info("Cleaned up file after error: %s", upload_path)
+            except Exception as cleanup_error:
+                logger.exception("Error during cleanup: %s", cleanup_error)
+
+            # Generic error message to avoid information disclosure
+            flash(
+                "Error procesando el archivo. Por favor verifica que sea un archivo Excel válido."
+            )
+            logger.exception(
+                "File processing error for %s: %s",
+                original_filename if "original_filename" in locals() else "unknown",
+                str(e),
+            )
+            return redirect(url_for("index"))
+    else:
+        flash(
+            "Tipo de archivo inválido. Por favor sube un archivo Excel (.xlsx o .xls)"
+        )
+        logger.info(
+            "Rejected upload - invalid file type: %s", file.filename if file else None
+        )
+        return redirect(url_for("index"))
+
+
+@app.route("/download/<filename>")
+def download_file(filename):
+    try:
+        logger.info("Download requested for: %s", filename)
+        normalized_filename = secure_filename(filename)
+
+        if not normalized_filename:
+            logger.warning(
+                "Rejected download with empty normalized filename: %s", filename
+            )
+            flash("Archivo no encontrado o ha expirado")
+            return redirect(url_for("index"))
+
+        if normalized_filename != filename:
+            logger.info(
+                "Normalized download filename from %s to %s",
+                filename,
+                normalized_filename,
+            )
+
+        upload_root = Path(app.config["UPLOAD_FOLDER"]).resolve()
+        requested_path = upload_root / normalized_filename
+
+        try:
+            resolved_path = requested_path.resolve(strict=True)
+        except FileNotFoundError:
+            logger.info("File not found or expired: %s", requested_path)
+            flash("Archivo no encontrado o ha expirado")
+            return redirect(url_for("index"))
+
+        try:
+            resolved_path.relative_to(upload_root)
+        except ValueError:
+            logger.warning(
+                "Rejected download outside upload directory: %s -> %s",
+                filename,
+                resolved_path,
+            )
+            flash("Archivo no encontrado o ha expirado")
+            return redirect(url_for("index"))
+
+        if resolved_path.is_file():
+            logger.info("Serving file: %s", resolved_path)
+            download_name = f"convertido_{normalized_filename.split('_', 2)[-1]}"
+            return send_file(
+                resolved_path, as_attachment=True, download_name=download_name
+            )
+
+        logger.info("Path is not a regular file or has expired: %s", resolved_path)
+        flash("Archivo no encontrado o ha expirado")
+        return redirect(url_for("index"))
+    except Exception as e:
+        logger.exception("Error serving download: %s", e)
+        flash(f"Error descargando el archivo: {str(e)}")
+        return redirect(url_for("index"))
+
+
+if __name__ == "__main__":
+    os.makedirs(UPLOAD_FOLDER, exist_ok=True)
+    app.run(debug=True, host="0.0.0.0", port=5000)
--- a/src/converters.py
+++ b/src/converters.py
@@ -0,0 +1,176 @@
+"""Utilities for normalizing and converting tabular data columns."""
+
+from __future__ import annotations
+import math
+import unicodedata
+from typing import Iterable, List, Optional, Tuple
+import pandas as pd
+
+
+_ID_KEYWORDS: Tuple[str, ...] = ("id",)
+_CURRENCY_SYMBOLS: Tuple[str, ...] = ("$", "€", "£", "¥", "₽", "₱", "₹")
+
+
+def normalize_column_name(name: object) -> str:
+    """Return a normalized, accent-free column identifier."""
+    if not isinstance(name, str):
+        return ""
+    normalized = unicodedata.normalize("NFKD", name.strip().lower())
+    return "".join(char for char in normalized if not unicodedata.combining(char))
+
+
+def _strip_currency_symbols(value: str) -> str:
+    cleaned = value
+    for symbol in _CURRENCY_SYMBOLS:
+        cleaned = cleaned.replace(symbol, "")
+    return cleaned
+
+
+def _coerce_to_string(value: object) -> Optional[str]:
+    if value is None:
+        return None
+    if isinstance(value, (int, float)) and not isinstance(value, bool):
+        if math.isnan(value) if isinstance(value, float) else False:
+            return None
+        return str(value)
+    text = str(value).strip()
+    return text or None
+
+
+def _parse_numeric_text(text_value: object) -> Tuple[Optional[str], bool]:
+    """Clean a numeric-like string and return (normalized_value, is_negative)."""
+    text = _coerce_to_string(text_value)
+    if text is None:
+        return None, False
+
+    cleaned = unicodedata.normalize("NFKC", text)
+    cleaned = cleaned.replace("\xa0", "")
+    cleaned = _strip_currency_symbols(cleaned)
+
+    is_negative = False
+    if cleaned.startswith("(") and cleaned.endswith(")"):
+        cleaned = cleaned[1:-1]
+        is_negative = True
+
+    if cleaned.endswith("-"):
+        cleaned = cleaned[:-1]
+        is_negative = True
+
+    if cleaned.startswith("-"):
+        cleaned = cleaned[1:]
+        is_negative = True
+
+    if cleaned.startswith("+"):
+        cleaned = cleaned[1:]
+
+    cleaned = cleaned.replace(" ", "")
+
+    if "." in cleaned and "," in cleaned:
+        last_dot = cleaned.rfind(".")
+        last_comma = cleaned.rfind(",")
+        if last_dot > last_comma:
+            cleaned = cleaned.replace(",", "")
+        else:
+            cleaned = cleaned.replace(".", "")
+            cleaned = cleaned.replace(",", ".")
+    elif cleaned.count(",") == 1 and len(cleaned.split(",")[1]) <= 2:
+        cleaned = cleaned.replace(",", ".")
+    else:
+        cleaned = cleaned.replace(",", "")
+
+    if cleaned.count(".") > 1:
+        parts = cleaned.split(".")
+        cleaned = "".join(parts[:-1]) + "." + parts[-1]
+
+    cleaned = cleaned.replace("'", "")
+
+    try:
+        float(cleaned)
+    except (TypeError, ValueError):
+        return None, False
+
+    return cleaned, is_negative
+
+
+def is_numeric_like(text_value: object) -> bool:
+    """Return True if a value can be safely interpreted as a number."""
+    cleaned, _ = _parse_numeric_text(text_value)
+    return cleaned is not None
+
+
+def convert_numeric_text(text_value: object) -> Optional[float]:
+    """Convert numeric-like text into a float. Returns pandas NA on failure."""
+    if text_value is None:
+        return pd.NA
+
+    if isinstance(text_value, (int, float)) and not isinstance(text_value, bool):
+        if isinstance(text_value, float) and math.isnan(text_value):
+            return pd.NA
+        return float(text_value)
+
+    cleaned, is_negative = _parse_numeric_text(text_value)
+    if cleaned is None:
+        return pd.NA
+
+    try:
+        result = float(cleaned)
+    except (TypeError, ValueError):
+        return pd.NA
+
+    return -result if is_negative else result
+
+
+def _should_force_numeric(norm_column_name: str) -> bool:
+    return any(keyword in norm_column_name for keyword in _ID_KEYWORDS)
+
+
+def convert_text_columns_to_numbers(df: pd.DataFrame) -> Tuple[pd.DataFrame, List[str]]:
+    """Convert numeric-like object columns in ``df`` into numeric dtypes."""
+    converted_columns: List[str] = []
+
+    for column in df.columns:
+        series = df[column]
+        if pd.api.types.is_numeric_dtype(series):
+            continue
+
+        normalized_name = normalize_column_name(column)
+        force_numeric = _should_force_numeric(normalized_name)
+
+        if not (
+            force_numeric
+            or series.dtype == object
+            or pd.api.types.is_string_dtype(series)
+        ):
+            continue
+
+        non_null = series.dropna()
+        if non_null.empty and not force_numeric:
+            continue
+
+        cleaned_non_null = non_null.map(_coerce_to_string).dropna()
+        if cleaned_non_null.empty and not force_numeric:
+            continue
+
+        if force_numeric or cleaned_non_null.map(is_numeric_like).all():
+            numeric_series = series.map(convert_numeric_text)
+            df[column] = pd.to_numeric(numeric_series, errors="coerce")
+            converted_columns.append(column)
+
+    return df, converted_columns
+
+
+def find_columns_with_keywords(
+    columns: Iterable[str], keywords: Iterable[str]
+) -> List[str]:
+    """Return columns whose normalized name contains any of the provided keywords."""
+    normalized_keywords = tuple(normalize_column_name(keyword) for keyword in keywords)
+    matches: List[str] = []
+
+    for column in columns:
+        normalized_column = normalize_column_name(column)
+        if any(
+            keyword and keyword in normalized_column for keyword in normalized_keywords
+        ):
+            matches.append(column)
+
+    return matches
--- a/src/static/globals.css
+++ b/src/static/globals.css
@@ -0,0 +1,32 @@
+:root {
+    color-scheme: dark;
+}
+
+html,
+body {
+    font-family: 'Inter', 'Nunito Sans', 'Segoe UI', sans-serif;
+    -webkit-font-smoothing: antialiased;
+    -moz-osx-font-smoothing: grayscale;
+}
+
+body {
+    background-color: #111827;
+}
+
+a {
+    color: #60a5fa;
+    transition: color 120ms ease-in-out;
+}
+
+a:hover {
+    color: #3b82f6;
+}
+
+button {
+    cursor: pointer;
+}
+
+.focus-ring {
+    outline: 2px solid #2563eb;
+    outline-offset: 2px;
+}
--- a/src/templates/download.html
+++ b/src/templates/download.html
@@ -0,0 +1,73 @@
+<!DOCTYPE html>
+<html lang="es">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Descargar Archivo Procesado - ML Converter</title>
+    <link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css" rel="stylesheet">
+</head>
+<body class="bg-gradient-to-br from-gray-900 to-gray-800 min-h-screen py-12">
+    <div class="max-w-2xl mx-auto">
+        <div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-8">
+            <div class="text-center mb-8">
+                <div class="mx-auto flex items-center justify-center h-14 w-14 rounded-full bg-green-200 mb-4">
+                    <svg class="h-8 w-8 text-green-600" fill="none" viewBox="0 0 24 24" stroke="currentColor">
+                        <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M5 13l4 4L19 7" />
+                    </svg>
+                </div>
+                <h1 class="text-3xl font-extrabold text-white mb-2">¡Procesamiento Completado!</h1>
+            </div>
+
+            {% with messages = get_flashed_messages() %}
+                {% if messages %}
+                    <div class="mb-4">
+                        {% for message in messages %}
+                            <div class="bg-green-100 border border-green-400 text-green-700 px-4 py-3 rounded mb-2">
+                                {{ message }}
+                            </div>
+                        {% endfor %}
+                    </div>
+                {% endif %}
+            {% endwith %}
+
+            <div class="bg-gray-800 border border-gray-700 rounded-lg shadow p-4 mb-6">
+                <h2 class="text-lg font-semibold text-gray-200 mb-3 tracking-wide">Archivo procesado: {{ original_name }}</h2>
+                {% if sum_h is not none %}
+                <div class="grid grid-cols-1 sm:grid-cols-3 gap-2 text-sm text-gray-200 text-center">
+                    <div>
+                        <span class="block font-medium text-gray-400">Total</span>
+                        <span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h) }}</span>
+                    </div>
+                    <div>
+                        <span class="block font-medium text-green-400">Ingresos</span>
+                        <span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h_pos) }}</span>
+                    </div>
+                    <div>
+                        <span class="block font-medium text-red-400">Egresos</span>
+                        <span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h_neg) }}</span>
+                    </div>
+                </div>
+                {% endif %}
+            </div>
+
+            <div class="space-y-4">
+                <a href="/download/{{ filename }}" 
+                   class="w-full flex justify-center items-center py-3 px-4 border border-transparent rounded-md shadow text-base font-semibold text-white bg-green-600 hover:bg-green-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-green-400">
+                    <svg class="w-5 h-5 mr-2" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                        <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 10v6m0 0l-3-3m3 3l3-3m2 8H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
+                    </svg>
+                    Descargar Archivo Procesado
+                </a>
+                <a href="/" 
+                   class="w-full flex justify-center items-center py-3 px-4 border border-gray-600 rounded-md shadow text-base font-semibold text-gray-200 bg-gray-800 hover:bg-gray-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-400">
+                    Procesar Otro Archivo
+                </a>
+            </div>
+
+            <div class="mt-8 text-xs text-gray-400 text-center">
+                <p>⚠️ Los archivos se eliminan automáticamente después de 30 minutos por seguridad.</p>
+            </div>
+        </div>
+    </div>
+</body>
+</html>
--- a/src/templates/index.html
+++ b/src/templates/index.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html>
+<html lang="es">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>ML Converter - Resumen de Mercado Libre/Pago para Excel</title>
+    <link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css" rel="stylesheet">
+    <link rel="stylesheet" href="{{ url_for('static', filename='globals.css') }}">
+</head>
+
+<body class="bg-gradient-to-br from-gray-900 to-gray-800 min-h-screen py-12">
+    <div class="max-w-2xl mx-auto">
+        <!-- Card: Upload -->
+        <div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-8 mb-8">
+            <div class="text-center mb-8">
+                <h1 class="text-4xl font-extrabold text-white mb-2">ML Converter</h1>
+                <p class="text-gray-300 text-lg">Convertí tu resumen de Mercado Pago para que sea legible en Excel</p>
+            </div>
+
+            {% with messages = get_flashed_messages() %}
+                {% if messages %}
+                    <div class="mb-4">
+                        {% for message in messages %}
+                            <div class="bg-red-100 border border-red-400 text-red-700 px-4 py-3 rounded mb-2">
+                                {{ message }}
+                            </div>
+                        {% endfor %}
+                    </div>
+                {% endif %}
+            {% endwith %}
+
+            <form action="/upload" method="post" enctype="multipart/form-data" class="space-y-6">
+                <div class="flex justify-center px-6 pt-8 pb-8 border-2 border-dashed border-gray-500 rounded-xl bg-gray-800">
+                    <div class="space-y-4 text-center">
+                        <div class="flex justify-center">
+                            <svg class="h-14 w-14 text-blue-400" fill="none" stroke="currentColor" viewBox="0 0 48 48">
+                                <path d="M28 8H12a4 4 0 00-4 4v20m32-12v8m0 0v8a4 4 0 01-4 4H12a4 4 0 01-4-4v-4m32-4l-3.172-3.172a4 4 0 00-5.656 0L28 28M8 32l9.172-9.172a4 4 0 015.656 0L28 28m0 0l4 4m4-24h8m-4-4v8m-12 4h.02" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" />
+                            </svg>
+                        </div>
+                        <div class="flex flex-col items-center text-gray-300">
+                            <label for="file" class="relative cursor-pointer bg-blue-600 hover:bg-blue-700 text-white rounded-md font-semibold py-2 px-6 text-base shadow focus:outline-none focus:ring-2 focus:ring-blue-400 focus:ring-offset-2">
+                                <span>Elegir Archivo</span>
+                                <input id="file" name="file" type="file" class="sr-only" accept=".xlsx,.xls" required>
+                            </label>
+                            <span class="mt-2 text-sm">o arrastrá y soltá tu archivo Excel aquí</span>
+                        </div>
+                        <p class="text-xs text-gray-400">Archivos Excel hasta 16MB</p>
+                    </div>
+                </div>
+                <div class="mt-4">
+                    <button type="submit" class="w-full flex justify-center py-3 px-4 border border-transparent rounded-md shadow text-base font-semibold text-white bg-blue-600 hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-400">
+                        Subir Archivo Excel
+                    </button>
+                </div>
+            </form>
+        </div>
+
+        <!-- Card: Cómo funciona -->
+        <div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-6">
+            <h2 class="text-lg font-bold text-white mb-4">Cómo funciona:</h2>
+            <ul class="space-y-2 text-gray-200 text-sm pl-4 list-disc">
+                <li>Subí tu archivo de resumen de Mercado Libre/Pago (.xlsx o .xls).</li>
+                <li>Las columnas como "VALOR DE LA COMPRA" y "MONTO NETO" se convierten automáticamente.</li>
+                <li>Los datos quedan listos para análisis en Excel con formato numérico correcto.</li>
+                <li>Las columnas de texto (tipos de pago, estados) permanecen sin cambios.</li>
+                <li>Los archivos se eliminan automáticamente después de 30 minutos por seguridad.</li>
+            </ul>
+        </div>
+    </div>
+
+    <script>
+        (function() {
+            const fileInput = document.getElementById('file');
+            if (!fileInput) return;
+
+            const label = document.querySelector('[for="file"]');
+            const dropZone = label ? label.closest('.border-dashed') : document.querySelector('.border-dashed');
+
+            function preventDefaults(e) {
+                e.preventDefault();
+                e.stopPropagation();
+            }
+
+            function highlight() {
+                if (dropZone) dropZone.classList.add('border-indigo-500', 'border-solid');
+            }
+
+            function unhighlight() {
+                if (dropZone) dropZone.classList.remove('border-indigo-500', 'border-solid');
+            }
+
+            function handleDrop(e) {
+                const dt = e.dataTransfer;
+                const files = dt && dt.files;
+                if (files && files.length > 0) {
+                    fileInput.files = files;
+                    setTimeout(() => fileInput.form && fileInput.form.submit(), 10);
+                }
+            }
+
+            if (dropZone) {
+                ['dragenter', 'dragover', 'dragleave', 'drop'].forEach(eventName => {
+                    dropZone.addEventListener(eventName, preventDefaults, false);
+                });
+
+                ['dragenter', 'dragover'].forEach(eventName => {
+                    dropZone.addEventListener(eventName, highlight, false);
+                });
+
+                ['dragleave', 'drop'].forEach(eventName => {
+                    dropZone.addEventListener(eventName, unhighlight, false);
+                });
+
+                dropZone.addEventListener('drop', handleDrop, false);
+            }
+
+            const uploadButton = document.querySelector('button');
+            if (uploadButton) {
+                uploadButton.addEventListener('click', function(e) {
+                    e.preventDefault();
+                    fileInput.click();
+                }, false);
+            }
+
+            fileInput.addEventListener('change', function() {
+                if (fileInput.files && fileInput.files.length > 0) {
+                    fileInput.form && fileInput.form.submit();
+                }
+            });
+        })();
+    </script>
+</body>
+</html>
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1 @@
+# Init for tests package
--- a/tests/test_app.py
+++ b/tests/test_app.py
@@ -0,0 +1,13 @@
+import pytest
+from src.app import app
+
+@pytest.fixture
+def client():
+    app.config['TESTING'] = True
+    with app.test_client() as client:
+        yield client
+
+def test_index(client):
+    response = client.get('/')
+    assert response.status_code == 200
+    assert b"ML Converter" in response.data or b"Subir Archivo" in response.data
--- a/tests/test_converters.py
+++ b/tests/test_converters.py
@@ -0,0 +1,50 @@
+import pandas as pd
+import pytest
+
+from src import converters
+
+
+def test_converts_currency_strings_to_numbers():
+    df = pd.DataFrame(
+        {
+            'Monto Neto de Operacion': ['\u20ac1.234,56', '$ 1,234.56', '(1.234,56)'],
+            'descripcion': ['uno', 'dos', 'tres'],
+        }
+    )
+
+    processed, converted = converters.convert_text_columns_to_numbers(df)
+
+    assert 'Monto Neto de Operacion' in converted
+    assert processed['Monto Neto de Operacion'].iloc[0] == pytest.approx(1234.56)
+    assert processed['Monto Neto de Operacion'].iloc[1] == pytest.approx(1234.56)
+    assert processed['Monto Neto de Operacion'].iloc[2] == pytest.approx(-1234.56)
+
+
+def test_force_converts_id_columns_even_with_padding():
+    df = pd.DataFrame(
+        {
+            'Operacion ID': ['000123', ' 456 ', None],
+        }
+    )
+
+    processed, converted = converters.convert_text_columns_to_numbers(df)
+
+    assert 'Operacion ID' in converted
+    assert processed['Operacion ID'].dropna().tolist() == [123.0, 456.0]
+
+
+def test_mixed_content_column_is_not_converted():
+    df = pd.DataFrame(
+        {
+            'monto': ['$123', 'no aplicar', '$456'],
+        }
+    )
+
+    processed, converted = converters.convert_text_columns_to_numbers(df)
+
+    assert 'monto' not in converted
+    assert processed['monto'].dtype == object
+
+
+def test_convert_numeric_text_returns_na_for_invalid_strings():
+    assert pd.isna(converters.convert_numeric_text('no es numero'))
--- a/tests/test_errors.py
+++ b/tests/test_errors.py
@@ -0,0 +1,29 @@
+import io
+import pandas as pd
+import pytest
+from src.app import app
+
+@pytest.fixture
+def client():
+    app.config['TESTING'] = True
+    with app.test_client() as client:
+        yield client
+
+def test_upload_no_file(client):
+    response = client.post('/upload', data={}, follow_redirects=True)
+    assert response.status_code == 200
+    assert b"Archivo" in response.data or b"Subir Archivo" in response.data
+
+def test_upload_invalid_extension(client):
+    response = client.post('/upload', data={
+        'file': (io.BytesIO(b"fake data"), 'test.txt')
+    }, content_type='multipart/form-data', follow_redirects=True)
+    assert response.status_code == 200
+    assert b"Archivo" in response.data or b"Subir Archivo" in response.data
+
+def test_upload_empty_file(client):
+    response = client.post('/upload', data={
+        'file': (io.BytesIO(), '')
+    }, content_type='multipart/form-data', follow_redirects=True)
+    assert response.status_code == 200
+    assert b"Archivo" in response.data or b"Subir Archivo" in response.data
--- a/tests/test_security.py
+++ b/tests/test_security.py
@@ -0,0 +1,31 @@
+from src import app as app_module
+
+
+def test_rejects_invalid_signature(tmp_path):
+    """Files with non-Excel signatures should be blocked early."""
+    bogus_excel = tmp_path / "malicious.xlsx"
+    bogus_excel.write_text("not really an excel file", encoding="utf-8")
+
+    assert app_module.is_valid_excel_file(str(bogus_excel)) is False
+
+
+def test_rejects_empty_file(tmp_path):
+    """Empty uploads fail validation."""
+    empty_excel = tmp_path / "empty.xlsx"
+    empty_excel.touch()
+
+    assert app_module.is_valid_excel_file(str(empty_excel)) is False
+
+
+def test_rejects_oversized_file(tmp_path, monkeypatch):
+    """Respect the MAX_CONTENT_LENGTH guardrail for large uploads."""
+    oversized_limit = 10
+    monkeypatch.setattr(app_module, "MAX_CONTENT_LENGTH", oversized_limit)
+    monkeypatch.setitem(app_module.app.config, "MAX_CONTENT_LENGTH", oversized_limit)
+
+    large_excel = tmp_path / "huge.xlsx"
+    large_excel.write_bytes(
+        b"PK\x03\x040" * 4
+    )  # Valid ZIP header repeated; file > limit
+
+    assert app_module.is_valid_excel_file(str(large_excel)) is False
--- a/tests/test_upload_download.py
+++ b/tests/test_upload_download.py
@@ -0,0 +1,60 @@
+import io
+import os
+import pandas as pd
+import pytest
+from src.app import app
+
+@pytest.fixture
+def client():
+    app.config['TESTING'] = True
+    with app.test_client() as client:
+        yield client
+
+def test_index_page(client):
+    response = client.get('/')
+    assert response.status_code == 200
+    assert b"ML Converter" in response.data or b"Subir Archivo" in response.data
+
+def test_upload_and_download(client):
+    # Create a simple Excel file in memory
+    df = pd.DataFrame({'words': ['one', 'two', 'three']})
+    excel_file = io.BytesIO()
+    df.to_excel(excel_file, index=False)
+    excel_file.seek(0)
+
+    # Upload the file
+    response = client.post('/upload', data={
+        'file': (excel_file, 'test.xlsx')
+    }, content_type='multipart/form-data', follow_redirects=True)
+    assert response.status_code == 200
+    assert b"Descargar Archivo Procesado" in response.data or b"Procesamiento Completado" in response.data
+
+def test_download_normalizes_and_confines_filename(client, tmp_path, monkeypatch):
+    upload_dir = tmp_path / "uploads"
+    upload_dir.mkdir()
+    monkeypatch.setitem(app.config, 'UPLOAD_FOLDER', str(upload_dir))
+
+    safe_name = '123_processed_test.xlsx'
+    file_path = upload_dir / safe_name
+    file_path.write_bytes(b'dummy excel bytes')
+
+    response = client.get(f"/download/..%5C{safe_name}")
+    assert response.status_code == 200
+    assert b'dummy excel bytes' in response.data
+    content_disposition = response.headers.get('Content-Disposition', '')
+    assert "attachment;" in content_disposition
+    assert "convertido_test.xlsx" in content_disposition
+
+def test_download_rejects_symlink_escape(client, tmp_path, monkeypatch):
+    upload_dir = tmp_path / "uploads"
+    upload_dir.mkdir()
+    outside_file = tmp_path / "outside.txt"
+    outside_file.write_text("secret")
+    monkeypatch.setitem(app.config, 'UPLOAD_FOLDER', str(upload_dir))
+
+    symlink_path = upload_dir / "escape"
+    os.symlink(outside_file, symlink_path)
+
+    response = client.get("/download/escape", follow_redirects=False)
+    # Should redirect back to index instead of serving the symlink target
+    assert response.status_code == 302