mirror of
https://github.com/brockar/ml-converter.git
synced 2026-01-11 15:01:01 -03:00
init
This commit is contained in:
29
Dockerfile
Normal file
29
Dockerfile
Normal file
@@ -0,0 +1,29 @@
|
||||
FROM python:3.13-slim
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application code
|
||||
COPY src/ ./src/
|
||||
COPY gunicorn.conf.py ./
|
||||
|
||||
# Create non-root user
|
||||
RUN useradd -m -u 1000 appuser
|
||||
|
||||
RUN mkdir -p /app/tmp && \
|
||||
chown -R appuser:appuser /app
|
||||
|
||||
USER appuser
|
||||
EXPOSE 5000
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/')" || exit 1
|
||||
|
||||
CMD ["gunicorn", "-c", "gunicorn.conf.py", "src.app:app"]
|
||||
123
README.md
Normal file
123
README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# ML Converter
|
||||
|
||||
Aplicación web en Python para procesar archivos Excel, convirtiendo automáticamente valores numéricos almacenados como texto al formato numérico apropiado.
|
||||
Especialmente útil para resúmenes de Mercado Pago.
|
||||
|
||||
## Características
|
||||
|
||||
- **Detección y Conversión Inteligente de Números**: Identifica columnas con valores numéricos en formato texto (ej: "$1,234.56", "1.234,56", "(123.45)") y los convierte a números reales, manejando formatos internacionales, símbolos de moneda y negativos en paréntesis.
|
||||
- **Preserva Datos de Texto**: Columnas de texto (nombres, categorías, fechas) permanecen sin cambios.
|
||||
- **Interfaz Web Simple**: UI responsiva en español, con soporte drag & drop, mensajes claros de éxito/error y resumen de totales, ingresos y egresos tras el procesamiento.
|
||||
- **Manejo Seguro de Archivos**: Almacenamiento temporal en `/tmp/ml-converter/` y limpieza automática tras 30 minutos.
|
||||
- **Validación Real de Archivos**: Verifica que el archivo subido sea realmente Excel, no solo por extensión.
|
||||
- **Soporte de Formatos**: Acepta `.xlsx` y `.xls` (máx. 16MB).
|
||||
- **Pruebas Automáticas**: Incluye tests para endpoints y validaciones.
|
||||
- **Headers de Seguridad**: Cabeceras HTTP adicionales (HSTS, CSP, etc).
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Backend**: Flask (Python), pandas, openpyxl
|
||||
- **Frontend**: HTML5, Tailwind CSS, Jinja2
|
||||
- **File Cleanup**: APScheduler
|
||||
- **Containerización**: Docker Compose
|
||||
|
||||
## Configuración del Entorno
|
||||
|
||||
1. **Copia y edita el archivo de entorno:**
|
||||
|
||||
```bash
|
||||
cp env.example .env
|
||||
nvim .env
|
||||
```
|
||||
|
||||
2. **Genera un SECRET_KEY seguro:**
|
||||
|
||||
```bash
|
||||
python3 -c "import secrets; print('SECRET_KEY=' + secrets.token_urlsafe(32))"
|
||||
```
|
||||
|
||||
Actualiza `SECRET_KEY` y `DOMAIN` en `.env`.
|
||||
|
||||
3. **Variables importantes:**
|
||||
- `FLASK_ENV`: 'production' para producción
|
||||
- `MAX_CONTENT_LENGTH`: Tamaño máximo de archivo (por defecto: 16MB)
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Opción 1: Entorno Virtual Python
|
||||
|
||||
```bash
|
||||
./run.sh
|
||||
```
|
||||
|
||||
O manualmente:
|
||||
|
||||
```bash
|
||||
cd ml-converter
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
python src/app.py
|
||||
```
|
||||
|
||||
Visita [http://localhost:5000](http://localhost:5000)
|
||||
|
||||
### Opción 2: Docker
|
||||
|
||||
```bash
|
||||
docker-compose up --build
|
||||
```
|
||||
|
||||
Visita [http://localhost:5000](http://localhost:5000)
|
||||
|
||||
## Flujo de Uso
|
||||
|
||||
1. **Subi** tu archivo Excel (.xlsx/.xls) arrastrando o seleccionando.
|
||||
2. **Procesa**: Haz clic en "Procesar Archivo". Las columnas numéricas en texto se convierten automáticamente.
|
||||
3. **Descarga** el archivo procesado.
|
||||
4. **Limpieza**: Los archivos temporales se eliminan automáticamente tras 30 minutos.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
- `GET /` – Página principal de carga
|
||||
- `POST /upload` – Subida y procesamiento de archivos
|
||||
- `GET /download/<filename>` – Descarga del archivo procesado
|
||||
|
||||
## Despliegue en Producción
|
||||
|
||||
### Gunicorn
|
||||
|
||||
```bash
|
||||
gunicorn -w 4 -b 0.0.0.0:5000 src.app:app
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Estructura de Archivos
|
||||
|
||||
```
|
||||
ml-converter/
|
||||
├── src/
|
||||
│ ├── app.py # Aplicación principal Flask
|
||||
│ ├── converters.py # Helpers
|
||||
│ └── templates/
|
||||
│ ├── index.html # Página de carga
|
||||
│ └── download.html # Página de descarga
|
||||
├── tests/ # Tests automáticos
|
||||
├── requirements.txt # Dependencias
|
||||
├── Dockerfile
|
||||
├── compose.yml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Seguridad
|
||||
|
||||
- **Nombres de archivo seguros** (`secure_filename`)
|
||||
- **Validación de tipo de archivo** (.xlsx/.xls y firma interna)
|
||||
- **Límites de tamaño** (16MB)
|
||||
- **Limpieza automática** (30 minutos)
|
||||
- **Nombres de archivo únicos** (UUIDs para prevenir conflictos)
|
||||
22
compose.yml
Normal file
22
compose.yml
Normal file
@@ -0,0 +1,22 @@
|
||||
services:
|
||||
ml-converter:
|
||||
build: .
|
||||
environment:
|
||||
- FLASK_ENV=${FLASK_ENV:-production}
|
||||
- SECRET_KEY=${SECRET_KEY:-change-this-in-production}
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- proxy
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
# HTTP Router
|
||||
- "traefik.http.routers.ml-converter.rule=Host(`${DOMAIN:-localhost}`)"
|
||||
- "traefik.http.routers.ml-converter.entrypoints=websecure"
|
||||
- "traefik.http.routers.ml-converter.tls.certresolver=letsencrypt"
|
||||
- "traefik.http.services.ml-converter.loadbalancer.server.port=5000"
|
||||
# Optional: File size limit for uploads (16MB)
|
||||
- "traefik.http.middlewares.ml-converter-limit.buffering.maxRequestBodyBytes=16777216"
|
||||
- "traefik.http.routers.ml-converter.middlewares=ml-converter-limit"
|
||||
networks:
|
||||
proxy:
|
||||
external: true
|
||||
12
env.example
Normal file
12
env.example
Normal file
@@ -0,0 +1,12 @@
|
||||
# Copy this file to .env and update the values
|
||||
|
||||
# Domain
|
||||
DOMAIN=your-domain.com
|
||||
|
||||
# Flask Configuration
|
||||
FLASK_ENV=production
|
||||
SECRET_KEY=change-this-very-long-random-secret-key-in-production
|
||||
|
||||
# File Upload Limits
|
||||
# 16 MB = 16 * 1024 * 1024 = 16777216 bytes
|
||||
MAX_CONTENT_LENGTH=16777216
|
||||
36
gunicorn.conf.py
Normal file
36
gunicorn.conf.py
Normal file
@@ -0,0 +1,36 @@
|
||||
# Gunicorn configuration file for ML Converter
|
||||
# Usage: gunicorn -c gunicorn.conf.py src.app:app
|
||||
|
||||
# Server socket
|
||||
bind = "0.0.0.0:5000"
|
||||
backlog = 2048
|
||||
|
||||
# Worker processes
|
||||
workers = 4
|
||||
worker_class = "sync"
|
||||
worker_connections = 1000
|
||||
timeout = 30
|
||||
keepalive = 2
|
||||
|
||||
# Restart workers after this many requests, with up to 50% jitter
|
||||
max_requests = 1000
|
||||
max_requests_jitter = 50
|
||||
|
||||
# Logging
|
||||
accesslog = "-"
|
||||
errorlog = "-"
|
||||
loglevel = "info"
|
||||
|
||||
# Process naming
|
||||
proc_name = "ml-converter"
|
||||
|
||||
# Server mechanics
|
||||
daemon = False
|
||||
pidfile = "/tmp/ml-converter.pid"
|
||||
user = None
|
||||
group = None
|
||||
tmp_upload_dir = None
|
||||
|
||||
# SSL (uncomment and configure for HTTPS)
|
||||
# keyfile = "/path/to/keyfile"
|
||||
# certfile = "/path/to/certfile"
|
||||
6
main.py
Normal file
6
main.py
Normal file
@@ -0,0 +1,6 @@
|
||||
def main():
|
||||
print("Hello from ml-converter!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
3
pytest.ini
Normal file
3
pytest.ini
Normal file
@@ -0,0 +1,3 @@
|
||||
[pytest]
|
||||
testpaths = tests
|
||||
python_files = test_*.py
|
||||
9
requirements.txt
Normal file
9
requirements.txt
Normal file
@@ -0,0 +1,9 @@
|
||||
Flask==3.1.2
|
||||
pandas==2.3.3
|
||||
openpyxl==3.1.5
|
||||
xlrd==2.0.2
|
||||
APScheduler==3.11.1
|
||||
Werkzeug==3.1.4
|
||||
gunicorn==23.0.0
|
||||
XlsxWriter==3.2.9
|
||||
pytest==9.0.1
|
||||
19
run.sh
Executable file
19
run.sh
Executable file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
echo "Starting ML Converter Development Server"
|
||||
|
||||
if [ ! -d "venv" ]; then
|
||||
echo "Creating virtual environment..."
|
||||
python -m venv venv
|
||||
fi
|
||||
|
||||
echo "Activating virtual environment..."
|
||||
source venv/bin/activate
|
||||
|
||||
echo "Installing dependencies..."
|
||||
pip install -r requirements.txt
|
||||
|
||||
mkdir -p tmp
|
||||
|
||||
echo "Starting Flask development server..."
|
||||
echo "Access the application at: http://localhost:5000"
|
||||
python src/app.py
|
||||
408
src/app.py
Normal file
408
src/app.py
Normal file
@@ -0,0 +1,408 @@
|
||||
import atexit
|
||||
import logging
|
||||
import os
|
||||
import uuid
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
from threading import Lock
|
||||
|
||||
import pandas as pd
|
||||
from apscheduler.schedulers.background import BackgroundScheduler
|
||||
from flask import Flask, flash, redirect, render_template, request, send_file, url_for
|
||||
from werkzeug.utils import secure_filename
|
||||
|
||||
try:
|
||||
from src.converters import (
|
||||
convert_text_columns_to_numbers,
|
||||
find_columns_with_keywords,
|
||||
normalize_column_name,
|
||||
)
|
||||
except ModuleNotFoundError as exc:
|
||||
if exc.name == "src":
|
||||
from converters import (
|
||||
convert_text_columns_to_numbers,
|
||||
find_columns_with_keywords,
|
||||
normalize_column_name,
|
||||
)
|
||||
else:
|
||||
raise
|
||||
|
||||
app = Flask(__name__)
|
||||
app.secret_key = os.environ.get("SECRET_KEY", "dev-key-change-in-production")
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s"
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.dirname(__file__)), "tmp")
|
||||
ALLOWED_EXTENSIONS = {"xlsx", "xls"}
|
||||
MAX_CONTENT_LENGTH_DEFAULT = 16 * 1024 * 1024 # 16MB max file size
|
||||
MAX_CONTENT_LENGTH = int(
|
||||
os.environ.get("MAX_CONTENT_LENGTH", str(MAX_CONTENT_LENGTH_DEFAULT))
|
||||
)
|
||||
|
||||
app.config["UPLOAD_FOLDER"] = UPLOAD_FOLDER
|
||||
app.config["MAX_CONTENT_LENGTH"] = MAX_CONTENT_LENGTH
|
||||
|
||||
# Ensure upload directory exists for all entrypoints (app import, gunicorn workers, tests)
|
||||
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
|
||||
|
||||
HSTS_POLICY = "max-age=31536000; includeSubDomains"
|
||||
CSP_POLICY = (
|
||||
"default-src 'self'; "
|
||||
"style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; "
|
||||
"script-src 'self' 'unsafe-inline'; "
|
||||
"img-src 'self' data:; "
|
||||
"font-src 'self' data:; "
|
||||
"connect-src 'self'; "
|
||||
"form-action 'self'; "
|
||||
"frame-ancestors 'none'; "
|
||||
"base-uri 'self'"
|
||||
)
|
||||
PERMISSIONS_POLICY = "geolocation=(), microphone=(), camera=()"
|
||||
|
||||
|
||||
@app.after_request
|
||||
def apply_security_headers(response):
|
||||
"""Apply modern security headers to every response."""
|
||||
response.headers["Strict-Transport-Security"] = HSTS_POLICY
|
||||
response.headers["Content-Security-Policy"] = CSP_POLICY
|
||||
response.headers["Permissions-Policy"] = PERMISSIONS_POLICY
|
||||
response.headers["X-Content-Type-Options"] = "nosniff"
|
||||
response.headers["X-Frame-Options"] = "DENY"
|
||||
response.headers["X-XSS-Protection"] = "1; mode=block"
|
||||
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
|
||||
return response
|
||||
|
||||
|
||||
def allowed_file(filename):
|
||||
return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS
|
||||
|
||||
|
||||
def is_valid_excel_file(file_path):
|
||||
"""
|
||||
Validate that the file is actually an Excel file by checking file signature (magic bytes)
|
||||
and attempting to read it with pandas.
|
||||
"""
|
||||
try:
|
||||
file_size = os.path.getsize(file_path)
|
||||
if file_size == 0:
|
||||
logger.warning(f"Empty file rejected: {file_path}")
|
||||
return False
|
||||
|
||||
if file_size > MAX_CONTENT_LENGTH:
|
||||
logger.warning(f"File too large rejected: {file_path} ({file_size} bytes)")
|
||||
return False
|
||||
|
||||
# Check file signature (magic bytes)
|
||||
with open(file_path, "rb") as f:
|
||||
header = f.read(8)
|
||||
|
||||
# Excel file signatures
|
||||
# .xlsx files start with PK (ZIP format)
|
||||
# .xls files start with specific OLE signatures
|
||||
xlsx_signature = (
|
||||
header.startswith(b"PK\x03\x04")
|
||||
or header.startswith(b"PK\x05\x06")
|
||||
or header.startswith(b"PK\x07\x08")
|
||||
)
|
||||
xls_signature = header.startswith(
|
||||
b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1"
|
||||
) # OLE2 signature
|
||||
|
||||
if not (xlsx_signature or xls_signature):
|
||||
logger.warning(f"Invalid file signature for {file_path}: {header.hex()}")
|
||||
return False
|
||||
|
||||
# Try to read with pandas as additional validation
|
||||
# Use nrows=1 to minimize resource usage and prevent potential DoS
|
||||
df = pd.read_excel(file_path, nrows=1)
|
||||
|
||||
if df is None:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"File validation failed for {file_path}: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def cleanup_old_files():
|
||||
"""Remove files older than 1 hour from the temp directory."""
|
||||
try:
|
||||
current_time = datetime.now()
|
||||
for filename in os.listdir(UPLOAD_FOLDER):
|
||||
file_path = os.path.join(UPLOAD_FOLDER, filename)
|
||||
if os.path.isfile(file_path):
|
||||
file_time = datetime.fromtimestamp(os.path.getctime(file_path))
|
||||
if current_time - file_time > timedelta(minutes=30):
|
||||
os.remove(file_path)
|
||||
logger.info("Deleted old file: %s", filename)
|
||||
except Exception as e:
|
||||
logger.exception("Error during cleanup: %s", e)
|
||||
|
||||
|
||||
_scheduler_lock = Lock()
|
||||
_scheduler = None
|
||||
_scheduler_shutdown_registered = False
|
||||
|
||||
|
||||
def _shutdown_scheduler():
|
||||
global _scheduler
|
||||
with _scheduler_lock:
|
||||
if _scheduler and _scheduler.running:
|
||||
logger.info("Shutting down cleanup scheduler")
|
||||
_scheduler.shutdown()
|
||||
|
||||
|
||||
def start_cleanup_scheduler():
|
||||
"""Ensure the cleanup scheduler starts only once per process."""
|
||||
global _scheduler, _scheduler_shutdown_registered
|
||||
with _scheduler_lock:
|
||||
if _scheduler is None:
|
||||
_scheduler = BackgroundScheduler()
|
||||
_scheduler.add_job(
|
||||
func=cleanup_old_files,
|
||||
trigger="interval",
|
||||
minutes=10,
|
||||
id="cleanup-old-files",
|
||||
replace_existing=True,
|
||||
)
|
||||
if not _scheduler.running:
|
||||
logger.info("Starting cleanup scheduler")
|
||||
_scheduler.start()
|
||||
if not _scheduler_shutdown_registered:
|
||||
atexit.register(_shutdown_scheduler)
|
||||
_scheduler_shutdown_registered = True
|
||||
return _scheduler
|
||||
|
||||
|
||||
start_cleanup_scheduler()
|
||||
|
||||
|
||||
@app.route("/")
|
||||
def index():
|
||||
return render_template("index.html")
|
||||
|
||||
|
||||
@app.route("/upload", methods=["GET", "POST"])
|
||||
def upload_file():
|
||||
if request.method == "GET":
|
||||
return redirect(url_for("index"))
|
||||
|
||||
if "file" not in request.files:
|
||||
flash("No se seleccionó ningún archivo")
|
||||
logger.info("Upload attempted with no file in request")
|
||||
return redirect(request.url)
|
||||
|
||||
file = request.files["file"]
|
||||
if file.filename == "":
|
||||
flash("No se seleccionó ningún archivo")
|
||||
logger.info("Upload attempted with empty filename")
|
||||
return redirect(request.url)
|
||||
|
||||
if file and allowed_file(file.filename):
|
||||
try:
|
||||
original_filename = secure_filename(file.filename)
|
||||
unique_id = str(uuid.uuid4())
|
||||
upload_path = os.path.join(
|
||||
app.config["UPLOAD_FOLDER"], f"{unique_id}_original_{original_filename}"
|
||||
)
|
||||
file.save(upload_path)
|
||||
logger.info("File uploaded: %s -> %s", original_filename, upload_path)
|
||||
|
||||
if not is_valid_excel_file(upload_path):
|
||||
os.remove(upload_path)
|
||||
flash(
|
||||
"El archivo no es un archivo Excel válido. Por favor sube un archivo Excel real."
|
||||
)
|
||||
logger.warning("Invalid Excel file rejected: %s", original_filename)
|
||||
return redirect(url_for("index"))
|
||||
|
||||
logger.info("Starting processing of %s", upload_path)
|
||||
df = pd.read_excel(upload_path)
|
||||
processed_df, converted_columns = convert_text_columns_to_numbers(df)
|
||||
date_keywords = ["fecha", "liberacion", "liberación"]
|
||||
date_cols = find_columns_with_keywords(processed_df.columns, date_keywords)
|
||||
|
||||
for col in date_cols:
|
||||
processed_df[col] = pd.to_datetime(processed_df[col], errors="coerce")
|
||||
# Remove timezone info if present (Excel does not support tz-aware datetimes)
|
||||
if pd.api.types.is_datetime64_any_dtype(processed_df[col]):
|
||||
try:
|
||||
processed_df[col] = processed_df[col].dt.tz_localize(None)
|
||||
except (AttributeError, TypeError):
|
||||
pass
|
||||
|
||||
sum_h = sum_h_pos = sum_h_neg = None
|
||||
if processed_df.shape[1] > 7:
|
||||
col_h = processed_df.iloc[:, 7]
|
||||
col_h_numeric = pd.to_numeric(col_h, errors="coerce")
|
||||
sum_h = col_h_numeric.sum(skipna=True)
|
||||
sum_h_pos = col_h_numeric[col_h_numeric > 0].sum(skipna=True)
|
||||
sum_h_neg = col_h_numeric[col_h_numeric < 0].sum(skipna=True)
|
||||
|
||||
processed_filename = f"{unique_id}_processed_{original_filename}"
|
||||
processed_path = os.path.join(
|
||||
app.config["UPLOAD_FOLDER"], processed_filename
|
||||
)
|
||||
|
||||
# Use ExcelWriter to set date, ID, and money column formats
|
||||
with pd.ExcelWriter(
|
||||
processed_path, engine="xlsxwriter", date_format="yyyy-mm-dd"
|
||||
) as writer:
|
||||
processed_df.to_excel(writer, index=False)
|
||||
workbook = writer.book
|
||||
worksheet = writer.sheets["Sheet1"]
|
||||
date_format = workbook.add_format({"num_format": "yyyy-mm-dd"})
|
||||
id_format = workbook.add_format({"num_format": "0", "align": "left"})
|
||||
money_format = workbook.add_format({"num_format": "$ #,##0.00"})
|
||||
|
||||
header_format = workbook.add_format(
|
||||
{
|
||||
"text_wrap": True,
|
||||
"bold": True,
|
||||
"align": "center",
|
||||
"valign": "vcenter",
|
||||
}
|
||||
)
|
||||
worksheet.set_row(0, 40)
|
||||
# Set all columns to width 20
|
||||
for col_idx in range(len(processed_df.columns)):
|
||||
worksheet.set_column(col_idx, col_idx, 20)
|
||||
# Overwrite header row with header_format to ensure wrap
|
||||
for col_idx, value in enumerate(processed_df.columns):
|
||||
worksheet.write(0, col_idx, value, header_format)
|
||||
|
||||
# Define normalized money columns
|
||||
money_col_targets = [
|
||||
"valor de la compra",
|
||||
"comision mas iva",
|
||||
"comisión más iva",
|
||||
"monto neto de operacion",
|
||||
"monto neto de operación",
|
||||
"impuestos cobrados por retenciones iibb",
|
||||
]
|
||||
|
||||
# Set date columns
|
||||
for col in date_cols:
|
||||
col_idx = processed_df.columns.get_loc(col)
|
||||
worksheet.set_column(col_idx, col_idx, 20, date_format)
|
||||
# Set ID columns to integer format, wide enough to avoid scientific notation
|
||||
for col in processed_df.columns:
|
||||
norm_col = normalize_column_name(col)
|
||||
if "id" in norm_col:
|
||||
col_idx = processed_df.columns.get_loc(col)
|
||||
worksheet.set_column(col_idx, col_idx, 15, id_format)
|
||||
# Set money columns to currency format
|
||||
for col in processed_df.columns:
|
||||
norm_col = normalize_column_name(col)
|
||||
if norm_col in money_col_targets:
|
||||
col_idx = processed_df.columns.get_loc(col)
|
||||
worksheet.set_column(col_idx, col_idx, 15, money_format)
|
||||
logger.info("Processed file saved: %s", processed_path)
|
||||
|
||||
os.remove(upload_path)
|
||||
logger.info("Removed original uploaded file: %s", upload_path)
|
||||
|
||||
return render_template(
|
||||
"download.html",
|
||||
filename=processed_filename,
|
||||
original_name=original_filename,
|
||||
sum_h=sum_h,
|
||||
sum_h_pos=sum_h_pos,
|
||||
sum_h_neg=sum_h_neg,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Clean up uploaded file in case of any error
|
||||
try:
|
||||
if "upload_path" in locals() and os.path.exists(upload_path):
|
||||
os.remove(upload_path)
|
||||
logger.info("Cleaned up file after error: %s", upload_path)
|
||||
except Exception as cleanup_error:
|
||||
logger.exception("Error during cleanup: %s", cleanup_error)
|
||||
|
||||
# Generic error message to avoid information disclosure
|
||||
flash(
|
||||
"Error procesando el archivo. Por favor verifica que sea un archivo Excel válido."
|
||||
)
|
||||
logger.exception(
|
||||
"File processing error for %s: %s",
|
||||
original_filename if "original_filename" in locals() else "unknown",
|
||||
str(e),
|
||||
)
|
||||
return redirect(url_for("index"))
|
||||
else:
|
||||
flash(
|
||||
"Tipo de archivo inválido. Por favor sube un archivo Excel (.xlsx o .xls)"
|
||||
)
|
||||
logger.info(
|
||||
"Rejected upload - invalid file type: %s", file.filename if file else None
|
||||
)
|
||||
return redirect(url_for("index"))
|
||||
|
||||
|
||||
@app.route("/download/<filename>")
|
||||
def download_file(filename):
|
||||
try:
|
||||
logger.info("Download requested for: %s", filename)
|
||||
normalized_filename = secure_filename(filename)
|
||||
|
||||
if not normalized_filename:
|
||||
logger.warning(
|
||||
"Rejected download with empty normalized filename: %s", filename
|
||||
)
|
||||
flash("Archivo no encontrado o ha expirado")
|
||||
return redirect(url_for("index"))
|
||||
|
||||
if normalized_filename != filename:
|
||||
logger.info(
|
||||
"Normalized download filename from %s to %s",
|
||||
filename,
|
||||
normalized_filename,
|
||||
)
|
||||
|
||||
upload_root = Path(app.config["UPLOAD_FOLDER"]).resolve()
|
||||
requested_path = upload_root / normalized_filename
|
||||
|
||||
try:
|
||||
resolved_path = requested_path.resolve(strict=True)
|
||||
except FileNotFoundError:
|
||||
logger.info("File not found or expired: %s", requested_path)
|
||||
flash("Archivo no encontrado o ha expirado")
|
||||
return redirect(url_for("index"))
|
||||
|
||||
try:
|
||||
resolved_path.relative_to(upload_root)
|
||||
except ValueError:
|
||||
logger.warning(
|
||||
"Rejected download outside upload directory: %s -> %s",
|
||||
filename,
|
||||
resolved_path,
|
||||
)
|
||||
flash("Archivo no encontrado o ha expirado")
|
||||
return redirect(url_for("index"))
|
||||
|
||||
if resolved_path.is_file():
|
||||
logger.info("Serving file: %s", resolved_path)
|
||||
download_name = f"convertido_{normalized_filename.split('_', 2)[-1]}"
|
||||
return send_file(
|
||||
resolved_path, as_attachment=True, download_name=download_name
|
||||
)
|
||||
|
||||
logger.info("Path is not a regular file or has expired: %s", resolved_path)
|
||||
flash("Archivo no encontrado o ha expirado")
|
||||
return redirect(url_for("index"))
|
||||
except Exception as e:
|
||||
logger.exception("Error serving download: %s", e)
|
||||
flash(f"Error descargando el archivo: {str(e)}")
|
||||
return redirect(url_for("index"))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
|
||||
app.run(debug=True, host="0.0.0.0", port=5000)
|
||||
176
src/converters.py
Normal file
176
src/converters.py
Normal file
@@ -0,0 +1,176 @@
|
||||
"""Utilities for normalizing and converting tabular data columns."""
|
||||
|
||||
from __future__ import annotations
|
||||
import math
|
||||
import unicodedata
|
||||
from typing import Iterable, List, Optional, Tuple
|
||||
import pandas as pd
|
||||
|
||||
|
||||
_ID_KEYWORDS: Tuple[str, ...] = ("id",)
|
||||
_CURRENCY_SYMBOLS: Tuple[str, ...] = ("$", "€", "£", "¥", "₽", "₱", "₹")
|
||||
|
||||
|
||||
def normalize_column_name(name: object) -> str:
|
||||
"""Return a normalized, accent-free column identifier."""
|
||||
if not isinstance(name, str):
|
||||
return ""
|
||||
normalized = unicodedata.normalize("NFKD", name.strip().lower())
|
||||
return "".join(char for char in normalized if not unicodedata.combining(char))
|
||||
|
||||
|
||||
def _strip_currency_symbols(value: str) -> str:
|
||||
cleaned = value
|
||||
for symbol in _CURRENCY_SYMBOLS:
|
||||
cleaned = cleaned.replace(symbol, "")
|
||||
return cleaned
|
||||
|
||||
|
||||
def _coerce_to_string(value: object) -> Optional[str]:
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, (int, float)) and not isinstance(value, bool):
|
||||
if math.isnan(value) if isinstance(value, float) else False:
|
||||
return None
|
||||
return str(value)
|
||||
text = str(value).strip()
|
||||
return text or None
|
||||
|
||||
|
||||
def _parse_numeric_text(text_value: object) -> Tuple[Optional[str], bool]:
|
||||
"""Clean a numeric-like string and return (normalized_value, is_negative)."""
|
||||
text = _coerce_to_string(text_value)
|
||||
if text is None:
|
||||
return None, False
|
||||
|
||||
cleaned = unicodedata.normalize("NFKC", text)
|
||||
cleaned = cleaned.replace("\xa0", "")
|
||||
cleaned = _strip_currency_symbols(cleaned)
|
||||
|
||||
is_negative = False
|
||||
if cleaned.startswith("(") and cleaned.endswith(")"):
|
||||
cleaned = cleaned[1:-1]
|
||||
is_negative = True
|
||||
|
||||
if cleaned.endswith("-"):
|
||||
cleaned = cleaned[:-1]
|
||||
is_negative = True
|
||||
|
||||
if cleaned.startswith("-"):
|
||||
cleaned = cleaned[1:]
|
||||
is_negative = True
|
||||
|
||||
if cleaned.startswith("+"):
|
||||
cleaned = cleaned[1:]
|
||||
|
||||
cleaned = cleaned.replace(" ", "")
|
||||
|
||||
if "." in cleaned and "," in cleaned:
|
||||
last_dot = cleaned.rfind(".")
|
||||
last_comma = cleaned.rfind(",")
|
||||
if last_dot > last_comma:
|
||||
cleaned = cleaned.replace(",", "")
|
||||
else:
|
||||
cleaned = cleaned.replace(".", "")
|
||||
cleaned = cleaned.replace(",", ".")
|
||||
elif cleaned.count(",") == 1 and len(cleaned.split(",")[1]) <= 2:
|
||||
cleaned = cleaned.replace(",", ".")
|
||||
else:
|
||||
cleaned = cleaned.replace(",", "")
|
||||
|
||||
if cleaned.count(".") > 1:
|
||||
parts = cleaned.split(".")
|
||||
cleaned = "".join(parts[:-1]) + "." + parts[-1]
|
||||
|
||||
cleaned = cleaned.replace("'", "")
|
||||
|
||||
try:
|
||||
float(cleaned)
|
||||
except (TypeError, ValueError):
|
||||
return None, False
|
||||
|
||||
return cleaned, is_negative
|
||||
|
||||
|
||||
def is_numeric_like(text_value: object) -> bool:
|
||||
"""Return True if a value can be safely interpreted as a number."""
|
||||
cleaned, _ = _parse_numeric_text(text_value)
|
||||
return cleaned is not None
|
||||
|
||||
|
||||
def convert_numeric_text(text_value: object) -> Optional[float]:
|
||||
"""Convert numeric-like text into a float. Returns pandas NA on failure."""
|
||||
if text_value is None:
|
||||
return pd.NA
|
||||
|
||||
if isinstance(text_value, (int, float)) and not isinstance(text_value, bool):
|
||||
if isinstance(text_value, float) and math.isnan(text_value):
|
||||
return pd.NA
|
||||
return float(text_value)
|
||||
|
||||
cleaned, is_negative = _parse_numeric_text(text_value)
|
||||
if cleaned is None:
|
||||
return pd.NA
|
||||
|
||||
try:
|
||||
result = float(cleaned)
|
||||
except (TypeError, ValueError):
|
||||
return pd.NA
|
||||
|
||||
return -result if is_negative else result
|
||||
|
||||
|
||||
def _should_force_numeric(norm_column_name: str) -> bool:
|
||||
return any(keyword in norm_column_name for keyword in _ID_KEYWORDS)
|
||||
|
||||
|
||||
def convert_text_columns_to_numbers(df: pd.DataFrame) -> Tuple[pd.DataFrame, List[str]]:
|
||||
"""Convert numeric-like object columns in ``df`` into numeric dtypes."""
|
||||
converted_columns: List[str] = []
|
||||
|
||||
for column in df.columns:
|
||||
series = df[column]
|
||||
if pd.api.types.is_numeric_dtype(series):
|
||||
continue
|
||||
|
||||
normalized_name = normalize_column_name(column)
|
||||
force_numeric = _should_force_numeric(normalized_name)
|
||||
|
||||
if not (
|
||||
force_numeric
|
||||
or series.dtype == object
|
||||
or pd.api.types.is_string_dtype(series)
|
||||
):
|
||||
continue
|
||||
|
||||
non_null = series.dropna()
|
||||
if non_null.empty and not force_numeric:
|
||||
continue
|
||||
|
||||
cleaned_non_null = non_null.map(_coerce_to_string).dropna()
|
||||
if cleaned_non_null.empty and not force_numeric:
|
||||
continue
|
||||
|
||||
if force_numeric or cleaned_non_null.map(is_numeric_like).all():
|
||||
numeric_series = series.map(convert_numeric_text)
|
||||
df[column] = pd.to_numeric(numeric_series, errors="coerce")
|
||||
converted_columns.append(column)
|
||||
|
||||
return df, converted_columns
|
||||
|
||||
|
||||
def find_columns_with_keywords(
|
||||
columns: Iterable[str], keywords: Iterable[str]
|
||||
) -> List[str]:
|
||||
"""Return columns whose normalized name contains any of the provided keywords."""
|
||||
normalized_keywords = tuple(normalize_column_name(keyword) for keyword in keywords)
|
||||
matches: List[str] = []
|
||||
|
||||
for column in columns:
|
||||
normalized_column = normalize_column_name(column)
|
||||
if any(
|
||||
keyword and keyword in normalized_column for keyword in normalized_keywords
|
||||
):
|
||||
matches.append(column)
|
||||
|
||||
return matches
|
||||
32
src/static/globals.css
Normal file
32
src/static/globals.css
Normal file
@@ -0,0 +1,32 @@
|
||||
:root {
|
||||
color-scheme: dark;
|
||||
}
|
||||
|
||||
html,
|
||||
body {
|
||||
font-family: 'Inter', 'Nunito Sans', 'Segoe UI', sans-serif;
|
||||
-webkit-font-smoothing: antialiased;
|
||||
-moz-osx-font-smoothing: grayscale;
|
||||
}
|
||||
|
||||
body {
|
||||
background-color: #111827;
|
||||
}
|
||||
|
||||
a {
|
||||
color: #60a5fa;
|
||||
transition: color 120ms ease-in-out;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: #3b82f6;
|
||||
}
|
||||
|
||||
button {
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.focus-ring {
|
||||
outline: 2px solid #2563eb;
|
||||
outline-offset: 2px;
|
||||
}
|
||||
73
src/templates/download.html
Normal file
73
src/templates/download.html
Normal file
@@ -0,0 +1,73 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="es">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Descargar Archivo Procesado - ML Converter</title>
|
||||
<link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css" rel="stylesheet">
|
||||
</head>
|
||||
<body class="bg-gradient-to-br from-gray-900 to-gray-800 min-h-screen py-12">
|
||||
<div class="max-w-2xl mx-auto">
|
||||
<div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-8">
|
||||
<div class="text-center mb-8">
|
||||
<div class="mx-auto flex items-center justify-center h-14 w-14 rounded-full bg-green-200 mb-4">
|
||||
<svg class="h-8 w-8 text-green-600" fill="none" viewBox="0 0 24 24" stroke="currentColor">
|
||||
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
</div>
|
||||
<h1 class="text-3xl font-extrabold text-white mb-2">¡Procesamiento Completado!</h1>
|
||||
</div>
|
||||
|
||||
{% with messages = get_flashed_messages() %}
|
||||
{% if messages %}
|
||||
<div class="mb-4">
|
||||
{% for message in messages %}
|
||||
<div class="bg-green-100 border border-green-400 text-green-700 px-4 py-3 rounded mb-2">
|
||||
{{ message }}
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
{% endwith %}
|
||||
|
||||
<div class="bg-gray-800 border border-gray-700 rounded-lg shadow p-4 mb-6">
|
||||
<h2 class="text-lg font-semibold text-gray-200 mb-3 tracking-wide">Archivo procesado: {{ original_name }}</h2>
|
||||
{% if sum_h is not none %}
|
||||
<div class="grid grid-cols-1 sm:grid-cols-3 gap-2 text-sm text-gray-200 text-center">
|
||||
<div>
|
||||
<span class="block font-medium text-gray-400">Total</span>
|
||||
<span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h) }}</span>
|
||||
</div>
|
||||
<div>
|
||||
<span class="block font-medium text-green-400">Ingresos</span>
|
||||
<span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h_pos) }}</span>
|
||||
</div>
|
||||
<div>
|
||||
<span class="block font-medium text-red-400">Egresos</span>
|
||||
<span class="block font-semibold text-gray-100">{{ "$ {:,.2f}".format(sum_h_neg) }}</span>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
|
||||
<div class="space-y-4">
|
||||
<a href="/download/{{ filename }}"
|
||||
class="w-full flex justify-center items-center py-3 px-4 border border-transparent rounded-md shadow text-base font-semibold text-white bg-green-600 hover:bg-green-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-green-400">
|
||||
<svg class="w-5 h-5 mr-2" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 10v6m0 0l-3-3m3 3l3-3m2 8H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
||||
</svg>
|
||||
Descargar Archivo Procesado
|
||||
</a>
|
||||
<a href="/"
|
||||
class="w-full flex justify-center items-center py-3 px-4 border border-gray-600 rounded-md shadow text-base font-semibold text-gray-200 bg-gray-800 hover:bg-gray-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-400">
|
||||
Procesar Otro Archivo
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<div class="mt-8 text-xs text-gray-400 text-center">
|
||||
<p>⚠️ Los archivos se eliminan automáticamente después de 30 minutos por seguridad.</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
133
src/templates/index.html
Normal file
133
src/templates/index.html
Normal file
@@ -0,0 +1,133 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="es">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>ML Converter - Resumen de Mercado Libre/Pago para Excel</title>
|
||||
<link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css" rel="stylesheet">
|
||||
<link rel="stylesheet" href="{{ url_for('static', filename='globals.css') }}">
|
||||
</head>
|
||||
|
||||
<body class="bg-gradient-to-br from-gray-900 to-gray-800 min-h-screen py-12">
|
||||
<div class="max-w-2xl mx-auto">
|
||||
<!-- Card: Upload -->
|
||||
<div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-8 mb-8">
|
||||
<div class="text-center mb-8">
|
||||
<h1 class="text-4xl font-extrabold text-white mb-2">ML Converter</h1>
|
||||
<p class="text-gray-300 text-lg">Convertí tu resumen de Mercado Pago para que sea legible en Excel</p>
|
||||
</div>
|
||||
|
||||
{% with messages = get_flashed_messages() %}
|
||||
{% if messages %}
|
||||
<div class="mb-4">
|
||||
{% for message in messages %}
|
||||
<div class="bg-red-100 border border-red-400 text-red-700 px-4 py-3 rounded mb-2">
|
||||
{{ message }}
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
{% endif %}
|
||||
{% endwith %}
|
||||
|
||||
<form action="/upload" method="post" enctype="multipart/form-data" class="space-y-6">
|
||||
<div class="flex justify-center px-6 pt-8 pb-8 border-2 border-dashed border-gray-500 rounded-xl bg-gray-800">
|
||||
<div class="space-y-4 text-center">
|
||||
<div class="flex justify-center">
|
||||
<svg class="h-14 w-14 text-blue-400" fill="none" stroke="currentColor" viewBox="0 0 48 48">
|
||||
<path d="M28 8H12a4 4 0 00-4 4v20m32-12v8m0 0v8a4 4 0 01-4 4H12a4 4 0 01-4-4v-4m32-4l-3.172-3.172a4 4 0 00-5.656 0L28 28M8 32l9.172-9.172a4 4 0 015.656 0L28 28m0 0l4 4m4-24h8m-4-4v8m-12 4h.02" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" />
|
||||
</svg>
|
||||
</div>
|
||||
<div class="flex flex-col items-center text-gray-300">
|
||||
<label for="file" class="relative cursor-pointer bg-blue-600 hover:bg-blue-700 text-white rounded-md font-semibold py-2 px-6 text-base shadow focus:outline-none focus:ring-2 focus:ring-blue-400 focus:ring-offset-2">
|
||||
<span>Elegir Archivo</span>
|
||||
<input id="file" name="file" type="file" class="sr-only" accept=".xlsx,.xls" required>
|
||||
</label>
|
||||
<span class="mt-2 text-sm">o arrastrá y soltá tu archivo Excel aquí</span>
|
||||
</div>
|
||||
<p class="text-xs text-gray-400">Archivos Excel hasta 16MB</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="mt-4">
|
||||
<button type="submit" class="w-full flex justify-center py-3 px-4 border border-transparent rounded-md shadow text-base font-semibold text-white bg-blue-600 hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-400">
|
||||
Subir Archivo Excel
|
||||
</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
|
||||
<!-- Card: Cómo funciona -->
|
||||
<div class="bg-gray-900 bg-opacity-80 border border-gray-700 rounded-2xl shadow-xl p-6">
|
||||
<h2 class="text-lg font-bold text-white mb-4">Cómo funciona:</h2>
|
||||
<ul class="space-y-2 text-gray-200 text-sm pl-4 list-disc">
|
||||
<li>Subí tu archivo de resumen de Mercado Libre/Pago (.xlsx o .xls).</li>
|
||||
<li>Las columnas como "VALOR DE LA COMPRA" y "MONTO NETO" se convierten automáticamente.</li>
|
||||
<li>Los datos quedan listos para análisis en Excel con formato numérico correcto.</li>
|
||||
<li>Las columnas de texto (tipos de pago, estados) permanecen sin cambios.</li>
|
||||
<li>Los archivos se eliminan automáticamente después de 30 minutos por seguridad.</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
(function() {
|
||||
const fileInput = document.getElementById('file');
|
||||
if (!fileInput) return;
|
||||
|
||||
const label = document.querySelector('[for="file"]');
|
||||
const dropZone = label ? label.closest('.border-dashed') : document.querySelector('.border-dashed');
|
||||
|
||||
function preventDefaults(e) {
|
||||
e.preventDefault();
|
||||
e.stopPropagation();
|
||||
}
|
||||
|
||||
function highlight() {
|
||||
if (dropZone) dropZone.classList.add('border-indigo-500', 'border-solid');
|
||||
}
|
||||
|
||||
function unhighlight() {
|
||||
if (dropZone) dropZone.classList.remove('border-indigo-500', 'border-solid');
|
||||
}
|
||||
|
||||
function handleDrop(e) {
|
||||
const dt = e.dataTransfer;
|
||||
const files = dt && dt.files;
|
||||
if (files && files.length > 0) {
|
||||
fileInput.files = files;
|
||||
setTimeout(() => fileInput.form && fileInput.form.submit(), 10);
|
||||
}
|
||||
}
|
||||
|
||||
if (dropZone) {
|
||||
['dragenter', 'dragover', 'dragleave', 'drop'].forEach(eventName => {
|
||||
dropZone.addEventListener(eventName, preventDefaults, false);
|
||||
});
|
||||
|
||||
['dragenter', 'dragover'].forEach(eventName => {
|
||||
dropZone.addEventListener(eventName, highlight, false);
|
||||
});
|
||||
|
||||
['dragleave', 'drop'].forEach(eventName => {
|
||||
dropZone.addEventListener(eventName, unhighlight, false);
|
||||
});
|
||||
|
||||
dropZone.addEventListener('drop', handleDrop, false);
|
||||
}
|
||||
|
||||
const uploadButton = document.querySelector('button');
|
||||
if (uploadButton) {
|
||||
uploadButton.addEventListener('click', function(e) {
|
||||
e.preventDefault();
|
||||
fileInput.click();
|
||||
}, false);
|
||||
}
|
||||
|
||||
fileInput.addEventListener('change', function() {
|
||||
if (fileInput.files && fileInput.files.length > 0) {
|
||||
fileInput.form && fileInput.form.submit();
|
||||
}
|
||||
});
|
||||
})();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
1
tests/__init__.py
Normal file
1
tests/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Init for tests package
|
||||
13
tests/test_app.py
Normal file
13
tests/test_app.py
Normal file
@@ -0,0 +1,13 @@
|
||||
import pytest
|
||||
from src.app import app
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
app.config['TESTING'] = True
|
||||
with app.test_client() as client:
|
||||
yield client
|
||||
|
||||
def test_index(client):
|
||||
response = client.get('/')
|
||||
assert response.status_code == 200
|
||||
assert b"ML Converter" in response.data or b"Subir Archivo" in response.data
|
||||
50
tests/test_converters.py
Normal file
50
tests/test_converters.py
Normal file
@@ -0,0 +1,50 @@
|
||||
import pandas as pd
|
||||
import pytest
|
||||
|
||||
from src import converters
|
||||
|
||||
|
||||
def test_converts_currency_strings_to_numbers():
|
||||
df = pd.DataFrame(
|
||||
{
|
||||
'Monto Neto de Operacion': ['\u20ac1.234,56', '$ 1,234.56', '(1.234,56)'],
|
||||
'descripcion': ['uno', 'dos', 'tres'],
|
||||
}
|
||||
)
|
||||
|
||||
processed, converted = converters.convert_text_columns_to_numbers(df)
|
||||
|
||||
assert 'Monto Neto de Operacion' in converted
|
||||
assert processed['Monto Neto de Operacion'].iloc[0] == pytest.approx(1234.56)
|
||||
assert processed['Monto Neto de Operacion'].iloc[1] == pytest.approx(1234.56)
|
||||
assert processed['Monto Neto de Operacion'].iloc[2] == pytest.approx(-1234.56)
|
||||
|
||||
|
||||
def test_force_converts_id_columns_even_with_padding():
|
||||
df = pd.DataFrame(
|
||||
{
|
||||
'Operacion ID': ['000123', ' 456 ', None],
|
||||
}
|
||||
)
|
||||
|
||||
processed, converted = converters.convert_text_columns_to_numbers(df)
|
||||
|
||||
assert 'Operacion ID' in converted
|
||||
assert processed['Operacion ID'].dropna().tolist() == [123.0, 456.0]
|
||||
|
||||
|
||||
def test_mixed_content_column_is_not_converted():
|
||||
df = pd.DataFrame(
|
||||
{
|
||||
'monto': ['$123', 'no aplicar', '$456'],
|
||||
}
|
||||
)
|
||||
|
||||
processed, converted = converters.convert_text_columns_to_numbers(df)
|
||||
|
||||
assert 'monto' not in converted
|
||||
assert processed['monto'].dtype == object
|
||||
|
||||
|
||||
def test_convert_numeric_text_returns_na_for_invalid_strings():
|
||||
assert pd.isna(converters.convert_numeric_text('no es numero'))
|
||||
29
tests/test_errors.py
Normal file
29
tests/test_errors.py
Normal file
@@ -0,0 +1,29 @@
|
||||
import io
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from src.app import app
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
app.config['TESTING'] = True
|
||||
with app.test_client() as client:
|
||||
yield client
|
||||
|
||||
def test_upload_no_file(client):
|
||||
response = client.post('/upload', data={}, follow_redirects=True)
|
||||
assert response.status_code == 200
|
||||
assert b"Archivo" in response.data or b"Subir Archivo" in response.data
|
||||
|
||||
def test_upload_invalid_extension(client):
|
||||
response = client.post('/upload', data={
|
||||
'file': (io.BytesIO(b"fake data"), 'test.txt')
|
||||
}, content_type='multipart/form-data', follow_redirects=True)
|
||||
assert response.status_code == 200
|
||||
assert b"Archivo" in response.data or b"Subir Archivo" in response.data
|
||||
|
||||
def test_upload_empty_file(client):
|
||||
response = client.post('/upload', data={
|
||||
'file': (io.BytesIO(), '')
|
||||
}, content_type='multipart/form-data', follow_redirects=True)
|
||||
assert response.status_code == 200
|
||||
assert b"Archivo" in response.data or b"Subir Archivo" in response.data
|
||||
31
tests/test_security.py
Normal file
31
tests/test_security.py
Normal file
@@ -0,0 +1,31 @@
|
||||
from src import app as app_module
|
||||
|
||||
|
||||
def test_rejects_invalid_signature(tmp_path):
|
||||
"""Files with non-Excel signatures should be blocked early."""
|
||||
bogus_excel = tmp_path / "malicious.xlsx"
|
||||
bogus_excel.write_text("not really an excel file", encoding="utf-8")
|
||||
|
||||
assert app_module.is_valid_excel_file(str(bogus_excel)) is False
|
||||
|
||||
|
||||
def test_rejects_empty_file(tmp_path):
|
||||
"""Empty uploads fail validation."""
|
||||
empty_excel = tmp_path / "empty.xlsx"
|
||||
empty_excel.touch()
|
||||
|
||||
assert app_module.is_valid_excel_file(str(empty_excel)) is False
|
||||
|
||||
|
||||
def test_rejects_oversized_file(tmp_path, monkeypatch):
|
||||
"""Respect the MAX_CONTENT_LENGTH guardrail for large uploads."""
|
||||
oversized_limit = 10
|
||||
monkeypatch.setattr(app_module, "MAX_CONTENT_LENGTH", oversized_limit)
|
||||
monkeypatch.setitem(app_module.app.config, "MAX_CONTENT_LENGTH", oversized_limit)
|
||||
|
||||
large_excel = tmp_path / "huge.xlsx"
|
||||
large_excel.write_bytes(
|
||||
b"PK\x03\x040" * 4
|
||||
) # Valid ZIP header repeated; file > limit
|
||||
|
||||
assert app_module.is_valid_excel_file(str(large_excel)) is False
|
||||
60
tests/test_upload_download.py
Normal file
60
tests/test_upload_download.py
Normal file
@@ -0,0 +1,60 @@
|
||||
import io
|
||||
import os
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from src.app import app
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
app.config['TESTING'] = True
|
||||
with app.test_client() as client:
|
||||
yield client
|
||||
|
||||
def test_index_page(client):
|
||||
response = client.get('/')
|
||||
assert response.status_code == 200
|
||||
assert b"ML Converter" in response.data or b"Subir Archivo" in response.data
|
||||
|
||||
def test_upload_and_download(client):
|
||||
# Create a simple Excel file in memory
|
||||
df = pd.DataFrame({'words': ['one', 'two', 'three']})
|
||||
excel_file = io.BytesIO()
|
||||
df.to_excel(excel_file, index=False)
|
||||
excel_file.seek(0)
|
||||
|
||||
# Upload the file
|
||||
response = client.post('/upload', data={
|
||||
'file': (excel_file, 'test.xlsx')
|
||||
}, content_type='multipart/form-data', follow_redirects=True)
|
||||
assert response.status_code == 200
|
||||
assert b"Descargar Archivo Procesado" in response.data or b"Procesamiento Completado" in response.data
|
||||
|
||||
def test_download_normalizes_and_confines_filename(client, tmp_path, monkeypatch):
|
||||
upload_dir = tmp_path / "uploads"
|
||||
upload_dir.mkdir()
|
||||
monkeypatch.setitem(app.config, 'UPLOAD_FOLDER', str(upload_dir))
|
||||
|
||||
safe_name = '123_processed_test.xlsx'
|
||||
file_path = upload_dir / safe_name
|
||||
file_path.write_bytes(b'dummy excel bytes')
|
||||
|
||||
response = client.get(f"/download/..%5C{safe_name}")
|
||||
assert response.status_code == 200
|
||||
assert b'dummy excel bytes' in response.data
|
||||
content_disposition = response.headers.get('Content-Disposition', '')
|
||||
assert "attachment;" in content_disposition
|
||||
assert "convertido_test.xlsx" in content_disposition
|
||||
|
||||
def test_download_rejects_symlink_escape(client, tmp_path, monkeypatch):
|
||||
upload_dir = tmp_path / "uploads"
|
||||
upload_dir.mkdir()
|
||||
outside_file = tmp_path / "outside.txt"
|
||||
outside_file.write_text("secret")
|
||||
monkeypatch.setitem(app.config, 'UPLOAD_FOLDER', str(upload_dir))
|
||||
|
||||
symlink_path = upload_dir / "escape"
|
||||
os.symlink(outside_file, symlink_path)
|
||||
|
||||
response = client.get("/download/escape", follow_redirects=False)
|
||||
# Should redirect back to index instead of serving the symlink target
|
||||
assert response.status_code == 302
|
||||
Reference in New Issue
Block a user