写在前面

转换文件格式？CSV 转成字典？
去找各种工具来？
不如让 AI 直接来

Concatenate CSVs into Dict

July 8, 2024
https://chatgpt.com/share/bc99c46b-d1a8-48ea-8d46-411182f415d6

Concatenate CSVs into Dict

🧑‍🦰User

1	在 "data/" 目录中查找所有的 CSV 文件，并将它们合并为一个字典。这些文件有标题，请确保所有标题一致，并移除这些标题。

🤖 Assistant

抱歉，可能有些误会，我无法直接访问您的文件系统。不过，我可以教您如何用 python 来实现这个操作。

要将 “data/“ 目录中的多个 CSV 文件合并到一个字典中，您可以使用 os 模块列出目录中的文件，并通过 pandas 来读取和处理 CSV 文件。具体步骤如下：

首先，确保您已经安装了 pandas：

1	pip install pandas

使用以下 python 代码，将 “data/“ 目录中的所有 CSV 文件合并为一个字典：

import os
import pandas as pd

path = 'data/'
csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]

data_dict = {}

for file in csv_files:
    file_path = os.path.join(path, file)
    
    # Assuming all CSVs have a header
    df = pd.read_csv(file_path)
    
    # Check if data_dict is empty; if yes, add the CSV's header as key
    if not data_dict:
        data_dict = {column: [] for column in df.columns}
    
    # Check if headers match
    if set(df.columns) != set(data_dict.keys()):
        raise ValueError(f"Headers in {file} don't match with the previous files.")
    
    # Append the data from the current CSV to the data_dict
    for column in df.columns:
        data_dict[column].extend(df[column].tolist())

print(data_dict)

如果 CSV 文件的标题不匹配，代码会报错。如果所有操作正常，data_dict 字典将包含所有 CSV 文件的合并数据。

注意：请根据您的具体需求处理可能出现的问题，例如文件丢失、标题不一致等。

🧑‍🦰User

1	哦，原来这些文件没有列标题

🤖 Assistant

如果您的 CSV 文件没有列标题，您可以逐行读取每个文件，并将内容添加到一个列表中。假设这些 CSV 文件没有列标题，我会假定每个 CSV 文件中的列顺序一致。

以下是一个简单的 python 脚本，用于将 “data/“ 目录中所有 CSV 文件的内容合并为一个列表的列表：

import os

path = 'data/'
csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]

all_data = []

for file in csv_files:
    file_path = os.path.join(path, file)
    
    with open(file_path, 'r') as f:
        # Read the file line by line
        for line in f:
            # Assuming the data is comma-separated
            # You can strip off any whitespace or newline characters and split by comma
            row = line.strip().split(',')
            all_data.append(row)

print(all_data)

这个脚本会生成一个列表的列表 (all_data)，每个子列表代表 CSV 文件中的一行。如果文件中的数据并不是用逗号分隔的，可以调整 split() 方法来适应。

🧑‍🦰User

1	不要只用 `strip` 和 `split`，请使用 `csv` 模块

🤖 Assistant

当然，使用 python 的 csv 模块处理 CSV 文件更为稳妥，特别是当数据中可能包含逗号或换行符时。以下是如何使用 csv 模块读取并合并文件的方法：

import os
import csv

path = 'data/'
csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]

all_data = []

for file in csv_files:
    file_path = os.path.join(path, file)
    
    with open(file_path, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            all_data.append(row)

print(all_data)