Python脚本：删除HTML文件中的所有脚本和样式内容

回答：

当然！以下是一个Python脚本，它可以从HTML文件中删除所有的<script>标签和样式相关内容（包括<style>标签和内联样式）。我们将使用BeautifulSoup库来解析和修改HTML内容。

首先，确保你已经安装了BeautifulSoup和lxml库。如果没有安装，可以使用以下命令进行安装：

pip install beautifulsoup4 lxml

以下��Python脚本：

from bs4 import BeautifulSoup

def remove_scripts_and_styles(html_content):
    soup = BeautifulSoup(html_content, 'lxml')

    # Remove all <script> tags
    for script in soup.find_all('script'):
        script.decompose()

    # Remove all <style> tags
    for style in soup.find_all('style'):
        style.decompose()

    # Remove inline styles
    for tag in soup.find_all(True):  # True matches all tags
        if 'style' in tag.attrs:
            del tag.attrs['style']

    return str(soup)

def main():
    input_file = 'input.html'
    output_file = 'output.html'

    # Read the input HTML file
    with open(input_file, 'r', encoding='utf-8') as file:
        html_content = file.read()

    # Remove scripts and styles
    cleaned_html = remove_scripts_and_styles(html_content)

    # Write the cleaned HTML to the output file
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(cleaned_html)

    print(f"Cleaned HTML has been saved to {output_file}")

if __name__ == "__main__":
    main()

提示词

安装依赖：
- 使用以下命令安装所需的库：
```
pip install beautifulsoup4 lxml
```
脚本说明：
- 该脚本读取一个HTML文件，删除所有的<script>标签、<style>标签以及内联样式，然后将清理后的HTML内容保存到一个新的文件中。
使用方法：
- 将你的HTML文件命名为input.html并放置在与脚本相同的目录中。
- 运行脚本：
```
python your_script_name.py
```
- 清理后的HTML内容将被保存到output.html文件中。
自定义：
- 如果需要处理不同的文件，可以修改input_file和output_file变量的值。

希望这个脚本对你有帮助！如果有任何问题，请随时提问。

Python脚本：删除HTML文件中的所有脚本和样式内容

由 ChatGPT4o 使用 ChatGPT 4o 回答

提示词