Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

读取大文件耗时很久 #4041

Open
cengxw opened this issue Oct 31, 2024 · 4 comments
Open

读取大文件耗时很久 #4041

cengxw opened this issue Oct 31, 2024 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@cengxw
Copy link

cengxw commented Oct 31, 2024

建议先去看文档

快速开始常见问题

异常代码

EasyExcel.read(file, new ReadListener() {
  @Override
  public void invoke(Object data, AnalysisContext context) {
	  System.out.println(data);
	  throw new ExcelAnalysisStopException();
  }
  
  @Override
  public void doAfterAllAnalysed(AnalysisContext context) {
  
  }
}).sheet().doRead();

异常提示

问题描述

就这段代码,读取一个160MB的xlsx,100W行、48列,耗时40秒。我只想读取第一行然后就结束读取,这耗时让我怀疑我是不是用错方式了

@cengxw cengxw added the help wanted Extra attention is needed label Oct 31, 2024
@cengxw
Copy link
Author

cengxw commented Oct 31, 2024

我用以下方法来读取就很快,2~4秒就能出结果,能说说差异在哪吗?

public static Map<String, String> firstMap(File file) {
        OPCPackage pkg = null;
        InputStream sheet = null;
        final Map<String, String> titleMap = new HashMap<>();
        try {
            pkg = OPCPackage.open(file, PackageAccess.READ);
            XSSFReader reader = new XSSFReader(pkg);
            sheet = reader.getSheet("rId1");
            XMLReader parser = XMLReaderFactory.createXMLReader();
            ReadOnlySharedStringsTable rosst = new ReadOnlySharedStringsTable(pkg);
            parser.setContentHandler(new XSSFSheetXMLHandler(reader.getStylesTable(),rosst,new XSSFSheetXMLHandler.SheetContentsHandler() {

                @Override
                public void cell(String index, String value, XSSFComment comment) {
                    titleMap.put(index, value);
                }

                @Override
                public void startRow(int arg0) {
                }

                @Override
                public void endRow(int arg0) {
                    throw new ArithmeticException();
                }

            },
            false));
            try {
                parser.parse(new InputSource(sheet));
            } catch(ArithmeticException e) {
                
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                sheet.close();
            } catch (IOException e) {

            }
            try {
                pkg.close();
            } catch (IOException e) {

            }
            return titleMap;
        }
    }

@wyctxwd1
Copy link

wyctxwd1 commented Nov 4, 2024

因为他源码里在读取的时候在一开始就解析了完整的sharestring,你的代码是自己处理了sharestring

@wangguanquan
Copy link

可以尝试换成EEC来处理这类问题, https://github.com/wangguanquan/eec
它支持filter, map, collect等标准的流式处理,可以像操作集合类一样读取Excel

try (ExcelReader reader = ExcelReader.read(Paths.get("/export/excel-1000k.xlsx"))) {
    // 指定第1行为表头并获取表头
    HeaderRow row = (HeaderRow) reader.sheet(0).header(1).getHeader();
    System.out.println(row);

    // 获取第100行的数据
    Row row100 = reader.sheet(0).rows().filter(r -> r.getRowNum() == 100).findAny().get();
    System.out.println(row100);

    // 将行数据转换为指定对象
    Item item = row100.to(Item.class);
}

@cengxw
Copy link
Author

cengxw commented Nov 6, 2024

感谢各位,我去研究研究

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants