How to Read Both .xls and .xlsx Excel Files in Python

Reading Excel Files in Python

by Sajjad Hussain

In Python data manipulation, handling Excel files is a common task. While libraries like pandas excel at reading modern .xlsx files, older .xls formats might require a different approach. Traditionally, you'd need the latest version of xlrd to handle .xls files. However, this blog presents a solution that lets you read both .xls and .xlsx formats using any version of xlrd.

Why Use xlrd?

Even with many alternative libraries, xlrd remains a reliable workhorse for reading legacy Excel files. Here are its key strengths:

Efficiency

Known for its swift file processing capabilities.

Lightweight

Its compact size makes it ideal for environments with limited resources.

Stability

It's well-established and consistently reliable.

The Secret Weapon: BytesIO

The solution revolves around the clever use of the built-in Python module, BytesIO. This module allows you to create a virtual file-like object in memory, which converts byte data into a stream that xlrd can easily interpret. Here's the revamped code that enables format flexibility:

Explanation

Decode Base64 Data

Decode the base64-encoded data (assuming it's stored in self.file) to retrieve the raw byte content.

Create BytesIO Stream

Create a BytesIO object using the decoded byte content. This transforms it into a stream that xlrd can understand.

Open Workbook

Open the workbook directly from the BytesIO stream using xlrd.open_workbook(file_contents=file_stream.read()).

Read Sheet Data

Proceed with iterating through the desired sheet and extracting cell values, as in the original code.

Benefits and Considerations

Format Independence

Read both .xls and .xlsx files using a single codebase.

No Version Dependency

No need to maintain strict xlrd version requirements.

Data Security

If handling sensitive data, ensure that you account for potential risks in base64 decoding.

Additional Tips

Error Handling

Implement error-handling mechanisms to manage potential issues during file processing.

Performance Optimisation

For very large files, consider alternative libraries or optimise your code for better performance.

Conclusion

By leveraging BytesIO, you've unlocked a powerful approach to reading Excel files in Python, regardless of format or xlrd version. This solution offers flexibility, efficiency, and compatibility, making it a valuable addition to your data processing arsenal.

Need more tips?


Keep up with our blog for useful tips.

Sajjad Hussain 24 September, 2024
Archive
Sign in to leave a comment
How to Restore a PostgreSQL Database Dump