![]() In Databricks Runtime 5.5, we backported a new pandas UDF type called “scalar iterator” from Apache Spark master. ![]() However, you might have to initialize the model for every record batch, which introduces overhead. The binary file data source enables you to run model inference tasks in parallel from Spark tables using a scalar pandas UDF. In Databricks Runtime 5.5, we have added an option, recursiveFileLookup, to load files recursively from nested input directories. In Databricks Runtime 5.4, we already made available the binary file data source to help ETL arbitrary files such as images, into Spark tables. Machine learning tasks, especially in the image and video domain, often have to operate on a large number of files. Faster model inference pipelines with improved binary file data source and scalar iterator pandas UDF (Public Preview)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |